
Useful scripts for data analysis.


There is a single entry point to access all commands we have produced so far. Please refer to this documentation or –help on the cammand line.

usage: phenix [-h] [--debug] [--version]
              {run_snp_pipeline,filter_vcf,prepare_reference,vcf2fasta} ...
--debug=False More verbose logging (default: turned off).
--version show program’s version number and exit

Run SNP pipeline.

Run the snp pipeline with specified mapper, variant caller and some filters. Available mappers: [‘bwa’, ‘bowtie2’] Available variant callers: [‘mpileup’, ‘gatk’] Available filters: [‘gq_score’, ‘dp4_ratio’, ‘ad_ratio’, ‘min_depth’, ‘mq_score’, ‘mq0_ratio’, ‘uncall_gt’, ‘qual_score’, ‘mq0f_ratio’] Available annotators: [‘coverage’]

usage: phenix run_snp_pipeline [-h] [--workflow WORKFLOW] [--input INPUT]
                               [-r1 R1] [-r2 R2] [--reference REFERENCE]
                               [--sample-name SAMPLE_NAME] [--outdir OUTDIR]
                               [--config CONFIG] [--mapper MAPPER]
                               [--mapper-options MAPPER_OPTIONS] [--bam BAM]
                               [--variant VARIANT]
                               [--variant-options VARIANT_OPTIONS] [--vcf VCF]
                               [--filters FILTERS]
                               [--annotators ANNOTATORS [ANNOTATORS ...]]
--workflow, -w Undocumented
--input, -i Undocumented
-r1 R1/Forward read in Fastq format.
-r2 R2/Reverse read in Fastq format.
--reference, -r
 Rerefence to use for mapping.
 Name of the sample for mapper to include as read groups.
--outdir, -o Undocumented
--config, -c Undocumented
--mapper=bwa, -m=bwa
 Available mappers: [‘bwa’, ‘bowtie2’]
 Custom maper options (advanced)
--bam Undocumented
--variant=gatk, -v=gatk
 Available variant callers: [‘mpileup’, ‘gatk’]
 Custom variant options (advanced)
--vcf Undocumented
--filters Filters to be applied to the VCF in key:value pairs, separated by comma (,). Available_filters: [‘gq_score’, ‘dp4_ratio’, ‘ad_ratio’, ‘min_depth’, ‘mq_score’, ‘mq0_ratio’, ‘uncall_gt’, ‘qual_score’, ‘mq0f_ratio’]
--annotators List of annotators to run before filters. Available: [‘coverage’]
 Keep intermediate files like BAMs and VCFs (default: False).

Filter a VCF.

Filter the VCF using provided filters.

usage: phenix filter_vcf [-h] --vcf VCF [--filters FILTERS | --config CONFIG]
                         --output OUTPUT [--reference REFERENCE] [--only-good]
--vcf, -v VCF file to (re)filter.
--filters, -f Filter(s) to apply as key:threshold pairs, separated by comma.
--config, -c Config with filters in YAML format. E.g.filters:-key:value
--output, -o Location for filtered VCF to be written.
--reference, -r
 mpileup version <= 1.3 do not output all positions. This is required to fix rfrence base in VCF.
 Write only variants that PASS all filters (default all variants are written).

Create aux files for reference.

Prepare reference for SNP pipeline by generating required aux files.

usage: phenix prepare_reference [-h] --reference REFERENCE [--mapper MAPPER]
                                [--variant VARIANT]
--reference, -r
 Path to reference file to prepare.
--mapper Available mappers: [‘bwa’, ‘bowtie2’]
--variant Available variants: [‘mpileup’, ‘gatk’]

Convert VCFs to FASTA.

Combine multiple VCFs into a single FASTA file.

usage: phenix vcf2fasta [-h]
                        (--directory DIRECTORY | --input INPUT [INPUT ...])
                        [--regexp REGEXP] --out OUT
                        [--with-mixtures WITH_MIXTURES]
                        [--column-Ns COLUMN_NS] [--column-gaps COLUMN_GAPS]
                        [--sample-Ns SAMPLE_NS] [--sample-gaps SAMPLE_GAPS]
                        [--reference REFERENCE]
                        [--include INCLUDE | --exclude EXCLUDE]
                        [--with-stats WITH_STATS] [--tmp TMP]
--directory, -d
 Path to the directory with .vcf files.
--input, -i List of VCF files to process.
--regexp Regular expression for finding VCFs in a directory.
--out, -o Path to the output FASTA file.
 Specify this option with a threshold to output mixtures above this threshold.
--column-Ns Keeps columns with fraction of Ns below specified threshold.
--column-gaps Keeps columns with fraction of Ns below specified threshold.
--sample-Ns Keeps samples with fraction of Ns below specified threshold.
--sample-gaps Keeps samples with fraction of gaps below specified threshold.
--reference If path to reference specified (FASTA), then whole genome will be written.
--include Only include positions in BED file in the FASTA
--exclude Exclude any positions specified in the BED file.
--with-stats If a path is specified, then position of the outputed SNPs is stored in this file. Requires mumpy and matplotlib.
--tmp Location for writing temp files (default: /tmp).