Scripts

Useful scripts for data analysis.

phenix

There is a single entry point to access all commands we have produced so far. Please refer to this documentation or –help on the cammand line.

usage: phenix [-h] [--debug] [--version]
              {run_snp_pipeline,filter_vcf,prepare_reference,vcf2fasta} ...
Options:
--debug=False More verbose logging (default: turned off).
--version show program’s version number and exit
Sub-commands:
run_snp_pipeline

Run SNP pipeline.

Run the snp pipeline with specified mapper, variant caller and some filters. Available mappers: [‘bwa’, ‘bowtie2’] Available variant callers: [‘mpileup’, ‘gatk’] Available filters: [‘gq_score’, ‘dp4_ratio’, ‘ad_ratio’, ‘min_depth’, ‘mq_score’, ‘mq0_ratio’, ‘uncall_gt’, ‘qual_score’, ‘mq0f_ratio’] Available annotators: [‘coverage’]

usage: phenix run_snp_pipeline [-h] [--workflow WORKFLOW] [--input INPUT]
                               [-r1 R1] [-r2 R2] [--reference REFERENCE]
                               [--sample-name SAMPLE_NAME] [--outdir OUTDIR]
                               [--config CONFIG] [--mapper MAPPER]
                               [--mapper-options MAPPER_OPTIONS] [--bam BAM]
                               [--variant VARIANT]
                               [--variant-options VARIANT_OPTIONS] [--vcf VCF]
                               [--filters FILTERS]
                               [--annotators ANNOTATORS [ANNOTATORS ...]]
                               [--keep-temp]
Options:
--workflow, -w Undocumented
--input, -i Undocumented
-r1 R1/Forward read in Fastq format.
-r2 R2/Reverse read in Fastq format.
--reference, -r
 Rerefence to use for mapping.
--sample-name=test_sample
 Name of the sample for mapper to include as read groups.
--outdir, -o Undocumented
--config, -c Undocumented
--mapper=bwa, -m=bwa
 Available mappers: [‘bwa’, ‘bowtie2’]
--mapper-options
 Custom maper options (advanced)
--bam Undocumented
--variant=gatk, -v=gatk
 Available variant callers: [‘mpileup’, ‘gatk’]
--variant-options
 Custom variant options (advanced)
--vcf Undocumented
--filters Filters to be applied to the VCF in key:value pairs, separated by comma (,). Available_filters: [‘gq_score’, ‘dp4_ratio’, ‘ad_ratio’, ‘min_depth’, ‘mq_score’, ‘mq0_ratio’, ‘uncall_gt’, ‘qual_score’, ‘mq0f_ratio’]
--annotators List of annotators to run before filters. Available: [‘coverage’]
--keep-temp=False
 Keep intermediate files like BAMs and VCFs (default: False).
filter_vcf

Filter a VCF.

Filter the VCF using provided filters.

usage: phenix filter_vcf [-h] --vcf VCF [--filters FILTERS | --config CONFIG]
                         --output OUTPUT [--reference REFERENCE] [--only-good]
Options:
--vcf, -v VCF file to (re)filter.
--filters, -f Filter(s) to apply as key:threshold pairs, separated by comma.
--config, -c Config with filters in YAML format. E.g.filters:-key:value
--output, -o Location for filtered VCF to be written.
--reference, -r
 mpileup version <= 1.3 do not output all positions. This is required to fix rfrence base in VCF.
--only-good=False
 Write only variants that PASS all filters (default all variants are written).
prepare_reference

Create aux files for reference.

Prepare reference for SNP pipeline by generating required aux files.

usage: phenix prepare_reference [-h] --reference REFERENCE [--mapper MAPPER]
                                [--variant VARIANT]
Options:
--reference, -r
 Path to reference file to prepare.
--mapper Available mappers: [‘bwa’, ‘bowtie2’]
--variant Available variants: [‘mpileup’, ‘gatk’]
vcf2fasta

Convert VCFs to FASTA.

Combine multiple VCFs into a single FASTA file.

usage: phenix vcf2fasta [-h]
                        (--directory DIRECTORY | --input INPUT [INPUT ...])
                        [--regexp REGEXP] --out OUT
                        [--with-mixtures WITH_MIXTURES]
                        [--column-Ns COLUMN_NS] [--column-gaps COLUMN_GAPS]
                        [--sample-Ns SAMPLE_NS] [--sample-gaps SAMPLE_GAPS]
                        [--reference REFERENCE]
                        [--include INCLUDE | --exclude EXCLUDE]
                        [--with-stats WITH_STATS] [--tmp TMP]
Options:
--directory, -d
 Path to the directory with .vcf files.
--input, -i List of VCF files to process.
--regexp Regular expression for finding VCFs in a directory.
--out, -o Path to the output FASTA file.
--with-mixtures
 Specify this option with a threshold to output mixtures above this threshold.
--column-Ns Keeps columns with fraction of Ns below specified threshold.
--column-gaps Keeps columns with fraction of Ns below specified threshold.
--sample-Ns Keeps samples with fraction of Ns below specified threshold.
--sample-gaps Keeps samples with fraction of gaps below specified threshold.
--reference If path to reference specified (FASTA), then whole genome will be written.
--include Only include positions in BED file in the FASTA
--exclude Exclude any positions specified in the BED file.
--with-stats If a path is specified, then position of the outputed SNPs is stored in this file. Requires mumpy and matplotlib.
--tmp Location for writing temp files (default: /tmp).