Scripts¶
Useful scripts for data analysis.
phenix¶
There is a single entry point to access all commands we have produced so far. Please refer to this documentation or –help on the cammand line.
usage: phenix [-h] [--debug] [--version]
{run_snp_pipeline,filter_vcf,prepare_reference,vcf2fasta} ...
- Options:
--debug=False More verbose logging (default: turned off). --version show program’s version number and exit - Sub-commands:
- run_snp_pipeline
Run SNP pipeline.
Run the snp pipeline with specified mapper, variant caller and some filters. Available mappers: [‘bwa’, ‘bowtie2’] Available variant callers: [‘mpileup’, ‘gatk’] Available filters: [‘gq_score’, ‘dp4_ratio’, ‘ad_ratio’, ‘min_depth’, ‘mq_score’, ‘mq0_ratio’, ‘uncall_gt’, ‘qual_score’, ‘mq0f_ratio’] Available annotators: [‘coverage’]
usage: phenix run_snp_pipeline [-h] [--workflow WORKFLOW] [--input INPUT] [-r1 R1] [-r2 R2] [--reference REFERENCE] [--sample-name SAMPLE_NAME] [--outdir OUTDIR] [--config CONFIG] [--mapper MAPPER] [--mapper-options MAPPER_OPTIONS] [--bam BAM] [--variant VARIANT] [--variant-options VARIANT_OPTIONS] [--vcf VCF] [--filters FILTERS] [--annotators ANNOTATORS [ANNOTATORS ...]] [--keep-temp]
- Options:
--workflow, -w Undocumented --input, -i Undocumented -r1 R1/Forward read in Fastq format. -r2 R2/Reverse read in Fastq format. --reference, -r Rerefence to use for mapping. --sample-name=test_sample Name of the sample for mapper to include as read groups. --outdir, -o Undocumented --config, -c Undocumented --mapper=bwa, -m=bwa Available mappers: [‘bwa’, ‘bowtie2’] --mapper-options Custom maper options (advanced) --bam Undocumented --variant=gatk, -v=gatk Available variant callers: [‘mpileup’, ‘gatk’] --variant-options Custom variant options (advanced) --vcf Undocumented --filters Filters to be applied to the VCF in key:value pairs, separated by comma (,). Available_filters: [‘gq_score’, ‘dp4_ratio’, ‘ad_ratio’, ‘min_depth’, ‘mq_score’, ‘mq0_ratio’, ‘uncall_gt’, ‘qual_score’, ‘mq0f_ratio’] --annotators List of annotators to run before filters. Available: [‘coverage’] --keep-temp=False Keep intermediate files like BAMs and VCFs (default: False).
- filter_vcf
Filter a VCF.
Filter the VCF using provided filters.
usage: phenix filter_vcf [-h] --vcf VCF [--filters FILTERS | --config CONFIG] --output OUTPUT [--reference REFERENCE] [--only-good]
- Options:
--vcf, -v VCF file to (re)filter. --filters, -f Filter(s) to apply as key:threshold pairs, separated by comma. --config, -c Config with filters in YAML format. E.g.filters:-key:value --output, -o Location for filtered VCF to be written. --reference, -r mpileup version <= 1.3 do not output all positions. This is required to fix rfrence base in VCF. --only-good=False Write only variants that PASS all filters (default all variants are written).
- prepare_reference
Create aux files for reference.
Prepare reference for SNP pipeline by generating required aux files.
usage: phenix prepare_reference [-h] --reference REFERENCE [--mapper MAPPER] [--variant VARIANT]
- Options:
--reference, -r Path to reference file to prepare. --mapper Available mappers: [‘bwa’, ‘bowtie2’] --variant Available variants: [‘mpileup’, ‘gatk’]
- vcf2fasta
Convert VCFs to FASTA.
Combine multiple VCFs into a single FASTA file.
usage: phenix vcf2fasta [-h] (--directory DIRECTORY | --input INPUT [INPUT ...]) [--regexp REGEXP] --out OUT [--with-mixtures WITH_MIXTURES] [--column-Ns COLUMN_NS] [--column-gaps COLUMN_GAPS] [--sample-Ns SAMPLE_NS] [--sample-gaps SAMPLE_GAPS] [--reference REFERENCE] [--include INCLUDE | --exclude EXCLUDE] [--with-stats WITH_STATS] [--tmp TMP]
- Options:
--directory, -d Path to the directory with .vcf files. --input, -i List of VCF files to process. --regexp Regular expression for finding VCFs in a directory. --out, -o Path to the output FASTA file. --with-mixtures Specify this option with a threshold to output mixtures above this threshold. --column-Ns Keeps columns with fraction of Ns below specified threshold. --column-gaps Keeps columns with fraction of Ns below specified threshold. --sample-Ns Keeps samples with fraction of Ns below specified threshold. --sample-gaps Keeps samples with fraction of gaps below specified threshold. --reference If path to reference specified (FASTA), then whole genome will be written. --include Only include positions in BED file in the FASTA --exclude Exclude any positions specified in the BED file. --with-stats If a path is specified, then position of the outputed SNPs is stored in this file. Requires mumpy and matplotlib. --tmp Location for writing temp files (default: /tmp).