Introduction

Author:Public Health England
Date:04 September, 2018

Installation

From source:

git clone https://github.com/phe-bioinformatics/PHEnix.git
pip2 install -e PHEnix

Directly from github:

pip install git+https://github.com/phe-bioinformatics/PHEnix.git

Note

Installing from Pip - Coming Soon.

Overview

This code was designed to allow users to input fastq files and a reference sequence and perform:

  • Reference mapping
  • VCF generation
  • VCF filtering
  • FASTA sequence of SNPs

The process is comprised of three steps:

  • Reference sequence preparation (prepare_reference.py)
  • Mapping and filtered VCF generation (run_snp_pipeline.py)
  • FASTA generation from single or multiple VCFs (vcf2fasta.py)

Example:

prepare a bwa and gatk reference using a fasta file (myref.fasta)

phenix.py prepare_reference \
--mapper bwa \
--variant gatk \
--reference myref.fasta

map, call and filter variants on fastq files (my.R1.fastq, my.R2.fastq). Filter SNPs on minimum depth, mapping quality and AD ratio

phenix.py run_snp_pipeline \
-r1 my.R1.fastq \
-r2 my.R2.fastq \
-r myref.fasta \
--sample-name mysample \
--mapper bwa \
--variant gatk \
--filters min_depth:5,mq_score:30,ad_ratio:0.9

generate a FASTA file of SNPs using filtered VCFs in current directory

phenix.py vcf2fastq -d ./ -o output.fasta --regex filtered

Requirements

A lot of functionality depends on the presence of existing 3rd party tools:

Mappers:

Variant Caller:

In order for them to function properly, they need to be already in you PATH. For commands that run through Java archives, please set appropriate environment variable (see below).

Python

  • Python >= 2.7
  • argparse
  • PyVCF
  • PyYAML
  • matplotlib (_optional_)
  • bintrees (_optional_) - If you are using vcf2fasta.py
  • numpy (_optional_)
  • matplotlib.venn (_optional_) - psycopg2 (_optional_)

3rd Party Requirements

Samtools

Samtools Samtools can be downloaded from https://github.com/samtools/samtools. It is used to filter and convert to SAM/BAM files and in mpileup variant caller.

BCFTools

BCFtools can be downloaded from https://github.com/samtools/bcftools. It is used for calling variants in mpileup.

BWA Heng Li’s mapper can be downloaded from https://github.com/lh3/bwa.

Bowtie2 Bowtie2 mapper available from https://github.com/BenLangmead/bowtie2.

GATK Set GATK_JAR - full path to the GATK Java archive.

Picard Tools

Picard is needed for GATK to create dictionary of reference fasta. Either set PICARD_TOOLS_PATH - path to directory where different Picard jars are or set PICARD_JAR - path to picard.jar. Older Picard distributions have many different jars (use first suggestion), where as newer versions have merged all into one jar file.