Pipeline hisat.py

Overview

This pipeline quantifies gene expression from FASTQ files using Hisat2.

Configuration

The pipeline requires a configured pipeline_hisat.yml file.

Default configuration files can be generated by executing:

python <srcdir>/pipeline_hisat.py config

Inputs

The pipeline requires the following inputs

  1. samples.tsv: see Configuration files

  2. libraries.tsv: see :doc: Configuration files<configuration>

  3. txseq annotations: the location where the pipeline_ensembl.py was run to prepare the annotatations.

  4. Hisat index: a hisat2 index built with pipeline_hisat_index.py.

Requirements

The following software is required:

  1. Hisat2

Output files

The pipeline produces the following outputs:

  1. bam files: these are found in the hisat.dir sub-folder and are named by sample_id.

Code

txseq.pipeline_hisat.firstPass(infile, sentinel)

Run a first hisat pass to identify novel splice sites.

txseq.pipeline_hisat.novelSpliceSites(infiles, sentinel)

Collect the novel splice sites into a single file.

txseq.pipeline_hisat.secondPass(infile, sentinel)

Align reads using HISAT with known and novel junctions.