Pipeline hisat.py

Overview

This pipeline quantifies gene expression from FASTQ files using Hisat2.

Configuration

The pipeline requires a configured pipeline_hisat.yml file.

Default configuration files can be generated by executing:

python <srcdir>/pipeline_hisat.py config

Inputs

The pipeline requires the following inputs

samples.tsv: see Configuration files
libraries.tsv: see :doc: Configuration files<configuration>
txseq annotations: the location where the pipeline_ensembl.py was run to prepare the annotatations.
Hisat index: a hisat2 index built with pipeline_hisat_index.py.

Requirements

The following software is required:

Hisat2

Output files

The pipeline produces the following outputs:

bam files: these are found in the hisat.dir sub-folder and are named by sample_id.

Code

txseq.pipeline_hisat.firstPass(infile, sentinel): Run a first hisat pass to identify novel splice sites.

txseq.pipeline_hisat.novelSpliceSites(infiles, sentinel): Collect the novel splice sites into a single file.

txseq.pipeline_hisat.secondPass(infile, sentinel): Align reads using HISAT with known and novel junctions.