pipeline_hisat_index.py

Overview

This pipeline makes a hisat2 HGFM index with transcripts.

The pipeline is written using the CGAT-core workflow management system .

Configuration

The pipeline requires a configured pipeline_hisat_index.yml file.

A default configuration file can be generated by executing:

txseq hisat_index config

Input files

For building the index the pipeline requires:

  1. A gzip compressed FASTA file containing the genome primary assembly sequences.

  2. A gzip compressed GTF file containing the transcript information.

The location of these files must be specified in the ‘pipeline_index.yml’ file.

Requirements

The following software is required:

  1. Hisat2

Output files

The pipeline generates a hisat2 HGFM index in the current folder.

Code

txseq.pipeline_hisat_index.spliceSites(infile, sentinel)

Extract the splice sites

txseq.pipeline_hisat_index.exons(infile, sentinel)

Extract the exons.

txseq.pipeline_hisat_index.transcriptomeIndex(infiles, sentinel)

Generate a HGFM index with transcripts.

txseq.pipeline_hisat_index.full()

Target to run the full pipeline