pipeline_hisat_index.py
Overview
This pipeline makes a hisat2 HGFM index with transcripts.
The pipeline is written using the CGAT-core workflow management system .
Configuration
The pipeline requires a configured pipeline_hisat_index.yml file.
A default configuration file can be generated by executing:
txseq hisat_index config
Input files
For building the index the pipeline requires:
A gzip compressed FASTA file containing the genome primary assembly sequences.
A gzip compressed GTF file containing the transcript information.
The location of these files must be specified in the ‘pipeline_index.yml’ file.
Requirements
The following software is required:
Hisat2
Output files
The pipeline generates a hisat2 HGFM index in the current folder.
Code
- txseq.pipeline_hisat_index.spliceSites(infile, sentinel)
Extract the splice sites
- txseq.pipeline_hisat_index.exons(infile, sentinel)
Extract the exons.
- txseq.pipeline_hisat_index.transcriptomeIndex(infiles, sentinel)
Generate a HGFM index with transcripts.
- txseq.pipeline_hisat_index.full()
Target to run the full pipeline