pipeline_salmon_index.py

Overview

This pipeline makes a Salmon SAF genome index. This type of index uses the full genome as a decoy: according to the Salmon authors this type of index ‘does the best job in avoiding spurious alignments to annotated transcripts’.

The pipeline is written using the CGAT-core workflow management system .

Configuration

The pipeline requires a configured pipeline_salmon_index.yml file.

A default configuration file can be generated by executing:

txseq salmon_index config

Input files

For building the SAF genome index the pipeline requires:

A gzip compressed FASTA file containing the transcript sequences.
A gzip compressed FASTA file containing the genome primary assembly sequences.

The location of these three files must be specified in the ‘pipeline_salmon.yml’ file.

Requirements

The following software is required:

Salmon

Output files

The pipeline generates a salmon index called “salmon_index” in the current folder.

Code

txseq.pipeline_salmon_index.index(infile, sentinel): Generate a SAF genome index.

txseq.pipeline_salmon_index.full(): Target to run the full pipeline