pipeline_salmon_index.py
Overview
This pipeline makes a Salmon SAF genome index. This type of index uses the full genome as a decoy: according to the Salmon authors this type of index ‘does the best job in avoiding spurious alignments to annotated transcripts’.
The pipeline is written using the CGAT-core workflow management system .
Configuration
The pipeline requires a configured pipeline_salmon_index.yml file.
A default configuration file can be generated by executing:
txseq salmon_index config
Input files
For building the SAF genome index the pipeline requires:
A gzip compressed FASTA file containing the transcript sequences.
A gzip compressed FASTA file containing the genome primary assembly sequences.
The location of these three files must be specified in the ‘pipeline_salmon.yml’ file.
Requirements
The following software is required:
Salmon
Output files
The pipeline generates a salmon index called “salmon_index” in the current folder.
Code
- txseq.pipeline_salmon_index.index(infile, sentinel)
Generate a SAF genome index.
- txseq.pipeline_salmon_index.full()
Target to run the full pipeline