Building transcriptome indexes
Introduction
Txseq uses Hisat2 for alignment-based gene expression quantification and Salmon for quasi-alignment based gene expression quantification. To use these tools, it is first necessary to build method-specific transcriptomes indexes.
As with preparation of the sanitised genome sequences and gene annotations, it is recommended to build transcriptome indexes in a central location for use in multiple projects.
Txseq has dedicated pipelines for building Salmon and Hisat2 indexes which can be used as follows.
Note
If you are using the KIR BMRC workspace, Salmon and Hisat index built with txseq can be found in the “/well/kir/projects/mirror/txseq/” directory.
Building a Salmon Index
In a suitably named directory, obtain the pipeline_salmon_index.py configuration file with the following command:
mkdir salmon.index.dir
cd salmon.index.dir
txseq salmon_index config
emacs pipeline_salmon_index.yml
After editing the yaml to point to the location of the “txseq.genome.fa.gz” and “txseq.transcript.fa.gz” files (see the “Genomes and Annotations” section), and configuring the parameters as desired, the pipeline can be executed with the following command:
txseq salmon_index make full -v5 -p20
Building a Hisat2 Index
In a suitably named directory, obtain the pipeline_hisat_index.py configuration file with the following command:
mkdir hisat.index.dir
cd hisat.index.dir
txseq hisat_index config
emacs pipeline_hisat_index.yml
After editing the yaml to point to the location of the “txseq.genome.fa.gz” and “txseq.geneset.gtf.gz” files (see the “Genomes and Annotations” section), and configuring the parameters as desired, the pipeline can be executed with the following command:
txseq hisat_index make full -v5 -p20