Pipeline salmon.py

Overview

This pipeline quantifies gene expression from FASTQ files using Salmon.

Configuration

The pipeline requires a configured pipeline_salmon.yml file.

A default configuration file can be generated by executing:

txseq salmon config

Inputs

The pipeline requires the following inputs

samples.tsv: see Configuration files
libraries.tsv: see :doc: Configuration files<configuration>
txseq annotations: the location where the pipeline_ensembl.py was run to prepare the annotatations.
Salmon index: the location of a salmon index built with pipeline_salmon_index.py.

Requirements

The following software is required:

Salmon

Output files

The pipeline produces the following outputs:

per-sample salmon quantification results in the “salmon.dir” folder
a csvdb sqlite database that contains tables of gene and transcript counts and TPMs

Note

It is strongly recommended to parse the raw Salmon results using the tximport Bioconductor R package for downstream analysis.

Code

txseq.pipeline_salmon.quant(infile, outfile): Per sample quantitation using salmon.

txseq.pipeline_salmon.loadSalmonTranscriptQuant(infiles, sentinel): Load the salmon transcript-level results.

txseq.pipeline_salmon.loadSalmonGeneQuant(infiles, sentinel): Load the salmon gene-level results.

txseq.pipeline_salmon.salmonTPMs(infile, outfile): Prepare a wide table of salmon TPMs (samples x transcripts|genes).

txseq.pipeline_salmon.loadSalmonTPMs(infile, outfile): Load a wide table of salmon TPMs in the project database.

txseq.pipeline_salmon.tximeta(infile, outfile): Run tximeta to summarise counts and gene and transcript level.

txseq.pipeline_salmon.quantitation(): Quantitation target.

txseq.pipeline_salmon.loadTranscriptInfo(infile, outfile): Load the annotations for salmon into the project database.

txseq.pipeline_salmon.numberGenesDetected(infile, outfile): Count no genes detected at copynumer > 0 in each sample.

txseq.pipeline_salmon.loadNumberGenesDetected(infile, outfile): Load the numbers of genes expressed to the project database.