Pipeline salmon.py

Overview

This pipeline quantifies gene expression from FASTQ files using Salmon.

Configuration

The pipeline requires a configured pipeline_salmon.yml file.

A default configuration file can be generated by executing:

txseq salmon config

Inputs

The pipeline requires the following inputs

  1. samples.tsv: see Configuration files

  2. libraries.tsv: see :doc: Configuration files<configuration>

  3. txseq annotations: the location where the pipeline_ensembl.py was run to prepare the annotatations.

  4. Salmon index: the location of a salmon index built with pipeline_salmon_index.py.

Requirements

The following software is required:

  1. Salmon

Output files

The pipeline produces the following outputs:

  1. per-sample salmon quantification results in the “salmon.dir” folder

  2. a csvdb sqlite database that contains tables of gene and transcript counts and TPMs

Note

It is strongly recommended to parse the raw Salmon results using the tximport Bioconductor R package for downstream analysis.

Code

txseq.pipeline_salmon.quant(infile, outfile)

Per sample quantitation using salmon.

txseq.pipeline_salmon.loadSalmonTranscriptQuant(infiles, sentinel)

Load the salmon transcript-level results.

txseq.pipeline_salmon.loadSalmonGeneQuant(infiles, sentinel)

Load the salmon gene-level results.

txseq.pipeline_salmon.salmonTPMs(infile, outfile)

Prepare a wide table of salmon TPMs (samples x transcripts|genes).

txseq.pipeline_salmon.loadSalmonTPMs(infile, outfile)

Load a wide table of salmon TPMs in the project database.

txseq.pipeline_salmon.tximeta(infile, outfile)

Run tximeta to summarise counts and gene and transcript level.

txseq.pipeline_salmon.quantitation()

Quantitation target.

txseq.pipeline_salmon.loadTranscriptInfo(infile, outfile)

Load the annotations for salmon into the project database.

txseq.pipeline_salmon.numberGenesDetected(infile, outfile)

Count no genes detected at copynumer > 0 in each sample.

txseq.pipeline_salmon.loadNumberGenesDetected(infile, outfile)

Load the numbers of genes expressed to the project database.