Pipeline salmon.py
Overview
This pipeline quantifies gene expression from FASTQ files using Salmon.
Configuration
The pipeline requires a configured pipeline_salmon.yml file.
A default configuration file can be generated by executing:
txseq salmon config
Inputs
The pipeline requires the following inputs
samples.tsv: see Configuration files
libraries.tsv: see :doc: Configuration files<configuration>
txseq annotations: the location where the pipeline_ensembl.py was run to prepare the annotatations.
Salmon index: the location of a salmon index built with pipeline_salmon_index.py.
Requirements
The following software is required:
Salmon
Output files
The pipeline produces the following outputs:
per-sample salmon quantification results in the “salmon.dir” folder
a csvdb sqlite database that contains tables of gene and transcript counts and TPMs
Note
It is strongly recommended to parse the raw Salmon results using the tximport Bioconductor R package for downstream analysis.
Code
- txseq.pipeline_salmon.quant(infile, outfile)
Per sample quantitation using salmon.
- txseq.pipeline_salmon.loadSalmonTranscriptQuant(infiles, sentinel)
Load the salmon transcript-level results.
- txseq.pipeline_salmon.loadSalmonGeneQuant(infiles, sentinel)
Load the salmon gene-level results.
- txseq.pipeline_salmon.salmonTPMs(infile, outfile)
Prepare a wide table of salmon TPMs (samples x transcripts|genes).
- txseq.pipeline_salmon.loadSalmonTPMs(infile, outfile)
Load a wide table of salmon TPMs in the project database.
- txseq.pipeline_salmon.tximeta(infile, outfile)
Run tximeta to summarise counts and gene and transcript level.
- txseq.pipeline_salmon.quantitation()
Quantitation target.
- txseq.pipeline_salmon.loadTranscriptInfo(infile, outfile)
Load the annotations for salmon into the project database.
- txseq.pipeline_salmon.numberGenesDetected(infile, outfile)
Count no genes detected at copynumer > 0 in each sample.
- txseq.pipeline_salmon.loadNumberGenesDetected(infile, outfile)
Load the numbers of genes expressed to the project database.