Pipeline feature_counts.py
Overview
This pipeline counts the number of reads mapped to transcript/gene models. It uses the featureCounts algorithm from the Subread package .
Configuration
The pipeline requires a configured pipeline_feature_counts.yml file.
A default configuration file can be generated by executing:
txseq salmon feature_counts
Inputs
The pipeline requires the following inputs
samples.tsv: see Configuration files
txseq annotations: the location where the pipeline_ensembl.py was run to prepare the annotatations.
bam files: the location of a folder containing the bam files named by “sample_id”.
Requirements
The following software is required:
Subread
Output files
The pipeline produces the following outputs:
per-sample results: in the “feature.counts.dir” subdirectory
An sqlite database: in a file named “csvdb” which contains the per-gene counts.
Code
- txseq.pipeline_feature_counts.count(infile, sentinel)
Run featureCounts.
- txseq.pipeline_feature_counts.loadCounts(infiles, outfile)
Combine and load count data in the project database.
- txseq.pipeline_feature_counts.geneCounts(infile, outfile)
Prepare a gene-by-sample table of featureCounts counts.
- txseq.pipeline_feature_counts.loadGeneCounts(infile, outfile)
Load the gene-by-sample matrix of count data in the project database.
- txseq.pipeline_feature_counts.loadTranscriptInfo(infile, outfile)
Load the annotations for salmon into the project database.
- txseq.pipeline_feature_counts.nGenesDetected(infile, outfile)
Count of genes detected by featureCount at counts > 0 in each sample.
- txseq.pipeline_feature_counts.loadNGenesDetected(infile, outfile)
Load the numbers of genes expressed to the project database.