Pipeline feature_counts.py

Overview

This pipeline counts the number of reads mapped to transcript/gene models. It uses the featureCounts algorithm from the Subread package .

Configuration

The pipeline requires a configured pipeline_feature_counts.yml file.

A default configuration file can be generated by executing:

txseq salmon feature_counts

Inputs

The pipeline requires the following inputs

  1. samples.tsv: see Configuration files

  2. txseq annotations: the location where the pipeline_ensembl.py was run to prepare the annotatations.

  3. bam files: the location of a folder containing the bam files named by “sample_id”.

Requirements

The following software is required:

  1. Subread

Output files

The pipeline produces the following outputs:

  1. per-sample results: in the “feature.counts.dir” subdirectory

  2. An sqlite database: in a file named “csvdb” which contains the per-gene counts.

Code

txseq.pipeline_feature_counts.count(infile, sentinel)

Run featureCounts.

txseq.pipeline_feature_counts.loadCounts(infiles, outfile)

Combine and load count data in the project database.

txseq.pipeline_feature_counts.geneCounts(infile, outfile)

Prepare a gene-by-sample table of featureCounts counts.

txseq.pipeline_feature_counts.loadGeneCounts(infile, outfile)

Load the gene-by-sample matrix of count data in the project database.

txseq.pipeline_feature_counts.loadTranscriptInfo(infile, outfile)

Load the annotations for salmon into the project database.

txseq.pipeline_feature_counts.nGenesDetected(infile, outfile)

Count of genes detected by featureCount at counts > 0 in each sample.

txseq.pipeline_feature_counts.loadNGenesDetected(infile, outfile)

Load the numbers of genes expressed to the project database.