Pipeline bam_qc.py

Overview

This pipeline computes QC statistic from BAM files. It uses the Picard toolkit and some custom scripts.

Configuration

The pipeline requires a configured pipeline_bam_qc.yml file.

Default configuration files can be generated by executing:

txseq pipeline_bam_qc config

Inputs

The pipeline requires the following inputs

samples.tsv: see Configuration files
bam files: the location of a folder containing the bam files named by “sample_id”.
txseq annotations: the location where the pipeline_ensembl.py was run to prepare the annotatations.

Requirements

The following software is required:

Picard

Output files

The pipeline produces the following outputs:

Picard rnaseq metrics: in the bam.qc.dir/rnaseq.metrics.dir
Picard alignment summary metrics: in the bam.qc.dir/alignment.summary.metrics.dir
Fraction of spliced reads: in the bam.qc.dir/fraction.spliced.dir
An sqlite database: in a file named “csvdb” which contains tables of the QC metrics, with key metrics summarised in the “qc_summary” table.

Code

txseq.pipeline_bamqc.flatGeneset(infile, sentinel): Prepare a flat version of the geneset for the Picard CollectRnaSeqMetrics module.

txseq.pipeline_bamqc.collectRnaSeqMetrics(infile, sentinel): Run Picard CollectRnaSeqMetrics on the bam files.

txseq.pipeline_bamqc.loadCollectRnaSeqMetrics(infiles, outfile): Load the metrics to the db.

txseq.pipeline_bamqc.threePrimeBias(infile, outfile): Compute a sensible three prime bias metric from the picard coverage histogram.

txseq.pipeline_bamqc.loadThreePrimeBias(infiles, outfile): Load the metrics in the project database.

txseq.pipeline_bamqc.estimateLibraryComplexity(infile, sentinel): Run Picard EstimateLibraryComplexity on the BAM files.

txseq.pipeline_bamqc.loadEstimateLibraryComplexity(infiles, outfile): Load the complexity metrics to a single table in the project database.

txseq.pipeline_bamqc.alignmentSummaryMetrics(infile, sentinel): Run Picard AlignmentSummaryMetrics on the bam files.

txseq.pipeline_bamqc.loadAlignmentSummaryMetrics(infiles, outfile): Load the complexity metrics to a single table in the project database.

txseq.pipeline_bamqc.insertSizeMetricsAndHistograms(infile, sentinels): Run Picard InsertSizeMetrics on the BAM files to collect summary metrics and histograms.

txseq.pipeline_bamqc.loadInsertSizeMetrics(infiles, outfile): Load the insert size metrics to a single table of the project database.

txseq.pipeline_bamqc.loadInsertSizeHistograms(infiles, outfile): Load the histograms to a single table of the project database.

txseq.pipeline_bamqc.fractionSpliced(infile, sentinel): Compute fraction of reads containing a splice junction. * paired-endedness is ignored * only uniquely mapping reads are considered.

txseq.pipeline_bamqc.loadFractionSpliced(infiles, outfile): Load fractions of spliced reads to a single table of the project database.

txseq.pipeline_bamqc.loadSampleInformation(infile, outfile): Load the sample information table to the project database.

txseq.pipeline_bamqc.qcSummary(infiles, outfile): Create a summary table of relevant QC metrics.

txseq.pipeline_bamqc.loadQCSummary(infile, outfile): Load summary to project database.

txseq.pipeline_bamqc.qc(): Target for executing quality control.