Pipeline bam_qc.py

Overview

This pipeline computes QC statistic from BAM files. It uses the Picard toolkit and some custom scripts.

Configuration

The pipeline requires a configured pipeline_bam_qc.yml file.

Default configuration files can be generated by executing:

txseq pipeline_bam_qc config

Inputs

The pipeline requires the following inputs

  1. samples.tsv: see Configuration files

  2. bam files: the location of a folder containing the bam files named by “sample_id”.

  3. txseq annotations: the location where the pipeline_ensembl.py was run to prepare the annotatations.

Requirements

The following software is required:

  1. Picard

Output files

The pipeline produces the following outputs:

  1. Picard rnaseq metrics: in the bam.qc.dir/rnaseq.metrics.dir

  2. Picard alignment summary metrics: in the bam.qc.dir/alignment.summary.metrics.dir

  3. Fraction of spliced reads: in the bam.qc.dir/fraction.spliced.dir

  4. An sqlite database: in a file named “csvdb” which contains tables of the QC metrics, with key metrics summarised in the “qc_summary” table.

Code

txseq.pipeline_bamqc.flatGeneset(infile, sentinel)

Prepare a flat version of the geneset for the Picard CollectRnaSeqMetrics module.

txseq.pipeline_bamqc.collectRnaSeqMetrics(infile, sentinel)

Run Picard CollectRnaSeqMetrics on the bam files.

txseq.pipeline_bamqc.loadCollectRnaSeqMetrics(infiles, outfile)

Load the metrics to the db.

txseq.pipeline_bamqc.threePrimeBias(infile, outfile)

Compute a sensible three prime bias metric from the picard coverage histogram.

txseq.pipeline_bamqc.loadThreePrimeBias(infiles, outfile)

Load the metrics in the project database.

txseq.pipeline_bamqc.estimateLibraryComplexity(infile, sentinel)

Run Picard EstimateLibraryComplexity on the BAM files.

txseq.pipeline_bamqc.loadEstimateLibraryComplexity(infiles, outfile)

Load the complexity metrics to a single table in the project database.

txseq.pipeline_bamqc.alignmentSummaryMetrics(infile, sentinel)

Run Picard AlignmentSummaryMetrics on the bam files.

txseq.pipeline_bamqc.loadAlignmentSummaryMetrics(infiles, outfile)

Load the complexity metrics to a single table in the project database.

txseq.pipeline_bamqc.insertSizeMetricsAndHistograms(infile, sentinels)

Run Picard InsertSizeMetrics on the BAM files to collect summary metrics and histograms.

txseq.pipeline_bamqc.loadInsertSizeMetrics(infiles, outfile)

Load the insert size metrics to a single table of the project database.

txseq.pipeline_bamqc.loadInsertSizeHistograms(infiles, outfile)

Load the histograms to a single table of the project database.

txseq.pipeline_bamqc.fractionSpliced(infile, sentinel)

Compute fraction of reads containing a splice junction. * paired-endedness is ignored * only uniquely mapping reads are considered.

txseq.pipeline_bamqc.loadFractionSpliced(infiles, outfile)

Load fractions of spliced reads to a single table of the project database.

txseq.pipeline_bamqc.loadSampleInformation(infile, outfile)

Load the sample information table to the project database.

txseq.pipeline_bamqc.qcSummary(infiles, outfile)

Create a summary table of relevant QC metrics.

txseq.pipeline_bamqc.loadQCSummary(infile, outfile)

Load summary to project database.

txseq.pipeline_bamqc.qc()

Target for executing quality control.