Pipeline bam_qc.py
Overview
This pipeline computes QC statistic from BAM files. It uses the Picard toolkit and some custom scripts.
Configuration
The pipeline requires a configured pipeline_bam_qc.yml file.
Default configuration files can be generated by executing:
txseq pipeline_bam_qc config
Inputs
The pipeline requires the following inputs
samples.tsv: see Configuration files
bam files: the location of a folder containing the bam files named by “sample_id”.
txseq annotations: the location where the pipeline_ensembl.py was run to prepare the annotatations.
Requirements
The following software is required:
Picard
Output files
The pipeline produces the following outputs:
Picard rnaseq metrics: in the bam.qc.dir/rnaseq.metrics.dir
Picard alignment summary metrics: in the bam.qc.dir/alignment.summary.metrics.dir
Fraction of spliced reads: in the bam.qc.dir/fraction.spliced.dir
An sqlite database: in a file named “csvdb” which contains tables of the QC metrics, with key metrics summarised in the “qc_summary” table.
Code
- txseq.pipeline_bamqc.flatGeneset(infile, sentinel)
Prepare a flat version of the geneset for the Picard CollectRnaSeqMetrics module.
- txseq.pipeline_bamqc.collectRnaSeqMetrics(infile, sentinel)
Run Picard CollectRnaSeqMetrics on the bam files.
- txseq.pipeline_bamqc.loadCollectRnaSeqMetrics(infiles, outfile)
Load the metrics to the db.
- txseq.pipeline_bamqc.threePrimeBias(infile, outfile)
Compute a sensible three prime bias metric from the picard coverage histogram.
- txseq.pipeline_bamqc.loadThreePrimeBias(infiles, outfile)
Load the metrics in the project database.
- txseq.pipeline_bamqc.estimateLibraryComplexity(infile, sentinel)
Run Picard EstimateLibraryComplexity on the BAM files.
- txseq.pipeline_bamqc.loadEstimateLibraryComplexity(infiles, outfile)
Load the complexity metrics to a single table in the project database.
- txseq.pipeline_bamqc.alignmentSummaryMetrics(infile, sentinel)
Run Picard AlignmentSummaryMetrics on the bam files.
- txseq.pipeline_bamqc.loadAlignmentSummaryMetrics(infiles, outfile)
Load the complexity metrics to a single table in the project database.
- txseq.pipeline_bamqc.insertSizeMetricsAndHistograms(infile, sentinels)
Run Picard InsertSizeMetrics on the BAM files to collect summary metrics and histograms.
- txseq.pipeline_bamqc.loadInsertSizeMetrics(infiles, outfile)
Load the insert size metrics to a single table of the project database.
- txseq.pipeline_bamqc.loadInsertSizeHistograms(infiles, outfile)
Load the histograms to a single table of the project database.
- txseq.pipeline_bamqc.fractionSpliced(infile, sentinel)
Compute fraction of reads containing a splice junction. * paired-endedness is ignored * only uniquely mapping reads are considered.
- txseq.pipeline_bamqc.loadFractionSpliced(infiles, outfile)
Load fractions of spliced reads to a single table of the project database.
- txseq.pipeline_bamqc.loadSampleInformation(infile, outfile)
Load the sample information table to the project database.
- txseq.pipeline_bamqc.qcSummary(infiles, outfile)
Create a summary table of relevant QC metrics.
- txseq.pipeline_bamqc.loadQCSummary(infile, outfile)
Load summary to project database.
- txseq.pipeline_bamqc.qc()
Target for executing quality control.