Workflow Overview ================= Introduction ------------ Txseq is designed to efficiently parallelise the processing of bulk RNA-sequencing data on compute clusters. The workflow can start from either FASTQ or BAM inputs. 1. Assessment of read quality ------------------------------ Read quality can be assessed using the `FASTQC quality control tool `_ with :doc:`pipeline_fastqc.py `. 2. Mapping and Quantitation --------------------------- Txseq supports the following workflows: #. `Salmon `_ and `tximeta `_ for fast, sensitive and accurate pseudo alignment based quantitation(see :doc:`pipeline_salmon.py ` for more details). #. `Hisat2 `_ and `featureCounts `_ for more traditional mapping based quantitation (see :doc:`pipeline_hisat.py ` and :doc:`pipeline_feature_counts ` for more details). It is recommended to run both workflows. The BAM files generated by Hisat2 are necessary for the generation of post-mapping QC statistics with Picard. These give essential insight into library quality that is not possible to obtain from analysis with Salmon alone. 3. Post-mapping QC ------------------ Useful insight can be gained from examining read mapping statistics. Txseq can compute a suite of metrics using the 'CollectRnaSeqMetrics', 'EstimateLibraryComplexity', 'CollectAlignmentSummaryMetrics' and 'CollectInsertSizeMetrics' tools from the `Picard toolkit `_. It also computes the fraction of spliced reads. This functionality is implemented in :doc:`pipeline_bam_qc`. 4. Downstream analysis ---------------------- For statistical and exploratory data analysis it is recommended to use tximeta length-corrected count estimates from Salmon. For visualising gene expression levels, it is recommended to use Salmon TPMs after applying an inter-sample normalisation routine such as e.g. upper-quartile normalisation. Examples of how the outputs can be used to assess read quality, perform exploratory analysis and to perform differential expression analysis are provided as R markdown notebooks for :doc:`mouse hsc example `.