Welcome to the GitHub repository for DiMSum: An error model and pipeline for analyzing deep mutational scanning (DMS) data and diagnosing common experimental pathologies.
The DiMSum pipeline processes raw sequencing reads (in FASTQ format) or variant counts from deep mutational scanning (DMS) experiments to calculate estimates of variant fitness (and assocated error). These estimates are suitable for use in downstream analyses of epistasis and protein structure determination.
The DiMSum pipeline consists of five stages grouped into two modules that can be run independently:
Further details of individual DiMSum pipeline stages can be found here.
The easiest way to install DiMSum is by using the bioconda package.
conda install -c bioconda r-dimsum
See the full Installation Instructions for further details and alternative installation options.
In the example below, DiMSum will obtain variant sequences by aligning paired-end reads in the directory "FASTQ_dir", count variant occurrences for all samples specified in the supplied Experimental Design File ("experimentDesign.txt") and calculate fitness (and error) for all variants relative to the indicated wild-type sequence.
DiMSum --fastqFileDir FASTQ_dir --experimentDesignPath experimentDesign.txt --wildtypeSequence AGCTAGCT
By default, output files are saved to the folder "DiMSum_Project" in the current working directory.
All bug reports are highly appreciated. You may submit a bug report here on GitHub as an issue or you could send an email to firstname.lastname@example.org.
Please cite the following publication if you use DiMSum:
Faure, A.J., Schmiedel, J.M., Baeza-Centurion, P., Lehner B. DiMSum: an error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies. Genome Biol 21, 207 (2020). 10.1186/s13059-020-02091-3
(Vector illustration credit: Vecteezy!)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.