README.md

TraceQC

TraceQC is a R package for quality control (QC) of CRISPR Lineage Tracing Sequence Data.

Installation

if(!requireNamespace("devtools", quietly=TRUE)) install.packages("devtools")
devtools::install_github("LiuzLab/TraceQC")

To install the Python packages for traceQC, run:

pip install biopython pandas tqdm pysam

Tutorial

The tutorial of TraceQC pipeline for bulk DNA sequencing is here. The dataset is sampled from hgRNA dataset.

The tutorial of TraceQC pipeline for single-cell RNA sequencing is here. The dataset is sampled from Carlin dataset.

TraceQC annotated reference format

The reference is a text file which contains information as follows:

ATGGACTATCATATGCTTACCG...CCGGTAGACGCACCTCCACCCCACAGTGGGGTTAGAGCTAGAAATA
target 23 140

The first line of the reference file is required should be the construct sequence. The second line is also required should be the target barcode region of the construct. In this lines, two numbers next to a region name specify the start and end locations of the region. Locations should be 1-based, i.e. the first location is indicated as 1. Users can optionally add additional regions such as spacer region or PAM region in the same format. Here is an example of the refenence file with additional regions:

ATGGACTATCATATGCTTACCG...CCGGTAGACGCACCTCCACCCCACAGTGGGGTTAGAGCTAGAAATA
target 24 140
spacer 88 107
PAM 108 110

The examples of annotated hgRNA reference sequence is aviable here. The examples of annotated Carlin reference sequence is aviable here.

TraceQC output format

| Column | Description | | ----------- | ----------- | | character | A mutation identification string | | type | The type of mutation (deletion, insertion and substitution). | | start | The starting positioin of mutation. | | length | The length of mutation. | | alt | The altered sequence. | | count | The read count of mutation. | | cell | The cell IDs that contain this mutation. |

Documentation

The full documentation of TraceQC functions is available here.

References

Kalhor, R., Mali, P., & Church, G. M. (2017). Rapidly evolving homing CRISPR barcodes. Nature methods, 14(2), 195-200.

Bowling, S., Sritharan, D., Osorio, F. G., Nguyen, M., Cheung, P., Rodriguez-Fraticelli, A., ... & Camargo, F. D. (2020). An engineered CRISPR-Cas9 mouse line for simultaneous readout of lineage histories and gene expression profiles in single cells. Cell, 181(6), 1410-1422.



LiuzLab/TraceQC documentation built on April 19, 2022, 1:29 p.m.