sc_trim_barcode: sc_trim_barcode
In LuyiTian/scPipe: Pipeline for single cell multi-omic data pre-processing

sc_trim_barcode

R Documentation

sc_trim_barcode

Description

Reformat fastq files so barcode and UMI sequences are moved from the sequence into the read name.

Usage

sc_trim_barcode(
  outfq,
  r1,
  r2 = NULL,
  read_structure = list(bs1 = -1, bl1 = 0, bs2 = 6, bl2 = 8, us = 0, ul = 6),
  filter_settings = list(rmlow = TRUE, rmN = TRUE, minq = 20, numbq = 2)
)

Arguments

`outfq`	the output fastq file, which reformat the barcode and UMI into the read name. Files ending in `.gz` will be automatically compressed.
`r1`	read one for pair-end reads. This read should contain the transcript.
`r2`	read two for pair-end reads, NULL if single read. (default: NULL)
`read_structure`	a list containing the read structure configuration: bs1: starting position of barcode in read one. -1 if no barcode in read one. bl1: length of barcode in read one, if there is no barcode in read one this number is used for trimming beginning of read one. bs2: starting position of barcode in read two bl2: length of barcode in read two us: starting position of UMI ul: length of UMI
`filter_settings`	A list contains read filter settings: rmlow whether to remove the low quality reads. rmN whether to remove reads that contains N in UMI or cell barcode. minq the minimum base pair quality that we allowed numbq the maximum number of base pair that have quality below `numbq`

Details

Positions used in this function are 0-indexed, so they start from 0 rather than 1. The default read structure in this function represents CEL-seq paired-ended reads. This contains a transcript in the first read, a UMI in the first 6bp of the second read followed by a 8bp barcode. So the read structure will be : list(bs1=-1, bl1=0, bs2=6, bl2=8, us=0, ul=6). bs1=-1, bl1=0 indicates negative start position and zero length for the barcode on read one, this is used to denote "no barcode" on read one. bs2=6, bl2=8 indicates there is a barcode in read two that starts at the 7th base with length 8bp. us=0, ul=6 indicates a UMI from first base of read two and the length in 6bp.

For a typical Drop-seq experiment the read structure will be list(bs1=-1, bl1=0, bs2=0, bl2=12, us=12, ul=8), which means the read one only contains transcript, the first 12bp in read two are cell barcode, followed by a 8bp UMI.

Value

generates a trimmed fastq file named outfq

Examples

data_dir="celseq2_demo"
## Not run: 
# for the complete workflow, refer to the vignettes
...
sc_trim_barcode(file.path(data_dir, "combined.fastq"),
   file.path(data_dir, "simu_R1.fastq"),
   file.path(data_dir, "simu_R2.fastq"))
...

## End(Not run)

LuyiTian/scPipe documentation built on Dec. 11, 2023, 8:21 p.m.