sc_trim_barcode: sc_trim_barcode

View source: R/wrapper_scPipeCPP.R

sc_trim_barcodeR Documentation

sc_trim_barcode

Description

Reformat fastq files so barcode and UMI sequences are moved from the sequence into the read name.

Usage

sc_trim_barcode(
  outfq,
  r1,
  r2 = NULL,
  read_structure = list(bs1 = -1, bl1 = 0, bs2 = 6, bl2 = 8, us = 0, ul = 6),
  filter_settings = list(rmlow = TRUE, rmN = TRUE, minq = 20, numbq = 2)
)

Arguments

outfq

the output fastq file, which reformat the barcode and UMI into the read name. Files ending in .gz will be automatically compressed.

r1

read one for pair-end reads. This read should contain the transcript.

r2

read two for pair-end reads, NULL if single read. (default: NULL)

read_structure

a list containing the read structure configuration:

  • bs1: starting position of barcode in read one. -1 if no barcode in read one.

  • bl1: length of barcode in read one, if there is no barcode in read one this number is used for trimming beginning of read one.

  • bs2: starting position of barcode in read two

  • bl2: length of barcode in read two

  • us: starting position of UMI

  • ul: length of UMI

filter_settings

A list contains read filter settings:

  • rmlow whether to remove the low quality reads.

  • rmN whether to remove reads that contains N in UMI or cell barcode.

  • minq the minimum base pair quality that we allowed

  • numbq the maximum number of base pair that have quality below numbq

Details

Positions used in this function are 0-indexed, so they start from 0 rather than 1. The default read structure in this function represents CEL-seq paired-ended reads. This contains a transcript in the first read, a UMI in the first 6bp of the second read followed by a 8bp barcode. So the read structure will be : list(bs1=-1, bl1=0, bs2=6, bl2=8, us=0, ul=6). bs1=-1, bl1=0 indicates negative start position and zero length for the barcode on read one, this is used to denote "no barcode" on read one. bs2=6, bl2=8 indicates there is a barcode in read two that starts at the 7th base with length 8bp. us=0, ul=6 indicates a UMI from first base of read two and the length in 6bp.

For a typical Drop-seq experiment the read structure will be list(bs1=-1, bl1=0, bs2=0, bl2=12, us=12, ul=8), which means the read one only contains transcript, the first 12bp in read two are cell barcode, followed by a 8bp UMI.

Value

generates a trimmed fastq file named outfq

Examples

data_dir="celseq2_demo"
## Not run: 
# for the complete workflow, refer to the vignettes
...
sc_trim_barcode(file.path(data_dir, "combined.fastq"),
   file.path(data_dir, "simu_R1.fastq"),
   file.path(data_dir, "simu_R2.fastq"))
...

## End(Not run)

LuyiTian/scPipe documentation built on Dec. 11, 2023, 8:21 p.m.