sc_atac_trim_barcode: demultiplex raw single-cell ATAC-Seq fastq reads

View source: R/sc_atac_trim_barcode.R

sc_atac_trim_barcodeR Documentation

demultiplex raw single-cell ATAC-Seq fastq reads

Description

single-cell data need to be demultiplexed in order to retain the information of the cell barcodes the data belong to. Here we reformat fastq files so barcode/s (and if available the UMI sequences) are moved from the sequence into the read name. Since scATAC-Seq data are mostly paired-end, both 'r1' and 'r2' are demultiplexed in this function.

Usage

sc_atac_trim_barcode(
  r1,
  r2,
  bc_file = NULL,
  valid_barcode_file = "",
  output_folder = "",
  umi_start = 0,
  umi_length = 0,
  umi_in = "both",
  rmN = FALSE,
  rmlow = FALSE,
  min_qual = 20,
  num_below_min = 2,
  id1_st = -0,
  id1_len = 16,
  id2_st = 0,
  id2_len = 16,
  no_reverse_complement = FALSE
)

Arguments

r1

read one for pair-end reads.

r2

read two for pair-end reads, NULL if single read.

bc_file

the barcode information, can be either in a fastq format (e.g. from 10x-ATAC) or from a .csv file (here the barcode is expected to be on the second column). Currently, for the fastq approach, this can be a list of barcode files.

valid_barcode_file

optional file path of the valid (expected) barcode sequences to be found in the bc_file (.txt, can be txt.gz). Must contain one barcode per line on the second column separated by a comma (default =""). If given, each barcode from bc_file is matched against the barcode of best fit (allowing a hamming distance of 1). If a FASTQ bc_file is provided, barcodes with a higher mapping quality, as given by the fastq reads quality score are prioritised.

output_folder

the output dir for the demultiplexed fastq file, which will contain fastq files with reformatted barcode and UMI into the read name. Files ending in .gz will be automatically compressed.

umi_start

if available, the start position of the molecular identifier.

umi_length

if available, the start position of the molecular identifier.

umi_in

umi_in

rmN

logical, whether to remove reads that contains N in UMI or cell barcode.

rmlow

logical, whether to remove reads that have low quality barcode sequences

min_qual

the minimum base pair quality that is allowed (default = 20).

num_below_min

the maximum number of base pairs below the quality threshold.

id1_st

barcode start position (0-indexed) for read 1, which is an extra parameter that is needed if the bc_file is in a .csv format.

id1_len

barcode length for read 1, which is an extra parameter that is needed if the bc_file is in a .csv format.

id2_st

barcode start position (0-indexed) for read 2, which is an extra parameter that is needed if the bc_file is in a .csv format.

id2_len

barcode length for read 2, which is an extra parameter that is needed if the bc_file is in a .csv format.

no_reverse_complement

specifies if the reverse complement of the barcode sequence should be used for barcode error correction (only when barcode sequences are provided as fastq files). FALSE (default) lets the function decide whether to use reverse complement, and TRUE forces the function to use the forward barcode sequences.

Value

None (invisible 'NULL')

Examples

data.folder <- system.file("extdata", package = "scPipe", mustWork = TRUE)
r1      <- file.path(data.folder, "small_chr21_R1.fastq.gz") 
r2      <- file.path(data.folder, "small_chr21_R3.fastq.gz") 

# Using a barcode fastq file:

# barcodes in fastq format
barcode_fastq      <- file.path(data.folder, "small_chr21_R2.fastq.gz") 

sc_atac_trim_barcode (
r1            = r1, 
r2            = r2, 
bc_file       = barcode_fastq,
rmN           = TRUE,
rmlow         = TRUE,
output_folder = tempdir())

# Using a barcode csv file:

# barcodes in .csv format
barcode_1000       <- file.path(data.folder, "chr21_modified_barcode_1000.csv")

## Not run: 
sc_atac_trim_barcode (
r1            = r1, 
r2            = r2, 
bc_file       = barcode_1000, 
id1_st        = 0,
rmN           = TRUE,
rmlow         = TRUE,
output_folder = tempdir())

## End(Not run)

LuyiTian/scPipe documentation built on Dec. 11, 2023, 8:21 p.m.