bc_extract_sc_sam: Extract barcode from single-cell sequencing sam file

bc_extract_sc_samR Documentation

Extract barcode from single-cell sequencing sam file

Description

bc_extract_sc_sam can extract cellular barcode, UMI, and lineage barcode sequences from 10X Genomics scRNASeq sam file (or bam file have similar data structure). This function can not process bam file directly, users need to uncompress the bam file to get a sam file to run this function See example.

Usage

bc_extract_sc_sam(sam, pattern, cell_barcode_tag = "CR", umi_tag = "UR")

bc_extract_sc_bam(bam, pattern, cell_barcode_tag = "CR", umi_tag = "UR")

Arguments

sam

A string, define the un-mapped sequences

pattern

A string, define the regular expression to match the barcode sequence. The barcode sequence should be in the first catch. Please see the documents of bc_extract and example for more information.

cell_barcode_tag

A string, define the tag of cellular barcode field in sam file. The default is "CR".

umi_tag

A string, define the tag of a UMI field in the sam file.

bam

A string, define the bam file, it will be converted to sam file

Details

Although the function 'bc_extract_sc_bam' can process bam file directly, some optimization is still working on, it will be much more efficient to use 'samtools' to get the sam file.

What's more, if the barcode sequence does not map to the reference genome. The user should use the samtools to get the un-mapped reads and save it as sam format for using as the input. It can save a lot of time. The way to get the un-mapped reads:

samtools view -f 4 input.bam > output.sam 

Value

A BarcodeObj object with each cell as a sample.

See Also

bc_extract, bc_extract_sc_fastq

Examples

## NOT run
# In the case that when the barcode sequence is not mapped to 
# reference genome, it will be much more efficient to get 
# the un-mapped sequences as the input.

## Get un-mapped reads
# samtools view -f 4 input.bam > scRNASeq_10X.sam 

sam_file <- system.file("extdata", "scRNASeq_10X.sam", package = "CellBarcode")

bc_extract_sc_sam(
  sam = sam_file,
  pattern = "AGATCAG(.*)TGTGGTA",
  cell_barcode_tag = "CR",
  umi_tag = "UR"
)

## Read bam file directly
bam_file <- system.file("extdata", "scRNASeq_10X.bam", package = "CellBarcode")
bc_extract_sc_bam(
   bam = bam_file,
   pattern = "AGATCAG(.*)TGTGGTA",
   cell_barcode_tag = "CR",
   umi_tag = "UR"
)


wenjie1991/CellBarocde documentation built on Aug. 10, 2024, 11:03 a.m.