SplitBam: Utility to split a bam file into multiple bam files based on...

View source: R/split_bams.R

SplitBamR Documentation

Utility to split a bam file into multiple bam files based on the barcode

Description

Given a bam file that was processed by CellRanger, splitBam splits the bam into multiple bam files, one per cell barcode. Bam file needs to have the barcode stored in the "CB" field.

Usage

SplitBam(
  bam,
  cellbc.df,
  outdir = NULL,
  yieldSize = 1e+06,
  gtf_gr = NULL,
  geneSymbol = NULL,
  gi_ext = 50,
  rle_output = FALSE,
  exportFastqHeader = FALSE,
  genomicRegion = NULL,
  bamTags = c("CB", "UB"),
  what = c("qname", "flag", "rname", "strand", "pos")
)

Arguments

bam

CellRanger outputted bam file with the CB field

cellbc.df

data frame of the cell barcode, needs to have the column names: "celltype" and "cellbc"

outdir

directory to output the bam files. The bam files will be called [celltype].bam. If NULL no BAM file created.

yieldSize

number of lines of bam files to load. Default: 1000000

gtf_gr

gene model genomic ranges. Only used if geneSymbol is defined.

geneSymbol

Gene symbol. Used to identify the genomic coordinates to extract reads from.

gi_ext

The number of nucleotides to extend the genomic interval in extracting reads from (default 50).

rle_output

If TRUE will generate and return rle_list object

exportFastqHeader

If TRUE will generate a txt output file that has same prefix as bam file containing fastq header IDs

genomicRegion

Granges object of genomic region to extract. Only used if geneSymbol not defined.

bamTags

BAM field tag identifiers to extract. Default is c("CB", "UB").

what

What BAM fields to copy into new file. Default is c('qname', 'flag', 'rname', 'strand', 'pos')

Value

a rleList of coverage for each cell type

Examples

library('Sierra')

# Example 1 split the entire BAM file for each cell type
## Not run: 
extdata_path <- system.file("extdata",package = "scpolya")
load(paste(extdata_path,"TIP_vignette_gene_Seurat.RData",sep="/"))
cellbc.df <- data.frame(celltype=genes.seurat@active.ident, 
                        cellbc= names(genes.seurat@active.ident))
bamfile <- c(paste0(extdata_path,"/Vignette_example_TIP_sham.bam")

SplitBam(bam, cellbc.df)

## End(Not run)

# Example 2 extract reads that overlap a gene

extdata_path <- system.file("extdata",package = "Sierra")
gtf.file <- paste0(extdata_path,"/Vignette_cellranger_genes_subset.gtf")
gtf.gr <- rtracklayer::import(gtf.file)

load(paste(extdata_path,"TIP_vignette_gene_Seurat.RData",sep="/"))
cellbc.df <- data.frame(celltype=genes.seurat@active.ident, 
                       cellbc= names(genes.seurat@active.ident))
  
# Modify cellbc.df so that the barcodes match what is in the BAM file                     
cellbc.df$cellbc <- sub("(.*)-.*", "\\1", cellbc.df$cellbc)
cellbc.df$cellbc <- paste0(cellbc.df$cellbc, "-1")
                       
                       
bam.file <- paste0(extdata_path,"/Vignette_example_TIP_mi.bam")
outdir <-  tempdir()  # change this to a meaningful location
SplitBam(bam.file, cellbc.df, outdir=outdir, gtf_gr=gtf.gr, geneSymbol="Dnajc19")



VCCRI/Sierra documentation built on July 3, 2023, 6:39 a.m.