reads_from_bam: Get reads from a bam file
In Close-your-eyes/scexpr: Lots of helpers and wrappers to analyze scRNAseq data sets

reads_from_bam

R Documentation

Get reads from a bam file

Description

This is basically a wrapper around Rsamtools::ScanBamParam and Rsamtools::scanBam. The output from scanBam is processed to a data frame and additional columns are attached. Providing an exact range has been found to not always work as expected. E.g. there were reads in chr6 outside the exonic regions of HLA-A that could be mapped to HLA-A. This may be an individual problem of the underlying BAM file (mapping). In order to not miss any relevant reads, one may pass a wider genomic range for reads to return (e.g. whole chr6 if HLA loci are of interest, see example).

Usage

reads_from_bam(
  file_path,
  genomic_ranges,
  add_tags = c("CR", "CB", "CY", "AS", "UR", "UY", "HI", "NH", "nM", "RE"),
  add_flags = NULL,
  read_scores = T,
  revcomp_minus_strand = T,
  lapply_fun = lapply,
  ...
)

Arguments

`file_path`	path to a position-sorted bam file; index file (.bai) has to be in the same directory
`genomic_ranges`	GRanges object with n genomic ranges
`add_tags`	tags to extract from bam file, passed to Rsamtools::ScanBamParam(); character(0) for nothing; missing tags do not seem to matter
`add_flags`	a named list of columns to add; each element must have n elements (length of genomic_ranges)
`read_scores`	calculate read scores from PhredQuality
`revcomp_minus_strand`	calculate reverse complement of reads on minus strand; passed as reverseComplement to Rsamtools::ScanBamParam
`lapply_fun`	lapply function name without quotes; lapply, pbapply::pblapply or parallel::mclapply are suggested
`...`	additional argument to the lapply function; mainly mc.cores when parallel::mclapply is chosen

Details

Read scores: https://support.illumina.com/help/BaseSpace_OLH_009008/Content/Source/Informatics/BS/QualityScoreEncoding_swBS.htm CellRanger tags: Cell barcode (CR), error-corrected Cell barcode (CB), Cell barcode read quality (CY), Alignment score (AS), UMI (UR), UMI read quality (UY), Query hit index (HI), Number of reported alignments for query (NH), Number of mismatches per pair (nM), Region type (E = exonic, N = intronic, I = intergenic) (RE)

Value

a data frame of reads

Examples

## Not run: 
The seqnames of different BAM file may require different formats:
GenomicRanges::GRanges(seqnames = "chr1", strand = "+", ranges = IRanges::IRanges(start = start, end = end)) or
GenomicRanges::GRanges(seqnames = "1", strand = "+", ranges = IRanges::IRanges(start = start, end = end))

# genomic range over part of chromosome 6 (or whole)
chr6 <- GenomicRanges::GRanges(seqnames = "6", strand = "+",
ranges = IRanges::IRanges(start = 29000000, end = 35000000))
# ranges = IRanges::IRanges(start = 1, end = 536870912))

# alternatively multiple regions of HLA-A exons (in hg19)
# these may have to be obtained from the BAM file, e.g. IGV browser; or the reference genome
hlaa <- GenomicRanges::GRanges(
  seqnames = "6", strand = "+",
  ranges = IRanges::IRanges(
    start = c(29910247, 29910534, 29911045, 29911899, 29912277, 29912836, 29913011, 29913228),
    end = c(29910403, 29910803, 29911320, 29912174, 29912393, 29912868, 29913058, 29913661)
  )
)

reads <- scexpr::reads_from_bam(
  file_path = "my_bam_path",
  genomic_ranges = chr6,
  lapply_fun = parallel::mclapply, mc.cores = parallel::detectCores()
)

# passing multiple regions may return reads twice or multiple times
# if these reads overlap two or more of the regions (see ?scanBam and ?ScanBamParam)
reads <- scexpr::reads_from_bam(
  file_path = "my_bam_path",
  genomic_ranges = hlaa,
  lapply_fun = parallel::mclapply, mc.cores = parallel::detectCores(),
  add_flags = list(exon = c(1:8), gene = rep("A", 8))
)

# filter and process reads
reads <- reads[which(reads$minQual >= 27), ]
reads <- reads[which(reads$n_belowQ30 <= 3), ]
# filter duplicate reads
# if additional flags like exons have been passed
# these columns will prevent dplyr::distinct from filtering
reads <- dplyr::distinct(reads, start, seq, .keep_all = T)
# only reads with standard nucleotides
reads <- reads[which(!grepl("[^ACTGU]", reads[,"seq",drop=T])),]
# readNames were found to be not unique in any case
# (same name for reads with different start and different seq)
reads$readName <- make.unique(reads$readName)

## End(Not run)

Close-your-eyes/scexpr documentation built on April 21, 2023, 10:27 a.m.

Close-your-eyes/scexpr index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Close-your-eyes/scexpr
Lots of helpers and wrappers to analyze scRNAseq data sets

reads_from_bam: Get reads from a bam file
In Close-your-eyes/scexpr: Lots of helpers and wrappers to analyze scRNAseq data sets

Get reads from a bam file

Description

Usage

Arguments

Details

Value

Examples

Related to reads_from_bam in Close-your-eyes/scexpr...

R Package Documentation

Browse R Packages

We want your feedback!

Close-your-eyes/scexpr Lots of helpers and wrappers to analyze scRNAseq data sets

reads_from_bam: Get reads from a bam file In Close-your-eyes/scexpr: Lots of helpers and wrappers to analyze scRNAseq data sets

Get reads from a bam file

Description

Usage

Arguments

Details

Value

Examples

Related to reads_from_bam in Close-your-eyes/scexpr...

R Package Documentation

Browse R Packages

We want your feedback!

Close-your-eyes/scexpr
Lots of helpers and wrappers to analyze scRNAseq data sets

reads_from_bam: Get reads from a bam file
In Close-your-eyes/scexpr: Lots of helpers and wrappers to analyze scRNAseq data sets