blacklisthighmap: Blacklist High Mappability Regions in Genomic Data

View source: R/preprocessing-blacklisthighmap.R

blacklisthighmapR Documentation

Blacklist High Mappability Regions in Genomic Data

Description

This function processes genomic data to remove scores that fall within blacklisted regions or have low mappability, and computes weighted means for overlapping windows. The process ensures the integrity of genomic scores by focusing on high mappability regions and excluding blacklisted intervals.

Usage

blacklisthighmap(maptrackpath, blacklistpath, exptabpath,
   nbcputrans, allwindowsbed, windsize, genomename, saveobjectpath = NA,
   tmpfold = file.path(tempdir(), "tmptepr"), reload = FALSE, showtime = FALSE,
   showmemory = FALSE, chromtab = NA, forcechrom = FALSE, verbose = TRUE)

Arguments

maptrackpath

Character string. Path to the mappability track file.

blacklistpath

Character string. Path to the blacklist regions file.

exptabpath

Path to the experiment table file containing a table with columns named 'condition', 'replicate', 'strand', and 'path'.

nbcputrans

Number of CPU cores to use for transcript-level operations.

allwindowsbed

Data frame. BED-formatted data frame obtained with the function 'makewindows'.

windsize

An integer specifying the size of the genomic windows.

genomename

Character string. A valid UCSC genome name. It is used to retrieve chromosome metadata, such as names and lengths.

saveobjectpath

Path to save intermediate R objects. Default is 'NA' and R objects are not saved.

tmpfold

A character string specifying the temporary folder for saving output files. The temporary files contain the scores for each bedgraph on each chromosome. Default is file.path(tempdir(), "tmptepr").

reload

Logical. If 'TRUE', reloads existing saved objects to avoid recomputation. Default is 'FALSE'. If the function failed during object saving, make sure to delete the corresponding object.

showtime

A logical value indicating whether to display processing time.

showmemory

A logical value indicating whether to display memory usage during processing.

chromtab

A Seqinfo object retrieved with the rtracklayer method SeqinfoForUCSCGenome. If NA, the method is called automatically. Default is NA.

forcechrom

Logical indicating if the presence of non-canonical chromosomes in chromtab (if not NA) should trigger an error. Default is FALSE.

verbose

A logical value indicating whether to display detailed processing messages.

Details

The 'blacklisthighmap' function iterates through chromosomes, processes genomic scores by removing those overlapping with blacklisted regions, and ensures that scores within windows are computed using a weighted mean when overlaps occur. The function uses parallel processing for efficiency and supports saving (saveobjectpath) and reloading (reload) intermediate results to optimize workflow.

The main steps include: - Reading and processing bedGraph values. - Removing scores overlapping with blacklisted or low mappability regions. - Computing weighted means for overlapping scores in genomic windows. - Saving the processed results to specified path (tmpfold).

If chromtab is left to NA, the chromosome information is automatically retrieved from the UCSC server using 'genomename'. Otherwise, the Seqinfo object can be retrieved with: chromtab <- rtracklayer::SeqinfoForUCSCGenome(genomename)

Value

This function does not return a value directly. It saves intermediate results to 'tmpfold'. These intermediates files are then combined by the function 'createtablescores'.

See Also

[createtablescores][makewindows]

Examples


exptabpath <- system.file("extdata", "exptab-preprocessing.csv", package="tepr")
gencodepath <- system.file("extdata", "gencode-chr13.gtf", package = "tepr")
maptrackpath <- system.file("extdata", "k50.umap.chr13.hg38.0.8.bed",
    package = "tepr")
blacklistpath <- system.file("extdata", "hg38-blacklist-chr13.v2.bed",
    package = "tepr")
windsize <- 200
genomename <- "hg38"
chromtabtest <- rtracklayer::SeqinfoForUCSCGenome(genomename)
allchromvec <- GenomeInfoDb::seqnames(chromtabtest)
chromtabtest <- chromtabtest[allchromvec[which(allchromvec == "chr13")], ]

## Copying bedgraphs to the current directory
expdfpre <- read.csv(exptabpath)
bgpathvec <- sapply(expdfpre$path, function(x) system.file("extdata", x,
    package = "tepr"))
expdfpre$path <- bgpathvec
write.csv(expdfpre, file = "exptab-preprocessing.csv", row.names = FALSE,
    quote = FALSE)
exptabpath <- "exptab-preprocessing.csv"

## Necessary result to call blacklisthighmap
allannobed <- retrieveanno(exptabpath, gencodepath, verbose = FALSE)
allwindowsbed <- makewindows(allannobed, windsize, verbose = FALSE)

## Test blacklisthighmap
blacklisthighmap(maptrackpath, blacklistpath, exptabpath,
    nbcputrans = 1, allwindowsbed, windsize, genomename,
    chromtab = chromtabtest, verbose = FALSE)


tepr documentation built on June 8, 2025, 10:46 a.m.