View source: R/preprocessing-blacklisthighmap.R
blacklisthighmap | R Documentation |
This function processes genomic data to remove scores that fall within blacklisted regions or have low mappability, and computes weighted means for overlapping windows. The process ensures the integrity of genomic scores by focusing on high mappability regions and excluding blacklisted intervals.
blacklisthighmap(maptrackpath, blacklistpath, exptabpath,
nbcputrans, allwindowsbed, windsize, genomename, saveobjectpath = NA,
tmpfold = file.path(tempdir(), "tmptepr"), reload = FALSE, showtime = FALSE,
showmemory = FALSE, chromtab = NA, forcechrom = FALSE, verbose = TRUE)
maptrackpath |
Character string. Path to the mappability track file. |
blacklistpath |
Character string. Path to the blacklist regions file. |
exptabpath |
Path to the experiment table file containing a table with columns named 'condition', 'replicate', 'strand', and 'path'. |
nbcputrans |
Number of CPU cores to use for transcript-level operations. |
allwindowsbed |
Data frame. BED-formatted data frame obtained with the function 'makewindows'. |
windsize |
An integer specifying the size of the genomic windows. |
genomename |
Character string. A valid UCSC genome name. It is used to retrieve chromosome metadata, such as names and lengths. |
saveobjectpath |
Path to save intermediate R objects. Default is 'NA' and R objects are not saved. |
tmpfold |
A character string specifying the temporary folder for saving
output files. The temporary files contain the scores for each bedgraph on
each chromosome. Default is |
reload |
Logical. If 'TRUE', reloads existing saved objects to avoid recomputation. Default is 'FALSE'. If the function failed during object saving, make sure to delete the corresponding object. |
showtime |
A logical value indicating whether to display processing time. |
showmemory |
A logical value indicating whether to display memory usage during processing. |
chromtab |
A Seqinfo object retrieved with the rtracklayer method SeqinfoForUCSCGenome. If NA, the method is called automatically. Default is NA. |
forcechrom |
Logical indicating if the presence of non-canonical
chromosomes in chromtab (if not NA) should trigger an error. Default is
|
verbose |
A logical value indicating whether to display detailed processing messages. |
The 'blacklisthighmap' function iterates through chromosomes, processes genomic scores by removing those overlapping with blacklisted regions, and ensures that scores within windows are computed using a weighted mean when overlaps occur. The function uses parallel processing for efficiency and supports saving (saveobjectpath) and reloading (reload) intermediate results to optimize workflow.
The main steps include: - Reading and processing bedGraph values. - Removing scores overlapping with blacklisted or low mappability regions. - Computing weighted means for overlapping scores in genomic windows. - Saving the processed results to specified path (tmpfold).
If chromtab is left to NA, the chromosome information is automatically retrieved from the UCSC server using 'genomename'. Otherwise, the Seqinfo object can be retrieved with: chromtab <- rtracklayer::SeqinfoForUCSCGenome(genomename)
This function does not return a value directly. It saves intermediate results to 'tmpfold'. These intermediates files are then combined by the function 'createtablescores'.
[createtablescores][makewindows]
exptabpath <- system.file("extdata", "exptab-preprocessing.csv", package="tepr")
gencodepath <- system.file("extdata", "gencode-chr13.gtf", package = "tepr")
maptrackpath <- system.file("extdata", "k50.umap.chr13.hg38.0.8.bed",
package = "tepr")
blacklistpath <- system.file("extdata", "hg38-blacklist-chr13.v2.bed",
package = "tepr")
windsize <- 200
genomename <- "hg38"
chromtabtest <- rtracklayer::SeqinfoForUCSCGenome(genomename)
allchromvec <- GenomeInfoDb::seqnames(chromtabtest)
chromtabtest <- chromtabtest[allchromvec[which(allchromvec == "chr13")], ]
## Copying bedgraphs to the current directory
expdfpre <- read.csv(exptabpath)
bgpathvec <- sapply(expdfpre$path, function(x) system.file("extdata", x,
package = "tepr"))
expdfpre$path <- bgpathvec
write.csv(expdfpre, file = "exptab-preprocessing.csv", row.names = FALSE,
quote = FALSE)
exptabpath <- "exptab-preprocessing.csv"
## Necessary result to call blacklisthighmap
allannobed <- retrieveanno(exptabpath, gencodepath, verbose = FALSE)
allwindowsbed <- makewindows(allannobed, windsize, verbose = FALSE)
## Test blacklisthighmap
blacklisthighmap(maptrackpath, blacklistpath, exptabpath,
nbcputrans = 1, allwindowsbed, windsize, genomename,
chromtab = chromtabtest, verbose = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.