remp | R Documentation |
remp
is used to predict genomewide methylation levels of locus-specific repetitive elements (RE).
Two major RE types in human, Alu element (Alu) and LINE-1 (L1) are available.
remp( methyDat = NULL, REtype = c("Alu", "L1", "ERV"), Seq.GR = NULL, parcel = NULL, work.dir = tempdir(), win = 1000, method = c("rf", "xgbTree", "svmLinear", "svmRadial", "naive"), autoTune = TRUE, param = NULL, seed = NULL, ncore = NULL, BPPARAM = NULL, verbose = FALSE )
methyDat |
A |
REtype |
Type of RE. Currently |
Seq.GR |
A |
parcel |
An |
work.dir |
Path to the directory where the annotation data generated by |
win |
An integer specifying window size to confine the upstream and downstream flanking
region centered on the predicted CpG in RE for prediction. Default = |
method |
Name of model/approach for prediction. Currently |
autoTune |
Logical parameter. If |
param |
A list specifying fixed model tuning parameter(s) (not applicable for Random Forest, see Details).
For Extreme Gradient Boosting, |
seed |
Random seed for Random Forest model for reproducible prediction results.
Default is |
ncore |
Number of cores used for parallel computing. By default, max number of cores available
in the machine will be utilized. If |
BPPARAM |
An optional |
verbose |
Logical parameter. Should the function be verbose? |
Before running remp
, user should make sure the methylation data have gone through
proper quality control, background correction, and normalization procedures. Both beta value
and M value are allowed. Rows represents probes and columns represents samples. For array data,
please make sure to have row names that specify the Illumina probe ID (i.e. cg00000029). For sequencing
data, please provide the genomic location of CpGs in a GRanges
obejct and
specify it using Seq.GR
parameter. win = 1000
is based on previous findings showing that
neighboring CpGs are more likely to be co-modified within 1000 bp. User can specify narrower window size
for slight improvement of prediction accuracy at the cost of less predicted RE. Window size greater than 1000 is not
recommended as the machine learning models would not be able to learn much userful information
for prediction but introduce noise. Random Forest model (method = "rf"
) is recommented
as it offers more accurate prediction and it also enables prediction reliability functionality.
Prediction reliability is estimated by conditional standard deviation using Quantile Regression Forest.
Please note that if parallel computing is allowed, parallel Random Forest
(powered by package ranger
) will be used automatically. The performance of
Random Forest model is often relatively insensitive to the choice of mtry
.
Therefore, auto-tune will be turned off using Random Forest and mtry
will be set to one third
of the total number of predictors. For SVM, if autoTune = TRUE
, preset tuning parameter
search grid can be access and modified using remp_options
.
A REMProduct
object containing predicted RE methylation results.
See initREMP
to prepare necessary annotation database before running remp
.
# Obtain example Illumina example data (450k) if (!exists("GM12878_450k")) GM12878_450k <- getGM12878("450k") # Make sure you have run 'initREMP' first. See ?initREMP. if (!exists("remparcel")) { data(Alu.hg19.demo) remparcel <- initREMP(arrayType = "450k", REtype = "Alu", annotation.source = "AH", genome = "hg19", RE = Alu.hg19.demo, ncore = 1, verbose = TRUE) } # With data template pre-built. See ?rempTemplate. if (!exists("template")) template <- rempTemplate(GM12878_450k, parcel = remparcel, win = 1000, verbose = TRUE) # Run remp with pre-built template: remp.res <- remp(template, ncore = 1) # Or run remp without pre-built template (identical results): ## Not run: remp.res <- remp(GM12878_450k, REtype = "Alu", parcel = remparcel, ncore = 1, verbose = TRUE) ## End(Not run) remp.res details(remp.res) rempB(remp.res) # Methylation data (beta value) # Extract CpG location information. # This accessor is inherit from class 'RangedSummarizedExperiment') rowRanges(remp.res) # RE annotation information rempAnnot(remp.res) # Add gene annotation remp.res <- decodeAnnot(remp.res, type = "symbol") rempAnnot(remp.res) # (Recommended) Trim off less reliable prediction remp.res <- rempTrim(remp.res) # Obtain RE-level methylation (aggregate by mean) remp.res <- rempAggregate(remp.res) rempB(remp.res) # Methylation data (beta value) # Extract RE location information rowRanges(remp.res) # Density plot across predicted RE remplot(remp.res)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.