initREMP: RE Annotation Database Initialization

View source: R/initREMP.R

initREMPR Documentation

RE Annotation Database Initialization

Description

initREMP is used to initialize annotation database for RE methylation prediction. Three RE types in human, Alu element (Alu), LINE-1 (L1), and endogenous retrovirus (ERV) are available.

Usage

initREMP(
  arrayType = c("450k", "EPIC", "Sequencing"),
  REtype = c("Alu", "L1", "ERV"),
  annotation.source = c("AH", "UCSC"),
  genome = c("hg19", "hg38"),
  RE = NULL,
  Seq.GR = NULL,
  ncore = NULL,
  BPPARAM = NULL,
  export = FALSE,
  work.dir = tempdir(),
  verbose = FALSE
)

Arguments

arrayType

Illumina methylation array type. Currently "450k", "EPIC", and "Sequencing" are supported. Default = "450k".

REtype

Type of RE. Currently "Alu", "L1", and "ERV" are supported.

annotation.source

Character parameter. Specify the source of annotation databases, including the RefSeq Gene annotation database and RepeatMasker annotation database. If "AH", the database will be obtained from the AnnotationHub package. If "UCSC", the database will be downloaded from the UCSC website http://hgdownload.cse.ucsc.edu/goldenpath. The corresponding build ("hg19" or "hg38") can be specified in the parameter genome.

genome

Character parameter. Specify the build of human genome. Can be either "hg19" or "hg38". Note that if annotation.source == "AH", only hg19 database is available.

RE

A GRanges object containing user-specified RE genomic location information. If NULL, the function will retrive RepeatMasker RE database from AnnotationHub (build hg19) or download the database from UCSC website (build hg19/hg38).

Seq.GR

A GRanges object containing genomic locations of the CpGs profiled by sequencing platforms. This parameter should not be NULL if arrayType == 'Sequencing'. Note that the genomic location can be in either hg19 or hg38 build. See details.

ncore

Number of cores used for parallel computing. By default max number of cores available in the machine will be utilized. If ncore = 1, no parallel computing is allowed.

BPPARAM

An optional BiocParallelParam instance determining the parallel back-end to be used during evaluation. If not specified, default back-end in the machine will be used.

export

Logical. Should the returned REMParcel object be saved to local machine? See Details.

work.dir

Path to the directory where the generated data will be saved. Valid when export = TRUE. If not specified and export = TRUE, temporary directory tempdir() will be used.

verbose

Logical parameter. Should the function be verbose?

Details

Currently, we support two major types of RE in the human genome, Alu and L1. The main purpose of initREMP is to generate and annotate CpG/RE data using the refSeq Gene (hg19) annotation database (provided by AnnotationHub). These annotation data are crucial to RE methylation prediction in remp. Once generated, the data can be reused in the future (data can be very large). Therefore, we recommend the user to save the output from initREMP to the local machine, so that user only need to run this function once as long as there is no change to the RE database. To minimize the size of the resulting data file, the generated annotation data are only for REs that contain RE-CpGs with neighboring profiled CpGs. By default, the neighboring CpGs are confined within 1200 bp flanking window. This window size can be modified using remp_options. Note that the refSeq Gene database from UCSC is dynamic (updated periodically) and reflecting the latest knowledge of gene, whereas the database from AnnotationHub is static and classic. Using different sources will have a slight impact on the prediction results of RE methylation and gene annotation of final results. For sequencing methylation data, please specify the genomic location of CpGs in a GenomicRanges object and specify it in Seq.GR. For an example of Seq.GR, Please run minfi::getLocations(IlluminaHumanMethylation450kanno.ilmn12.hg19) (the row names of the CpGs in Seq.GR can be NULL). The user should make sure the genome build of Seq.GR match the build specified in genome parameter (default is "hg19").

Value

An REMParcel object containing data needed for RE methylation prediction.

See Also

See remp for RE methylation prediction.

Examples

if (!exists("remparcel")) {
  data(Alu.hg19.demo)
  remparcel <- initREMP(arrayType = "450k", 
                        REtype = "Alu", 
                        annotation.source = "AH",
                        genome = "hg19",
                        RE = Alu.hg19.demo, 
                        ncore = 1,
                        verbose = TRUE)
}


YinanZheng/REMP documentation built on May 14, 2022, 5:58 p.m.