preciseTAD: Precise TAD boundary prediction at base-level resolution...
In preciseTAD: preciseTAD: A machine learning framework for precise TAD boundary prediction

Description Usage Arguments Value Examples

Precise TAD boundary prediction at base-level resolution using density-based spatial clustering and partitioning techniques

preciseTAD(
  genomicElements.GR,
  featureType = "distance",
  CHR,
  chromCoords = NULL,
  tadModel,
  threshold = 1,
  verbose = TRUE,
  parallel = NULL,
  DBSCAN_params,
  flank
)

`genomicElements.GR`	`GRangesList` object containing GRanges from each ChIP-seq BED file that was used to train a predictive model (can be obtained using the `bedToGRangesList`). Required.
`featureType`	Controls how the feature space is constructed (one of either "binary", "oc", "op", "signal, or "distance" (log2- transformed). Default is "distance".
`CHR`	Controls which chromosome to predict boundaries on at base-level resolution. Required.
`chromCoords`	List containing the starting bp coordinate and ending bp coordinate that defines the region of the linear genome to make predictions on. If chromCoords is not specified, then predictions will be made on the entire chromosome. Default is NULL.
`tadModel`	Model object used to obtain predicted probabilities at base-level resolution (examples include `gbm`, `glmnet`, `svm`, `glm`, etc). For a random forest model, can be obtained using `preciseTAD::randomForest`). Required.
`threshold`	Bases with predicted probabilities that are greater than or equal to this value are labeled as potential TAD boundaries. Values in the range of .95-1.0 are suggested. Default is 1.
`verbose`	Option to print progress. Default is TRUE.
`parallel`	Option to parallelise the process for obtaining predicted probabilities. Must be number to indicate the number of cores to use in parallel. Default is NULL.
`DBSCAN_params`	Parameters passed to `dbscan` in list form containing 1) eps and 2) MinPts. Required.
`flank`	Controls how much to flank the predicted TAD boundaries for calculating normalized enrichment. Normalized enrichment is calculated as the total number of peak regions that overlap with flanked predicted boundaries divided by the number of predicted boundaries. Recommended value is resolution. Required.

A list containing 3 elements including: 1) the genomic coordinates spanning each preciseTAD predicted region (PTBR), 2) the genomic coordinates of preciseTAD predicted boundaries points (PTBP). 3) a named list including summary statistics of the following: PTBRWidth - PTBR width, PTBRCoverage - the proportion of bases within a PTBR with probabilities that equal to or exceed the threshold (t=1 by default), DistanceBetweenPTBR - the genomic distance between the end of the previous PTBR and the start of the subsequent PTBR, NumSubRegions - the number of the subregions in each PTBR cluster, SubRegionWidth - the width of the subregion forming each PTBR, DistBetweenSubRegions - the genomic distance between the end of the previous PTBR-specific subregion and the start of the subsequent PTBR-specific subregion, and the normalized enrichment of the genomic annotations used in the model around flanked PTBPs.

# Read in ARROWHEAD-called TADs at 5kb
data(arrowhead_gm12878_5kb)

# Extract unique boundaries
bounds.GR <- extractBoundaries(domains.mat = arrowhead_gm12878_5kb,
                               preprocess = FALSE,
                               CHR = c("CHR21", "CHR22"),
                               resolution = 5000)

# Read in GRangesList of 26 TFBS and filter to include only CTCF, RAD21,
#SMC3, and ZNF143
data(tfbsList)

tfbsList_filt <- tfbsList[which(names(tfbsList) %in%
                                                 c("Gm12878-Ctcf-Broad",
                                                   "Gm12878-Rad21-Haib",
                                                   "Gm12878-Smc3-Sydh",
                                                   "Gm12878-Znf143-Sydh"))]

# Create the binned data matrix for CHR1 (training) and CHR22 (testing)
# using 5 kb binning, distance-type predictors from 4 TFBS from
# the GM12878 cell line, and random under-sampling
set.seed(123)
tadData <- createTADdata(bounds.GR = bounds.GR,
                         resolution = 5000,
                         genomicElements.GR = tfbsList_filt,
                         featureType = "distance",
                         resampling = "rus",
                         trainCHR = "CHR21",
                         predictCHR = "CHR22")

# Perform random forest using TADrandomForest by tuning mtry over 10 values
# using 3-fold CV
set.seed(123)
tadModel <- TADrandomForest(trainData = tadData[[1]],
                            testData = tadData[[2]],
                            tuneParams = list(mtry = 2,
                                            ntree = 500,
                                            nodesize = 1),
                            cvFolds = 3,
                            cvMetric = "Accuracy",
                            verbose = TRUE,
                            model = TRUE,
                            importances = TRUE,
                            impMeasure = "MDA",
                            performances = TRUE)

# Apply preciseTAD on a specific 2mb section of CHR22:17000000-19000000
set.seed(123)
pt <- preciseTAD(genomicElements.GR = tfbsList_filt,
                 featureType = "distance",
                 CHR = "CHR22",
                 chromCoords = list(17000000, 19000000),
                 tadModel = tadModel[[1]],
                 threshold = 1.0,
                 verbose = TRUE,
                 parallel = NULL,
                 DBSCAN_params = list(10000, 3),
                 flank = 5000)

preciseTAD documentation built on Nov. 8, 2020, 6:51 p.m.

preciseTAD index

README.md preciseTAD Vignette

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

preciseTAD
preciseTAD: A machine learning framework for precise TAD boundary prediction

preciseTAD: Precise TAD boundary prediction at base-level resolution...
In preciseTAD: preciseTAD: A machine learning framework for precise TAD boundary prediction

Description

Usage

Arguments

Value

Examples

Related to preciseTAD in preciseTAD...

R Package Documentation

Browse R Packages

We want your feedback!

preciseTAD preciseTAD: A machine learning framework for precise TAD boundary prediction

preciseTAD: Precise TAD boundary prediction at base-level resolution... In preciseTAD: preciseTAD: A machine learning framework for precise TAD boundary prediction

Description

Usage

Arguments

Value

Examples

Related to preciseTAD in preciseTAD...

R Package Documentation

Browse R Packages

We want your feedback!

preciseTAD
preciseTAD: A machine learning framework for precise TAD boundary prediction

preciseTAD: Precise TAD boundary prediction at base-level resolution...
In preciseTAD: preciseTAD: A machine learning framework for precise TAD boundary prediction