createTADdata: Function to create a data matrix used for building a...
In dozmorovlab/preciseTAD: preciseTAD: A machine learning framework for precise TAD boundary prediction

createTADdata

R Documentation

Function to create a data matrix used for building a predictive model to classify boundary regions from functional genomic elements

Description

Function to create a data matrix used for building a predictive model to classify boundary regions from functional genomic elements

Usage

createTADdata(
  bounds.GR,
  resolution,
  genomicElements.GR,
  featureType = "distance",
  resampling,
  trainCHR,
  predictCHR = NULL,
  genome = "hg19"
)

Arguments

`bounds.GR`	a GRanges object with chromosomal coordinates of TAD boundaries used to identify positive cases (can be obtained using `extractBoundaries`). Required.
`resolution`	Numeric, the width to bin the genome at, should match the resolution that TADs were called at. Required.
`genomicElements.GR`	a GRangesList object containing GRanges objects for each ChIP-seq data to leverage in the random forest model (can be obtained using the `bedToGRangesList`). Required.
`featureType`	Character, controls how the feature space is constructed (one of either "binary" (overlap yes/no), "oc" (overlap counts, the number of overlaps), "op" (overlap percent, the percent of bin width covered by the genomic annotation), or "distance" (log2-transformed distance from the center of the nearest genomic annotation to the center of the bin); default is "distance"). Required.
`resampling`	Character, controls if and how the data should be resampled to create balanced classes of boundary vs. nonboundary regions (one of either "none" - no re-sampling, "ros" - Random Over-Sampling, "rus" - Random Under-Sampling. Required.
`trainCHR`	Character vector of chromosomes to use to build the binned data matrix for training. Required.
`predictCHR`	Character vector of chromosomes to use to build the binned data matrix for testing. Default in NULL, indicating no test data is created. If trainCHR=predictCHR then a 7:3 split is created.
`genome`	version of the human genome assembly. Used to filter out bases overlapping centromeric regions. Accepted values - hg19 (default) or hg38.

Value

A list object containing two data.frames: 1) the training data, 2) the test data (only if predictCHR is not NULL, otherwise it is NA). "y" is an indicator whether the corresponding bin is a TAD boundary, and the subsequent columns have the association measures between bins and the genomic annotations

Examples

# Create training data for CHR21 and testing data for CHR22 with
# 5 kb binning, oc-type predictors from 26 different transcription factor
# binding sites from the GM12878 cell line, and random under-sampling

# Read in ARROWHEAD-called TADs at 5kb
data(arrowhead_gm12878_5kb)

#Extract unique boundaries
bounds.GR <- extractBoundaries(domains.mat = arrowhead_gm12878_5kb,
                               filter = FALSE,
                               CHR = c("CHR21", "CHR22"),
                               resolution = 5000)

# Read in GRangesList of 26 TFBS
data(tfbsList)

tadData <- createTADdata(bounds.GR = bounds.GR,
                         resolution = 5000,
                         genomicElements.GR = tfbsList,
                         featureType = "oc",
                         resampling = "rus",
                         trainCHR = "CHR21",
                         predictCHR = "CHR22")

dozmorovlab/preciseTAD documentation built on April 26, 2022, 9:42 a.m.

dozmorovlab/preciseTAD index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

dozmorovlab/preciseTAD
preciseTAD: A machine learning framework for precise TAD boundary prediction

createTADdata: Function to create a data matrix used for building a...
In dozmorovlab/preciseTAD: preciseTAD: A machine learning framework for precise TAD boundary prediction

Function to create a data matrix used for building a predictive model to classify boundary regions from functional genomic elements

Description

Usage

Arguments

Value

Examples

Related to createTADdata in dozmorovlab/preciseTAD...

R Package Documentation

Browse R Packages

We want your feedback!

dozmorovlab/preciseTAD preciseTAD: A machine learning framework for precise TAD boundary prediction

createTADdata: Function to create a data matrix used for building a... In dozmorovlab/preciseTAD: preciseTAD: A machine learning framework for precise TAD boundary prediction

Function to create a data matrix used for building a predictive model to classify boundary regions from functional genomic elements

Description

Usage

Arguments

Value

Examples

Related to createTADdata in dozmorovlab/preciseTAD...

R Package Documentation

Browse R Packages

We want your feedback!

dozmorovlab/preciseTAD
preciseTAD: A machine learning framework for precise TAD boundary prediction

createTADdata: Function to create a data matrix used for building a...
In dozmorovlab/preciseTAD: preciseTAD: A machine learning framework for precise TAD boundary prediction