MatchRegionStats: Match DNA sequence characteristics

View source: R/utilities.R

MatchRegionStatsR Documentation

Match DNA sequence characteristics

Description

Return a vector if genomic regions that match the distribution of a set of query regions for any given set of characteristics, specified in the input meta.feature dataframe.

Usage

MatchRegionStats(
  meta.feature,
  query.feature,
  features.match = c("GC.percent"),
  n = 10000,
  verbose = TRUE,
  ...
)

Arguments

meta.feature

A dataframe containing DNA sequence information for features to choose from

query.feature

A dataframe containing DNA sequence information for features to match.

features.match

Which features of the query to match when selecting a set of regions. A vector of column names present in the feature metadata can be supplied to match multiple characteristics at once. Default is GC content.

n

Number of regions to select, with characteristics matching the query

verbose

Display messages

...

Arguments passed to other functions

Details

For each requested feature to match, a density distribution is estimated using the density function, and a set of weights for each feature in the dataset estimated based on the density distribution. If multiple features are to be matched (for example, GC content and overall accessibility), a joint density distribution is then computed by multiplying the individual feature weights. A set of features with characteristics matching the query regions is then selected using the sample function, with the probability of randomly selecting each feature equal to the joint density distribution weight.

Value

Returns a character vector

Examples

metafeatures <- SeuratObject::GetAssayData(
  object = atac_small[['peaks']], layer = 'meta.features'
)
query.feature <- metafeatures[1:10, ]
features.choose <- metafeatures[11:nrow(metafeatures), ]
MatchRegionStats(
  meta.feature = features.choose,
  query.feature = query.feature,
  features.match = "percentile",
  n = 10
)

timoast/signac documentation built on Aug. 23, 2024, 1:48 a.m.