MatchRegionStats: Match DNA sequence characteristics

View source: R/utilities.R

MatchRegionStatsR Documentation

Match DNA sequence characteristics

Description

Return a vector if genomic regions that match the distribution of a set of query regions for any given set of characteristics, specified in the input meta.feature dataframe.

Usage

MatchRegionStats(
  meta.feature,
  query.feature,
  features.match = c("GC.percent"),
  n = 10000,
  verbose = TRUE,
  ...
)

Arguments

meta.feature

A dataframe containing DNA sequence information for features to choose from

query.feature

A dataframe containing DNA sequence information for features to match.

features.match

Which features of the query to match when selecting a set of regions. A vector of column names present in the feature metadata can be supplied to match multiple characteristics at once. Default is GC content.

n

Number of regions to select, with characteristics matching the query

verbose

Display messages

...

Arguments passed to other functions

Details

For each requested feature to match, a density distribution is estimated using the stats::density() function, and a set of weights for each feature in the dataset estimated based on the density distribution. If multiple features are to be matched (for example, GC content and overall accessibility), a joint density distribution is then computed by multiplying the individual feature weights. A set of features with characteristics matching the query regions is then selected using the base::sample() function, with the probability of randomly selecting each feature equal to the joint density distribution weight.

Value

Returns a character vector

Examples

metafeatures <- SeuratObject::GetAssayData(
  object = atac_small[['peaks']], layer = 'meta.features'
)
query.feature <- metafeatures[1:10, ]
features.choose <- metafeatures[11:nrow(metafeatures), ]
MatchRegionStats(
  meta.feature = features.choose,
  query.feature = query.feature,
  features.match = "percentile",
  n = 10
)

Signac documentation built on April 1, 2026, 5:06 p.m.