windows_indiv_roh: Detect runs of homozygosity using a sliding-window approach
In tidypopgen: Tidy Population Genetics

windows_indiv_roh

R Documentation

Detect runs of homozygosity using a sliding-window approach

Description

This function uses a sliding-window approach to look for runs of homozygosity (or heterozygosity) in a diploid genome. It is based on the package selectRUNS, which implements an approach equivalent to the one in PLINK.

Usage

windows_indiv_roh(
  .x,
  window_size = 15,
  threshold = 0.05,
  min_snp = 3,
  heterozygosity = FALSE,
  max_opp_window = 1,
  max_miss_window = 1,
  max_gap = 10^6,
  min_length_bps = 1000,
  min_density = 1/1000,
  max_opp_run = NULL,
  max_miss_run = NULL
)

gt_roh_window(
  .x,
  window_size = 15,
  threshold = 0.05,
  min_snp = 3,
  heterozygosity = FALSE,
  max_opp_window = 1,
  max_miss_window = 1,
  max_gap = 10^6,
  min_length_bps = 1000,
  min_density = 1/1000,
  max_opp_run = NULL,
  max_miss_run = NULL
)

Arguments

`.x`	a gen_tibble
`window_size`	the size of sliding window (number of SNP loci) (default = 15)
`threshold`	the threshold of overlapping windows of the same state (homozygous/heterozygous) to call a SNP in a RUN (default = 0.05)
`min_snp`	minimum n. of SNP in a RUN (default = 3)
`heterozygosity`	should we look for runs of heterozygosity (instead of homozygosity? (default = FALSE)
`max_opp_window`	max n. of SNPs of the opposite type (e.g. heterozygous snps for runs of homozygosity) in the sliding window (default = 1)
`max_miss_window`	max. n. of missing SNP in the sliding window (default = 1)
`max_gap`	max distance between consecutive SNP to be still considered a potential run (default = 10^6 bps)
`min_length_bps`	minimum length of run in bps (defaults to 1000 bps = 1 kbps)
`min_density`	minimum n. of SNP per kbps (defaults to 0.1 = 1 SNP every 10 kbps)
`max_opp_run`	max n. of opposite genotype SNPs in the run (optional)
`max_miss_run`	max n. of missing SNPs in the run (optional)

Details

This function returns a data frame with all runs detected in the dataset. The data frame is, in turn, the input for other functions of the detectRUNS package that create plots and produce statistics from the results (see plots and statistics functions in this manual, and/or refer to the detectRUNS vignette).

If the gen_tibble is grouped, then the grouping variable is used to fill in the 'group' column. Otherwise, the 'group' column is filled with the same values as the 'id' column. Note that this behaviour is different from other windowed operations in tidypopgen, which return a list for grouped gen_tibbles; this different behaviour is designed to maintain compatibility with detectRUNS.

The old name for this function, gt_roh_window, is still available, but it is soft deprecated and will be removed in future versions of tidypopgen.

Value

A dataframe with RUNs of Homozygosity or Heterozygosity in the analysed dataset. The returned dataframe contains the following seven columns: "group", "id", "chrom", "nSNP", "from", "to", "lengthBps" (group: population, breed, case/control etc.; id: individual identifier; chrom: chromosome on which the run is located; nSNP: number of SNPs in the run; from: starting position of the run, in bps; to: end position of the run, in bps; lengthBps: size of the run)

Examples


sheep_ped <- system.file("extdata", "Kijas2016_Sheep_subset.ped",
  package = "detectRUNS"
)
sheep_gt <- tidypopgen::gen_tibble(sheep_ped,
  backingfile = tempfile(),
  quiet = TRUE
)
sheep_gt <- sheep_gt %>% group_by(population)
sheep_roh <- windows_indiv_roh(sheep_gt)
detectRUNS::plot_Runs(runs = sheep_roh)

tidypopgen documentation built on Aug. 28, 2025, 1:08 a.m.