windows_indiv_roh: Detect runs of homozygosity using a sliding-window approach

View source: R/windows_indiv_roh.R

windows_indiv_rohR Documentation

Detect runs of homozygosity using a sliding-window approach

Description

This function uses a sliding-window approach to look for runs of homozygosity (or heterozygosity) in a diploid genome. It is based on the package selectRUNS, which implements an approach equivalent to the one in PLINK.

Usage

windows_indiv_roh(
  .x,
  window_size = 15,
  threshold = 0.05,
  min_snp = 3,
  heterozygosity = FALSE,
  max_opp_window = 1,
  max_miss_window = 1,
  max_gap = 10^6,
  min_length_bps = 1000,
  min_density = 1/1000,
  max_opp_run = NULL,
  max_miss_run = NULL
)

gt_roh_window(
  .x,
  window_size = 15,
  threshold = 0.05,
  min_snp = 3,
  heterozygosity = FALSE,
  max_opp_window = 1,
  max_miss_window = 1,
  max_gap = 10^6,
  min_length_bps = 1000,
  min_density = 1/1000,
  max_opp_run = NULL,
  max_miss_run = NULL
)

Arguments

.x

a gen_tibble

window_size

the size of sliding window (number of SNP loci) (default = 15)

threshold

the threshold of overlapping windows of the same state (homozygous/heterozygous) to call a SNP in a RUN (default = 0.05)

min_snp

minimum n. of SNP in a RUN (default = 3)

heterozygosity

should we look for runs of heterozygosity (instead of homozygosity? (default = FALSE)

max_opp_window

max n. of SNPs of the opposite type (e.g. heterozygous snps for runs of homozygosity) in the sliding window (default = 1)

max_miss_window

max. n. of missing SNP in the sliding window (default = 1)

max_gap

max distance between consecutive SNP to be still considered a potential run (default = 10^6 bps)

min_length_bps

minimum length of run in bps (defaults to 1000 bps = 1 kbps)

min_density

minimum n. of SNP per kbps (defaults to 0.1 = 1 SNP every 10 kbps)

max_opp_run

max n. of opposite genotype SNPs in the run (optional)

max_miss_run

max n. of missing SNPs in the run (optional)

Details

This function returns a data frame with all runs detected in the dataset. The data frame is, in turn, the input for other functions of the detectRUNS package that create plots and produce statistics from the results (see plots and statistics functions in this manual, and/or refer to the detectRUNS vignette).

If the gen_tibble is grouped, then the grouping variable is used to fill in the 'group' column. Otherwise, the 'group' column is filled with the same values as the 'id' column. Note that this behaviour is different from other windowed operations in tidypopgen, which return a list for grouped gen_tibbles; this different behaviour is designed to maintain compatibility with detectRUNS.

The old name for this function, gt_roh_window, is still available, but it is soft deprecated and will be removed in future versions of tidypopgen.

Value

A dataframe with RUNs of Homozygosity or Heterozygosity in the analysed dataset. The returned dataframe contains the following seven columns: "group", "id", "chrom", "nSNP", "from", "to", "lengthBps" (group: population, breed, case/control etc.; id: individual identifier; chrom: chromosome on which the run is located; nSNP: number of SNPs in the run; from: starting position of the run, in bps; to: end position of the run, in bps; lengthBps: size of the run)

Examples


sheep_ped <- system.file("extdata", "Kijas2016_Sheep_subset.ped",
  package = "detectRUNS"
)
sheep_gt <- tidypopgen::gen_tibble(sheep_ped,
  backingfile = tempfile(),
  quiet = TRUE
)
sheep_gt <- sheep_gt %>% group_by(population)
sheep_roh <- windows_indiv_roh(sheep_gt)
detectRUNS::plot_Runs(runs = sheep_roh)


tidypopgen documentation built on Aug. 28, 2025, 1:08 a.m.