View source: R/windows_indiv_roh.R
windows_indiv_roh | R Documentation |
This function uses a sliding-window approach to look for runs of homozygosity
(or heterozygosity) in a diploid genome. It is based on the package
selectRUNS
, which implements an approach equivalent to the one in PLINK.
windows_indiv_roh(
.x,
window_size = 15,
threshold = 0.05,
min_snp = 3,
heterozygosity = FALSE,
max_opp_window = 1,
max_miss_window = 1,
max_gap = 10^6,
min_length_bps = 1000,
min_density = 1/1000,
max_opp_run = NULL,
max_miss_run = NULL
)
gt_roh_window(
.x,
window_size = 15,
threshold = 0.05,
min_snp = 3,
heterozygosity = FALSE,
max_opp_window = 1,
max_miss_window = 1,
max_gap = 10^6,
min_length_bps = 1000,
min_density = 1/1000,
max_opp_run = NULL,
max_miss_run = NULL
)
.x |
a gen_tibble |
window_size |
the size of sliding window (number of SNP loci) (default = 15) |
threshold |
the threshold of overlapping windows of the same state (homozygous/heterozygous) to call a SNP in a RUN (default = 0.05) |
min_snp |
minimum n. of SNP in a RUN (default = 3) |
heterozygosity |
should we look for runs of heterozygosity (instead of homozygosity? (default = FALSE) |
max_opp_window |
max n. of SNPs of the opposite type (e.g. heterozygous snps for runs of homozygosity) in the sliding window (default = 1) |
max_miss_window |
max. n. of missing SNP in the sliding window (default = 1) |
max_gap |
max distance between consecutive SNP to be still considered a potential run (default = 10^6 bps) |
min_length_bps |
minimum length of run in bps (defaults to 1000 bps = 1 kbps) |
min_density |
minimum n. of SNP per kbps (defaults to 0.1 = 1 SNP every 10 kbps) |
max_opp_run |
max n. of opposite genotype SNPs in the run (optional) |
max_miss_run |
max n. of missing SNPs in the run (optional) |
This function returns a data frame with all runs detected in the
dataset. The data frame is, in turn, the input for other functions of the
detectRUNS
package that create plots and produce statistics from the
results (see plots and statistics functions in this manual, and/or refer to
the detectRUNS
vignette).
If the gen_tibble
is grouped, then the grouping variable is used to
fill in the 'group' column. Otherwise, the 'group' column is filled with
the same values as the 'id' column. Note that this behaviour is
different from other windowed operations in tidypopgen
, which
return a list for grouped gen_tibbles
; this different behaviour is
designed to maintain compatibility with detectRUNS
.
The old name for this function, gt_roh_window
, is still available, but
it is soft deprecated and will be removed in future versions of
tidypopgen
.
A dataframe with RUNs of Homozygosity or Heterozygosity in the analysed dataset. The returned dataframe contains the following seven columns: "group", "id", "chrom", "nSNP", "from", "to", "lengthBps" (group: population, breed, case/control etc.; id: individual identifier; chrom: chromosome on which the run is located; nSNP: number of SNPs in the run; from: starting position of the run, in bps; to: end position of the run, in bps; lengthBps: size of the run)
sheep_ped <- system.file("extdata", "Kijas2016_Sheep_subset.ped",
package = "detectRUNS"
)
sheep_gt <- tidypopgen::gen_tibble(sheep_ped,
backingfile = tempfile(),
quiet = TRUE
)
sheep_gt <- sheep_gt %>% group_by(population)
sheep_roh <- windows_indiv_roh(sheep_gt)
detectRUNS::plot_Runs(runs = sheep_roh)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.