fill_gap: Fill in a gap in recombination distance

View source: R/fill_gap.R

fill_gapR Documentation

Fill in a gap in recombination distance

Description

Fill in a gap in recombination distance

Usage

fill_gap(df, start, end, n_snps, method = "value")

Arguments

df

- A dataframe containing info on SNPs for a single gap region. Must contain the column "dist" for each SNP's recombination distance (in cM), but otherwise can contain any additional number of columns

start

Numeric - the start position of the gap in cM (only used when method is set to 'value')

end

Numeric - the end position of the gap in cM (only used when method is set to 'value')

n_snps

Integer - the ideal number of SNPs to select from the region. The actual number of SNPs selected may be lower than this value when setting method to 'value'

method

String - either one of 'value' or 'percentile'

Details

This function will attempt to find SNPs to fill in gaps (in terms of recombination distance) on chromosomes. To do so, it requires a dataframe containing the cM coordinates of SNPs within the gap region, and some number supplied for 'n_snps', which essentially specifies the maximum number of SNPs to return. Note that the gap region should contain more SNPs than 'n_snps'. The function then generates "virtual SNPs" which are evenly spaced across the gap region. These can be spaced according to actual recombination distance (cM) or else percentile. If using the former, the function will typically return fewer SNPs than 'n_snps' because the actual SNPs present in the region are likely not distributed uniformly. Therefore multiple virtual SNPs may correspond to a single actual SNP. In the case of using percentiles, the function will return n_snps unless the region actually contains fewer SNPs than 'n_snps'.

This function can be used to find SNPs that are approximately equally spaced across a chromosome by considering the entire chromosome as a gap. If using the 'value' method, simply set the start value to 0 and the end value to the expected length of the chromosome in cM.

Value

The input data.table with added weights column and rows filtered to only include selected SNPs. The weights column represents clustering of SNPs - i.e. the number of SNPs that are "respresented" by the selected SNP.


etnite/bwardr documentation built on Jan. 6, 2023, 7:12 a.m.