rank2map: Convert SNP Ranks To Windows Corresponding to Mapping...

View source: R/rank2map.r

rank2mapR Documentation

Convert SNP Ranks To Windows Corresponding to Mapping Distance

Description

This function estimates positions of ordered single nucleotide polymorphisms (SNPs) that correspond to a window spanning a user-defined distance in the SNP positions mapped to a reference. Each window is centered at the SNP mapped position. Conversion of a SNP rank position metric to a mapped position metric is useful for kernel smoothing of the diem output state along a genomic sequence.

Usage

rank2map(includedSites, ChosenSites = "all", windowSize = 1e+07, nCores = 1)

Arguments

includedSites

A character path to a file with columns CHROM and POS.

ChosenSites

A logical vector indicating which sites are to be included in the analysis.

windowSize

A numeric window size for metric conversion in base-pairs.

nCores

A numeric number of cores to be used for parallelisation. Must be nCores = 1 on Windows.

Details

Single nucleotide polymorphisms (SNPs) tend to be spread across a genome randomly. To facilitate interpretation of the diem output, the marker states should be assessed on the metric of their position along chromosomes (contigs). The windows for kernel smoothing might contain a variable number of markers. This function estimates which markers should be assessed together given their proximity on a chromosome.

Values in includedSites are in essence SNP positions in BED format with a header. The includedSites file should ideally be generated by vcf2diem to ensure congruence across all analyses.

The function reads SNP positions from the specified BED-like file and divides the genome into segments based on chromosomes. Each segment is then processed to identify genomic windows encompassing each SNP, considering the specified window size. This process is parallelized to enhance performance, and each SNP is considered within its chromosomal context to ensure accurate window placement.

Minimum value of windowSize is equal to 3, but in genomic data evaluations, window size should be at least two orders of magnitude larger. A good approximation of a useful minimum window size is $(genome size) / ((number of SNPSs) / 2)$.

Value

A two-column matrix with the number of rows corresponding to the number of ChosenSites, indicating start and end indices of adjacent markers that are within an interval of length windowSize centered on the specific marker.

Note

The unit of parallelization when using nCores > 1 is set per chromosome. This may differ from the parallelization approach used in diem, where processing of compartment files is parallelized. Note that while compartment files can correspond to chromosomes, this is not necessarily the case.

Author(s)

Natalia Martinkova

Filip Jagos 521160@mail.muni.cz

Examples

 ## Not run: 
 # Run this example in a working directory with write permissions
 myo <- system.file("extdata", "myotis.vcf", package = "diemr")
 vcf2diem(myo, "myo")
 rank2map("myo-includedSites.txt", windowSize = 50)
 
## End(Not run) 

diemr documentation built on Sept. 23, 2024, 5:10 p.m.