Description Usage Arguments Value Note Author(s) References See Also Examples
This function estimates variable recombination rates from population genetic data.
Therefore, a segmentation algorithm with specific segment lengths (segLength
) and type-I error probability (alpha
, α) is applied. The returned object can be plotted with the plot-function of the package stepR
.
1 2 3 4 5 | LDJump(seqName, alpha = 0.05, quant = 0.35, segLength = 1000, pathLDhat = "",
pathPhi = "", format = "fasta", refName = NULL, start = NULL, constant = F,
rescale = F, status = T, polyThres = 0, cores = 1, accept = F,
demography = F, regMod = "", out = "", lengthofseq = NULL, chr = NULL,
startofseq = NULL, endofseq = NULL)
|
seqName |
A character string containing the full path and the name of the sequence file in |
alpha |
A value from the interval (0,1) for the type-I error probability α used in the segmentation algorithm. We recommend to use 0.05. We enabled to estimate the recombination map efficiently (without recalculating all summary statistics) under several type-I errors when |
quant |
A value between 0.1 and 0.5 with 0.05 distances in between which reflects the quantile used in the quantile regression. We recommend to use the 0.35 quantile which is the default value. |
segLength |
An integer value for the length of the segments, provided by the user. The default value of 1000 is our recommended value (1kb). The number of resulting segments, based on the sequence length is calculated within the funtion. |
pathLDhat |
A character string containing the path to LDhat. This path and the installation of LDhat is necessary for the computation of the package. |
pathPhi |
A character string containing the path to PhiPack. This path and the installation of PhiPack is necessary for the computation of the package. |
format |
A character string which can be |
refName |
An (optional) path to the reference sequence for the region of interest downloaded from e.g. http://phase3browser.1000genomes.org/index.html. Only to be used in case that |
start |
An (optional) integer value which reflects the starting position of the sequences in bp. Only to be used in case that |
constant |
an optional logical value: by default |
rescale |
an optional logical value: by default |
status |
an optional logical value: by default |
polyThres |
a numeric value between 0 and 1. Used in data manipulation function |
cores |
a positive integer value which is by default 1. This integer reflects the number of cores to be used. Hence, when setting to an integer larger then one the same number of cores are used to compute the recombination map. |
accept |
an optional logical value: by default |
demography |
an optional logical value: by default |
regMod |
an optional character string: for the default empty string "" |
out |
an optional character string: by default an empty string "". Can be set to any user-defined string in order to rename all output files used within |
lengthofseq |
an integer value describing the length of the sequence (Only required when running |
chr |
either an integer value between 1-22 or a character value "X"/"Y" describing which chromosome is used to run |
startofseq |
an integer value describing at which position the sequence to be analyzed starts (Only required when running |
endofseq |
an integer value describing at which position the sequence to be analyzed ends (Only required when running |
The following list is returned in the case of estimating variable recombination rates (constant == FALSE
).
seq.full.cor |
The final estimate of the recombination map. Depiction with plot-function of stepR package. |
pr.full.cor |
The (constant) estimates of the recombination rate per segment. |
help |
A helper matrix containing the summary statistics per segment used in the regression model. |
alpha |
The type-I error probability α. |
nn |
The number of individuals (more precisely sequences) for which the recombination map was estimated. |
ll |
Total sequence length |
segs |
The number of segments by which the sequence is divided. Resulting from the user-defined segment length (segLength). |
For constant recombination rate estimation across the whole sequences (constant == TRUE
), we provide the same list except for seq.full.cor
.
This function only works with unix and having PhiPack installed. We strongly recommend to also install LDhat (Auton and McVean (2007)) in order to decrease the computational cost of estimating recombination maps. Please properly check all paths to PhiPack and in case of LDhat as well as the sequence files.
Previous versions (older than v 0.2.1) required lookup tables within the pairwise estimate of LDhat. These files should be located in the path "pathToLDhat/LDhat-master/lk_files". Lookup tables are contained in LDhat, but we still provide several lookup tables here. We strongly recommend to use the most recent version of LDJump
in order to estimate recombination rates.
Philipp Hermann philipp.hermann@jku.at, Andreas Futschik, Fardokhtsadat Mohammadi fardokht.fm@gmail.com
Auton, A. and McVean, G. (2007). Recombination rate estimation in the presence of hotspots. Genome Research, 17(8), 1219-1227.
Bruen, T. C., Philippe, H., and Bryant, D. (2006). A simple and robust statistical test for detecting the presence of recombination. Genetics, 172(4):2665-2681.
Frick, K., Munk, A., and Sieling, H. (2014). Multiscale change-point inference. Journal of the Royal Statistical Society: Series B, 76(3), 495-580.
Futschik, A., Hotz, T., Munk, A., and Sieling, H. (2014). Multiscale DNA partitioning: Statistical evidence for segments. Bioinformatics, 30(16), 2255-2262.
Hermann, P., Heissl, A., Tiemann-Boege, I., and Futschik, A. (2019), LDJump: Estimating Variable Recombination Rates from Population Genetic Data. Mol Ecol Resour. doi:10.1111/1755-0998.12994.
Jombart T. and Ahmed I. (2011) adegenet 1.3-1: new tools for the analysis of genome-wide SNP data. Bioinformatics. doi:10.1093/bioinformatics/btr521
Knaus BJ and Grünwald NJ (2017). VCFR: a package to manipulate and visualize variant call format data in R. Molecular Ecology Resources, 17(1), pp. 44-53. ISSN 757, doi:10.1111/1755-0998.12549.
McVean, G. A. T., Myers, S. R., Hunt, S., Deloukas, P., Bentley, D. R., and Donnelly, P. (2004). The fine-scale structure of recombination rate variation in the human genome. Science, 304(5670), 581-584.
Paradis E., Claude J. & Strimmer K. 2004. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20: 289-290.
The 1000 Genomes Project Consortium (2015). Aglobal reference for human genetic variation. Nature, 526(7571), 68-74.
Wood, S.N. (2011) Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B) 73(1):3-36
summary_statistics
, vcfR_to_fasta
, getPhi
, get_smuce
, smuceR
, rq
, gam
, vcfR2DNAbin
, diseq
, genotype
, readDNAStringSet
, calcRegMod
1 2 3 4 5 6 7 | ##### Do not run these examples #####
##### result = LDJump(fileName, alpha = 0.05, segLength = 1000, #####
##### pathLDhat = getwd(), format = "fasta") #####
##### plot(results) #####
##### results = LDJump("/pathToSample/HatLandscapeN16Len1000000Nrhs15_th0.01_540_1.fa", #####
##### alpha = 0.05, segLength = 1000, pathLDhat = "/pathToLDhat", pathPhi = "/pathToPhi", #####
##### format = "fasta", refName = NULL #####
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.