haplotest | R Documentation |
Selection can be tested using either haplotype or allele frequencies. For haplotype frequencies, 3 methods are available, basic, haplotype filtering using SNPs, haplotype block based. All tests uses the adapted chi-squared test to correct for variance during testing, dependent on whether the frequencies are estimated. For haplotype filtering and haplotype block based method, both haplotype and allele frequencies, as well as the haplotype structure will be required.
haplotest( haplotype = NULL, haplotype_frequency = NULL, allele_frequency = NULL, coverage_matrix = NULL, deltat = 10, Ne = 1000, repli = 1, test_type = "haplotype", test_method = "base", p_combine_method = "omnibus", freq_est = "known", significance = 0.05, seed = 2021 )
haplotype |
Binary numeric matrix containing information of haplotype structure. The columns corresponds to the haplotypes and the rows to the SNPs. |
haplotype_frequency |
Numeric matrix, with each i,j element being the frequency of haplotype i at sequenced time point j. |
allele_frequency |
Numeric matrix, with each i,j element being the frequency of SNP i at sequenced time point j. |
coverage_matrix |
Numeric matrix, required if frequencies are estimated. Sample size matrix of either haplotype/allele frequencies. If test_method == "hap_filt" for haplotype based test, sample size matrix for both haplotype and allele needs to be present. It should be in the form of list(haplotype_sample_size, allele_sample_size). |
deltat |
Numeric, number of generations between each pair of time points of interests |
Ne |
Numeric vector with length as number of replicates, containing information of Ne (effective population size) at each replicated population. If Ne changes over time, take as input a numeric matrix, with the column being the replicate position, row being the Ne at each sequenced time points. |
repli |
Numeric, specifying the number of replicated populations. |
test_type |
Factor, can be either "haplotype" or "allele", denoting whether to use haplotype based test or snp based test |
test_method |
Factor, can be "base", "hap_filt" or "hap_block" for haplotype based, denoting basic haplotype based test, haplotype filtering using snp based test, or haplotype block based test. Can only take argument "base" for snp based. |
p_combine_method |
Factor, method of pvalue combination, can be "omnibus" (Futschik, A. et al. 2019), "harmonic" (Wilson, D.J. (2019)), "vovk" (Vovk, V. et al. 2018), "bonferroni", "BH" (Benjamini & Hochberg 1995). |
freq_est |
Factor, whether the frequencies are known or estimated. Input "known" if all frequencies are known, "half" if starting frequencies are known and others estimated, "est" if all are estimated. |
significance |
Numeric, must be between 0 and 1, level of significance for snp based test if "hap_filt" method is used. |
seed |
setting seed of the run |
A numeric vector of p-values from the test after multiple testing corrections.
Spitzer, K., Pelizzola, M., Futschik, A., (2019), Modifying the Chi-square and the CMH test for population genetic inference: adapting to over-dispersion, The Annals of Applied Statistics 14(1): 202-220.
Futschik, A., Taus, T., Zehetmayer, S., (2019), An omnibus test for the global null hypothesis, Statistical Methods in Medical Research 28(8): 2292-2304.
Wilson, D. J., (2019), The harmonic mean p-value for combining dependent tests, Proceedings of the National Academy of Sciences of the United States of America 116(4): 1195-1200.
Benjamini, Y., Hochberg, Y., (1995), Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society Series B 57(1): 289-300.
Vovk, V., Wang., R., (2018), Combining p-values via averaging, Biometrika 107(4): 791–808.
#We show here examples from simulated data to p-value results of a haplotype based test. #The data simulation part can be ignored if a real data set is used. ##########Known frequency simulated data using basic haplotype based test########## #Simulate haplotype structure with 500 SNPs and 5 haplotypes: hap_struc = Haplotype_sim(nsnp = 500,nhap = 5) #Simulation of selected SNP, with 1 selected SNP private to 1 haplotype, 1 population and 0.05 selective strength: SNP_sel = benef_sim(haplotype = hap_struc, n_benef = 1, min = 0.05, max = 0.05, fix_sel = 1) #Simulation of haplotype trajectory, 60 generations, sequenced at every 10 timepoints, Ne of 500: hap_freq = Frequency_sim(haplotype = hap_struc, t = 6, tdelta = 10, benef_sim = SNP_sel,Ne = 500)[[1]] #Computation of p-value using basic haplotype based test and omnibus p-value combination method: pval = haplotest(haplotype = hap_struc, haplotype_frequency = hap_freq, deltat = 10, Ne = 500, test_type = "haplotype", test_method = "base", p_combine_method = "omnibus", freq_est = "known") ##########Known frequency simulated data using HapSNP########## #In addition to the previous simulations, we need to compute the allele frequencies: allele_freq = hap_struc %*% hap_freq #Computation of p-value using HapSNP and omnibus p-value combination method: pval = haplotest(haplotype = hap_struc, haplotype_frequency = hap_freq, allele_frequency = allele_freq, deltat = 10, Ne = 500, test_type = "haplotype", test_method = "hap_filt", p_combine_method = "omnibus", freq_est = "known") ##########Estimated frequency simulated data using basic haplotype based test########## #Simulation of noisy haplotype frequencies with mean samplesize 80: hap_freq_est = hap_err_sim(hap_freq, 80) #Computation of p-value using basic haplotype based test and omnibus p-value combination method: pval = haplotest(haplotype = hap_struc, haplotype_frequency = hap_freq_est[[1]], coverage_matrix = hap_freq_est[[2]], deltat = 10, Ne = 500, test_type = "haplotype", test_method = "base", p_combine_method = "omnibus", freq_est = "est")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.