haplotest: Testing for selection using temporal haplotype/allele...

View source: R/haplotest.R

haplotestR Documentation

Testing for selection using temporal haplotype/allele frequencies.

Description

Selection can be tested using either haplotype or allele frequencies. For haplotype frequencies, 3 methods are available, basic, haplotype filtering using SNPs, haplotype block based. All tests uses the adapted chi-squared test to correct for variance during testing, dependent on whether the frequencies are estimated. For haplotype filtering and haplotype block based method, both haplotype and allele frequencies, as well as the haplotype structure will be required.

Usage

haplotest(
  haplotype = NULL,
  haplotype_frequency = NULL,
  allele_frequency = NULL,
  coverage_matrix = NULL,
  deltat = 10,
  Ne = 1000,
  repli = 1,
  test_type = "haplotype",
  test_method = "base",
  p_combine_method = "omnibus",
  freq_est = "known",
  significance = 0.05,
  seed = 2021
)

Arguments

haplotype

Binary numeric matrix containing information of haplotype structure. The columns corresponds to the haplotypes and the rows to the SNPs.

haplotype_frequency

Numeric matrix, with each i,j element being the frequency of haplotype i at sequenced time point j.

allele_frequency

Numeric matrix, with each i,j element being the frequency of SNP i at sequenced time point j.

coverage_matrix

Numeric matrix, required if frequencies are estimated. Sample size matrix of either haplotype/allele frequencies. If test_method == "hap_filt" for haplotype based test, sample size matrix for both haplotype and allele needs to be present. It should be in the form of list(haplotype_sample_size, allele_sample_size).

deltat

Numeric, number of generations between each pair of time points of interests

Ne

Numeric vector with length as number of replicates, containing information of Ne (effective population size) at each replicated population. If Ne changes over time, take as input a numeric matrix, with the column being the replicate position, row being the Ne at each sequenced time points.

repli

Numeric, specifying the number of replicated populations.

test_type

Factor, can be either "haplotype" or "allele", denoting whether to use haplotype based test or snp based test

test_method

Factor, can be "base", "hap_filt" or "hap_block" for haplotype based, denoting basic haplotype based test, haplotype filtering using snp based test, or haplotype block based test. Can only take argument "base" for snp based.

p_combine_method

Factor, method of pvalue combination, can be "omnibus" (Futschik, A. et al. 2019), "harmonic" (Wilson, D.J. (2019)), "vovk" (Vovk, V. et al. 2018), "bonferroni", "BH" (Benjamini & Hochberg 1995).

freq_est

Factor, whether the frequencies are known or estimated. Input "known" if all frequencies are known, "half" if starting frequencies are known and others estimated, "est" if all are estimated.

significance

Numeric, must be between 0 and 1, level of significance for snp based test if "hap_filt" method is used.

seed

setting seed of the run

Value

A numeric vector of p-values from the test after multiple testing corrections.

References

Spitzer, K., Pelizzola, M., Futschik, A., (2019), Modifying the Chi-square and the CMH test for population genetic inference: adapting to over-dispersion, The Annals of Applied Statistics 14(1): 202-220.

Futschik, A., Taus, T., Zehetmayer, S., (2019), An omnibus test for the global null hypothesis, Statistical Methods in Medical Research 28(8): 2292-2304.

Wilson, D. J., (2019), The harmonic mean p-value for combining dependent tests, Proceedings of the National Academy of Sciences of the United States of America 116(4): 1195-1200.

Benjamini, Y., Hochberg, Y., (1995), Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society Series B 57(1): 289-300.

Vovk, V., Wang., R., (2018), Combining p-values via averaging, Biometrika 107(4): 791–808.

Examples

#We show here examples from simulated data to p-value results of a haplotype based test.
#The data simulation part can be ignored if a real data set is used.
##########Known frequency simulated data using basic haplotype based test##########

#Simulate haplotype structure with 500 SNPs and 5 haplotypes:
hap_struc = Haplotype_sim(nsnp = 500,nhap = 5)

#Simulation of selected SNP, with 1 selected SNP private to 1 haplotype, 1 population and 0.05 selective strength:
SNP_sel = benef_sim(haplotype = hap_struc, n_benef = 1, min = 0.05, max = 0.05, fix_sel = 1)

#Simulation of haplotype trajectory, 60 generations, sequenced at every 10 timepoints, Ne of 500:
hap_freq = Frequency_sim(haplotype = hap_struc, t = 6, tdelta = 10, benef_sim = SNP_sel,Ne = 500)[[1]]

#Computation of p-value using basic haplotype based test and omnibus p-value combination method:
pval = haplotest(haplotype = hap_struc, haplotype_frequency = hap_freq, deltat = 10, Ne = 500, test_type = "haplotype", test_method = "base", p_combine_method = "omnibus", freq_est = "known")

##########Known frequency simulated data using HapSNP##########

#In addition to the previous simulations, we need to compute the allele frequencies:
allele_freq = hap_struc %*% hap_freq

#Computation of p-value using HapSNP and omnibus p-value combination method:
pval = haplotest(haplotype = hap_struc, haplotype_frequency = hap_freq, allele_frequency = allele_freq, deltat = 10, Ne = 500, test_type = "haplotype", test_method = "hap_filt", p_combine_method = "omnibus", freq_est = "known")

##########Estimated frequency simulated data using basic haplotype based test##########

#Simulation of noisy haplotype frequencies with mean samplesize 80:
hap_freq_est = hap_err_sim(hap_freq, 80)

#Computation of p-value using basic haplotype based test and omnibus p-value combination method:
pval = haplotest(haplotype = hap_struc, haplotype_frequency = hap_freq_est[[1]], coverage_matrix = hap_freq_est[[2]], deltat = 10, Ne = 500, test_type = "haplotype", test_method = "base", p_combine_method = "omnibus", freq_est = "est")


xthchen/haplotest documentation built on Nov. 29, 2022, 12:07 p.m.