freqpcr: Estimate population allele frequency from the set of Cq...

View source: R/02_freqpcr.R

freqpcrR Documentation

Estimate population allele frequency from the set of Cq measurements.

Description

The function estimates the population allele frequency using the dataset of Cq values measured over length(N) bulk samples, each of which has a sample size of N[i] as the number of individuals included. N[i] can be 1, meaning that every individual is analyzed separately. For the ith sample, the four Cq values were measured as housek0[i], target0[i], housek1[i], and target1[i]. The function can estimate up to five parameters simultaneously when the Cq sets are available for more than two (bulk) samples: P, K, targetScale, sdMeasure, and EPCR.
Since v0.3.2, user can also use an experimental ‘continuous model’ by specifying A instead of N. That is, each sample DNA is directly extracted from the environment and the sample allele ratio y follows y ~ Beta(apk, a(1-p)k) instead of y ~ Beta(mk, (n-m)k), m ~ Binomial(n, p), where p and k are the population allele frequency and the gamma shape parameter of the individual DNA yield, respectively. Each element of A, a is a scaling factor of relative DNA contents between the samples. The continuous model is likely when each sample directly comes from the environment e.g., water sampling in an eDNA analysis or cell culture in a petri dish.
Since v0.4.0, freqpcr() also works without specifying housek0 and target0 i.e., it can estimate population allele frequency from ΔCq values instead of ΔΔCq. In this setting, the sizes of targetScale and sdMeasure should be fixed in addition to EPCR and zeroAmount.

Usage

freqpcr(
  N,
  A,
  housek0,
  target0,
  housek1,
  target1,
  P = NULL,
  K = NULL,
  targetScale = NULL,
  sdMeasure = NULL,
  EPCR = 0.99,
  XInit0 = c(P = NULL, K = NULL, targetScale = NULL, sdMeasure = NULL, EPCR = NULL),
  zeroAmount = NULL,
  beta = TRUE,
  diploid = FALSE,
  pvalue = 0.05,
  gradtol = 1e-04,
  steptol = 1e-09,
  iterlim = 100,
  maxtime = 600,
  print.level = 1,
  ...
)

Arguments

N

Sample sizes as a numeric vector. N[i] signifies the number of individuals (both for haploidy and diploidy) contained in the ith bulk sample. N must not contain a missing value (NA). If N is not applicable (= even not 1), feed A instead of N and then the estimation process runs with the ‘continuous model’.

A

Use instead of N in the continuous model. This is a scale factor to control the relative DNA content between samples. A[i] can take any positive value, but must not be NA. Considering the case you have arranged each sample by e.g. water filrtation or extraction from a culture in a petri dish, it is convenient to define the unit size of A[i] == 1.0 to be same as the vessel volume (e.g. 2.0 for two petri dishs, 0.5 for half bottle of water, etc.). When neither N nor A is specified by the user, the function stops. If both N and A are specified, only N is evaluated.

housek0

A numeric vector. In RED-ΔΔCq method, housek0 is the Cq values of the test sample without the restriction enzyme digestion, which is amplified with the primer set for a housekeeping gene. In general ΔΔCq analyses, housek0 is defined for the control sample (typically, 100% mutant) solution, which is also amplified with the primer set for the housekeeping gene.
Since v0.4.0, you can run freqpcr() without specifying housek0 and target0 (a ΔCq method). As this setting halves the effective data points, it is recommended to fix other parameters, especially targetScale.
The four Cq arguments, housek0, target0, housek1, and target1, all must have the same data length. They also must be the same length as N or A. If the Cq dataset has missing values, they must be filled with NA so that the length of the data vectors will not differ.

target0

In RED-ΔΔCq method, target0[i] signifies the measured Cq value of the ith bulk sample without the digestion, for which both alleles, wild-type (S: susceptible) and mutant (R: resistant to a pesticide), on the target locus are amplified. In general ΔΔCq analyses, target0 is the Cq values of the pure-S control sample, which is amplified with a R-allele-specific primer set.

housek1

The Cq values of the test sample measured on the housekeeping gene after the restriction enzyme digestion (in RED-ΔΔCq method), or the test sample amplified on the housekeeping gene (in general ΔΔCq analyses).

target1

For each test sample with unknown allele-ratio, target1[i] is defined as the Cq value for the target locus amplified after the restriction enzyme digestion (in RED-ΔΔCq method), or the target locus amplified with the R-allele-specific primer set (in general ΔΔCq analyses).

P

Scalar. Population allele frequency from which the test samples are derived. Default is NULL and to be estimated. If the parameter is known, it is given as a numeric between 0 and 1.

K

Scalar. The gamma shape parameter of the individual DNA yield. Default is NULL and to be estimated. If known, given as a positive numeric.

targetScale

(δ_{T}) Scalar. The relative template DNA amount of the target locus to the houskeeping locus. If known, given as a positive numeric.

sdMeasure

(σ_{c}) Scalar. The measurement error (standard deviation) on each Cq value following Normal(0, σ_{c}^2). If known, given as a positive numeric.

EPCR

(η) Scalar. Amplification efficiency per PCR cycle. If known, given as a positive numeric. When EPCR = 1, template DNA doubles every cycle (EPCR + 1 = 2).

XInit0

Optionally the initial value for the parameter optimization can be specified, but it is strongly recommended to keep the argument as is. Unlike XInit in knownqpcr(), the argument is not directly passed to the optimizer; used only when each parameter is set unknown (the parameter is absent or specified as NULL).

zeroAmount

(In RED-ΔΔCq method) residue rate of restriction enzyme digestion, or (in general ΔΔCq analyses) small portion of the off-target allele on the target locus of the test sample, which will be amplified in the PCR. It needs to be always specified by the user as a number between 0 and 1, usually near 0.

beta

Whether to use the beta distribution to approximate the sample allele ratio instead of specifying individual gamma distribution for each of the allelic DNA amounts? Default is TRUE, which accelerates the calculation.

diploid

Is the target organism diploidy? Default is FALSE, assuming haploidy. Current implementation of diploidy assumes i.i.d. between the amounts of R and S chromosomes owned by a heterozygote individual, which is unlikely in many animals but necessary for the calculation in a realistic time.

pvalue

The two-sided confidence interval is calculated at the last iteration at given significance level. Default is 0.05, which returns the 95% Wald's CI (2.5 to 97.5 percentile) based on the Hessian matrix.

gradtol, steptol, iterlim

Control parameters passed to nlm(). gradtol and steptol are the positive scalars giving the tolerance to terminate the algorithm and the minimum allowable relative step length. iterlim specifies the maximum number of iterations to be performed before the program is terminated (and evaluated at the last iteration). Usually 30 iterations are enough.

maxtime

A positive scalar to set the maximum calculation time in seconds to abort the optimizer (and return error). The total calculation time largely depends on N[i], the number of individuals contained in each bulk sample.

print.level

print.level=1 (the default) shows the initial values of the parameters and likelihood as well as the output in the last iteration. print.level=2 shows the parameter values and gradients in every step. print.level=0 does not output any intermediate state to R console, simply returning the result summary.

...

Additional arguments passed to the function.

Value

Object of the S4 class CqFreq. The slot report is a matrix and each row contains the estimated parameter value with 100*(1-pvalue)% confidence interval. The following parameters are returned:

  1. P, the population allele frequency from which the test samples are derived.

  2. K, the gamma shape parameter of the individual DNA yield.

  3. targetScale (δ_{T}), the relative template DNA amount of the target to the houskeeping loci.

  4. EPCR (η), the amplification efficiency per PCR cycle.

  5. sdMeasure or "Cq measurement error" (σ_{c}).

Choise of the parameters to be estimated

Estimation is conducted only for parameters where the values are not specified or specified explicitly as NULL. If one feeds a value e.g. K=1 or sdMeasure=0.24, it is then treated as fixed parameter. Notwithstanding, EPCR is estimated only when EPCR = NULL is specified explicitly.
You must verify the size of EPCR and zeroAmount beforehand because they are not estimable from the experiments with unknown allele ratios. Although targetScale and sdMeasure are estimable within freqpcr(), it is better to feed the known values, especially when you have only a few bulk samples (length(N) <= 3). Fixing targetScale and sdMeasure is also strongly recommended when housek0 and target0 are absent (ΔCq method). The functions knownqpcr() or knownqpcr_unpaired(), depending on your data format, provide the procedure to estimate the sizes of the experimental parameters using the DNA solutions of known allele mixing ratios.
For the unknown parameters, XInit0 optionally specifies initial values for the optimization using nlm() though the use of internal default is strongly recommended. The specification as a fixed parameter has higher priority than the specification in XInit0. Every user-specified parameter values, fixed or unknown, must be given in linear scale (e.g. between 0 and 1 for the allele frequency); they are transformed internally to log or logit.

See Also

Other estimation procedures: knownqpcr_unpaired(), knownqpcr(), sim_dummy()

Examples


# Dummy Cq dataset with six bulk samples, each of which comprises of eight haploids.
EPCR <- 0.95; zeroAmount <- 0.0016; # True values for the two must be known.
P <- 0.75
dmy_cq <- make_dummy(   rand.seed=1, P=P, K=2, ntrap=6, npertrap=8,
                        scaleDNA=1e-07, targetScale=1.5, baseChange=0.3,
                        EPCR=EPCR, zeroAmount=zeroAmount,
                        sdMeasure=0.3, diploid=FALSE   )
print(dmy_cq)

# Estimation with freqpcr, where P, K, targetScale, and baseChange are marked unknown.
result <- freqpcr( N=dmy_cq@N, housek0=dmy_cq@housek0, target0=dmy_cq@target0,
                   housek1=dmy_cq@housek1, target1=dmy_cq@target1,
                   EPCR=EPCR, zeroAmount=zeroAmount, beta=TRUE, print.level=0 )
print(result)

# Estimation with freqpcr, assuming diploidy.
result <- freqpcr( N=dmy_cq@N, housek0=dmy_cq@housek0, target0=dmy_cq@target0,
                   housek1=dmy_cq@housek1, target1=dmy_cq@target1,
                   EPCR=EPCR, zeroAmount=zeroAmount, beta=TRUE, diploid=TRUE )

# Estimation where you have knowledge on the size of K.
result <- freqpcr( N=dmy_cq@N, housek0=dmy_cq@housek0, target0=dmy_cq@target0,
                   housek1=dmy_cq@housek1, target1=dmy_cq@target1,
                   K=2,
                   EPCR=EPCR, zeroAmount=zeroAmount, beta=TRUE, print.level=2 )
# (>= v0.3.2)
# Provided the model is continuous (specify A instead of N).
result <- freqpcr( A=dmy_cq@N, housek0=dmy_cq@housek0, target0=dmy_cq@target0,
                   housek1=dmy_cq@housek1, target1=dmy_cq@target1,
                   K=2, EPCR=EPCR, zeroAmount=zeroAmount, beta=TRUE, print.level=1 )
# (>= v0.4.0)
# If the dataset lacks control samples (housek0 and target0 are absent).
# Fixing the sizes of targetScale and sdMeasure is strongly recommended.
result <- freqpcr( N=dmy_cq@N, housek1=dmy_cq@housek1, target1=dmy_cq@target1,
                   K=2, EPCR=EPCR, zeroAmount=zeroAmount,
                   targetScale=1.5, sdMeasure=0.3, beta=TRUE, print.level=1 )


freqpcr documentation built on March 18, 2022, 7:25 p.m.