runLogRegTest: Logistic Regression Test

Description Usage Arguments Value Author(s) References See Also Examples

Description

This function performs a binomial logistic regression test for a genetic cline among three populations, implementing the appropriate resampling strategies to model the total sampling error associated with either Sanger sequencing (population sampling error) or next-generation sequencing (population + sequencer sampling error).

Usage

1
runLogRegTest(dataFile, OutputBase=NULL, mainDirectory=getwd(), NumBS=1000, NGSdata=T, nCores=NULL)

Arguments

dataFile

a data.frame or character string indicating the path to the csv file. Required. Each row is a SNP with a unique snpID, and columns represent the total number of alleles (2n, TotAlleles), read depth (DP), reference alleles (RefAl), alternate alleles (AltAl), reference reads (RD), and alternate reads (AD) per pool (as indicated by the number in the column names). Use simulate_data to generate an example and view required format. Required.

OutputBase

A character string indicating the base of the output filename, to which the number of bootstraps and if ModelSeqSampError was incorporated (ResampReads) will be appended. If NULL (default) then no output file is written.

mainDirectory

A character string indicating the working directory to use for input and optional output files. Default is getwd().

NumBS

An integer >=2 indicating the number of bootstrap iterations to perform. Default is 10000.

NGSdata

A logical indicating whether the resampling strategy for next-generation sequencing data (TRUE, Default) or Sanger sequencing data (FALSE) should be implemented.

nCores

An integer indicating the number of cores to use in parallel processing, or NULL (default) to use the maximum available cores minus 1. Note that parallel processing not supported by Windows.

Value

The runLogRegTest function returns a dataframe and optional csv file written to the working directory. Each row is a SNP, where ObsSlope refers to the observed rate of change in the log odds ratio of the logistic regression line, and SlopeP is used to evaluate its significance.

Author(s)

Rebecca M. Hamner, Jason D. Selwyn, Evan Krell, Scott A. King, Christopher E. Bird

References

Hamner, R.M., J.D. Selwyn, E. Krell, S.A. King, and C.E. Bird. In review. Modeling next-generation sequencer sampling error in pooled population samples dramatically reduces false positives in genetic structure tests.

See Also

simulate_data, runAMOVA

Examples

1
2
3
4
# simulate data file
dataFile <- simulate_data(rep(50, 3), rep(100, 3), rep(0.5, 3), 5, file_name=T)
# run logistic regression test
LogRegResults <- runLogRegTest(dataFile, OutputBase=NULL, mainDirectory=getwd(), NumBS=10, NGSdata=T, nCores=NULL)

cbirdlab/impostar documentation built on June 1, 2019, 7:08 p.m.