runLogRegTest: Logistic Regression Test
In cbirdlab/impostar: ImPoStAR: Implement Population Structure Analyses in R

Description Usage Arguments Value Author(s) References See Also Examples

This function performs a binomial logistic regression test for a genetic cline among three populations, implementing the appropriate resampling strategies to model the total sampling error associated with either Sanger sequencing (population sampling error) or next-generation sequencing (population + sequencer sampling error).

1	runLogRegTest(dataFile, OutputBase=NULL, mainDirectory=getwd(), NumBS=1000, NGSdata=T, nCores=NULL)

`dataFile`	a data.frame or character string indicating the path to the csv file. Required. Each row is a SNP with a unique snpID, and columns represent the total number of alleles (2n, TotAlleles), read depth (DP), reference alleles (RefAl), alternate alleles (AltAl), reference reads (RD), and alternate reads (AD) per pool (as indicated by the number in the column names). Use simulate_data to generate an example and view required format. Required.
`OutputBase`	A character string indicating the base of the output filename, to which the number of bootstraps and if ModelSeqSampError was incorporated (ResampReads) will be appended. If NULL (default) then no output file is written.
`mainDirectory`	A character string indicating the working directory to use for input and optional output files. Default is getwd().
`NumBS`	An integer >=2 indicating the number of bootstrap iterations to perform. Default is 10000.
`NGSdata`	A logical indicating whether the resampling strategy for next-generation sequencing data (TRUE, Default) or Sanger sequencing data (FALSE) should be implemented.
`nCores`	An integer indicating the number of cores to use in parallel processing, or NULL (default) to use the maximum available cores minus 1. Note that parallel processing not supported by Windows.

The runLogRegTest function returns a dataframe and optional csv file written to the working directory. Each row is a SNP, where ObsSlope refers to the observed rate of change in the log odds ratio of the logistic regression line, and SlopeP is used to evaluate its significance.

Rebecca M. Hamner, Jason D. Selwyn, Evan Krell, Scott A. King, Christopher E. Bird

Hamner, R.M., J.D. Selwyn, E. Krell, S.A. King, and C.E. Bird. In review. Modeling next-generation sequencer sampling error in pooled population samples dramatically reduces false positives in genetic structure tests.

simulate_data, runAMOVA

# simulate data file
dataFile <- simulate_data(rep(50, 3), rep(100, 3), rep(0.5, 3), 5, file_name=T)
# run logistic regression test
LogRegResults <- runLogRegTest(dataFile, OutputBase=NULL, mainDirectory=getwd(), NumBS=10, NGSdata=T, nCores=NULL)