run_stego: Similarity Test for Genetic Outliers
In dschlauch/WESTGO: Similarity Test for Estimation of Genetic Outliers

Description Usage Arguments Value Author(s) Examples

This function runs the analysis. A genotype matrix ([0,1] for phased, or [0,1,2] for unphased) is the only required argument.

run_stego(genotypes, phased = T, groups = "all.together",
  sampleNames = NULL, labels = NA, super = NA, minVariants = 5,
  blocksize = NA, simFun = NULL, saveDir = NA, verbose = F,
  cores = NULL)

`genotypes`	data object containing the phased or unphased genotypes by samples
`phased`	logical defining whether data exists as phased data, as opposed to unphased data
`groups`	character specifying grouping of analysis. Default is to run analysis all at once- one of "all.together", "each.separately" or "pairwise.within.superpop"
`sampleNames`	character vector with unique identifiers for each sample
`labels`	character covariates, such as population membership. This is unused if groups is "all.together".
`super`	character covariates, such as population membership. This is used only if groups is "pairwise.within.superpop".
`minVariants`	integer specifing a minimum number of occurrences of the minor allele for the variant to be included in analysis. Default is 5, minimum allowed is 2.
`blocksize`	integer specifying the number of consecutive rows in the data matrix to be considered LD blocks. One variant will be chosen from each block in the analysis. Default is NA (no LD pruning, equivalent to blocksize=1)
`simFun`	function for similarity comparision, such as cor or cov. Default is null.
`saveDir`	file to save results output. Default is no saving (saveDir=NA)
`verbose`	logical indicating whether to output status updates during analysis run

List with class "stego" containing

`summary`	Summary statistics, including p-values, FDR, kinship coefficient estimate between all pairs of individuals
`s_matrix_dip`	A matrix of pairwise s statistics between all individuals
`s_matrix_hap`	For phased data only, a matrix of pairwise s statistics between all haplotypes
`var_s_dip`	numeric estimate of the variance of pairwise subject test statistics
`var_s_hap`	numeric estimate of the variance of pairwise haploid test statistics. For phased data only
`simMat`	if simFun is used, A similarity matrix between all individuals
`analysisType`	character indicating what manner the subjects were grouped in the analysis
`pkweightsMean`	numeric value for the whole dataset as a function of the observed allele frequencies

Dan Schlauch dschlauch@fas.harvard.edu

data(toyGenotypes)
sampleNames <- paste("Sample",1:100)

res <- run_stego(toyGenotypes, sampleNames=sampleNames)
plot(res, plotname="All Samples")

labels <- paste("Group",c(LETTERS[rep(1:5,20)]))
res <- run_stegotoyGenotypes, groups="each.separately", labels=labels)
plot(res)

labels <- paste("Group",c(LETTERS[rep(1:5,10)],LETTERS[rep(6:10,10)]))
super <- c(rep("Super A",50), rep("Super B",50))
res <- run_stego(toyGenotypes, groups="pairwise.within.superpop", labels=labels, super=super)
plotFromGSM(res)