R/gwas3.R

#' gwas3: A Package for Multi-stage GWAS Analysis.
#'
#' The gwas3 package contains functions that can be used for a
#' three-stage filtering GWAS method. It can also be expanded
#' to variable selection for other types of datasets.
#' The gwas3 package provides four categories of important functions:
#' gwas data simulator, simple initial filters, elastic net for
#' second stage filtering, and
#' random forests and trees for final SNP selection.
#'
#' @section Gwas3 functions:
#' The gwas3 functions are \code{gwas_sim}, \code{snp_filter},
#' \code{filter_subset},
#' \code{elasticnet}, \code{snp_rf}, \code{snp_tree}, and \code{tree_plot}.
#' The functions are intended to be used
#' successively to perform a comprehensive GWAS, but they can be used
#' as stand alone functions for variable selection.
#'
#' @section GWAS data simulator:
#' \code{gwas_sim} uses a set of real genotypes to generate simulated
#' GWAS data. The user specifies the number of SNPs that will influence
#' the phenotype and the heretability (R-square) and the simulator
#' returns a phenotype that is dependent only on those SNPs. The
#' simulator randomly selects the SNPs that will contribute to the
#' phenotype. The function returns the phenotype, SNP locations,
#' effect size, and the estimated heretability between the genotype
#' and simulated SNP. Note that the genotype is not changed so all of
#' the physical characteristics, such as linkage disequilbrium, are
#' preserved.
#'
#' @section Primary filter functions:
#' \code{snp_filter} uses distance correlation, logistic regression,
#' or linear regression for initial filtering. It returns a
#' vector of p-values or distance correlations.
#'
#' \code{filter_subsets} uses the genotypes (x-variables) and the
#' vector returned by \code{snp_filter} and returns a trimmed
#' dataset that includes only the SNPs (or x-variables) that
#' pass the user specified threshold.
#'
#' @section Second stage filter function:
#' \code{elasticnet} uses the genotypes from a full or trimmed dataset
#' and the phenotype (y-variable) and computes a series of elastic net
#' models with different levels of alpha. The function and returns a list
#' of datasets that contain the SNPs that have
#' non-zero coefficients for a given level of alpha in
#' an elasticnet model. The alpha value is also returned
#' in a list.
#'
#' @section Final stage SNP selection functions:
#' \code{snp_rf} uses a list of datasets returned by the
#' \code{elasticnet} function and computes a random forest
#' model for each of the datsets. A list is returned
#' that has three elements for each dataset: a
#' numeric vector with the importance measure for each SNP,
#' a variable importance plot with at least 15 of the
#' most important variables, and the alpha used by
#' \code{elasticnet} that was used to generate the dataset.
#'
#' \code{snp_tree} uses a list of datasets returned by the
#' \code{elasticnet} function and computes a classification or
#' regression tree for each of the datsets. The trees are
#' pruned using the 1-se rule. A list is returned
#' that has three elements for each dataset: a
#' pruned tree that is an rpart object, a character
#' vector that lists all of the SNPs where the tree
#' splits, and the alpha used by \code{elasticnet} that was used
#' to generate the dataset.
#'
#' \code{tree_plot} will take a list generated by the \code{snp-tree}
#' function and plot the trees using the \code{rpart.plot} package.
#'
#' @docType package
#' @name gwas3
NULL
jillbo1000/gwas3 documentation built on June 14, 2019, 3:08 a.m.