R/gdsCoxSurv.R

Defines functions gdsCoxSurv

Documented in gdsCoxSurv

#' Fit Cox survival to all variants from a standard IMPUTE2 output after
#' genotype imputation
#'
#' Performs survival analysis using Cox proportional hazard models on imputed
#' genetic data from GDS files.
#'
#' @param gdsfile path to .gds file. Location of the .gds file should also contain
#' .snp.rdata and .scan.rdata files.
#' @param covariate.file data.frame comprising phenotype information, all 
#'  covariates to be added in the model must be numeric.
#' @param id.column character giving the name of the ID column
#'  in covariate.file.
#' @param sample.ids character vector of sample IDs to keep in
#'  survival analysis
#' @param time.to.event character of column name in covariate.file that
#'  represents the time interval of interest in the analysis
#' @param event character of column name in covariate.file that represents
#'  the event of interest to be included in the analysis
#' @param covariates character vector with exact names of columns in
#'  covariate.file to include in analysis
#' @param inter.term character string giving the column name of the covariate
#'  that will be added to the interaction term with
#'  SNP (e.g. \code{term*SNP}). See details.
#' @param print.covs character string of either \code{"only"}, \code{"all"}
#'  or \code{"some"}, defining which covariate statistics should be printed to
#'  the output. See details.
#' @param out.file character of output file name (do not include extension) 
#' @param chunk.size integer number of variants to process per thread
#' @param maf.filter numeric to filter minor allele frequency
#'  (i.e. choosing 0.05 means filtering MAF>0.05). User can set this to 
#' \code{NULL} if no filtering is preffered. Default is 0.05.
#' @param flip.dosage logical to flip which allele the dosage was
#'  calculated on, default \code{flip.dosage=TRUE}
#' @param verbose logical for messages that describe which part of the
#'  analysis is currently being run
#' @param clusterObj A cluster object that can be used with the
#'  \code{parApply} function. See details.
#' 
#' @details 
#' Testing for SNP-covariate interactions:          
#' User can define the column name of the covariate that will be included in
#'  the interaction term. 
#' For example, for given covariates \code{a} and \code{b}, where \code{c} is
#'  defined as the \code{inter.term} the model will be:
#'  \code{~ a + b + c + SNP + c*SNP}.
#' 
#' Printing results of other covariates:       
#' \code{print.covs} argument controls the number of covariates will be printed
#'  as output. The function is set to \code{only}
#'  by default and will only print the SNP or if an interaction term is given, 
#'  the results of the interaction 
#'  term (e.g. \code{SNP*covariate}). Whereas,  \code{all} will print results
#'  (coef, se.coef, p.value etc) of all covariates included in the model.
#'  \code{some} is only applicable if an interaction term is given and will 
#'  print the results for SNP, covariate tested for interaction and the
#'  interaction term. User should be mindful about using the \code{all} option,
#'  as it will likely slow down the analysis and will increase
#'  the output file size. 
#'
#' User defined parallelization:  
#' This function uses \code{parApply} from \code{parallel} package to fit 
#'  models to SNPs in parallel. 
#' User is not required to set any options for the parallelization. 
#' However, advanced users who wish to optimize it, can provide a 
#'  cluster object generated by \code{makeCluster} family of functions that 
#'  suits their need and platform.
#'
#' @return
#' Saves two text files directly to disk:  
#' \code{.coxph} extension containing CoxPH survival analysis results.  
#' \code{.snps_removed} extension containing SNPs that were removed
#'  due to low variance or user-defined thresholds.   
#' 
#' @examples
#' gdsfile <- system.file(package="gwasurvivr",
#'                        "extdata",
#'                        "gds_example.gds")
#' covariate.file <- system.file(package="gwasurvivr", 
#'                               "extdata",
#'                               "simulated_pheno.txt")
#' covariate.file <- read.table(covariate.file,
#'                              sep=" ",
#'                              header=TRUE,
#'                              stringsAsFactors = FALSE)
#' covariate.file$SexFemale <- ifelse(covariate.file$sex=="female", 1L, 0L)
#' sample.ids <- covariate.file[covariate.file$group=="experimental",]$ID_2
#' gdsCoxSurv(gdsfile=gdsfile,
#'            covariate.file=covariate.file,
#'            id.column="ID_2",
#'            sample.ids=sample.ids,
#'            time.to.event="time",
#'            event="event",
#'            covariates=c("age", "SexFemale", "DrugTxYes"),
#'            inter.term=NULL,
#'            print.covs="only",
#'            out.file="impute_example",
#'            chunk.size=50,
#'            maf.filter=0.005,
#'            flip.dosage=TRUE,
#'            verbose=TRUE,
#'            clusterObj=NULL)  
#'  
#' @importFrom survival Surv coxph.fit
#' @importFrom matrixStats rowMeans2 rowVars rowSds
#' @importFrom SummarizedExperiment rowRanges
#' @importFrom utils write.table
#' @importFrom stats pnorm rnorm setNames complete.cases
#' @import parallel
#' @import GWASTools
#'  
#' @export

gdsCoxSurv <- function(gdsfile,
                       covariate.file,
                       id.column,
                       sample.ids = NULL,
                       time.to.event,
                       event,
                       covariates,
                       inter.term = NULL,
                       print.covs = "only",
                       out.file,
                       chunk.size = 5000,
                       maf.filter = 0.05,
                       exclude.snps=NULL,
                       flip.dosage = TRUE,
                       verbose = TRUE,
                       clusterObj = NULL)
{
    
    coxSurv(createGdsCoxSurv(gdsfile,
                                 covariate.file,
                                 id.column,
                                 sample.ids,
                                 time.to.event,
                                 event,
                                 covariates,
                                 inter.term,
                                 print.covs,
                                 out.file,
                                 chunk.size,
                                 maf.filter,
                                 exclude.snps,
                                 flip.dosage,
                                 verbose,
                                 clusterObj))

}
suchestoncampbelllab/survivR documentation built on March 10, 2021, 1:21 p.m.