R/impute2CoxSurv.R

Defines functions impute2CoxSurv

Documented in impute2CoxSurv

#' Fit Cox survival to all variants from a standard IMPUTE2 output after
#' genotype imputation
#'
#' Performs survival analysis using Cox proportional hazard models on imputed
#' genetic data from IMPUTE2 output
#'
#' @param impute.file character of IMPUTE2 file
#' @param sample.file character of sample file affiliated with IMPUTE2 file
#' @param chr numeric denoting chromosome number
#' @param covariate.file data.frame comprising phenotype information, all 
#'  covariates to be added in the model must be numeric.
#' @param id.column character giving the name of the ID column
#'  in covariate.file.
#' @param sample.ids character vector of sample IDs to keep in
#'  survival analysis
#' @param time.to.event character of column name in covariate.file that
#'  represents the time interval of interest in the analysis
#' @param event character of column name in covariate.file that represents
#'  the event of interest to be included in the analysis
#' @param covariates character vector with exact names of columns in
#'  covariate.file to include in analysis
#' @param inter.term character string giving the column name of the covariate
#'  that will be added to the interaction term with
#'  SNP (e.g. \code{term*SNP}). See details.
#' @param print.covs character string of either \code{"only"}, \code{"all"}
#'  or \code{"some"}, defining which covariate statistics should be printed to
#'  the output. See details.
#' @param out.file character of output file name (do not include extension) 
#' @param chunk.size integer number of variants to process per thread
#' @param exclude.snps a character vector listing the rsIDs of SNPs that will be excluded from analyses
#' @param maf.filter numeric to filter minor allele frequency
#'  (i.e. choosing 0.05 means filtering MAF>0.05). User can set this to 
#' \code{NULL} if no filtering is preffered. Default is 0.05.
#' @param flip.dosage logical to flip which allele the dosage was
#'  calculated on, default \code{flip.dosage=TRUE}
#' @param verbose logical for messages that describe which part of the
#'  analysis is currently being run
#' @param clusterObj A cluster object that can be used with the
#'  \code{parApply} function. See details.
#' @param keepGDS logical to keep GDS files (compressed IMPUTE2 files) 
#' after the analysis. Defaults to \code{FALSE}.
#' 
#' @details 
#' Testing for SNP-covariate interactions:          
#' User can define the column name of the covariate that will be included in
#'  the interaction term. 
#' For example, for given covariates \code{a} and \code{b}, where \code{c} is
#'  defined as the \code{inter.term} the model will be:
#'  \code{~ a + b + c + SNP + c*SNP}.
#' 
#' Printing results of other covariates:       
#' \code{print.covs} argument controls the number of covariates will be printed
#'  as output. The function is set to \code{only}
#'  by default and will only print the SNP or if an interaction term is given, 
#'  the results of the interaction 
#'  term (e.g. \code{SNP*covariate}). Whereas,  \code{all} will print results
#'  (coef, se.coef, p.value etc) of all covariates included in the model.
#'  \code{some} is only applicable if an interaction term is given and will 
#'  print the results for SNP, covariate tested for interaction and the
#'  interaction term. User should be mindful about using the \code{all} option,
#'  as it will likely slow down the analysis and will increase
#'  the output file size. 
#'
#' User defined parallelization:  
#' This function uses \code{parApply} from \code{parallel} package to fit 
#'  models to SNPs in parallel. 
#' User is not required to set any options for the parallelization. 
#' However, advanced users who wish to optimize it, can provide a 
#'  cluster object generated by \code{makeCluster} family of functions that 
#'  suits their need and platform.
#'
#' @return
#' Saves two text files directly to disk:  
#' \code{.coxph} extension containing CoxPH survival analysis results.  
#' \code{.snps_removed} extension containing SNPs that were removed
#'  due to low variance or user-defined thresholds.   
#' 
#' @examples
#' impute.file <- system.file(package="gwasurvivr",
#'                            "extdata",
#'                            "impute_example.impute2.gz")
#' sample.file <- system.file(package="gwasurvivr",
#'                            "extdata", 
#'                            "impute_example.impute2_sample")
#' covariate.file <- system.file(package="gwasurvivr", 
#'                               "extdata",
#'                               "simulated_pheno.txt")
#' covariate.file <- read.table(covariate.file,
#'                              sep=" ",
#'                              header=TRUE,
#'                              stringsAsFactors = FALSE)
#' covariate.file$SexFemale <- ifelse(covariate.file$sex=="female", 1L, 0L)
#' sample.ids <- covariate.file[covariate.file$group=="experimental",]$ID_2
#' impute2CoxSurv(impute.file=impute.file,
#'               sample.file=sample.file,
#'               chr=14,
#'               covariate.file=covariate.file,
#'               id.column="ID_2",
#'               sample.ids=sample.ids,
#'               time.to.event="time",
#'               event="event",
#'               covariates=c("age", "SexFemale", "DrugTxYes"),
#'               inter.term=NULL,
#'               print.covs="only",
#'               out.file="impute_example",
#'               chunk.size=50,
#'               maf.filter=0.005,
#'               exclude.snps=NULL,
#'               flip.dosage=TRUE,
#'               verbose=TRUE,
#'               clusterObj=NULL,
#'               keepGDS=FALSE)  
#'  
#' @importFrom survival Surv coxph.fit
#' @importFrom matrixStats rowMeans2 rowVars rowSds
#' @importFrom SummarizedExperiment rowRanges
#' @importFrom utils write.table
#' @importFrom stats pnorm rnorm setNames complete.cases
#' @import parallel
#' @import GWASTools
#'  
#' @export

impute2CoxSurv <- function(impute.file,
                           sample.file,
                           chr,
                           covariate.file,
                           id.column,
                           sample.ids=NULL,
                           time.to.event, 
                           event,
                           covariates,
                           inter.term=NULL,
                           print.covs="only",
                           out.file,
                           chunk.size=10000,
                           maf.filter=0.05,
                           exclude.snps=NULL,
                           flip.dosage=TRUE,
                           verbose=TRUE,
                           clusterObj=NULL,
                           keepGDS=FALSE
                           )
{
    
    coxSurv(createImpute2CoxSurv(impute.file,
                                 sample.file,
                                 chr,
                                 covariate.file,
                                 id.column,
                                 sample.ids,
                                 time.to.event, 
                                 event,
                                 covariates,
                                 inter.term,
                                 print.covs,
                                 out.file,
                                 chunk.size,
                                 maf.filter,
                                 exclude.snps,
                                 flip.dosage,
                                 verbose,
                                 clusterObj,
                                 keepGDS))
    
}
suchestoncampbelllab/gwasurvivr documentation built on March 18, 2021, 5:50 p.m.