R/plinkCoxSurv.R

Defines functions plinkCoxSurv

Documented in plinkCoxSurv

#' Fit Cox survival to all variants from PLINK binary files (.BED, .BIM, .FAM)
#'
#' Performs survival analysis using Cox proportional hazard models on directly 
#' typed data in PLINK format
#'
#' @param bed.file character of name of plink files without extension 
#' @param covariate.file data.frame comprising phenotype information, all 
#'  covariates to be added in the model must be numeric.
#' @param id.column character giving the name of the ID column
#'  in covariate.file.
#' @param sample.ids character vector of sample IDs to keep in
#'  survival analysis
#' @param time.to.event character of column name in covariate.file that
#'  represents the time interval of interest in the analysis
#' @param event character of column name in covariate.file that represents
#'  the event of interest to be included in the analysis
#' @param covariates character vector with exact names of columns in
#'  covariate.file to include in analysis
#' @param inter.term character string giving the column name of the covariate
#'  that will be added to the interaction term with
#'  SNP (e.g. \code{term*SNP}). See details.
#' @param print.covs character string of either \code{"only"}, \code{"all"}
#'  or \code{"some"}, defining which covariate statistics should be printed to
#'  the output. See details.
#' @param out.file character of output file name (do not include extension) 
#' @param chunk.size integer number of variants to process per thread
#' @param maf.filter numeric to filter minor allele frequency
#'  (i.e. choosing 0.05 means filtering MAF>0.05). User can set this to 
#' \code{NULL} if no filtering is preffered. Default is 0.05.
#' @param flip.dosage logical to flip which allele the dosage was
#'  calculated on, default \code{flip.dosage=TRUE}
#' @param verbose logical for messages that describe which part of the
#'  analysis is currently being run
#' @param clusterObj A cluster object that can be used with the
#'  \code{parApply} function. See details.
#' 
#' @details 
#' Testing for SNP-covariate interactions:          
#' User can define the column name of the covariate that will be included in
#'  the interaction term. 
#' For example, for given covariates \code{a} and \code{b}, where \code{c} is
#'  defined as the \code{inter.term} the model will be:
#'  \code{~ a + b + c + SNP + c*SNP}.
#' 
#' Printing results of other covariates:       
#' \code{print.covs} argument controls the number of covariates will be printed
#'  as output. The function is set to \code{only}
#'  by default and will only print the SNP or if an interaction term is given, 
#'  the results of the interaction 
#'  term (e.g. \code{SNP*covariate}). Whereas,  \code{all} will print results
#'  (coef, se.coef, p.value etc) of all covariates included in the model.
#'  \code{some} is only applicable if an interaction term is given and will 
#'  print the results for SNP, covariate tested for interaction and the
#'  interaction term. User should be mindful about using the \code{all} option,
#'  as it will likely slow down the analysis and will increase
#'  the output file size. 
#'
#' User defined parallelization:  
#' This function uses \code{parApply} from \code{parallel} package to fit 
#'  models to SNPs in parallel. 
#' User is not required to set any options for the parallelization. 
#' However, advanced users who wish to optimize it, can provide a 
#'  cluster object generated by \code{makeCluster} family of functions that 
#'  suits their need and platform.
#'
#' @return
#' Saves two text files directly to disk:  
#' \code{.coxph} extension containing CoxPH survival analysis results.  
#' \code{.snps_removed} extension containing SNPs that were removed
#'  due to low variance or user-defined thresholds.   
#' 
#' @examples
#' bed.file <- system.file(package="gwasurvivr",
#'                        "extdata",
#'                        "plink_example.bed")
#' covariate.file <- system.file(package="gwasurvivr", 
#'                               "extdata",
#'                               "simulated_pheno.txt")
#' covariate.file <- read.table(covariate.file,
#'                              sep=" ",
#'                              header=TRUE,
#'                              stringsAsFactors = FALSE)
#' covariate.file$SexFemale <- ifelse(covariate.file$sex=="female", 1L, 0L)
#' sample.ids <- covariate.file[covariate.file$group=="experimental",]$ID_2
#' plinkCoxSurv(bed.file=bed.file,
#'              covariate.file=covariate.file,
#'              id.column="ID_2",
#'              sample.ids=sample.ids,
#'              time.to.event="time",
#'              event="event",
#'              covariates=c("age", "SexFemale", "DrugTxYes"),
#'              inter.term=NULL,
#'              print.covs="only",
#'              out.file="impute_example",
#'              chunk.size=50,
#'              maf.filter=0.005,
#'              flip.dosage=TRUE,
#'              verbose=TRUE,
#'              clusterObj=NULL)  
#'  
#' @importFrom survival Surv coxph.fit
#' @importFrom matrixStats rowMeans2 rowVars rowSds
#' @importFrom SummarizedExperiment rowRanges
#' @importFrom utils write.table
#' @importFrom stats pnorm rnorm setNames complete.cases
#' @importFrom SNPRelate snpgdsBED2GDS
#' @import parallel
#' @import GWASTools
#'  
#' @export


plinkCoxSurv <- function(bed.file,
                         covariate.file,
                         id.column,
                         sample.ids=NULL, 
                         time.to.event, 
                         event,
                         covariates,
                         inter.term=NULL,
                         print.covs="only",
                         out.file,
                         chunk.size=10000,
                         maf.filter=0.005,
                         exclude.snps=NULL,
                         flip.dosage=TRUE,
                         verbose=TRUE,
                         clusterObj=NULL,
                         keepGDS = FALSE)
{
  
  coxSurv(createPlinkCoxSurv(bed.file,
                             covariate.file,
                             id.column,
                             sample.ids, 
                             time.to.event, 
                             event,
                             covariates,
                             inter.term,
                             print.covs,
                             out.file,
                             chunk.size,
                             maf.filter,
                             exclude.snps,
                             flip.dosage,
                             verbose,
                             clusterObj,
                             keepGDS))

} 
suchestoncampbelllab/survivR documentation built on March 10, 2021, 1:21 p.m.