BiocStyle::markdown()
knitr::opts_chunk$set(echo = TRUE, cache=FALSE, eval = TRUE) library(knitr)
gwasurvivr
can be used to perform survival analyses of imputed genotypes from Sanger and Michigan imputation servers and IMPUTE2 software. This vignette is a tutorial on how to perform these analyses. This package can be run locally on a Linux, Mac OS X, Windows or conveniently batched on a high performing computing cluster. gwasurvivr
iteratively processes the data in chunks and therefore intense memory requirements are not necessary.
gwasurvivr
package comes with three main functions to perform survival analyses using Cox proportional hazard (Cox PH) models depending on the imputation method used to generate the genotype data:
michiganCoxSurv
: Performs survival analysis on imputed genetic data stored in compressed VCF files generated via Michigan imputation server. sangerCoxSurv
: Performs survival analysis on imputed genetic data stored in compressed VCF files generated via Sanger imputation server. impute2CoxSurv
: Performs survival analysis on imputed genetic data from IMPUTE2 output.plinkCoxSurv
: Performs survival analysis on directly typed genetic data from PLINK files (.BED, .BIM, and .FAM) gdsCoxSurv
: Performs survival analysis on genetic data that user has already converted from IMPUTE2 format to GDS format. All functions fit a Cox PH model to each SNP including other user defined covariates and will save the results as a text file directly to the disk that contains survival analysis results. gwasurvivr
functions can also test for interaction of SNPs with a given covariate. See examples for further details.
All three functions require the following main arguments:
vcf.file
: A character string giving the path to genotype data file (impute.file
for IMPUTE2; bed.file
for PLINK)covariate.file
: A data frame comprising sample IDs (that link to the genotype data samples), phenotype (time, event) and additional covariate dataid.column
: A character string providing sample ID column name in covariate.filetime.to.event
: A character string that contains the time column name in covariate.fileevent
: Character string that contains the event column name in covariate.filecovariates
: Character vector with contains column names in covariate.file of covariates to include in modelout.file
: A character string giving output file nameFurther arguments can be passed depending on the user preference. For example, user can define minor allele frequency (MAF) or info/R2 score threshold to filter out SNPs that have low MAF or info/R2 score to avoid false-positive signals and to reduce computing time. User can also define a subset of samples to be analyzed by providing the set of sample IDs. Users can also control how chunk size -- the number of rows (SNPs) to include for each iteration.
IMPORTANT: In the covariate.file
, categorical variables need to be converted to indicator (dummy) variables and be of class numeric
. Ordinal variables represented as characters, ie "low", "medium" and "high" should be converted to the appropriate numeric values as well. For VCF files -- an index (.tbi or .csi) file must be accompanied with the vcf file, with the exact same name (plus the extension) and in the same directory. Index files can alternatively be created using tabix/bcftools.
The output for the 3 main functions in gwasurvivr
are very similar but with subtle differences. In general the output includes the following main fields: RSID, TYPED, CHR, POS, REF, ALT, Allele Frequencies*, INFO*, PVALUES, HRs, HR confidence intervals, coefficient estimates, standard errors, Z-statistic, N, and NEVENT. Allele frequencies and INFO differ by the input imputation software.
Note: Invoking the inter.term
argument for any of the functions will make the PVALUE and HRs and HR confidence intervals represent the INTERACTION term and not the SNP alone.
The non-software specific fields are summarized below:
cols <- c("RSID", "CHR", "POS", "REF", "ALT", "SAMP_FREQ_ALT", "SAMP_MAF", "PVALUE", "HR", "HR_lowerCI", "HR_upperCI", "COEF", "SE.COEF", "Z", "N", "NEVENT") desc <- c("SNP ID", "Chromosome number", "Genomic Position (BP)", "Reference Allele", "Alternate Allele", "Alternate Allele frequency in sample being tested", "Minor allele frequency in sample being tested", "P-value of single SNP or interaction term", "Hazard Ratio (HR)", "Lower bound 95% CI of HR", "Upper bound 95% CI of HR", "Estimated coefficient of SNP", "Standard error of coefficient estimate", "Z-statistic", "Number of individuals in sample being tested", "Number of events that occurred in sample being tested") df <- cbind(cols, desc) colnames(df) <- c("Column", "Description") kable(df)
The software specific fields are summarized below:
1. michiganCoxSurv
unique output columns are AF, MAF, R2, ER2. They are summarized below.
cols <- c("TYPED", "AF", "MAF", "R2", "ER2") desc <- c("Imputation status: TRUE (SNP IS TYPED)/FALSE (SNP IS IMPUTED)", "Minimac3 output Alternate Allele Frequency", "Minimac3 output of Minor Allele Frequency", "Imputation R2 score (minimac3 $R^2$)", "Minimac3 ouput empirical $R^2$") df <- cbind(cols, desc) colnames(df) <- c("Column", "Description") kable(df)
Please see Minimac3 Info File for details on output
sangerCoxSurv
cols <- c("TYPED", "RefPanelAF", "INFO") desc <- c("Imputation status: TRUE (SNP IS TYPED)/FALSE (SNP IS IMPUTED)", "HRC Reference Panel Allele Frequency", "Imputation INFO score from PBWT") df <- cbind(cols, desc) colnames(df) <- c("Column", "Description") kable(df)
impute2CoxSurv
cols <- c("TYPED", "A0", "A1", "exp_freq_A1") desc <- c("`---` is imputed, repeated RSID is typed", "Allele coded 0 in IMPUTE2", "Allele coded 1 in IMPUTE2", "Expected allele frequency of alelle code A1") df <- cbind(cols, desc) colnames(df) <- c("Column", "Description") kable(df)
More statistics can be printed out by invoking the print.covs
argument and setting it to print.covs=all
(single SNP/SNP*covariate interaction) or print.covs=some
(SNP*covariate ineraction). These options are available mainly for modeling purposes (covariate selection) and aren't suggested for very large analyses as it will greatly increase the number of columns in the output, depending on how many covariates users are adjusting for.
Install gwasurvivr
from the Sucheston-Campbell Lab Github repository using devtools
.
devtools::install_github("suchestoncampbelllab/gwasurvivr")
Or please install from the devel
branch of Bioconductor (R version >= 3.5)
if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") BiocManager::install("gwasurvivr", version = "devel")
Note: This package depends on GWASTools
which uses netcdf framework on linux. Therefore, for linux users, please install libnetcdf-dev
and netcdf-bin
before installing gwasurvivr
. These linux libraries may already installed on an academic computing cluster.
CRAN packages:
1. ncdf4
2. matrixStats
3. parallel
4. survival
install.packages(c("ncdf4", "matrixStats", "parallel", "survival"))
Bioconductor packages:
1. GWASTools
2. VariantAnnotation
3. SummarizedExperiment
4. SNPRelate
BiocManager::install("GWASTools") BiocManager::install("VariantAnnotation") BiocManager::install("SummarizedExperiment") BiocManager::install("SNPRelate")
Load gwasurvivr
.
library(gwasurvivr)
gwasurvivr
uses parallel
package for its internal parallelization to fit the Cox PH models. Users are not required to define a parallelization setup, by default gwasurvivr
functions will detect the user's operating system and set the cluster object to FORK
if the platform is Linux/OS X and to SOCK
if Windows. However, parallelization settings can be modified by the user if needed. Users are given two ways to define their cluster settings:
1. Setting the number of cores to be used:
Linux/OS X users can run analyses on a prespecified number of cores by setting the option in the R session as shown below. This option should be defined in the R session before running any of the gwasurvivr
functions. Here we decide to use 4 cores. This option is not available to Windows users.
options("gwasurvivr.cores"=4)
2. Providing a user defined cluster object
To modify more settings, users can also provide a "cluster object" to any of the gwasurvivr
functions. The cluster object can be generated via makeCluster
, makePSOCKcluster
, makeForkCluster
functions from parallel
package or similar cluster object functions from snow
or snowfall
packages. This method can be applied on any operating system. User can create a cluster object before running any of the functions and pass the cluster object to the clusterObj
argument as shown below. For further details see ??parallel::parallel
.
library(parallel) cl <- makeCluster(detectCores()) impute2CoxSurv(..., clusterObj=cl) sangerCoxSurv(..., clusterObj=cl) michiganCoxSurv(..., clusterObj=cl)
Michigan Imputation Server pre-phases typed genotypes using HAPI-UR, SHAPEIT, or EAGLE (default is EAGLE2), imputes using Minimac3 imputation engine and outputs Blocked GNU Zip Format VCF files (.vcf.gz
). These .vcf.gz
files are used as input for gwasurvivr
. Minimac uses slightly different metrics to assess imputation quality ($R^2$ versus info score) and complete details as to minimac output are available on the Minimac3 Wikipage.
The function, michiganCoxSurv
uses a modification of Cox proportional hazard regression from the R library survival:::coxph.fit
. Built specifically for genetic data, michiganCoxSurv
allows the user to filter on info ($R^2$) score (imputation quality metric) and minor allele frequency from the reference panel used for imputation using RefPanelAF
as the input arguement for maf.filter
. Users are also provided with the sample minor allele frequency (MAF) in the sangerCoxSurv
output, which can be used for filtering post analysis.
Samples can be selected for analyses by providing a vector of sample.ids
. The output from Sanger imputation server returns the samples as SAMP1, ..., SAMPN
, where N
is the total number of samples. The sample order corresponds to the sample order in the vcf.file
used for imputation. Note, sample order can also be found in the .fam
file if genotyping data were initially in .bed
, .bim
and .fam
(PLINK) format prior to conversion to VCF. If no sample list is specified all samples are included in the analyses.
vcf.file <- system.file(package="gwasurvivr", "extdata", "michigan.chr14.dose.vcf.gz") pheno.fl <- system.file(package="gwasurvivr", "extdata", "simulated_pheno.txt") pheno.file <- read.table(pheno.fl, sep=" ", header=TRUE, stringsAsFactors = FALSE) head(pheno.file) # recode sex column and remove first column pheno.file$SexFemale <- ifelse(pheno.file$sex=="female", 1L, 0L) # select only experimental group sample.ids sample.ids <- pheno.file[pheno.file$group=="experimental",]$ID_2 head(sample.ids)
In this example, we will select samples from the experimental
group and will test survival only on these patients. The first column in the pheno.file
are sample IDs, which link the phenotype file to the imputation file. We include age
, DrugTxYes
, and sex
in the survival model as covariates.
We perform the analysis using the experimental
group to demonstrate how one may want to prepare their data if interested in testing only on a subset of samples (i.e. a case-control study and survival of cases is of interest). Note that how the IDs (sample.ids
) need to be a vector of class character
. The chunk.size
refers to size of each data chunk read in and is defaulted to 10,000 rows. Users can customize that to their needs. The larger the chunk.size
the more memory (RAM) required to run the analysis. The recommended chunk.size=10000
and probably should not exceed chunk.size=100000
. This does not mean that gwasurvivr
is limited to only 100,000 SNPs, it just is how many SNPs are analyzed for each iteration.
By default survival estimates and pvalues for the SNP adjusted for other covariates are outputted (print.covs='only'
), however users can select print.covs=all
to get the coefficient estimates for covariates included in the model. Depending on the number of covariates included this can add substantially to output file size.
Next we run michiganCoxSurv
with the default, print.covs="only"
, load the results into R and provide descriptions of output by column. We will then run the analysis again using print.covs="all"
. verbose=TRUE
is used for these examples so the function display messages while running.
Use ?michiganCoxSurv
for argument specific documentation.
print.covs="only"
michiganCoxSurv(vcf.file=vcf.file, covariate.file=pheno.file, id.column="ID_2", sample.ids=sample.ids, time.to.event="time", event="event", covariates=c("age", "SexFemale", "DrugTxYes"), inter.term=NULL, print.covs="only", out.file="michigan_only", r2.filter=0.3, maf.filter=0.005, chunk.size=100, verbose=TRUE, clusterObj=NULL)
michiganCoxSurv(vcf.file=vcf.file, covariate.file=pheno.file, id.column="ID_2", sample.ids=sample.ids, time.to.event="time", event="event", covariates=c("age", "SexFemale", "DrugTxYes"), inter.term=NULL, print.covs="only", out.file=tempfile("michigan_only"), r2.filter=0.3, maf.filter=0.005, chunk.size=100, verbose=TRUE, clusterObj=NULL)
Here we load the data and glimpse the first few values in each column outputted from the SNP*interaction survival analyses using print.covs="only"
.
michigan_only <- read.table("michigan_only.coxph", sep="\t", header=TRUE, stringsAsFactors = FALSE)
michigan_only <- read.table(dir(tempdir(), pattern="^michigan_only.*\\.coxph$", full.names = TRUE), sep="\t", header=TRUE, stringsAsFactors = FALSE)
str(head(michigan_only))
A SNP*covariate interaction can be implemented using the inter.term
argument. In this example, we will use DrugTxYes
from the covariate file as the covariate we want to test for interaction with the SNP.
print.covs="only"
michiganCoxSurv(vcf.file=vcf.file, covariate.file=pheno.file, id.column="ID_2", sample.ids=sample.ids, time.to.event="time", event="event", covariates=c("age", "SexFemale", "DrugTxYes"), inter.term="DrugTxYes", print.covs="only", out.file="michigan_intx_only", r2.filter=0.3, maf.filter=0.005, chunk.size=100, verbose=FALSE, clusterObj=NULL)
michiganCoxSurv(vcf.file=vcf.file, covariate.file=pheno.file, id.column="ID_2", sample.ids=sample.ids, time.to.event="time", event="event", covariates=c("age", "SexFemale", "DrugTxYes"), inter.term="DrugTxYes", print.covs="only", out.file=tempfile("michigan_intx_only"), r2.filter=0.3, maf.filter=0.005, chunk.size=100, verbose=FALSE, clusterObj=NULL)
Here we load the data and glimpse the first few values in each column outputted from the SNP*interaction survival analyses using print.covs="only"
.
michigan_intx_only <- read.table("michigan_intx_only.coxph", sep="\t", header=TRUE, stringsAsFactors = FALSE)
michigan_intx_only <- read.table(dir(tempdir(), pattern="^michigan_intx_only.*\\.coxph$", full.names = TRUE), sep="\t", header=TRUE, stringsAsFactors = FALSE)
str(head(michigan_intx_only))
Sanger Imputation Server pre-phases typed genotypes using either SHAPEIT or EAGLE, imputes genotypes using PBWT algorithm and outputs a .vcf.gz
file for each chromosome. These .vcf.gz
files are used as input for gwasurvivr
. The function, sangerCoxSurv
uses a modification of Cox proportional hazard regression from survival::coxph.fit
. Built specifically for genetic data, sangerCoxSurv
allows the user to filter on info score (imputation quality metric) and minor allele frequency from the reference panel used for imputation using RefPanelAF
as the input arguement for maf.filter
. Users are also provided with the sample minor allele frequency in the sangerCoxSurv
output.
Samples can be selected for analyses by providing a vector of sample.ids
. The output from Sanger imputation server returns the samples as SAMP1, ..., SAMPN
, where N
is the total number of samples. The sample order corresponds to the sample order in the VCF file used for imputation. Note, sample order can also be found in the .fam
file if genotyping data were initially in .bed
, .bim
and .fam
(PLINK) format prior to conversion to VCF. If no sample list is specified, all samples are included in the analyses.
In this example, we will select samples from the experimental
group and will test survival only on these patients. The first column in the pheno.file
are sample IDs (we will match on these). We include age
, DrugTxYes
, and sex
in the survival model as covariates.
vcf.file <- system.file(package="gwasurvivr", "extdata", "sanger.pbwt_reference_impute.vcf.gz") pheno.fl <- system.file(package="gwasurvivr", "extdata", "simulated_pheno.txt") pheno.file <- read.table(pheno.fl, sep=" ", header=TRUE, stringsAsFactors = FALSE) head(pheno.file) # recode sex column and remove first column pheno.file$SexFemale <- ifelse(pheno.file$sex=="female", 1L, 0L) # select only experimental group sample.ids sample.ids <- pheno.file[pheno.file$group=="experimental",]$ID_2 head(sample.ids)
We perform the analysis using the experimental
group to demonstrate how one may want to prepare their data if not initially all samples are patients or cases (i.e. a case-control study and survival of cases is of interest). We also are showing how the IDs (sample.ids
) need to be a vector of class character
. The chunk.size
refers to size of each data chunk read in and is defaulted to 10,000 rows, users can customize that to their needs. The larger the chunk.size
the more memory (RAM) required to run the analysis. The recommended chunk.size=10000
and probably should not exceed chunk.size=100000
. This does not mean that gwasurvivr
is limited to only 100,000 SNPs, it just is how many SNPs are analyzed for each iteration.
By default survival estimates and pvalues for the SNP adjusted for other covariates are outputted (print.covs='only'
), however users can select print.covs=all
to get the coefficient estimates for covariates included in the model. Depending on the number of covariates included this can add substantially to output file size.
Use ?sangerCoxSurv
for argument specific documentation.
Next we run sangerCoxSurv
with the default, print.covs="only"
, load the results into R and provide descriptions of output by column. verbose=TRUE
is used for these examples so the function display messages while running.
print.covs="only"
sangerCoxSurv(vcf.file=vcf.file, covariate.file=pheno.file, id.column="ID_2", sample.ids=sample.ids, time.to.event="time", event="event", covariates=c("age", "SexFemale", "DrugTxYes"), inter.term=NULL, print.covs="only", out.file="sanger_only", info.filter=0.3, maf.filter=0.005, chunk.size=100, verbose=TRUE, clusterObj=NULL)
sangerCoxSurv(vcf.file=vcf.file, covariate.file=pheno.file, id.column="ID_2", sample.ids=sample.ids, time.to.event="time", event="event", covariates=c("age", "SexFemale", "DrugTxYes"), inter.term=NULL, print.covs="only", out.file=tempfile("sanger_only"), info.filter=0.3, maf.filter=0.005, chunk.size=100, verbose=TRUE, clusterObj=NULL)
Here we load the data and glimpse the first few values in each column from the survival analyses.
sanger_only <- read.table(dir(tempdir(), pattern="^sanger_only.*\\.coxph$", full.names = TRUE), sep="\t", header=TRUE, stringsAsFactors = FALSE)
str(head(sanger_only))
Column names with descriptions from the survival analyses with covariates, specifying the default print.covs="only"
A SNP*covariate interaction can be implemented using the inter.term
argument. In this example, we will use DrugTxYes
from the covariate file as the covariate we want to test for interaction with the SNP.
print.covs="only"
sangerCoxSurv(vcf.file=vcf.file, covariate.file=pheno.file, id.column="ID_2", sample.ids=sample.ids, time.to.event="time", event="event", covariates=c("age", "SexFemale", "DrugTxYes"), inter.term="DrugTxYes", print.covs="only", out.file="sanger_intx_only", info.filter=0.3, maf.filter=0.005, chunk.size=100, verbose=TRUE, clusterObj=NULL)
sangerCoxSurv(vcf.file=vcf.file, covariate.file=pheno.file, id.column="ID_2", sample.ids=sample.ids, time.to.event="time", event="event", covariates=c("age", "SexFemale", "DrugTxYes"), inter.term="DrugTxYes", print.covs="only", out.file=tempfile("sanger_intx_only"), info.filter=0.3, maf.filter=0.005, chunk.size=100, verbose=TRUE, clusterObj=NULL)
IMPUTE2 is a genotype imputation and haplotype phasing program (Howie et al 2009). IMPUTE2 outputs 6 files for each chromosome chunk imputed (usually 5 MB in size). Only 2 of these files are required for analyses using gwasurvivr
:
.impute
) .sample
) More information can be read about these formats
We are going to load in and pre-process the genetic data and the phenotypic data (covariate.file
).
impute.file <- system.file(package="gwasurvivr", "extdata", "impute_example.impute2.gz") sample.file <- system.file(package="gwasurvivr", "extdata", "impute_example.impute2_sample") covariate.file <- system.file(package="gwasurvivr", "extdata", "simulated_pheno.txt") covariate.file <- read.table(covariate.file, sep=" ", header=TRUE, stringsAsFactors = FALSE) covariate.file$SexFemale <- ifelse(covariate.file$sex=="female", 1L, 0L) sample.ids <- covariate.file[covariate.file$group=="experimental",]$ID_2
To perform survival analysis using IMPUTE2 the function arguments are very similarto michiganCoxSurv
and sangerCoxSurv
, however the function now takes a chromosome arguement. This is needed to properly annotate the file output with the chromosome that these SNPs are in. This is purely an artifact of IMPUTE2 and how we leverage GWASTools
in this function.
First we will do the analysis with no interaction term, followed by doing the analysis with the interaction term. The recommended output setting for single SNP analysis is print.cov="only"
.
impute2CoxSurv(impute.file=impute.file, sample.file=sample.file, chr=14, covariate.file=covariate.file, id.column="ID_2", sample.ids=sample.ids, time.to.event="time", event="event", covariates=c("age", "SexFemale", "DrugTxYes"), inter.term=NULL, print.covs="only", out.file="impute_example_only", chunk.size=100, maf.filter=0.005, exclude.snps=NULL, flip.dosage=TRUE, verbose=TRUE, clusterObj=NULL, keepGDS=FALSE)
impute2CoxSurv(impute.file=impute.file, sample.file=sample.file, chr=14, covariate.file=covariate.file, id.column="ID_2", sample.ids=sample.ids, time.to.event="time", event="event", covariates=c("age", "SexFemale", "DrugTxYes"), inter.term=NULL, print.covs="only", out.file=tempfile("impute_example_only"), chunk.size=100, maf.filter=0.005, exclude.snps=NULL, flip.dosage=TRUE, verbose=TRUE, clusterObj=NULL, keepGDS=FALSE)
Here we load the data and glimpse the first few values in each column output.
impute2_res_only <- read.table("impute_example_only.coxph", sep="\t", header=TRUE, stringsAsFactors = FALSE) str(head(impute2_res_only))
impute2_res_only <- read.table(dir(tempdir(), pattern="^impute_example_only.*\\.coxph$", full.names=TRUE), sep="\t", header=TRUE, stringsAsFactors = FALSE) str(head(impute2_res_only))
Now we will perform a SNP*covariate interaction survival analysis using impute2CoxSurv
.
impute2CoxSurv(impute.file=impute.file, sample.file=sample.file, chr=14, covariate.file=covariate.file, id.column="ID_2", sample.ids=sample.ids, time.to.event="time", event="event", covariates=c("age", "SexFemale", "DrugTxYes"), inter.term="DrugTxYes", print.covs="only", out.file="impute_example_intx", chunk.size=100, maf.filter=0.005, exclude.snps=NULL, flip.dosage=TRUE, verbose=FALSE, clusterObj=NULL, keepGDS=FALSE)
impute2CoxSurv(impute.file=impute.file, sample.file=sample.file, chr=14, covariate.file=covariate.file, id.column="ID_2", sample.ids=sample.ids, time.to.event="time", event="event", covariates=c("age", "SexFemale", "DrugTxYes"), inter.term="DrugTxYes", print.covs="only", out.file=tempfile("impute_example_intx"), chunk.size=100, maf.filter=0.005, exclude.snps=NULL, flip.dosage=TRUE, verbose=FALSE, clusterObj=NULL, keepGDS=FALSE)
Here we load the data and glimpse the first few values in each column outputted from the SNP*interaction survival analyses using print.covs="only"
.
impute2_res_intx <- read.table("impute_example_intx.coxph", sep="\t", header=TRUE, stringsAsFactors = FALSE) str(head(impute2_res_intx))
impute2_res_intx <- read.table(dir(tempdir(), pattern="^impute_example_intx.*\\.coxph$", full.names=TRUE), sep="\t", header=TRUE, stringsAsFactors = FALSE) str(head(impute2_res_intx))
Batch jobs for multiple analyses and different subsets are easy to implement using gwasurvivr
. These types of analyses should be reserved for usage on a UNIX-based high performance computing cluster. This is facilitated by the package batch
, which can internalize R variables from bash. First write an R script (e.g. mysurvivalscript.R
) to pass in bash.
Note: it is important to refer to the Getting Started part of the manual for packages that need to be installed for this
## mysurvivalscript.R library(gwasurvivr) library(batch) parseCommandArgs(evaluate=TRUE) # this is loaded in the batch package options("gwasurvivr.cores"=4) vcf.file <- system.file(package="gwasurvivr", "extdata", vcf.file) pheno.fl <- system.file(package="gwasurvivr", "extdata", pheno.file) pheno.file <- read.table(pheno.fl, sep=" ", header=TRUE, stringsAsFactors = FALSE) # recode sex column and remove first column pheno.file$SexFemale <- ifelse(pheno.file$sex=="female", 1L, 0L) # select only experimental group sample.ids sample.ids <- pheno.file[pheno.file$group=="experimental",]$ID_2 ## -- unlist the covariates ## (refer below to the shell script as to why we are doing this) covariates <- unlist(sapply(covariates, strsplit, "_", 1, "[[")) sangerCoxSurv(vcf.file=vcf.file, covariate.file=pheno.file, id.column="ID_2", sample.ids=sample.ids, time.to.event=time, event=event, covariates=covariates, inter.term=NULL, print.covs="only", out.file=out.file, info.filter=0.3, maf.filter=0.005, chunk.size=100, verbose=TRUE, clusterObj=NULL)
Now we can run a shell script. This can be used well with manifest files to set up multiple runs with different outcomes and different subsets of samples. We define a manifest file has columns that corresond to the functions that the user wants to pass and each row is a separate analysis that a user may want to run. The covariates are separated by an underscore ("_"
). This is so it can be passed properly, and also why we used str_split
to split the covariates.
#!/bin/bash DIRECTORY=/path/to/dir/impute_chr module load R R --script ${DIRECTORY}/survival/code/mysurvivalscript.R -q --args \ vcf.file ${DIRECTORY}/sanger.pbwt_reference_impute.vcf.gz \ pheno.file ${DIRECTORY}/phenotype_data/simulated_pheno.txt \ covariates DrugTxYes_age_SexFemale\ time.to.event time \ event event \ out.file ${DIRECTORY}/survival/results/sanger_example_output
The file paths above are completely arbitrary and were just used as an example of how file structure may be and where desirable output would be stored.
Exactly the same as for sangerCoxSurv
but this time with the input arguments for impute2CoxSurv
. See ?impute2CoxSurv
for help
Exactly the same as for sangerCoxSurv
but this time with the input arguments for michiganCoxSurv
. See ?michiganCoxSurv
for help
Here is the output of sessionInfo()
on the system that this document was compiled:
sessionInfo()
Terry M. Therneau and Patricia M. Grambsch (2000). Modeling Survival Data: Extending the Cox Model. Springer, New York. ISBN 0-387-98784-3.
Martin Morgan, Valerie Obenchain, Jim Hester and Hervé Pagès (2017). SummarizedExperiment: SummarizedExperiment container. R package version 1.6.3.
Gogarten SM, Bhangale T, Conomos MP, Laurie CA, McHugh CP, Painter I, Zheng X, Crosslin DR, Levine D, Lumley T, Nelson SC, Rice K, Shen J, Swarnkar R, Weir BS and Laurie CC (2012). “GWASTools: an R/Bioconductor package for quality control and analysis of genome-wide association studies.” Bioinformatics, 28(24), pp. 3329-3331. doi: 10.1093/bioinformatics/bts610.
B. N. Howie, P. Donnelly and J. Marchini (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genetics 5(6): e1000529
Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, Vrieze S, Chew EY, Levy S, McGue M, Schlessinger D, Stambolian D, Loh PR, Iacono WG, Swaroop A, Scott LJ, Cucca F, Kronenberg F, Boehnke M, Abecasis GR, Fuchsberger C. Next-generation genotype imputation service and methods. Nature Genetics 48, 1284–1287 (2016). 27571263
Efficient haplotype matching and storage using the Positional Burrows-Wheeler Transform (PBWT)", Richard Durbin Bioinformatics 30:1266-72 (2014).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.