library(phenopredict) ## Track time spent on making the vignette startTime <- Sys.time()
## load libraries library('devtools') install_github("leekgroup/phenopredict") # document("/users/sellis/phenopredict") library('phenopredict')
data(sysdata, package='phenopredict') #loads cm, cm_new, regiondata, pheno to run example
The first step in building the predictor is filtering the regions that should be used for prediction. This step will require an expression data set (expression
) in which your samples are in columns and your expression data are in rows. A corresponding GRanges object (regiondata
) is required. The rows of this object should correspond to those in your expression data set. Finally, you will have to provide a phenotype file (phenodata
). The rows of this phenotype file will contain the samples included in your expression data set and the columns can include any phenotype information. phenotype
specifies the phenotype upon which you want to build the predictor. covariates
specifies those covariates you would like to include in the predictor.
Note: the speed of this step depends on both the number of regions in your expression
set AND (even moreso on) the number of levels in your phenotype
being predicted.
# number of regions in expression data nrow(cm) # number of samples included in example ncol(cm) inputdata<-filter_regions(expression=cm, regiondata=regiondata ,phenodata=pheno, phenotype="Sex", covariates=NULL,type="factor", numRegions=100) # taking a look at output of filter_regions() dim(inputdata$covmat) inputdata$regiondata head(inputdata$regioninfo)
After selecting the regions for your phenotype of choice, you'll have to calculate the coefficient estimates for these regions. This will output the coefficient regions for your selected regions (rows) for each of the categories of your predictor (columns)
predictor<-build_predictor(inputdata=inputdata ,phenodata=pheno, phenotype="Sex", covariates=NULL,type="factor", numRegions=10) #number of probes used for prediction length(predictor$trainingProbes) #this contains the coefficient estimates used for prediction. # the number of rows corresponds to the number of sites used for prediction # while the columns corresponds to the number of categories of your phenotype. dim(predictor$coefEsts)
If you want to test the accuracy of your build predictor on data in which the phenotype is known, use test_predictions().
predictions_test <-test_predictor(inputdata=inputdata ,phenodata=pheno, phenotype="Sex", covariates=NULL,type="factor",predictordata=predictor) # get summary of how prediction is doing predictions_test$summarized
To calculate the predictions for your chosen phenotype in a new data set, you will have to first extract the coverage matrix for the regions from filter_regions() for your new data set. extract_data() supplies this:
# looking at the input data for extract_data dim(cm_new) test_data<-extract_data(newexpression=cm_new, newregiondata=regiondata, predictordata=predictor)
Finally, with coefficient estimates calculated and the appropriate regions selected from the test data set, you can now make your predictions!
predictions<-predict_pheno(inputdata_test= test_data, phenodata=pheno, phenotype="Sex", covariates=NULL,type="factor", predictordata = predictor) #looking at the output table(predictions)
## Time spent creating this report: diff(c(startTime, Sys.time())) ## Date this report was generated message(Sys.time()) ## Reproducibility info options(width = 120) devtools::session_info()
Code for creating the vignette
## Create the vignette library('rmarkdown') system.time(render('/users/sellis/phenopredict/vignettes/phenopredict-quickstart.Rmd', BiocStyle::html_document())) ## Extract the R code library('knitr') knit('/users/sellis/phenopredict/vignettes/phenopredict-quickstart.Rmd', tangle = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.