get.post.probs: Main posterior probability calculation
In kdkorthauer/MADGiC: Fits an empirical Bayesian hierarchical model to obtain posterior probabilities that each gene is a driver.

Description Usage Arguments Details Value Examples

This function reads in an MAF data file, exome annotation, and pre-computed prior information and then fits a hierarchical emprical Bayesian model to obtain posterior probabilities that each gene is a driver.

  get.post.probs(maf.file,
    exome.file = system.file("data/exome_36.RData", package = "MADGiC"),
    gene.rep.expr.file = system.file("data/gene.rep.expr.RData", package = "MADGiC"),
    gene.names.file = system.file("data/gene_names.txt", package = "MADGiC"),
    prior.file = system.file("data/prior.RData", package = "MADGiC"),
    alpha = 0.2, beta = 6, N = 20, replication.file = NULL,
    expression.file = NULL)

`maf.file`	name of an MAF (Mutation Annotation Format) data file containing the somatic mutations. Currently, NCBI builds 36 and 37 are supported.
`exome.file`	name of an .RData file that annotates every position of the exome for how many transitions/transversions are possible, whether each change is silent or nonsilent, and the SIFT scores for each possible change
`gene.rep.expr.file`	name of an .RData file that annotates every gene for its Ensembl name, chromosome, base pair positions, replication timing region, and expression level.
`gene.names.file`	name of a text file containing the Ensembl names of all genes.
`prior.file`	name of an .RData file containing a named vector of prior probabilities that each gene is a driver, obtained from positional information in the COSMIC database.
`N`	integer number of simulated datasets to be used in the estimation of the null distribution of functional impact scores. The default value is 20 (see `shuffle.muts`).
`expression.file`	(optional) name of a .txt file containing gene expression data if user wishes to supply one (default is to use an average expression signal of the CCLE). The .txt file should have two columns and no header. The first column should contain the Ensembl Gene ID (using Ensembl 54 for hg18) and the second column should contain the expression measurements. These can be raw or log-scaled but should be normalized if normalization is desired.
`replication.file`	(optional) name of a .txt file containing replication timing data if user wishes to supply one (default is to use data from Chen et al. (2010)). The .txt file should have two columns and no header. The first column should contain the Ensembl Gene ID (using Ensembl 54 for hg18) and the second column should contain the replication timing measurements.
`alpha`	numeric value of first shape parameter of the prior Beta distribution on the probability of mutation for driver genes. Default value of 0.2 is chosen as a compromise between a cancer type with a relatively low mutation rate (Ovarian cancer, fitted value from COSMIC of 0.15) and one with a comparatively high mutation rate (Squamous cell lung, fitted value from COSMIC of 0.27), but results are robust to changes in this parameter. Note that intuitively (and empirically), a higher mutation rate overall leads to a higher driver mutation rate overall - and thus less mass is concentrated in the left tail of the distribution.
`beta`	numeric value of second shape parameter of the prior Beta distribution on the probability of mutation for driver genes. Default value of 6 is chosen as a compromise between a cancer type with a relatively low mutation rate (Ovarian cancer, fitted value from COSMIC of 6.6) and one with a comparatively high mutation rate (Squamous cell lung, fitted value from COSMIC of 5.83), but results are robust to changes in this parameter. Note that intuitively (and empirically), a higher mutation rate overall leads to a higher driver mutation rate overall - and thus less mass is concentrated in the left tail of the distribution.

The typical user only need specify the MAF file they wish to analyze. The other fields (exome annotation, gene annotation, gene names, and prior probabilities) have been precomputed and distributed with this package.

a named vector of posterior probabilities that each gene is a driver

## Not run: 

# pointer to the MAF file to be analyzed
maf.file <- system.file("data/OV.maf",package="MADGiC")

# calculation of posterior probabilities that each gene is a driver
post.probs <- get.post.probs(maf.file)

# Modify default settings to match TCGA ovarian analysis in paper
post.probs <- get.post.probs(maf.file, N=100, alpha=0.15, beta=6.6)

## End(Not run)

kdkorthauer/MADGiC documentation built on June 13, 2020, 1:35 p.m.

kdkorthauer/MADGiC index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

kdkorthauer/MADGiC
Fits an empirical Bayesian hierarchical model to obtain posterior probabilities that each gene is a driver.

get.post.probs: Main posterior probability calculation
In kdkorthauer/MADGiC: Fits an empirical Bayesian hierarchical model to obtain posterior probabilities that each gene is a driver.

Description

Usage

Arguments

Details

Value

Examples

Related to get.post.probs in kdkorthauer/MADGiC...

R Package Documentation

Browse R Packages

We want your feedback!

kdkorthauer/MADGiC Fits an empirical Bayesian hierarchical model to obtain posterior probabilities that each gene is a driver.

get.post.probs: Main posterior probability calculation In kdkorthauer/MADGiC: Fits an empirical Bayesian hierarchical model to obtain posterior probabilities that each gene is a driver.

Description

Usage

Arguments

Details

Value

Examples

Related to get.post.probs in kdkorthauer/MADGiC...

R Package Documentation

Browse R Packages

We want your feedback!

kdkorthauer/MADGiC
Fits an empirical Bayesian hierarchical model to obtain posterior probabilities that each gene is a driver.

get.post.probs: Main posterior probability calculation
In kdkorthauer/MADGiC: Fits an empirical Bayesian hierarchical model to obtain posterior probabilities that each gene is a driver.