Performs an MGSA analysis

Description

Estimate marginal posterior of the MGSA problem with an MCMC sampling algorithm.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
mgsa(o, sets, population = NULL, p = seq(min(0.1, 1/length(sets)), min(0.3,
  20/length(sets)), length.out = 10), ...)

## S4 method for signature 'integer,list'
mgsa(o, sets, population = NULL, p = seq(1, min(20,
  floor(length(sets)/3)), length.out = 10)/length(sets), ...)

## S4 method for signature 'numeric,list'
mgsa(o, sets, population = NULL, p = seq(1, min(20,
  floor(length(sets)/3)), length.out = 10)/length(sets), ...)

## S4 method for signature 'character,list'
mgsa(o, sets, population = NULL, p = seq(1,
  min(20, floor(length(sets)/3)), length.out = 10)/length(sets), ...)

## S4 method for signature 'logical,list'
mgsa(o, sets, population = NULL, p = seq(min(0.1,
  1/length(sets)), min(0.3, 20/length(sets)), length.out = 10), ...)

## S4 method for signature 'character,MgsaSets'
mgsa(o, sets, population = NULL,
  p = seq(min(0.1, 1/length(sets)), min(0.3, 20/length(sets)), length.out =
  10), ...)

Arguments

o

The observations. It can be a numeric, integer, character or logical. See details.

sets

The sets. It can be an MgsaSets or a list. In this case, each list entry is a vector of type numeric, integer, character. See details.

population

The total population. Optional. A numeric, integer or character vector. Default to NULL. See details.

p

Grid of values for the parameter p. Values represent probabilities of term activity and therefore must be in [0,1].

...

Optional arguments that are passed to the methods. Supported parameters are

alpha

Grid of values for the parameter alpha. Values represent probabilities of false-positive events and hence must be in [0,1]. numeric.

beta

Grid of values for the parameter beta. Values represent probabilities of false-negative events and hence must be in [0,1]. numeric.

steps

The number of steps of each run of the MCMC sampler. integer of length 1. A recommended value is 1e6 or greater.

burnin

The number of burn-in MCMC steps, until sample collecting begins. integer of length 1. A recommended value is half of total MCMC steps.

thin

The sample collecting period. An integer of length 1. A recommended value is 100 to reduce autocorrelation of subsequently collected samples.

flip.freq

The frequency of MCMC Gibbs step that randomly flips the state of a random set from active to inactive or vice versa. numeric from (0,1].

restarts

The number of different runs of the MCMC sampler. integer of length 1. Must be greater or equal to 1. A recommended value is 5 or greater.

threads

The number of threads that should be used for concurrent restarts. A value of 0 means to use all available cores. Default to 0.

Details

The function can handle items (such as genes) encoded as character or integer. For convenience numeric items can also be provided but these values should essentially be integers. The type of items in the observations o, the sets and in the optional population should be consistent. In the case of character items, o and population should be of type character and sets can either be an MgsaSets or a list of character vectors. In the case of integer items, o should be of type integer, numeric (but essentially with integer values), or logical and entries in sets as well as the population should be integer. When o is logical, it is first coerced to integer with a call on which. Observations outside the population are not taken into account. If population is NULL, it is defined as the union of all sets.

The default grid value for p is such that between 1 and 20 sets are active in expectation. The lower limit is constrained to be lower than 0\.1 and the upper limit lower than 0\.3 independently of the total number of sets to make sure that complex solutions are penalized. Marginal posteriors of activity of each set are estimated using an MCMC sampler as described in Bauer et al., 2010. Because convergence of an MCM sampler is difficult to assess, it is recommended to run it several times (using restarts). If variations between runs are too large (see MgsaResults), the number of steps (steps) of each MCMC run should be increased.

Value

An MgsaMcmcResults object.

References

Bauer S., Gagneur J. and Robinson P. GOing Bayesian: model-based gene set analysis of genome-scale data. Nucleic Acids Research (2010) http://nar.oxfordjournals.org/content/38/11/3523.full

See Also

MgsaResults, MgsaMcmcResults

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## observing items A and B, with sets {A,B,C} and {B,C,D}
mgsa(c("A", "B"), list(set1 = LETTERS[1:3], set2 = LETTERS[2:4]))

## same case with integer representation of the items and logical observation
mgsa(c(TRUE,TRUE,FALSE,FALSE), list(set1 = 1:3, set2 = 2:4))

## a small example with gene ontology sets and plot
data(example)
fit = mgsa(example_o, example_go)
## Not run:
plot(fit)
## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.