# mgsa-methods: Performs an MGSA analysis In mgsa: Model-based gene set analysis

## Description

Estimate marginal posterior of the MGSA problem with an MCMC sampling algorithm.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23``` ```mgsa(o, sets, population = NULL, p = seq(min(0.1, 1/length(sets)), min(0.3, 20/length(sets)), length.out = 10), ...) ## S4 method for signature 'integer,list' mgsa(o, sets, population = NULL, p = seq(1, min(20, floor(length(sets)/3)), length.out = 10)/length(sets), ...) ## S4 method for signature 'numeric,list' mgsa(o, sets, population = NULL, p = seq(1, min(20, floor(length(sets)/3)), length.out = 10)/length(sets), ...) ## S4 method for signature 'character,list' mgsa(o, sets, population = NULL, p = seq(1, min(20, floor(length(sets)/3)), length.out = 10)/length(sets), ...) ## S4 method for signature 'logical,list' mgsa(o, sets, population = NULL, p = seq(min(0.1, 1/length(sets)), min(0.3, 20/length(sets)), length.out = 10), ...) ## S4 method for signature 'character,MgsaSets' mgsa(o, sets, population = NULL, p = seq(min(0.1, 1/length(sets)), min(0.3, 20/length(sets)), length.out = 10), ...) ```

## Arguments

 `o` The observations. It can be a `numeric`, `integer`, `character` or `logical`. See details. `sets` The sets. It can be an `MgsaSets` or a `list`. In this case, each list entry is a vector of type `numeric`, `integer`, `character`. See details. `population` The total population. Optional. A `numeric`, `integer` or `character` vector. Default to `NULL`. See details. `p` Grid of values for the parameter p. Values represent probabilities of term activity and therefore must be in [0,1]. `...` Optional arguments that are passed to the methods. Supported parameters are `alpha`Grid of values for the parameter alpha. Values represent probabilities of false-positive events and hence must be in [0,1]. `numeric`. `beta`Grid of values for the parameter beta. Values represent probabilities of false-negative events and hence must be in [0,1]. `numeric`. `steps`The number of steps of each run of the MCMC sampler. `integer` of length 1. A recommended value is 1e6 or greater. `burnin`The number of burn-in MCMC steps, until sample collecting begins. `integer` of length 1. A recommended value is half of total MCMC steps. `thin`The sample collecting period. An `integer` of length 1. A recommended value is 100 to reduce autocorrelation of subsequently collected samples. `flip.freq`The frequency of MCMC Gibbs step that randomly flips the state of a random set from active to inactive or vice versa. `numeric` from (0,1]. `restarts`The number of different runs of the MCMC sampler. `integer` of length 1. Must be greater or equal to 1. A recommended value is 5 or greater. `threads`The number of threads that should be used for concurrent restarts. A value of 0 means to use all available cores. Default to 0.

## Details

The function can handle items (such as genes) encoded as `character` or `integer`. For convenience `numeric` items can also be provided but these values should essentially be integers. The type of items in the observations `o`, the `sets` and in the optional `population` should be consistent. In the case of `character` items, `o` and `population` should be of type `character` and `sets` can either be an `MgsaSets` or a `list` of `character` vectors. In the case of `integer` items, `o` should be of type `integer`, `numeric` (but essentially with integer values), or `logical` and entries in `sets` as well as the `population` should be `integer`. When `o` is `logical`, it is first coerced to integer with a call on `which`. Observations outside the `population` are not taken into account. If `population` is `NULL`, it is defined as the union of all sets.

The default grid value for p is such that between 1 and 20 sets are active in expectation. The lower limit is constrained to be lower than 0\.1 and the upper limit lower than 0\.3 independently of the total number of sets to make sure that complex solutions are penalized. Marginal posteriors of activity of each set are estimated using an MCMC sampler as described in Bauer et al., 2010. Because convergence of an MCM sampler is difficult to assess, it is recommended to run it several times (using `restarts`). If variations between runs are too large (see `MgsaResults`), the number of steps (`steps`) of each MCMC run should be increased.

## Value

An `MgsaMcmcResults` object.

## References

Bauer S., Gagneur J. and Robinson P. GOing Bayesian: model-based gene set analysis of genome-scale data. Nucleic Acids Research (2010) http://nar.oxfordjournals.org/content/38/11/3523.full

`MgsaResults`, `MgsaMcmcResults`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12``` ```## observing items A and B, with sets {A,B,C} and {B,C,D} mgsa(c("A", "B"), list(set1 = LETTERS[1:3], set2 = LETTERS[2:4])) ## same case with integer representation of the items and logical observation mgsa(c(TRUE,TRUE,FALSE,FALSE), list(set1 = 1:3, set2 = 2:4)) ## a small example with gene ontology sets and plot data(example) fit = mgsa(example_o, example_go) ## Not run: plot(fit) ## End(Not run) ```