Description Usage Arguments Details Value Note Author(s) References See Also Examples
This function fits a non-homogeneous hidden Markov model to CGH data through bayesian methods and Reversible Jump Markov chain Montecarlo.
1 2 3 4 5 6 7 8 9 | RJaCGH(y, Chrom = NULL, Start=NULL, End=NULL, Pos = NULL,
Dist=NULL, probe.names=NULL, maxVar=NULL, model = "Genome",
var.equal=TRUE, max.dist=NULL, normal.reference=0,
window = NULL, burnin = 10000, TOT =10000,
k.max = 6, stat = NULL, mu.alfa = NULL, mu.beta = NULL,
s1=NULL, s2=NULL, init.mu=NULL, init.sigma.2=NULL, init.beta=NULL,
prob.k = NULL, jump.parameters=list(),
start.k = NULL, RJ=TRUE, NC = 1, deltaT = 2, delete_gzipped =
.__DELETE_GZIPPED, singleState = FALSE)
|
y |
Vector with Log Ratio observations. |
Chrom |
Vector with Chromosome indicator. |
Start |
Vector with start positions of the probes. |
End |
Vector with end positions of the probes. |
Pos |
Vector with Positions of every gene. They can be absolute
to the genome or relative to the chromosome. They should be ordered
within every chromosome. This is, the arrays must be ordered by their
positions in the genome. They must be integers. Positions can be
specified with |
Dist |
Optional vector of distances between genes. It should be a vector
of length length(y)-1. Note that when |
probe.names |
Character vector with the number of the probes. |
maxVar |
Maximum value for the variance of the states. If
|
model |
if |
var.equal |
Logical. If |
max.dist |
maximal distance between spots. When two spots have
a distance between them as far or further than |
normal.reference |
The value considered as the mean of the normal
state. See details. By default is |
window |
Multiplier of the standard deviation of the data to determine the width of the normal state. See details. Default (window = NULL) is 1. |
burnin |
Number of burn-in iterations in the Markov Chain |
TOT |
Number of iterations after the burn-in |
k.max |
Maximum number of hidden states to fit. |
stat |
Initial Distribution for the hidden states. Must be a
vector of size 1 + 2 + ... +k.max. If |
mu.alfa |
Hyperparameter. See details |
mu.beta |
Hyperparameter. See details |
prob.k |
Hyperparameter. See details |
jump.parameters |
List with the parameters for the MCMC jumps. See details. |
start.k |
Initial number of states. if |
RJ |
Logical. If |
s1 |
Standard deviation for the creation of a new mean in the
birth move (first attempt). If |
s2 |
Standard deviation for the creation of a new mean in the
birth move (second attempt). If |
init.mu |
Starting values for |
init.sigma.2 |
Starting values for |
init.beta |
Starting values for |
NC |
Number of coupled parallel chains. See details |
deltaT |
Heat parameter for tempering the parallel chains. See details |
delete_gzipped |
Should temporal files with viterbi paths be deleted? |
singleState |
Logical. If TRUE, each hidden state is classified
into a state of copy number alteration. If FALSE, a probability
is assigned to each hidden state of being gained and lost. See
|
RJaCGH fits the following bayesian model: There is a priori
distribution for the number of hidden states (different copy numbers)
as stated by prob.k
. If NULL
, a uniform distribution between 1
and k.max
is used.
The hidden states follow a normal distribution which mean (mu
)
follows
itself a normal distribution with mean
mu.alfa
and stdev mu.beta
. If NULL
, these are the
median of the data and the range. The square
root of the variance (sigma.2
)of the hidden states
follows a uniform distribution between $0$ and maxVar
.
The model for the transition matrix is based on a random matrix beta whose diagonal is zero. The transition matrix, Q, has the form: Q[i,j] = exp(-beta[i,j] + beta[i,j]*x) / sum(i,.) exp(-beta[i,.] + beta[i,.]*x
The prior distribution for beta is gamma with parameters 1, 1. The x are the distances between positions, normalized to lay between zero and 1 (x=diff(Pos) / max(diff(Pos)))
RJaCGH performs Markov Chain MonteCarlo with Reversible Jump to sample
for the posterior distribution.
NC
sets the number of chains from we will sample in
parallel. Each of them is tempered in order to escape from local
maximum. The temper parameter is deltaT
, and it can be
a value greater than $0$. each chain is tempered according to:
$ 1 / (1 + deltaT
* NC
) $
Every sweep is performed for all chains and has 3 steps plus another one common for all of them:
1.- A Metropolis-Hastings move is used to update, for a fixed number
of hidden states, mu
, sigma.2
and beta
. A
symmetric proposal with a normal distribution and standard deviation
sigma.tau.mu
, sigma.tau.sigma.2
and
sigma.tau.beta
is sampled.
2.- A transdimensional move is chosen, between birth (a new hidden
state is sampled from the prior) or death (an existing hidden state is
erased). Both moves are tried using delayed rejection. That is, if
the move is rejected, is given another try. The means for the new
state are drawn for the priors, but the standard deviation can be
set for the two stages with parameters s1
and s2
.
3.- Another transdimensional move is performed; an split move (divide
an existing state in two) or a combine move (join two adjacent
states). The length of the split is sampled from a normal distribution
with standard deviation tau.split.mu
for the mu
and
tau.split.beta
for beta
.
4.- If NC
is greater than 1, a swap move is tried to exchange
information from two of the coupled parallel chains.
jump.parameters
must be a list with the parameters for the
moves. It must have components sigma.tau.mu
,
sigma.tau.sigma.2
, sigma.tau.beta
These are vectors of
length k.max
. tau.split.mu
, tau.split.beta
are vectors of
length 1. If any of them is NULL, a call to the internal function
get.jump()
is made to find 'good' values.
A relabelling of hidden states is performed to match biological
states. See details in relabelStates
.
The initial values of the chain are drawn from an overdispersed
distribution. One can start the chain in a given point with
the parameters start.k
(model to start from $1, ...,
max.k
$) and the initial values init.mu
,
init.sigma.2
(vectors of dimension start.k
) and
init.beta
(matrix of positive values with start.k
)
rows and start.k
) columns. The diagonal must be zero.
The object returned follows a hierarchy:
If y is a matrix or data.frame (i.e., several arrays), an object of
class RJaCGH.array
is returned, with components:
[[]] |
A list with an object of corresponding class (see below) for every array. |
array.names |
Vector with the names of the arrays. |
If model
is "Genome", an object of class RJaCGH.Genome
is returned, with components:
[[]] |
a list with as many objects as k.max, with the fits. |
k |
sequence of number of hidden states sampled. |
prob.b |
Number of birth moves performed in first and second proposal.(Includes burn-in. |
prob.d |
Number of death moves performed in first and second proposal.(Includes burn-in. |
prob.s |
Number of split moves performed (Includes burn-in. |
prob.c |
Number of combine moves performed (Includes burn-in. |
prob.e |
Number of exchange (swap) moves performed between tempered chains and with cool (main) chain. |
y |
y vector. |
Pos |
Pos vector. |
model |
model. |
Chrom |
Chromosome vector. |
x |
x vector of distances between genes. |
viterbi |
A list with as many components as chromosomes. For each
chromosome, a list, with at least three components: gzipped\_sequence:
the compacted and gzipped sequence of states; num\_sequences: the
number of sequences in that compacted set of sequences;
sum\_mcmc\_iter: the number of times a viterbi sequence was
obtained. All these are only of use for calls to the pREC funcitons
( |
If model
is "Chrom", an object of class RJaCGH.Chrom
is
returned, with the following components:
[[]] |
a list with as
many components as chromosomes, of class |
Pos |
Pos vector. |
Start |
Start positions. |
End |
End positions. |
probe.names |
Names of the probes. |
model |
model. |
Chrom |
Chromosome vector. |
viterbi |
Identical structure as above. |
If no model
was
specified and no Chrom
was given, an object of class
RJaCGH
is returned, with components k
, prob.b
,
prob.d
, prob.s
, prob.c
, y
, Pos
,
x
, viterbi
(but as there are no chroms, viterbi has a
one component list —analogous to having a single chromosome), as
described before, plus a list with as many components of number of max
hidden states fitted. The length of k
equals aproximately $2$
times TOT
, because in every sweep of the algorithm there are
two tries to jump between models, so two times to explore the
probability of the number of hidden states. For every hidden markov
model fitted, a list is returned with components:
mu |
a matrix with the means sampled |
sigma.2 |
a matrix with the variances sampled |
beta |
an array of dimension 3 with beta values sampled |
stat |
vector of initial distribution |
loglik |
log likelihoods of every MCMC iteration |
prob.mu |
probability of aceptance of |
prob.sigma.2 |
probability of aceptance of |
prob.beta |
probability of aceptance of |
state.labels |
Labels of the biological states. |
prob.states |
Marginal posterior probabilities of belonging to every hidden state. |
The number of rows of components mu
, sigma.2
and
beta
is random, because it depends on the number of times
a particular model is visited and on the number of moves between
models, because when we visit a new model we also explore the space
of its means, variances and parameters of its transition functions.
The data must be ordered by chromosome and within chromosome by position.
Oscar M. Rueda and Ramon Diaz Uriarte
Rueda OM, Diaz-Uriarte R. Flexible and Accurate Detection of Genomic Copy-Number Changes from aCGH. PLoS Comput Biol. 2007;3(6):e122
Cappe, Moulines and Ryden, 2005. Inference in Hidden Markov Models. Springer.
Green, P.J. (1995) Reversible Jump Markov Chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711-732.
Green, P.J. and Antonietta, M. (2001) Delayed Rejection in Reversible Jump Metropolis Hastings. Biometrika, 88 (4), 1035-1053.
Geyer, C. J. (1991). Markov Chain Monte Carlo Maximum Likelihood. Proceedings of the 23th Symposium on the Interface, 156-163.
summary.RJaCGH
,
states
, modelAveraging
,
plot.RJaCGH
, trace.plot
,
relabelStates
, pREC_A
,
pREC_S
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | y <- c(rnorm(100, 0, 1), rnorm(10, -3, 1), rnorm(20, 3, 1),
rnorm(100,0, 1))
Pos <- sample(x=1:500, size=230, replace=TRUE)
Pos <- cumsum(Pos)
Chrom <- rep(1:23, rep(10, 23))
jp <- list(sigma.tau.mu=rep(0.05, 4), sigma.tau.sigma.2=rep(0.03, 4),
sigma.tau.beta=rep(0.07, 4), tau.split.mu=0.1, tau.split.beta=0.1)
fit.chrom <- RJaCGH(y=y, Pos=Pos, Chrom=Chrom, model="Chrom",
burnin=10, TOT=1000, k.max = 4,
jump.parameters=jp)
##RJ results for chromosome 5
table(fit.chrom[["array1"]][[5]]$k)
fit.genome <- RJaCGH(y=y, Pos=Pos, Chrom=Chrom, model="Genome",
burnin=100, TOT=1000, jump.parameters=jp, k.max = 4)
## Results for the model with 3 states:
summary(fit.genome)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.