supcluster | R Documentation |
We assume that each individual has set of features and an outcome, further we assume that the features are organized in clusters with a random effect for each cluster, and that the outcome is related to the random effects by a linear regression. The function supcluster performs an MCMC to determine the parameters of this model including the cluster membership of each feature. The program can also perform the estimation without considering the outcome. The outcome can be any data object, as long as it is related to the individual through a frialty.
supcluster(data,outcome,features,log.transform=TRUE,maxclusters=10, nstart=100,n=500,shape=1,scale=1,alpha=1,betaP=1,fixj="random", fbeta=FALSE,starting.value=NULL,nchains=1,linkLikelihood = NULL)
data |
A data frame of the input data |
outcome |
Either the variable number or the variable name of the outcome variable. If |
features |
A list of features either as variable names or column numbers this can't be mixed |
log.transform |
Log transform the feature data. Generally used when the features are gene expressons |
maxclusters |
The maximum number of clusters used |
nstart |
The first nstart-1 values of each MCMC chain are not reported, that is used as a “burn in”. |
n |
The number of MCMC iterations for each chain |
shape |
The shape parameter for the prior on the variance components |
scale |
The starting scale parmeter for the prior on the variance components |
alpha |
The value to use for the Dirichelet prior parameter |
betaP |
The prior precision of the regression parameters. |
fixj |
If |
fbeta |
If TRUE then the outcome is not used in the clustering algorithm |
starting.value |
Starting value for the MCMC. It should be left as NULL when multiple chains are run, in which case the starting cluster membership is determined by |
nchains |
Number of chains to run |
linkLikelihood |
Likelihood function for model linking actual outcome data to the per-patient frialty. The input of the function is a vector of length |
A compound list is returned. At the first level is the chain number. At the second level there are two elements
inp |
This has twp values |
parms |
This is a |
When the feature space is large this program runs slowely. In the example only 20 iterations where used for the burn in and only 80 iterations are run. In general this would not be adequate to fully explore the feature space.
David A. Schoenfeld, Jessie Hsu
Hsu, Jessie J., Dianne M. Finkelstein, and David A. Schoenfeld. "Outcome-driven cluster analysis with application to microarray data." PloS one 10.11 (2015): e0141874.
concordmap
,
compare.chains
,beta.by.gene
##---- Should be DIRECTLY executable !! ---- ##-- ==> Define data, use random, ##-- or do help(data=index) for the standard data sets ##--Note you need to change nstart and n in example to get enough iterations #run supcluster on trauma data. Note: nstart and n must be increased to,say, 2000,3000 #and maxclusters increased to 20 data("trauma_data") us=supcluster(trauma_data,outcome="outcome",features=1:87, maxclusters=5,nstart=5,n=20,fbeta=FALSE) #creates plot in paper usm=concordmap(us,chains=1,sort.genes=TRUE) image(1:87,1:87,usm$map,xlab='Genes',ylab='Genes', main="Trauma Data Example", col=gray(16:1 / 16)) #Associate genes with clusters data("gene_names") betas=colSums(us[[1]]$parms[,3:22]) outpt=data.frame(cluster.number=usm$clusters,beta=betas[usm$clusters],gene_names[usm$order,])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.