BANFF2: Bayesian nonparametric feature selection over large-scale...
In BANFF: Bayesian Network Feature Finder

Description Usage Arguments Details Value Examples

Main function. Two steps: Given density specification, update selection indicator z by Swendsen- Wang; Given selection indicator z, update density specification by DPM fitting.

BANFF2(net,test.stat,pvalue.stat=FALSE,candidate.z.set=c(-1,0,1),
seed.main=1024,na.action=c("NN","Bayes","na.remove"),niter.densupd=5,niter=10,
paras=list(tau=c(2,10,2),alpha=NULL,gamma=NULL,xi=NULL, beta=rep(10,3),
rho=c(1.003,0.479,0.988,0.000),pivec=c(0.15,0.7,0.15),densAcc=0.001,
null.quantile=c(0.25, 0.75),null.method="biGaussianModeSplit",
transitionMatrix.Z.11=0.6,miss.stat=2,min.node=5),
para.DPM=NULL,para.HODC=NULL,para.DMH=NULL)

`net`	The adjacent matrix with 0/1 indicating "connected" or "not directly connected
`test.stat`	The observed test statistics. Missing values are represented as NAs. If they are pvalues, then the pvalue.stat should be T;
`pvalue.stat`	Logical. Wether test.stat is generated as pvalues or not. Default F.
`candidate.z.set`	Default is of three regulation type. Defalut=c(-1,0,1), 1=down-regulated, 2=not differentially expressed, 3=up-regulated.
`seed.main`	Set seed before iteration for generating reproducible results. Default=1024.
`na.action`	The method used to impute missing values. Can be "NN", "Bayes", or "na.remove".
`niter.densupd`	The total number of iterations for updating density. Default=5
`niter`	The total number of iterations for study. Default=10.
`paras`	A list contains hyper-parameters and other parameters used for preparations. niter.densupd The iteration is from 1 to the maximum steps when we update density specification by DPM. Default=20. tau A three-element vector, default=c(2,10,2); alpha A three-element vector. Default=NULL. gamma A three-element vector. Default=NULL. xi A three-element vector. Default=NULL. beta A three-element vector. Default=rep(10,3). rho A four-element vector. Default=c(1.003,0.479,0.988,0.000), indicating local smoothness for Potts prior. Note: the default value is calculated based on data(net) strucutre by DMH. pivec A three-element vector. Default=c(0.15,0.7,0.15). Contains prior knowledge globally about selection indicator z. densityAcc A number, need to specify precision for K-L integration when to use the numerical approximation. Default=0.001. null.quantile A two element vector representing lower quantile and upper quantile for calculating prior null density if not given by biologists. Default=c(0.25, 0.75). null.method A char. The method we used to estimate null density: "biGaussian"– EM algorithm for mixtures of two univariate normals; "biGaussianWith0Cutoff"– assume all negative test statistics forms one normal and all positive test statistics forms the other one normal. And proportion parameter is proportional to a number of observations each class; "biGaussianMean0"– null is formed by two half normals. "biGaussianModeSplit"– split data from median value, then flip each part to the other side to estimate normal distribution. transitionMatrix.Z.11 [1,1] element in transition matrix for z. Default=0.6. miss.stat impute NAs in test.test when apply Double Metropolis-Hastings sampler (DMH) to find hyperparameters: rho & pi. min.node The minimum number of nodes in each group.
`para.DPM`	A list object contains, if NULL, default value is used: niter default=10 nsample default=10 KLrange default=c(-6,6), usually we consider wider range than c(floor(min(test.stat,na.rm=TRUE)),ceiling(max(test.stat,na.rm=TRUE))) KLprecision default=0.001 KLNullmethod default="biGaussianMean0" mcmc a list, default=list(nburn=10000,nsave=100,nskip=0,ndisplay=10000) prior a list, default=list(alpha=3,m1=rep(0,1),psiinv1=diag(0.5,1),nu1=4,tau1=1,tau2=100)
`para.HODC`	A list object contains, if NULL, default value is used: nsample default=10 KLrange default=c(-6,6), usually we consider wider range than c(floor(min(test.stat,na.rm=TRUE)),ceiling(max(test.stat,na.rm=TRUE))) KLprecision default=0.001 KLNullmethod default="biGaussianMean0", mcmc a list, default=list(nburn=1000,nsave=100,nskip=0,ndisplay=1000) prior a list, defaut is a list object where each of the element specify the prior used when fitting each density for class labels z. For each of the class, default parameters are the same, a list contains: alpha=3,m2=rep(0,1),s2=diag(100000,1),psiinv2=diag(temp.sdlist[1],1),nu1=4,nu2=4,tau1=1,tau2=100
`para.DMH`	If rho & pivec is not given, DMH is used for pre-calculating rho & pivec. Default is a list object contains: niter default=1000 pistat default=c(0.25,0.5,0.25) pisd default=rep(0.03,3) rhostat default=c(1,0.5,1,0) rhosd default=rep(0.03,4) rhoLowB default=c(0,0,0,0) rhoUpB default=c(1.5,1.5,1.5,1.5) piLowB default=c(0,0,0) piUpB default=c(1,1,1) niter.z default=1 replaceInf default=-99999 DMHplot default=FALSE

The fully Bayesian updating algorithm is executed as below:

Input data r and graph G=<V,E>
Update z|theta via Swendsen-Wang
Update theta|z via DPM Fitting

A list:

`initialValue`	initial parameter list
`zTrack`	trace for z
`FinalValue`	final parameter list
`iters`	total iterations
`rmisTrack`	(if NAs in test.statistics) trace for test.statistics imputation. (only for those with NAs)

## Not run: 
## The simulation settings based on real gene network (takes time)
data("net")
data("test.stat")
res=BANFF2(net,test.stat,niter=300,na.action="NN")
res=BANFF2(net,pnorm(test.stat),pvalue.stat=TRUE,candidate.z.set=c(0,1),na.action="NN",
niter=300,
paras=list(tau=c(2,10),alpha=NULL,gamma=NULL,xi=NULL, beta=rep(10,2),rho=c(1,0.5,0),
pivec=c(0.2,0.8),densAcc=0.001,null.quantile=c(0.25, 1),
null.method="biGaussianModeSplit",transitionMatrix.Z.11=0.6,miss.stat=2,min.node=5))

## A toy example
simdata=SimulatedDataGenerator(nnode=100,missing=TRUE,missrate=0.1,dist="norm",
plot=TRUE,nbin=c(20,20,10),rng=1024)
res=BANFF2(net=simdata$net,test.stat=simdata$testcov,niter=100,na.action="NN")
classLabelEst=SummaryClassLabel(simdata$net,simdata$testcov,res$zTrack,
method="MajorVote",nburn=10)
print(table(classLabelEst))

## End(Not run)