BANFF2: Bayesian nonparametric feature selection over large-scale...

Description Usage Arguments Details Value Examples

View source: R/bayeslabel.R

Description

Main function. Two steps: Given density specification, update selection indicator z by Swendsen- Wang; Given selection indicator z, update density specification by DPM fitting.

Usage

1
2
3
4
5
6
7
BANFF2(net,test.stat,pvalue.stat=FALSE,candidate.z.set=c(-1,0,1),
seed.main=1024,na.action=c("NN","Bayes","na.remove"),niter.densupd=5,niter=10,
paras=list(tau=c(2,10,2),alpha=NULL,gamma=NULL,xi=NULL, beta=rep(10,3),
rho=c(1.003,0.479,0.988,0.000),pivec=c(0.15,0.7,0.15),densAcc=0.001,
null.quantile=c(0.25, 0.75),null.method="biGaussianModeSplit",
transitionMatrix.Z.11=0.6,miss.stat=2,min.node=5),
para.DPM=NULL,para.HODC=NULL,para.DMH=NULL)

Arguments

net

The adjacent matrix with 0/1 indicating "connected" or "not directly connected

test.stat

The observed test statistics. Missing values are represented as NAs. If they are pvalues, then the pvalue.stat should be T;

pvalue.stat

Logical. Wether test.stat is generated as pvalues or not. Default F.

candidate.z.set

Default is of three regulation type. Defalut=c(-1,0,1), 1=down-regulated, 2=not differentially expressed, 3=up-regulated.

seed.main

Set seed before iteration for generating reproducible results. Default=1024.

na.action

The method used to impute missing values. Can be "NN", "Bayes", or "na.remove".

niter.densupd

The total number of iterations for updating density. Default=5

niter

The total number of iterations for study. Default=10.

paras

A list contains hyper-parameters and other parameters used for preparations.

  • niter.densupd The iteration is from 1 to the maximum steps when we update density specification by DPM. Default=20.

  • tau A three-element vector, default=c(2,10,2);

  • alpha A three-element vector. Default=NULL.

  • gamma A three-element vector. Default=NULL.

  • xi A three-element vector. Default=NULL.

  • beta A three-element vector. Default=rep(10,3).

  • rho A four-element vector. Default=c(1.003,0.479,0.988,0.000), indicating local smoothness for Potts prior. Note: the default value is calculated based on data(net) strucutre by DMH.

  • pivec A three-element vector. Default=c(0.15,0.7,0.15). Contains prior knowledge globally about selection indicator z.

  • densityAcc A number, need to specify precision for K-L integration when to use the numerical approximation. Default=0.001.

  • null.quantile A two element vector representing lower quantile and upper quantile for calculating prior null density if not given by biologists. Default=c(0.25, 0.75).

  • null.method A char. The method we used to estimate null density: "biGaussian"– EM algorithm for mixtures of two univariate normals; "biGaussianWith0Cutoff"– assume all negative test statistics forms one normal and all positive test statistics forms the other one normal. And proportion parameter is proportional to a number of observations each class; "biGaussianMean0"– null is formed by two half normals. "biGaussianModeSplit"– split data from median value, then flip each part to the other side to estimate normal distribution.

  • transitionMatrix.Z.11 [1,1] element in transition matrix for z. Default=0.6.

  • miss.stat impute NAs in test.test when apply Double Metropolis-Hastings sampler (DMH) to find hyperparameters: rho & pi.

  • min.node The minimum number of nodes in each group.

para.DPM

A list object contains, if NULL, default value is used:

  • niter default=10

  • nsample default=10

  • KLrange default=c(-6,6), usually we consider wider range than c(floor(min(test.stat,na.rm=TRUE)),ceiling(max(test.stat,na.rm=TRUE)))

  • KLprecision default=0.001

  • KLNullmethod default="biGaussianMean0"

  • mcmc a list, default=list(nburn=10000,nsave=100,nskip=0,ndisplay=10000)

  • prior a list, default=list(alpha=3,m1=rep(0,1),psiinv1=diag(0.5,1),nu1=4,tau1=1,tau2=100)

para.HODC

A list object contains, if NULL, default value is used:

  • nsample default=10

  • KLrange default=c(-6,6), usually we consider wider range than c(floor(min(test.stat,na.rm=TRUE)),ceiling(max(test.stat,na.rm=TRUE)))

  • KLprecision default=0.001

  • KLNullmethod default="biGaussianMean0",

  • mcmc a list, default=list(nburn=1000,nsave=100,nskip=0,ndisplay=1000)

  • prior a list, defaut is a list object where each of the element specify the prior used when fitting each density for class labels z. For each of the class, default parameters are the same, a list contains: alpha=3,m2=rep(0,1),s2=diag(100000,1),psiinv2=diag(temp.sdlist[1],1),nu1=4,nu2=4,tau1=1,tau2=100

para.DMH

If rho & pivec is not given, DMH is used for pre-calculating rho & pivec. Default is a list object contains:

  • niter default=1000

  • pistat default=c(0.25,0.5,0.25)

  • pisd default=rep(0.03,3)

  • rhostat default=c(1,0.5,1,0)

  • rhosd default=rep(0.03,4)

  • rhoLowB default=c(0,0,0,0)

  • rhoUpB default=c(1.5,1.5,1.5,1.5)

  • piLowB default=c(0,0,0)

  • piUpB default=c(1,1,1)

  • niter.z default=1

  • replaceInf default=-99999

  • DMHplot default=FALSE

Details

The fully Bayesian updating algorithm is executed as below:

Value

A list:

initialValue

initial parameter list

zTrack

trace for z

FinalValue

final parameter list

iters

total iterations

rmisTrack

(if NAs in test.statistics) trace for test.statistics imputation. (only for those with NAs)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## Not run: 
## The simulation settings based on real gene network (takes time)
data("net")
data("test.stat")
res=BANFF2(net,test.stat,niter=300,na.action="NN")
res=BANFF2(net,pnorm(test.stat),pvalue.stat=TRUE,candidate.z.set=c(0,1),na.action="NN",
niter=300,
paras=list(tau=c(2,10),alpha=NULL,gamma=NULL,xi=NULL, beta=rep(10,2),rho=c(1,0.5,0),
pivec=c(0.2,0.8),densAcc=0.001,null.quantile=c(0.25, 1),
null.method="biGaussianModeSplit",transitionMatrix.Z.11=0.6,miss.stat=2,min.node=5))

## A toy example
simdata=SimulatedDataGenerator(nnode=100,missing=TRUE,missrate=0.1,dist="norm",
plot=TRUE,nbin=c(20,20,10),rng=1024)
res=BANFF2(net=simdata$net,test.stat=simdata$testcov,niter=100,na.action="NN")
classLabelEst=SummaryClassLabel(simdata$net,simdata$testcov,res$zTrack,
method="MajorVote",nburn=10)
print(table(classLabelEst))

## End(Not run)

BANFF documentation built on May 29, 2017, 11:59 a.m.

Related to BANFF2 in BANFF...