CSclone: Clustering model which contains the CNAs mutation and SNVs...
In CSclone: Bayesian Nonparametric Modeling in R

Description Usage Arguments Details Value Author(s) Examples

View source: R/CSclone.R

CSclone is the main function of the package and a clustering model which contains the CNAs mutation and SNVs mutation.

1
2
3

CSclone(somatic.id, DNAcopy.object, mcmc = list(nburn = 5000, nsave = 10000,
  nskip = 1, ndisplay = 1000), y, set = 100, alpha = 1, max.k = NULL,
  method = "ward.D2", prior = c(1, 1))

`somatic.id`	is a vector of s observiations and mark which loci has somatic mutation.
`DNAcopy.object`	is a list and output of the DNAcopy package.
`mcmc`	is a list giving the MCMC parameters. The list must include the following integers: nburn giving the number of burn-in scans, nskip giving the thinning interval, nsave giving the total number of scans to be saved, and ndisplay giving the number of saved scans to be displayed on screen (the function reports on the screen when every ndisplay iterations have been carried out) and default is list(nburn=5000,nsave=10000,nskip=1,ndisplay=1000).
`y`	is a matrix giving the binomial data. The first column is B allele read depth and the second column is total read depth.
`set`	is number that you can reproduce the simulation result and default is 100.
`alpha`	is a number giving the concentration parameter of Dirichlet process and default is 1.
`max.k`	is a number giving limit the maximum cluster number and default is NULL.
`method`	is the agglomeration method to be used and the method is used to result of Hamming distance. The default is ward.D2.
`prior`	is a vector has tow number which is the parameter of Beta distribution and the default is (1,1).

CSclone is a two steps model. The first step is clustering the non-CNAs somatic mutations by DPMM with Binomial distribution and the posterior distribution is Beta distribution. The second step classifies the CNAs mutation and the somatic mutation with CNAs mutation. There are an example and the dataset is small case in order to run quickly.

cluster is a vector giving the group result of the somatic mutation.

segment is a matrix. The first column is chromosome, the second column is starting position, the third column is end position, the fourth column is copy number, the fifth column is proportion of mutation, the sixth is number of loci, the seventh column is starting loci number, and the eighth column is end loci number.

mcmc is a list giving the MCMC parameters.

alpha is number giving the concentration parameter of Dirichlet process.

y is a matrix giving the binomial data.

group.prop is number giving the predict the proportion of group.

SNV is matrix giving the somatic mutation of every loci. The first column is B allele read depth, the second column is total read depth, the third column proportion of somatic mutation.

Peter Wu (peter123wu0@gmail.com)

mcmc=list(nburn=200,nsave=500,nskip=1,ndisplay=1000)
p=c(0.2,0.4,0.6,0.8)
chrs=rep(c(1:2),times=rep(500,2))
pos=sort(sample(size=1000,x=1:10^7))
pc=sample(rep(p,4*c(0,0,0.5,0.5)))
ps=sample(rep(p,50*c(0.3,0.3,0.2,0.2)))
x=simu.data(n.germline=1000,pc=pc,read=200,ps=ps,dis="Negative binomial",parameter=0.75)
snv.id=x$snv.id
y=x$y
row.names(y)=paste0(chrs,"_",pos)
logR=log(y[,2],base=2)-median(log(y[,2],base=2))
data=data.frame(chr=chrs,pos=pos,logR=logR)
rownames(data)=paste0("SNP",1:nrow(data))
CNA.object=CNA(data$logR,data$chr,data$pos,data.type="logratio",sampleid="test")
smoothed.CNA.object=smooth.CNA(CNA.object)
DNAcopy.object=segment(smoothed.CNA.object, undo.splits = "sdundo",undo.SD = 3, verbose = 1)
fit=CSclone(somatic.id=snv.id,DNAcopy.object=DNAcopy.object,mcmc=mcmc,y=y)
result=fit$cluster
tf_similar(real=ps,cluster=result)