postIntensityiid: Posterior intensity for persistence diagrams modeled as IID...
In maroulaslab/BayesTDA: Bayesian inference for persistence diagrams

Description Usage Arguments Details Value References Examples

This function uses the Bayesian inference obtained by characterizing tilted persistence diagrams (PDs) as i.i.d. cluster point processes. A tilted persistence diagram is produced by the mapping T(b,d)=(b,d-b). A i.i.d. cluster point process is a generalization of Poisson point process and characterized by the cardinality and intensity simultaneously. Similarly as Poisson point process the intensity reflects the expected number of elements and serves as an analog to the first order moment for a random variable. Cardinality takes the form of probability mass function(pmf) of the number of points in the point process. Hence i.i.d point process characterization of any object needs to integrate the intensity and cardinality densities. To that end, we define the prior object of the Bayesian model as i.i.d cluster point process with intensity as Gaussian mixture:

λ_X(x) = ∑_{j = 1}^{N}c^X_{j}N*(x;μ^X_{j},σ^X_{j}I)---------------------(1)

and cardinality pmf as

p_X(n) = B(N_0,n)p_x^n (1-p_x)^{N_0-n}---------------------(2)

where N* is the restricted Gaussian on the wedge W where the tilted PD is defined and B(N_0,n) is the binomial coefficient. The weights c^X_j, means μ^X_j and variance σ^X_j of the Gaussian mixture and the probability p_x depend on the prior knowledge about a PD. Posterior intensity is computed from a prior intensity and a set of observed persistence diagrams. The observed PDs can exhibit two features: expected and unexpected features. Features which are believed to be associated to prior are called expected feature and others are unexpected features. The unexpected features of the observed PDs are also defined similarly as prior having the intensity as Gaussian mixture form:

λ_Y(y) = ∑_{i = 1}^{M}c^Y_{i}N*(y;μ^Y_{i},σ^Y_{i}I)---------------------(3)

and cardinality pmf as

p_Y(n) = B(M_0,n)p_y^n (1-p_y)^{M_0-n}.---------------------(4)

Their respective weights c^Y_i, means μ^Y_i and variance σ^Y_i and probabilty p_y need to be predefined. The expected features formed the likelihood density l(y|x)=N*(y;x,σ I), where the variance σis a preselected parameter that reflect the level of faith on the observed PDs.

1	postIntensityiid(x,Dy,alpha,prob.prior,weight.prior,mean.prior,sigma.prior,sigma.y,prob.unexpected,weights.unexpected,mean.unexpected,sigma.unexpected,Nmax)

`x:`	a 2 by 1 vector where the posterior intensity is computed. Cosequently, x is a coordinate in the wedge where the tilted PD is defined.
`Dy:`	a list of n vectors(2 by 1) representing points observed in a tilted persistence diagram of a fixed homological feature.
`alpha:`	0<=alpha<1. The probability of a feature in the prior will be detected in the observation.
`prob.prior:`	The prior cardinality pmf is defined as a binomial and prob.prior is the probability term (p_x in Eqn.(2)).
`weight.prior:`	a N by 1 vector of mixture weights (c^X_{j} of Eqn (1)) for the prior density estimation.
`mean.prior:`	a list of N vectors(2 by 1) each represets mean of the prior density (μ^X_{j} of Eqn (1)).
`sigma.prior:`	a N by 1 vector of positive constants, σ^X_{j} of Eqn (1).
`sigma.y:`	a positive constant. Variance coefficient (σ) of the likelihood density l(y\|x) defined in the description above. This represents the degree of faith on the observed PDs representing the prior.
`prob.unexpected:`	The unexpected feature cardinality pmf is defined as a binomial and prob.unexpected is the probability term (p_y in Eqn.(4)).
`weights.unexpected:`	a M by 1 vector of mixture weights for the unexpected features. i.e., (c^Y_{i} of Eqn (3) above.
`mean.unexpected:`	a list of M vectors (2 by 1),each represets mean of the Gaussian mixture density (μ^Y_{i} of Eqn (3)) for the unexpected features.
`sigma.unexpected:`	a M by 1 vector of positive constants, σ^Y_{i} of Eqn (3).
`Nmax:`	TThe maximum number of points on which the posterior cardinality will be truncated, i.e., p_n is computed for n=0 to n=Nmax. Also, this Nmax will be used to define prior cardinality too. So, it should be large enough so that all Binomials involved in the computation make sense

Required packages are mvtnorm, polynom, purrr and TDA

The function postIntensityiid returns posterior intensity given prior and set of observed PDs using Bayesian framework, where PDs are characterized by IID cluster point process.

Bayesian Inference for Persistent Homology, V Maroulas, F Nasrin, C Oballe, https://arxiv.org/abs/1901.02034

# sample data created from a unit circle to define prior
 set.seed(88)
 t = seq(from=0, to = 1, by = 0.01)
 x = cos(2*pi*t)
 y = sin(2*pi*t)
 coord.mat = cbind(x,y)

 # persistence diagram using ripsDiag of TDA package.  
 #library(TDA)
 circle.samp = coord.mat[sample(1:nrow(coord.mat),50),]
 pd.circle = ripsDiag(circle.samp,maxscale=3,maxdimension = 1)
 circle.diag = matrix(pd.circle$diagram,ncol=3)
 circle.holes = matrix(circle.diag[which(circle.diag[,1]==1),],ncol=3)
 # convert the persistence diagram to a tilted persistence diagram.
 circle.tilted = cbind(circle.holes[,2],circle.holes[,3]-circle.holes[,2])
 circle.df = data.frame(Birth = circle.tilted[,1], Persistence = circle.tilted[,2]) ## create the dataframe consisting of pd coordinates.
 
 # these parameters are required to define the object unexpected
 unex.mean = list(c(0.5,0))
 unex.noise = 1
 unex.weight = 1
 unex.card = length(unex.mean)
 # these parameters are required to define the object prior
 inf.mean = list(c(0.5,1.2))
 inf.noise = 0.2
 inf.weight = 1
 inf.card = length(inf.mean)
 ## plot the prior intensity on a grid. Required package mvtnorm and ggplot2
 #library(mvtnorm)
 #library(ggplot2)
 values.x = seq(from = 0, to = 3, by = 0.1)
 values.y = seq(from = 0, to = 4, by = 0.1)
 grid = expand.grid(values.x,values.y)
 inf.dens = apply(grid,1,Wedge_Gaussian_Mixture,inf.weight,inf.mean,inf.noise)
 max_inf.dens = max(inf.dens)
 normalized_inf.dens = inf.dens/max_inf.dens
 inf.df = data.frame(Birth = grid[,1],Persistence = grid[,2],Intensity = normalized_inf.dens)
 g.inf = ggplot(data = inf.df, aes(x = Birth, y = Persistence)) + geom_tile(aes(fill=Intensity)) + coord_equal()
 g.inf = g.inf + scale_fill_gradient2(low = "blue", high = "red", mid = "yellow",midpoint = 0.5, limit = c(0,1))
 g.inf = g.inf + labs(title='Informative Prior',x='Birth',y='Persistence',fill='')  + theme_grey(base_size=16) +theme(plot.title = element_text(hjust = 0.5))
 print(g.inf)
 
 ## next we define the likelihood from the observed pds. Here we consider a dataset from the same unit circle that is perturbed by a Gaussian noise.
 set.seed(7)
 sy = 0.01 #the variance of the likelihood density function
 circle.noise = 0.001 ## the variance coefficient of noise in this dataset 
 noise = rmvnorm(nrow(coord.mat),mean = c(0,0), sigma = circle.noise*diag(2))
 coord.mat = coord.mat + noise ## gives us the dataset which we will consider as observation
 
 ## following section creates observed PD
 circle.samp = coord.mat[sample(1:nrow(coord.mat),50),]
 pd.circle = ripsDiag(circle.samp,maxscale=3,maxdimension = 1)
 circle.diag = matrix(pd.circle$diagram,ncol=3)
 circle.holes = matrix(circle.diag[which(circle.diag[,1]==1),],ncol=3)
 circle.tilted = cbind(circle.holes[,2],circle.holes[,3]-circle.holes[,2])
 circle.df = data.frame(Birth = circle.tilted[,1], Persistence = circle.tilted[,2]) 
 dy = lapply(1:nrow(circle.df),function(x){unlist(c(circle.df[x,][1],circle.df[x,][2]))}) ## creates the parameter Dy
 plot(x,y, type='l',col = 'red', lwd = 1.5, asp = 1)
 points(circle.samp[,1],circle.samp[,2],pch = 20, col = 'blue')
     ## next we compute the posterior over a grid
    alpha = 0.99 #probablity of a feature in prior to be detected in observations
    Nmax = 7
    sig_Dyo = 0.01
    prob_Dx = inf.card/Nmax
    
    intens.inf = apply(grid,1,postIntensityiid,dy,alpha,prob_Dx,inf.weight,
    inf.mean,inf.noise,sig_Dyo,unex.weight,unex.mean,unex.noise,Nmax)
    
    max_intens.inf = max(intens.inf)
    normalized_intens.inf = intens.inf/max_intens.inf
    post.df = data.frame(Birth = grid[,1],Persistence = grid[,2], Intensity = normalized_intens.inf)
    g.post.inf = ggplot(data = post.df, aes(x = Birth, y = Persistence)) + geom_tile(aes(fill=Intensity)) + coord_equal()+
    scale_fill_gradient2(low = "blue", high = "red", mid = "yellow",midpoint = 0.5, limit = c(0,1))+
    geom_point(data = circle.df, aes(x = Birth, y= Persistence), size=2, color = "Green")+
    labs(title='Posterior Using Inf.',x='Birth',y='Persistence', fill = "") + theme_grey(base_size=16) +  theme(plot.title = element_text(hjust = 0.5))
    print(g.post.inf)