rgeode: GEOmetric Density Estimation.

Description Usage Arguments Details Value Note Author(s) References Examples

View source: R/rgeode.R

Description

It selects the principal directions of the data and performs inference. Moreover GEODE is also able to handle missing data.

Usage

1
2
3
rgeode(Y, d = 6, burn = 1000, its = 2000, tol = 0.01, atau = 1/20,
  asigma = 1/2, bsigma = 1/2, starttime = NULL, stoptime = NULL,
  fast = TRUE, c0 = -1, c1 = -0.005)

Arguments

Y

array_like
a real input matrix (or data frame), with dimensions (n, D). It is the real matrix of data.

d

int, optional
it is the conservative upper bound for the dimension D. We are confident that the real dimension is smaller then it.

burn

int, optional
number of burn-in to perform in our Gibbs sampler. It represents also the stopping time that stop the choice of the principal axes.

its

int, optional
number of iterations that must be performed after the burn-in.

tol

double, optional
threshold for adaptively removing redundant dimensions. It is used compared with the ratio: \frac{α_j^2(t)}{\max α_i^2(t)}.

atau

double, optional
The parameter a_τ of the truncated Exponential (the prior for τ_j).

asigma

double, optional
The shape parameter a_σ of the truncated Gamma (the prior for σ^2).

bsigma

double, optional
The rate parameter b_σ of the truncated Gamma (the prior for σ^2).

starttime

int, optional
starting time for adaptive pruning. It must be less then the number of burn-in.

stoptime

int, optional
stop time for adaptive pruning. It must be less then the number of burn-in.

fast

bool, optional
If TRUE it is run using fast d-rank SVD. Otherwise it uses the classical SVD.

c0

double, optional
Additive constant for the exponent of the pruning step.

c1

double, optional
Multiplicative constant for the exponent of the pruning step.

Details

GEOmetric Density Estimation (rgeode) is a fast algorithm performing inference on normally distributed data. It is essentially divided in two principal steps:

It takes in inputs several quantities. A rectangular (N,D) matrix Y, on which we will run a Fast rank d SVD. The conservative upper bound of the true dimension of our data d. A set of tuning parameters. We remark that the choice of the conservative upper bound d must be such that d>p, with p real dimension, and d << D.

Value

rgeode returns a list containing the following components:

InD

array_like
The chose principal axes.

u

matrix
Containing the sample from the full conditional posterior of u_js. We store each iteration on the columns.

tau

matrix
Containing the sample from the full conditional posterior of tau_js.

sigmaS

array_like
Containing the sample from the full conditional posterior of sigma.

W

matrix
Containing the principal singular vectors.

Miss

list
Containing all the informations about missing data. If there are not missing data this output is not provide.

  • id_m array
    It contains the set of rows with missing data.

  • pos_m list
    It contains the set of missing data positions for each row with missing values.

  • yms list
    The list contained the pseudo-observation substituting our missing data. Each element of the list represents the simulated data for that time.

Note

The part related to the missing data is filled only in the case in which we have missing data.

Author(s)

L. Rimella, lorenzo.rimella@hotmail.it

References

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
library(MASS)
library(RGeode)

####################################################################
# WITHOUT MISSING DATA
####################################################################
# Define the dataset
D= 200
n= 500
d= 10
d_true= 3

set.seed(321)

mu_true= runif(d_true, -3, 10)

Sigma_true= matrix(0,d_true,d_true)
diag(Sigma_true)= c(runif(d_true, 10, 100))

W_true = svd(matrix(rnorm(D*d_true, 0, 1), d_true, D))$v

sigma_true = abs(runif(1,0,1))

mu= W_true%*%mu_true
C= W_true %*% Sigma_true %*% t(W_true)+ sigma_true* diag(D)

y= mvrnorm(n, mu, C)

################################
# GEODE: Without missing data
################################

start.time <- Sys.time() 
GEODE= rgeode(Y= y, d)
Sys.time()- start.time

# SIGMAS
#plot(seq(110,3000,by=1),GEODE$sigmaS[110:3000],ty='l',col=2,
#     xlab= 'Iteration', ylab= 'sigma^2', main= 'Simulation of sigma^2')
#abline(v=800,lwd= 2, col= 'blue')
#legend('bottomright',c('Posterior of sigma^2', 'Stopping time'),
#       lwd=c(1,2),col=c(2,4),cex=0.55, border='black', box.lwd=3)
       
       
####################################################################
# WITH MISSING DATA
####################################################################

###########################
#Insert NaN
n_m = 5 #number of data vectors containing missing features
d_m = 1  #number of missing features

data_miss= sample(seq(1,n),n_m)

features= sample(seq(1,D), d_m)
for(i in 2:n_m)
{
  features= rbind(features, sample(seq(1,D), d_m))
}

for(i in 1:length(data_miss))
{
  
  if(i==length(data_miss))
  {
    y[data_miss[i],features[i,][-1]]= NaN
  }
  else
  {
    y[data_miss[i],features[i,]]= NaN
  }
  
}

################################
# GEODE: With missing data
################################
set.seed(321)
start.time <- Sys.time() 
GEODE= rgeode(Y= y, d)
Sys.time()- start.time

# SIGMAS
#plot(seq(110,3000,by=1),GEODE$sigmaS[110:3000],ty='l',col=2,
#     xlab= 'Iteration', ylab= 'sigma^2', main= 'Simulation of sigma^2')
#abline(v=800,lwd= 2, col= 'blue')
#legend('bottomright',c('Posterior of sigma^2', 'Stopping time'),
#       lwd=c(1,2),col=c(2,4),cex=0.55, border='black', box.lwd=3)



####################################################################
####################################################################

LorenzoRimella/RGeode documentation built on May 22, 2019, 12:22 p.m.