Internal Functions of modi-package

Share:

Description

The modi-package contains internal functions which are normally not called directly by the user. The internal functions are specifically built for the modi-package and are mainly used to improve efficiency and speed in the main functions of the package.

Calculation of distances for Epidemic Algorithm for multivariate outlier detection and imputation: .EA.dist(data,n,p,weights,reach,transmission.function, power, distance.type, maxl)

Non-zero non-missing minimum function: .nz.min(x)

Addressing function for Epidemic Algorithm: .ind.dij(i, j, n)

Addressing function for Epidemic Algorithm: .ind.dijs(i, js, n)

Sum of weights for observations < value (if lt=T) or observations=value (if lt=F): .sum.weights(observations,weights,value,lt=TRUE)

Definition of the sweep and reverse-sweep operator: .sweep.operator(M,k,reverse=FALSE)

psi-function (defined in Little and Smith for ER algorithm): .psi.lismi(d,present,psi.par=c(2,1.25))

EM for multivariate normal data: .EM.normal(data, weights=rep(1,nrow(data)), n=sum(weights) ,p=ncol(data), s.counts, s.id, S, T.obs, start.mean=rep(0,p),start.var=diag(1,p),numb.it=10,Estep.output=F)

ER for multivariate normal data: .ER.normal(data, weights=rep(1,nrow(data)), psi.par=c(2,1.25), np=sum(weights) ,p=ncol(data), s.counts, s.id, S, missing.items, nb.missing.items, start.mean=rep(0,p),start.var=diag(1,p),numb.it=10,Estep.output=F,tolerance=1e-06)

Arguments

data

a data frame or matrix with the data

n

nrow(data)

p

ncol(data)

weights

a vector of positive sampling weights

reach

if reach="max" the maximal nearest neighbour distance is used as the basis for the transmission function, otherwise the weighted (1-(p+1)/n) quantile of the nearest neighbour distances is used.

transmission.function

form of the transmission function of distance d: "step" is a heaviside function which jumps to 1 at d0, "linear" is linearly decreasing from 1 to 0 between 0 and d0, "power" is (beta*d+1)^(-p) with p=ncol(data) as default, "root" is the function 1-(1-d/d0)^(1/maxl)

power

sets p=power

maxl

Maximum number of steps without infection

monitor

if TRUE verbose output on epidemic

x

vector of numeric values

i

index for row

j

index for column

js

vector of indices of columns

observations

Number of observations

value

an integer, indicating the threshold for the sum of weights computation

lt

if TRUE, sum of weights for observations < value is returned. If FALSE, sum of weights for observations = value is returned

M

an array, including a matrix

k

a vector giving the subscripts which the function will be applied over. E.g., for a matrix 1 indicates rows, 2 indicates columns

reverse

logical value

s.counts

counts of the different missingness patterns ordered alphabetically

s.id

indices of the last observation of each missingness pattern in the dataset ordered by missingness pattern

S

total number of different missingness patterns

T.obs

Sufficient statistics on complete observations

start.mean

starting value for mean vector

start.var

starting value for variance vector

numb.it

number of iterations

Estep.output

logical, TRUE if verbose output is desired

psi.par

further parameters passed to the psi-function

np

population size

missing.items

Indices of missing items

nb.missing.items

number of missing items

tolerance

stop iterations when change is below tolerance

Details

.EA.dist creates a vector of length n*(n-1)/2 in the global environment. To avoid memory problems this vector is not (!) passed as a function result.

Value

A list with two components: The first component output is a list with components

sample.spatial.median.index

The index of the observation with minimal sum of absolute distances to all other points

max.min.di

The maximum distance to a nearest neighbour

d0

The reach of the transmission function

The second componentn is

min.dist2nn

A vector of the distances to the nearest neighbour

Author(s)

C\'edric B\'eguin, Beat Hulliger

References

B\'eguin, C., and Hulliger, B. (2004). Multivariate oulier detection in incomplete survey data: The epidemic algorithm and transformed rank correlations. Journal of the Royal Statistical Society, A 167(Part 2.), 275-294.