# modi-internal: Internal Functions of modi-package In modi: Multivariate outlier detection and imputation for incomplete survey data

## Description

The `modi-package` contains internal functions which are normally not called directly by the user. The internal functions are specifically built for the `modi-package` and are mainly used to improve efficiency and speed in the main functions of the package.

Calculation of distances for Epidemic Algorithm for multivariate outlier detection and imputation: `.EA.dist(data,n,p,weights,reach,transmission.function, power, distance.type, maxl)`

Non-zero non-missing minimum function: `.nz.min(x)`

Addressing function for Epidemic Algorithm: `.ind.dij(i, j, n)`

Addressing function for Epidemic Algorithm: `.ind.dijs(i, js, n)`

Sum of weights for observations < value (if lt=T) or observations=value (if lt=F): `.sum.weights(observations,weights,value,lt=TRUE)`

Definition of the sweep and reverse-sweep operator: `.sweep.operator(M,k,reverse=FALSE) `

psi-function (defined in Little and Smith for ER algorithm): `.psi.lismi(d,present,psi.par=c(2,1.25))`

EM for multivariate normal data: `.EM.normal(data, weights=rep(1,nrow(data)), n=sum(weights) ,p=ncol(data), s.counts, s.id, S, T.obs, start.mean=rep(0,p),start.var=diag(1,p),numb.it=10,Estep.output=F)`

ER for multivariate normal data: `.ER.normal(data, weights=rep(1,nrow(data)), psi.par=c(2,1.25), np=sum(weights) ,p=ncol(data), s.counts, s.id, S, missing.items, nb.missing.items, start.mean=rep(0,p),start.var=diag(1,p),numb.it=10,Estep.output=F,tolerance=1e-06)`

## Arguments

 `data` a data frame or matrix with the data `n` `nrow(data)` `p` `ncol(data)` `weights` a vector of positive sampling weights `reach` if `reach="max"` the maximal nearest neighbour distance is used as the basis for the transmission function, otherwise the weighted (1-(p+1)/n) quantile of the nearest neighbour distances is used. `transmission.function` form of the transmission function of distance `d`: `"step"` is a heaviside function which jumps to `1` at d0, `"linear"` is linearly decreasing from 1 to 0 between 0 and d0, `"power"` is (beta*d+1)^(-p) with p=ncol(data) as default, `"root"` is the function 1-(1-d/d0)^(1/maxl) `power` sets `p=power` `maxl` Maximum number of steps without infection `monitor` if `TRUE` verbose output on epidemic `x` vector of numeric values `i` index for row `j` index for column `js` vector of indices of columns `observations` Number of observations `value` an integer, indicating the threshold for the sum of weights computation `lt` if TRUE, sum of weights for observations < `value` is returned. If FALSE, sum of weights for observations = `value` is returned `M` an array, including a matrix `k` a vector giving the subscripts which the function will be applied over. E.g., for a matrix 1 indicates rows, 2 indicates columns `reverse` logical value `s.counts` counts of the different missingness patterns ordered alphabetically `s.id` indices of the last observation of each missingness pattern in the dataset ordered by missingness pattern `S` total number of different missingness patterns `T.obs` Sufficient statistics on complete observations `start.mean` starting value for mean vector `start.var` starting value for variance vector `numb.it` number of iterations `Estep.output` logical, TRUE if verbose output is desired `psi.par` further parameters passed to the psi-function `np` population size `missing.items` Indices of missing items `nb.missing.items` number of missing items `tolerance` stop iterations when change is below tolerance

## Details

`.EA.dist` creates a vector of length n*(n-1)/2 in the global environment. To avoid memory problems this vector is not (!) passed as a function result.

## Value

A list with two components: The first component `output` is a list with components

 `sample.spatial.median.index` The index of the observation with minimal sum of absolute distances to all other points `max.min.di` The maximum distance to a nearest neighbour `d0` The reach of the transmission function

The second componentn is

 `min.dist2nn` A vector of the distances to the nearest neighbour

## Author(s)

C\'edric B\'eguin, Beat Hulliger

## References

B\'eguin, C., and Hulliger, B. (2004). Multivariate oulier detection in incomplete survey data: The epidemic algorithm and transformed rank correlations. Journal of the Royal Statistical Society, A 167(Part 2.), 275-294.

modi documentation built on May 31, 2017, 5 a.m.