dbstats-package: Distance-based statistics (dbstats)

dbstats-packageR Documentation

Distance-based statistics (dbstats)

Description

This package contains functions for distance-based prediction methods.

These are methods for prediction where predictor information is coded as a matrix of distances between individuals.

In the currently implemented methods the response is a univariate variable as in the ordinary linear model or in the generalized linear model.

Distances can either be directly input as an distances matrix, a squared distances matrix, an inner-products matrix (see GtoD2) or computed from observed explanatory variables.

Notation convention: in distance-based methods we must distinguish observed explanatory variables which we denote by Z or z, from Euclidean coordinates which we denote by X or x. For explanation on the meaning of both terms see the bibliography references below.

Observed explanatory variables z are possibly a mixture of continuous and qualitative explanatory variables or more general quantities.

dbstats does not provide specific functions for computing distances, depending instead on other functions and packages, such as:

  • dist in the stats package.

  • dist in the proxy package. When the proxy package is loaded, its dist function supersedes the one in the stats package.

  • daisy in the cluster package. Compared to both instances of dist above whose input must be numeric variables, the main feature of daisy is its ability to handle other variable types as well (e.g. nominal, ordinal, (a)symmetric binary) even when different types occur in the same data set.

    Actually the last statement is not hundred percent true: it refers only to the default behaviour of both dist functions, whereas the dist function in the proxy package can evaluate distances between observations with a user-provided function, entered as a parameter, hence it can deal with any type of data. See the examples in pr_DB.

Functions of dbstats package:

Linear and local linear models with a continuous response:

  • dblm for distance-based linear models.

  • ldblm for local distance-based linear models.

  • dbplsr for distance-based partial least squares.

Generalized linear and local generalized linear models with a numeric response:

  • dbglm for distance-based generalized linear models.

  • ldbglm for local distance-based generalized linear models.

Details

Package: dbstats
Type: Package
Version: 2.0.2
Date: 2024-01-26
License: GPL-2
LazyLoad: yes

Author(s)

Boj, Eva <evaboj@ub.edu>, Caballe, Adria <adria.caballe@upc.edu>, Delicado, Pedro <pedro.delicado@upc.edu> and Fortiana, Josep <fortiana@ub.edu>

References

Boj E, Caballe, A., Delicado P, Esteve, A., Fortiana J (2016). Global and local distance-based generalized linear models. TEST 25, 170-195.

Boj E, Delicado P, Fortiana J (2010). Distance-based local linear regression for functional predictors. Computational Statistics and Data Analysis 54, 429-437.

Boj E, Grane A, Fortiana J, Claramunt MM (2007). Implementing PLS for distance-based regression: computational issues. Computational Statistics 22, 237-248.

Boj E, Grane A, Fortiana J, Claramunt MM (2007). Selection of predictors in distance-based regression. Communications in Statistics B - Simulation and Computation 36, 87-98.

Cuadras CM, Arenas C, Fortiana J (1996). Some computational aspects of a distance-based model for prediction. Communications in Statistics B - Simulation and Computation 25, 593-609.

Cuadras C, Arenas C (1990). A distance-based regression model for prediction with mixed data. Communications in Statistics A - Theory and Methods 19, 2261-2279.

Cuadras CM (1989). Distance analysis in discrimination and classification using both continuous and categorical variables. In: Y. Dodge (ed.), Statistical Data Analysis and Inference. Amsterdam, The Netherlands: North-Holland Publishing Co., pp. 459-473.


dbstats documentation built on May 29, 2024, 1:11 a.m.