lassie: Local Association Measures

View source: R/lassie.R

lassieR Documentation

Local Association Measures

Description

Estimates local (and global) association measures: Ducher's Z, Lewontin's D, pointwise mutual information, normalized pointwise mutual information and chi-squared residuals.

Usage

lassie(x, select, continuous, breaks, measure = "chisq", default_breaks = 4)

Arguments

x

data.frame or matrix.

select

optional vector of column numbers or column names specifying a subset of data to be used. By default, uses all columns.

continuous

optional vector of column numbers or column names specifying continuous variables that should be discretized. By default, assumes that every variable is categorical.

breaks

numeric vector or list passed on to cut to discretize continuous variables. When a numeric vector is specified, break points are applied to all continuous variables. In order to specify variable-specific breaks, lists are used. List names identify variables and list values identify breaks. List names are column names (not numbers). If a continuous variable has no specified breaks, then default_breaks will be applied.

measure

name of measure to be used:

  • 'chisq': Chi-squared residuals.

  • 'd': Lewontin's D.

  • 'z': Ducher's 'z'.

  • 'pmi': Pointwise mutual information (in bits).

  • 'npmi': Normalized pointwise mutual information (Bouma).

  • 'npmi2': Normalized pointwise mutual information (Multivariate).

default_breaks

default break points for discretizations. Same syntax as in cut.

Value

An instance of S3 class lassie with the following objects:

  • data: raw and preprocessed data.frames (see preprocess).

  • prob probability arrays (see estimate_prob).

  • global global association (see local_association).

  • local local association arrays (see local_association).

  • lassie_params parameters used in lassie.

See Also

Results can be visualized using plot.lassie and print.lassie methods. plot.lassie is only available in the bivariate case and returns a tile plot representing the probability or local association measure matrix. print.lassie shows an array or a data.frame.

Results can be saved using write.lassie.

The permtest function accesses the significance of local and global association values using p-values estimated by permutations.

The chisqtest function accesses the significance in the case of two dimensional chi-squared analysis.

Examples

# In this example, we will use the 'mtcars' dataset

# Selecting a subset of mtcars.
# Takes column names or numbers.
# If nothing was specified, all variables would have been used.
select <- c('mpg', 'cyl') # or select <- c(1, 2)

# Specifying 'mpg' as a continuous variables using column numbers
# Takes column names or numbers.
# If nothing was specified, all variables would have been used.
continuous <- 'mpg' # or continuous <- 1

# How should breaks be specified?
# Specifying equal-width discretization with 5 bins for all continuous variables ('mpg')
# breaks <- 5

# Specifying user-defined breakpoints for all continuous variables.
# breaks <- c(10, 15, 25, 30)

# Same thing but only for 'mpg'.
# Here both notations are equivalent because 'mpg' is the only continuous variable.
# This notation is useful if you wish to specify different break points for different variables
# breaks <- list('mpg' = 5)
# breaks <- list('mpg' = c(10, 15, 25, 30))

# Calling lassie
# Not specifying breaks means that the value in default_breaks (4) will be used.
las <- lassie(mtcars, select = c(1, 2), continuous = 1)

# Print local association to console as an array
print(las)

# Print local association and probabilities
# Here only rows having a positive local association are printed
# The data.frame is also sorted by observed probability
print(las, type = 'df', range = c(0, 1), what_sort = 'obs')

# Plot results as heatmap
plot(las)

# Plot observed probabilities using different colors
plot(las, what_x = 'obs', low = 'white', mid = 'grey', high = 'black', text_colour = 'red')


olivmrtn/zebu documentation built on Aug. 31, 2023, 6:34 p.m.