lassie | R Documentation |
Estimates local (and global) association measures: Ducher's Z, Lewontin's D, pointwise mutual information, normalized pointwise mutual information and chi-squared residuals.
lassie(x, select, continuous, breaks, measure = "chisq", default_breaks = 4)
x |
data.frame or matrix. |
select |
optional vector of column numbers or column names specifying a subset of data to be used. By default, uses all columns. |
continuous |
optional vector of column numbers or column names specifying continuous variables that should be discretized. By default, assumes that every variable is categorical. |
breaks |
numeric vector or list passed on to |
measure |
name of measure to be used:
|
default_breaks |
default break points for discretizations.
Same syntax as in |
An instance of S3 class lassie
with
the following objects:
data: raw and preprocessed data.frames (see preprocess).
prob probability arrays (see estimate_prob).
global global association (see local_association).
local local association arrays (see local_association).
lassie_params parameters used in lassie.
Results can be visualized using plot.lassie
and
print.lassie
methods. plot.lassie
is only available
in the bivariate case and returns
a tile plot representing the probability or local association measure matrix.
print.lassie
shows an array or a data.frame.
Results can be saved using write.lassie
.
The permtest
function accesses the significance of local and global
association values using p-values estimated by permutations.
The chisqtest
function accesses the significance in the case
of two dimensional chi-squared analysis.
# In this example, we will use the 'mtcars' dataset
# Selecting a subset of mtcars.
# Takes column names or numbers.
# If nothing was specified, all variables would have been used.
select <- c('mpg', 'cyl') # or select <- c(1, 2)
# Specifying 'mpg' as a continuous variables using column numbers
# Takes column names or numbers.
# If nothing was specified, all variables would have been used.
continuous <- 'mpg' # or continuous <- 1
# How should breaks be specified?
# Specifying equal-width discretization with 5 bins for all continuous variables ('mpg')
# breaks <- 5
# Specifying user-defined breakpoints for all continuous variables.
# breaks <- c(10, 15, 25, 30)
# Same thing but only for 'mpg'.
# Here both notations are equivalent because 'mpg' is the only continuous variable.
# This notation is useful if you wish to specify different break points for different variables
# breaks <- list('mpg' = 5)
# breaks <- list('mpg' = c(10, 15, 25, 30))
# Calling lassie
# Not specifying breaks means that the value in default_breaks (4) will be used.
las <- lassie(mtcars, select = c(1, 2), continuous = 1)
# Print local association to console as an array
print(las)
# Print local association and probabilities
# Here only rows having a positive local association are printed
# The data.frame is also sorted by observed probability
print(las, type = 'df', range = c(0, 1), what_sort = 'obs')
# Plot results as heatmap
plot(las)
# Plot observed probabilities using different colors
plot(las, what_x = 'obs', low = 'white', mid = 'grey', high = 'black', text_colour = 'red')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.