lassie  R Documentation 
Estimates local (and global) association measures: Ducher's Z, Lewontin's D, pointwise mutual information, normalized pointwise mutual information and chisquared residuals.
lassie(x, select, continuous, breaks, measure = "chisq", default_breaks = 4)
x 
data.frame or matrix. 
select 
optional vector of column numbers or column names specifying a subset of data to be used. By default, uses all columns. 
continuous 
optional vector of column numbers or column names specifying continuous variables that should be discretized. By default, assumes that every variable is categorical. 
breaks 
numeric vector or list passed on to 
measure 
name of measure to be used:

default_breaks 
default break points for discretizations.
Same syntax as in 
An instance of S3 class lassie
with
the following objects:
data: raw and preprocessed data.frames (see preprocess).
prob probability arrays (see estimate_prob).
global global association (see local_association).
local local association arrays (see local_association).
lassie_params parameters used in lassie.
Results can be visualized using plot.lassie
and
print.lassie
methods. plot.lassie
is only available
in the bivariate case and returns
a tile plot representing the probability or local association measure matrix.
print.lassie
shows an array or a data.frame.
Results can be saved using write.lassie
.
The permtest
function accesses the significance of local and global
association values using pvalues estimated by permutations.
The chisqtest
function accesses the significance in the case
of two dimensional chisquared analysis.
# In this example, we will use the 'mtcars' dataset
# Selecting a subset of mtcars.
# Takes column names or numbers.
# If nothing was specified, all variables would have been used.
select < c('mpg', 'cyl') # or select < c(1, 2)
# Specifying 'mpg' as a continuous variables using column numbers
# Takes column names or numbers.
# If nothing was specified, all variables would have been used.
continuous < 'mpg' # or continuous < 1
# How should breaks be specified?
# Specifying equalwidth discretization with 5 bins for all continuous variables ('mpg')
# breaks < 5
# Specifying userdefined breakpoints for all continuous variables.
# breaks < c(10, 15, 25, 30)
# Same thing but only for 'mpg'.
# Here both notations are equivalent because 'mpg' is the only continuous variable.
# This notation is useful if you wish to specify different break points for different variables
# breaks < list('mpg' = 5)
# breaks < list('mpg' = c(10, 15, 25, 30))
# Calling lassie
# Not specifying breaks means that the value in default_breaks (4) will be used.
las < lassie(mtcars, select = c(1, 2), continuous = 1)
# Print local association to console as an array
print(las)
# Print local association and probabilities
# Here only rows having a positive local association are printed
# The data.frame is also sorted by observed probability
print(las, type = 'df', range = c(0, 1), what_sort = 'obs')
# Plot results as heatmap
plot(las)
# Plot observed probabilities using different colors
plot(las, what_x = 'obs', low = 'white', mid = 'grey', high = 'black', text_colour = 'red')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.