distr.table.xy: Analysis of a bivariate distribution using cross-tables

View source: R/UBStats_Main_Visible_ALL_202406.R

distr.table.xyR Documentation

Analysis of a bivariate distribution using cross-tables

Description

distr.table.xy() displays tables of joint or conditional distributions.

Usage

distr.table.xy(
  x,
  y,
  freq = "counts",
  freq.type = "joint",
  total = TRUE,
  breaks.x,
  breaks.y,
  adj.breaks = TRUE,
  interval.x = FALSE,
  interval.y = FALSE,
  f.digits = 2,
  p.digits = 0,
  force.digits = FALSE,
  data,
  ...
)

Arguments

x, y

Unquoted strings identifying the variables whose joint distribution has to be analysed. x and y can be the name of a vector or a factor in the workspace or the name of one of the columns in the data frame specified in the data argument. Note that in the table x is displayed on the rows and y on the columns.

freq

A character vector specifying the set of frequencies to be displayed (more options are allowed). Allowed options (possibly abbreviated) are "counts", "percentages" and "proportions".

freq.type

A character vector specifying the types of frequencies to be displayed (more types are allowed). Allowed options are joint (default) for joint frequencies, x|y (or column) for the distributions of x conditioned to y, and y|x (or row) for the distributions of y conditioned to x.

total

Logical value indicating whether the sum of the requested frequencies should be added to the table; default to TRUE.

breaks.x, breaks.y

Allow to classify the variables x and/or y, if numerical, into intervals. They can be integers indicating the number of intervals of equal width used to classify x and/or y, or vectors of increasing numeric values defining the endpoints of the intervals (closed on the left and open on the right; the last interval is closed on the right too). To cover the entire range of values taken by one variable, the maximum and the minimum values should be included between the first and the last break. It is possible to specify a set of breaks covering only a portion of the variable's range.

adj.breaks

Logical value indicating whether the endpoints of intervals of a numerical variable (x or y) when classified into intervals should be displayed avoiding scientific notation; default to TRUE.

interval.x, interval.y

Logical values indicating whether x and/or y are variables measured in classes (TRUE). If the detected intervals are not consistent (e.g. overlapping intervals, or intervals with upper endpoint higher than the lower one), the variable is tabulated as it is, even if results are not necessarily consistent; default to FALSE.

f.digits, p.digits

Integer values specifying the number of decimals used to round respectively proportions (default: f.digits=2) and percentages (default: p.digits=0). If the chosen rounding formats some non-zero values as zero, the number of decimals is increased so that all values have at least one significant digit, unless the argument force.digits is set to TRUE.

force.digits

Logical value indicating whether proportions and percentages should be forcedly rounded to the number of decimals specified in f.digits and p.digits even if non-zero values are rounded to zero (default to FALSE).

data

An optional data frame containing x and/or y. If not found in data, the variables are taken from the environment from which distr.table.xy() is called.

...

Additional arguments to be passed to low level functions.

Value

A list whose elements are the requested tables (converted to dataframes) listing the values taken by the two variables arranged in standard order (logical, alphabetical or numerical order for vectors, order of levels for factors, ordered intervals for classified variables or for variables measured in classes) and the specified joint or conditional types of frequencies.

Author(s)

Raffaella Piccarreta raffaella.piccarreta@unibocconi.it

See Also

distr.plot.xy() for plotting a bivariate distribution.

distr.table.x() for tabulating a univariate distribution.

distr.plot.x() for plotting a univariate distribution.

Examples

data(MktDATA, package = "UBStats")

# Character vectors, factors, and discrete numeric vectors
# - Default: joint counts
distr.table.xy(LikeMost, Children, data = MktDATA) 

# - Joint and conditional distribution of x|y
#   counts and proportions, no totals
distr.table.xy(LikeMost, Education, freq = c("counts","Prop"), 
               freq.type = c("joint","x|y"), total = FALSE,
               data = MktDATA)
# - Joint and conditional row and column distributions (%) 
distr.table.xy(CustClass, Children, freq = "Percentages", 
               freq.type = c("joint","row","column"),
               data = MktDATA)

# Numerical variables classified or measured in classes
# - A numerical variable classified into intervals 
#   and a factor
distr.table.xy(CustClass, TotPurch, 
               breaks.y = c(0,5,10,15,20,35),
               freq = c("Counts","Prop"), freq.type = "y|x", 
               data = MktDATA)

# - Two numerical variables, one measured in classes
#   and the other classified into intervals 
distr.table.xy(Income.S, TotPurch, interval.x = TRUE,
               breaks.y = c(0,5,10,15,20,35),
               freq = c("Counts","Prop"), 
               freq.type = c("row","col"), data = MktDATA)

# Argument force.digits
# - Default: manages possible excess of rounding
distr.table.xy(CustClass, Children, freq = "Percentages", 
               freq.type = c("x|y"),data = MktDATA)
# - Force to the required rounding
distr.table.xy(CustClass, Children, freq = "Percentages", 
               freq.type = c("x|y"), 
               force.digits = TRUE, data = MktDATA)

# Output the list with the requested tables
tables.xy<-distr.table.xy(Income.S, TotPurch, 
                          interval.x = TRUE,
                          breaks.y = c(0,5,10,15,20,35),
                          freq = c("Counts","Prop"), 
                          freq.type = c("joint","row","col"), 
                          data = MktDATA)


UBStats documentation built on Sept. 11, 2024, 6:52 p.m.