zPatterns: Find and display patterns of zeros/missing values in a data...

View source: R/zPatterns.R

zPatternsR Documentation

Find and display patterns of zeros/missing values in a data set

Description

This function summarises the patterns of zero and/or missing values in a data set and returns a vector of pattern numbers.

Usage

zPatterns(X, label = NULL, plot = TRUE,
             axis.labels = c("Component", "Pattern ID"),
             bar.ordered = as.character(c(FALSE,FALSE)),
             bar.colors = c("red3", "red3"), bar.labels = FALSE,
             show.means = FALSE, round.means = 2, cex.means = 1,
             type.means = c("cgm","am"),
             cell.colors = c("dodgerblue", "white"),
             cell.labels = c(label, paste("No", label)), cex.axis = 1.1,
             grid.color = "black", grid.lty = "dotted",
             legend = TRUE, suppress.print = FALSE, ...)

Arguments

X

Data set (matrix or data.frame class).

label

Unique label (numeric or character) used to identify zeros/unobserved values in X.

plot

Logical value indicating whether a graphical summary of the patterns is produced or not (default plot=TRUE).

axis.labels

Vector of axis labels for the table of patterns (format c("x-axis","y-axis")).

bar.ordered

Vector of logical values to order table of patterns according to frequencies by patterns, component or both; with the first element referring to the patterns and the second to the components (default c(FALSE,FALSE)).

bar.colors

Colors for the margin barplots (format c("col.top","col.right")).

bar.labels

Logical value indicating if labels showing percentages must be added to the margin barplots (default bar.labels=FALSE).

show.means

Logical value indicating if mean values by pattern are shown on the graphical summary table (default show.means=FALSE).

round.means

When show.means=TRUE, number of decimal places for the mean values shown (2=default).

cex.means

When show.means=TRUE, numeric character expansion factor; character size for the mean values shown (1=default).

type.means

When show.means=TRUE, statistic used for computing the means. Either compositional geometric mean (type.means=cgm, in percentage units, default) or standard arithmetic mean (type.means=am).

cell.colors

Vector of colors for the table cells (format c("col.unobserved","col.observed")).

cell.labels

Labels for the cells (format c("Unobserved","Observed"), default c(label,paste("No",label))).

cex.axis

Axis labels scaling factor relative to default.

grid.color

Color of the grid lines (default "black").

grid.lty

Style of the grid lines (default "dotted", see lty in par).

legend

Logical value indicating if a legend must be included (default legend=TRUE).

suppress.print

Suppress printed feedback (default suppress.print=FALSE).

...

Other graphical parameters.

Value

Vector (factor type) of pattern IDs corresponding to each row of X.

By default, a summary table is printed showing patterns in the data according to label and some summary statistics: number of zero/missing components by pattern (No.Unobs), pattern frequency in percentage, percentage zero/missing values by component (column) and overall percentage of zero/missing values in the data set. The symbols + and - indicate, respectively, zero/missing and observed components within each pattern. A graphical version of the summary table is returned including barplots on the margins displaying percentage zero/missing and compositional geometric means by pattern (if show.means=TRUE; expressed in percentage scale). Common arithmetic means can be also shown for the case of ordinary data (type.means="am"), however this is not recommended for compositional data.

The patterns are assigned ID number and by default arranged in the table in the same order as they are found in the data set. The argument bar.ordered can be used to re-arrange the display according to frequencies of patterns, of unobserved values by component or both.

A warning message is shown if zeros or NA values not identified by label are present in the data set. These will be ignored for the graphical display and numerical summaries of patterns, which will be only based on label.

Check out 'plus' functions to deal with zeros and missing data simultaneously.

References

Palarea-Albaladejo J. and Martin-Fernandez JA. zCompositions – R package for multivariate imputation of left-censored data under a compositional approach. Chemometrics and Intelligence Laboratory Systems 2015; 143: 85-96.

See Also

lrEM, lrEMplus, lrDA, multRepl, multReplus, multLN, multKM, cmultRepl

Examples

data(LPdata)

pattern.ID <- zPatterns(LPdata,label=0)

LPdata[pattern.ID==5,]
LPdata[pattern.ID==7,]
LPdata[pattern.ID==10,]

# Modify cell labels and show percentages along with barplots
pattern.ID <- zPatterns(LPdata,label=0,
              cell.labels=c("Zero","Non-zero"),bar.labels=TRUE)

# Show compositional geometric means (in %) per zero pattern
zPatterns(LPdata,label=0,show.means=TRUE)

# Same but orderer by pattern frequency and incidence of zeros by component
zPatterns(LPdata,label=0,bar.ordered=c(TRUE,TRUE),,bar.labels=TRUE,show.means=TRUE)

# Data set with zeros and missing data (0 = zero; NA = missing) (see lrEMplus function).
data(LPdataZM)

# Show missingness patterns only
zPatterns(LPdataZM,label=NA)

# Show zero patterns only and means by pattern based on available data
# (blanks indicate not enough data available for computation)
zPatterns(LPdataZM,label=0,show.means=TRUE)

zCompositions documentation built on June 22, 2024, 9:46 a.m.