count_cells: Count number of observations of specific kind

Description Usage Arguments Details Value See Also

View source: R/missing_values.R

Description

A generic function to count number of certain observations (e.g. NAs or extreme values) by cells (e.g. combination of region and eye-tracking measure).

Usage

1
count_cells(dat, by, ..., cast.formula, value.name = "count")

Arguments

dat

data.frame containing data to process

by

unquoted column name for the main grouping variable. See "Details"

...

unqouted column names - other grouping variables. See "Details".

cast.formula

character. A formula for dcast() (from 'reshape2' package). LHS is the grouping variable, RHS defines cells

value.name

character. name of the column containing counts in the result data.frame

Details

The function assumes that dat argument contains the data which has already been subsetted to only contain values of interest. E.g., if you want to count NAs, before passing the data set to this function, you need to filter all non-NAs out. The package contains two specific convenience functions which would do the subseetting for you (the names are self-explanatory): count_NAs and count_extremes.

The by argument would typically contain subject or item column name. this is the main grouping variable, which will be displayed on the y axis in the summary plots.

The column names in ... argument define which columns will act as grouping variables for dplyr::group_by, thus defining the smallest subset of the data in which the observations should be counted. In a typical use for eye-tracking data, there will be two components here: region and measure columns names. In general, the values in the first item will be varied slowest, the values in the last item will be varied fastest.

Thus, e.g. (if by = subj and ... = region.col, measure,col), the observations will be counted in each region, in each measure, for each subject.

Value

data.frame with 3 columns (default names):

  1. Name of the by argument, typically subj or item; contains the unique values from the corresponding column in dat

  2. cell; contains data subset identifiers. In a typical use, if used with regions and measures, it would be region_measure, e.g. "critical_tt"

  3. count count of observations in each subset of the data.

See Also

count_NAs, count_extremes


antonmalko/ettools documentation built on May 28, 2019, 3:35 p.m.