FactorUtils: R Factor Utilities

R Factor UtilitiesR Documentation

R Factor Utilities

Description

Utilities to manipulate R factors, extending the ones in regtools.

Usage

levelCounts(data)
dataToTopLevels(data,lowCountThresholds)
factorToTopLevels(f,lowCountThresh=0)
cartesianFactor(dataName,factorNames,fNameSep = ".")
qeRareLevels(x, yName, yesYVal = NULL)

Arguments

data

A data frame or equivalent.

f

An R factor.

lowCountThresh

Factor levels will counts below this value will not be used for this factor.

lowCountThresholds

An R list of column names and their corresponding values of lowCountThresh.

dataName

A quoted name of a data frame or equivalent.

factorNames

A vector of R factor names.

fNameSep

A character to be used as a delimiter in the names of the levels of the output factor.

x

A data frame.

yName

Quoted name of the response variable.

yesYVal

In the case of binary Y, the factor level to be considered positive.

Details

Often one has an R factor in which one or more levels are rare in the data. This could cause problems, say in performing cross-validation; a level in the test set might be "new," not having appeared in the training set. Toward this end, factorToTopLevels will remove rare levels from a factor; dataToTopLevels applies this to an entire data frame.

Also toward this end, the function levelCounts simply applies table() to each column of data, returning the result as an R list. (If more than 10 levels, it returns NA.

The function cartesianFactor generates a "superfactor" from individual ones; e.g. if factors f1 and f2 have n1 and n2 levels, the output is a new factor with n1 * n2 levels.

The function qeRareLevels checks all columns in a data frame in terms of being an R factor with rare levels.

Author(s)

Norm Matloff

Examples


data(svcensus)
levelCounts(svcensus)  # e.g. finds there are 15182 men, 4908 women
f1 <- svcensus$gender  # 2 levels
f2 <- svcensus$occ  # 6 levels
z <- cartesianFactor('svcensus',c('gender','occ'))
head(z)
# [1] female.102 male.101   female.102 male.100   female.100 male.100  
# 12 Levels: female.100 female.101 female.102 female.106 ... male.141


matloff/qeML documentation built on Dec. 15, 2024, 10:15 a.m.