factorsDummies: Factor Conversion Utilities

factorsToDummiesR Documentation

Factor Conversion Utilities

Description

Utilities from converting back and forth between factors and dummy variables.

Usage

xyDataframeToMatrix(xy)
dummiesToInt(dms,inclLast=FALSE)
factorToDummies(f,fname,omitLast=FALSE,factorInfo=NULL)
factorsToDummies(dfr,omitLast=FALSE,factorsInfo=NULL,dfOut=FALSE)
dummiesToFactor(dms,inclLast=FALSE) 
charsToFactors(dtaf)
factorTo012etc(f,earlierLevels = NULL)
discretize(x,endpts)
getDFclasses(dframe)
hasCharacters(dfr)
hasFactors(x)
toAllNumeric(w,factorsInfo=NULL)
toSubFactor(f,saveLevels,lumpedLevel="zzzOther")
toSuperFactor(inFactor,superLevels)

Arguments

dfOut

If TRUE, return a data frame, otherwise a matrix.

dms

Matrix or data frame of dummy columns.

inclLast

When forming a factor from dummies, include the last dummy as a level if this is TRUE.

xy

A data frame mentioned for prediction, "Y" in last column.

saveLevels

In collapsing a factor, which levels to retain.

lumpedLevel

Name of new level to be created from levels not retained.

x

A numeric vector, except in hasFactors, where it is a data frame.

endpts

Vector to be used as breaks in call to cut. To avoid NAs, range of the vector must cover the range of the input vector.

f

A factor.

inFactor

Original factor, to be extended.

superLevels

New levels to be added to the original factor.

earlierLevels

Previous levels found for this factor.

fname

A factor name.

dfr

A data frame.

w

A data frame.

dframe

A data frame, for which we wish to find the column classes.

omitLast

If TRUE, then generate only k-1 dummies from k factor levels.

factorsInfo

Attribute from output of factorsToDummies.

factorInfo

Attribute from output of factorToDummies.

dtaf

A data frame.

Details

Many R users prefer to express categorical data as R factors, or often work with data that is of this type to begin with. On the other hand, many regression packages, e.g. lars, disallow factors. These utilities facilitate conversion from one form to another.

Here is an overview of the roles of the various functions:

  • factorToDummies: Convert one factor to dummies, yielding a matrix of dummies corresponding to that factor.

  • factorsToDummies: Convert all factors to dummies, yielding a matrix of dummies, corresponding to all factors in the input data frame.

  • dummiesToFactor: Convert a set of related dummies to a factor.

  • factorTo012etc: Convert a factor to a numeric code, starting at 0.

  • dummiesToInt: Convert a related set of dummies to a numeric code, starting at 0.

  • charsToFactors: Convert all character columns in a data frame to factors.

  • toAllNumeric: Convert all factors in a data frame to dummies, yielding a new version of the data frame, including its original nonfactor columns.

  • toSubFactor: Coalesce some levels of a factor, yielding a new factor.

  • toSuperFactor: Add levels to a factor. Typically used in prediction contexts, in which a factor in a data point to be predicted does not have all the levels of the same factor in the training set.

    \item xyDataframeToMatrix: Given a data frame to be used in a training set, with "Y" a factor in the last column, change to all numeric, with dummies in place of all "X" factors and in place of the "Y" factor.

The optional argument factorsInfo is intended for use in prediction contexts. Typically a set of new cases will not have all levels of the factor in the training set. Without this argument, only an incomplete set of dummies would be generated for the set of new cases.

A key point about changing factors to dummies is that, for later prediction after fitting a model in our training set, one needs to use the same transformations. Say a factor has levels 'abc', 'de' and 'f' (and omitLast = FALSE). If we later have a set of say two new cases to predict, and their values for this factor are 'de' and 'f', we would generate dummies for them but not for 'abc', incompatible with the three dummies used in the training set.

Thus the factor names and levels are saved in attributes, and can be used as input: The relations are as follows:

  • factorsToDummies calls factorToDummies on each factor it finds in its input data frame

  • factorToDummies outputs and later inputs factorsInfo

  • factorsToDummies outputs and later inputs factorsInfo

Other functions:

  • getDFclasses: Return a vector of the classes of the columns of a data frame.

  • discretize: Partition range of a vector into (not necessarily equal-length) intervals, and construct a factor from the labels of the intervals that the input elements fall into.

  • hasCharacters, hasFactors: Logical scalars, TRUE if the input data frame has any character or factor columns.

Value

The function factorToDummies returns a matrix of dummy variables, while factorsToDummies returns a new version of the input data frame, in which each factor is replaced by columns of dummies. The function factorToDummies is similar, but changes character vectors to factors.

Author(s)

Norm Matloff

Examples

x <- factor(c('abc','de','f','de'))
xd <- factorToDummies(x,'x')  
xd 
#      x.abc x.de
# [1,]     1    0
# [2,]     0    1
# [3,]     0    0
# [4,]     0    1
# attr(,"factorInfo")
# attr(,"factorInfo")$fname
# [1] "x"
# 
# attr(,"factorInfo")$omitLast
# [1] TRUE
# 
# attr(,"factorInfo")$fullLvls
# [1] "abc" "de"  "f"  
w <- factor(c('de','abc','abc'))
wd <- factorToDummies(w,'x',factorInfo=attr(xd,'factorInfo')) 
wd 
#      x.abc x.de
# [1,]     0    1
# [2,]     1    0
# [3,]     1    0
# attr(,"factorInfo")
# attr(,"factorInfo")$fname
# [1] "x"
# 
# attr(,"factorInfo")$omitLast
# [1] TRUE
# 
# attr(,"factorInfo")$fullLvls
# [1] "abc" "de"  "f"  


regtools documentation built on March 31, 2022, 1:06 a.m.