factorsToDummies | R Documentation |
Utilities from converting back and forth between factors and dummy variables.
xyDataframeToMatrix(xy) dummiesToInt(dms,inclLast=FALSE) factorToDummies(f,fname,omitLast=FALSE,factorInfo=NULL) factorsToDummies(dfr,omitLast=FALSE,factorsInfo=NULL,dfOut=FALSE) dummiesToFactor(dms,inclLast=FALSE) charsToFactors(dtaf) factorTo012etc(f,earlierLevels = NULL) discretize(x,endpts) getDFclasses(dframe) hasCharacters(dfr) hasFactors(x) toAllNumeric(w,factorsInfo=NULL) toSubFactor(f,saveLevels,lumpedLevel="zzzOther") toSuperFactor(inFactor,superLevels)
dfOut |
If TRUE, return a data frame, otherwise a matrix. |
dms |
Matrix or data frame of dummy columns. |
inclLast |
When forming a factor from dummies, include the last dummy as a level if this is TRUE. |
xy |
A data frame mentioned for prediction, "Y" in last column. |
saveLevels |
In collapsing a factor, which levels to retain. |
lumpedLevel |
Name of new level to be created from levels not retained. |
x |
A numeric vector, except in |
endpts |
Vector to be used as |
f |
A factor. |
inFactor |
Original factor, to be extended. |
superLevels |
New levels to be added to the original factor. |
earlierLevels |
Previous levels found for this factor. |
fname |
A factor name. |
dfr |
A data frame. |
w |
A data frame. |
dframe |
A data frame, for which we wish to find the column classes. |
omitLast |
If TRUE, then generate only k-1 dummies from k factor levels. |
factorsInfo |
Attribute from output of |
factorInfo |
Attribute from output of |
dtaf |
A data frame. |
Many R users prefer to express categorical data as R factors, or often work with data that is of this type to begin with. On the other hand, many regression packages, e.g. lars, disallow factors. These utilities facilitate conversion from one form to another.
Here is an overview of the roles of the various functions:
factorToDummies
: Convert one factor to dummies, yielding a
matrix of dummies corresponding to that factor.
factorsToDummies
: Convert all factors to dummies, yielding
a matrix of dummies, corresponding to all factors in the input data
frame.
dummiesToFactor
: Convert a set of related dummies to a
factor.
factorTo012etc
: Convert a factor to a numeric code,
starting at 0.
dummiesToInt
: Convert a related set of dummies to a numeric code,
starting at 0.
charsToFactors
: Convert all character columns in a data
frame to factors.
toAllNumeric
: Convert all factors in a data frame to
dummies, yielding a new version of the data frame, including its
original nonfactor columns.
toSubFactor
: Coalesce some levels of a factor, yielding a
new factor.
toSuperFactor
: Add levels to a factor. Typically used in
prediction contexts, in which a factor in a data point to be predicted
does not have all the levels of the same factor in the training set.
\item xyDataframeToMatrix
: Given a data frame to be used in
a training set, with "Y" a factor in the last column, change to all
numeric, with dummies in place of all "X" factors and in place of the
"Y" factor.
The optional argument factorsInfo
is intended for use in prediction
contexts. Typically a set of new cases will not have all levels of the
factor in the training set. Without this argument, only an incomplete
set of dummies would be generated for the set of new cases.
A key point about changing factors to dummies is that, for later
prediction after fitting a model in our training set, one needs to use
the same transformations. Say a factor has levels 'abc', 'de' and 'f'
(and omitLast = FALSE
). If we later have a set of say two new
cases to predict, and their values for this factor are 'de' and 'f', we
would generate dummies for them but not for 'abc', incompatible with the
three dummies used in the training set.
Thus the factor names and levels are saved in attributes, and can be used as input: The relations are as follows:
factorsToDummies
calls factorToDummies
on each
factor it finds in its input data frame
factorToDummies
outputs and later inputs factorsInfo
factorsToDummies
outputs and later inputs factorsInfo
Other functions:
getDFclasses
: Return a vector of the classes of the columns
of a data frame.
discretize
: Partition range of a vector into (not
necessarily equal-length) intervals, and construct a factor from the
labels of the intervals that the input elements fall into.
hasCharacters, hasFactors
: Logical scalars, TRUE if the
input data frame has any character or factor columns.
The function factorToDummies
returns a matrix of dummy
variables, while factorsToDummies
returns a new version of the
input data frame, in which each factor is replaced by columns of
dummies. The function factorToDummies
is similar, but changes
character vectors to factors.
Norm Matloff
x <- factor(c('abc','de','f','de')) xd <- factorToDummies(x,'x') xd # x.abc x.de # [1,] 1 0 # [2,] 0 1 # [3,] 0 0 # [4,] 0 1 # attr(,"factorInfo") # attr(,"factorInfo")$fname # [1] "x" # # attr(,"factorInfo")$omitLast # [1] TRUE # # attr(,"factorInfo")$fullLvls # [1] "abc" "de" "f" w <- factor(c('de','abc','abc')) wd <- factorToDummies(w,'x',factorInfo=attr(xd,'factorInfo')) wd # x.abc x.de # [1,] 0 1 # [2,] 1 0 # [3,] 1 0 # attr(,"factorInfo") # attr(,"factorInfo")$fname # [1] "x" # # attr(,"factorInfo")$omitLast # [1] TRUE # # attr(,"factorInfo")$fullLvls # [1] "abc" "de" "f"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.