Factor-class: Factor objects

Factor-classR Documentation

Factor objects

Description

The Factor class serves a similar role as factor in base R (a.k.a. ordinary factor) except that the levels of a Factor object can be any vector-like object, that is, they can be an ordinary vector or a Vector derivative, or even an ordinary factor or another Factor object.

A notable difference with ordinary factors is that Factor objects cannot contain NAs, at least for now.

Usage

Factor(x, levels, index=NULL, ...)  # constructor function

Arguments

x, levels

At least one of x and levels must be specified. If index is NULL, both can be specified.

When levels is specified, it must be a vector-like object (see above) with no duplicates (i.e. anyDuplicated(levels) must return 0).

When x and levels are both specified, they should typically have the same class (or, at least, match(x, levels) must work on them), and all the elements in x must be represented in levels (i.e. the integer vector returned by match(x, levels) should contain no NAs). See Details section below.

index

NULL or an integer (or numeric) vector of valid positive indices (no NAs) into levels. See Details section below.

...

Optional metadata columns.

Details

There are 4 different ways to use the Factor() constructor function:

  1. Factor(x, levels) (i.e. index is missing): In this case match(x, levels) is used internally to encode x as a Factor object. An error is returned if some elements in x cannot be matched to levels so it's important to make sure that all the elements in x are represented in levels when doing Factor(x, levels).

  2. Factor(x) (i.e. levels and index are missing): This is equivalent to Factor(x, levels=unique(x)).

  3. Factor(levels=levels, index=index) (i.e. x is missing): In this case the encoding of the Factor object is supplied via index, that is, index must be an integer (or numeric) vector of valid positive indices (no NAs) into levels. This is the most efficient way to construct a Factor object.

  4. Factor(levels=levels) (i.e. x and index are missing): This is a convenient way to construct a 0-length Factor object with the specified levels. In other words, it's equivalent to Factor(levels=levels, index=integer(0)).

Value

A Factor object.

Accessors

Factor objects support the same set of accessors as ordinary factors. That is:

  • length(x) to get the length of Factor object x.

  • names(x) and names(x) <- value to get and set the names of Factor object x.

  • levels(x) and levels(x) <- value to get and set the levels of Factor object x.

  • nlevels(x) to get the number of levels of Factor object x.

  • as.integer(x) to get the encoding of Factor object x. Note that length(as.integer(x)) and names(as.integer(x)) are the same as length(x) and names(x), respectively.

In addition, because Factor objects are Vector derivatives, they support the mcols() and metadata() getters and setters.

Decoding a Factor

unfactor(x) can be used to decode Factor object x. It returns an object of the same class as levels(x) and same length as x. Note that it is the analog of as.character() on ordinary factors, with the notable difference that unfactor(x) propagates the names on x.

For convenience, unfactor(x) also works on ordinary factor x.

unfactor() supports extra arguments use.names and ignore.mcols to control whether the names and metadata columns on the Factor object to decode should be propagated or not. By default they are propagated, that is, the default values for use.names and ignore.mcols are TRUE and FALSE, respectively.

Coercion

From vector or Vector to Factor: coercion of a vector-like object x to Factor is supported via as(x, "Factor") and is equivalent to Factor(x). There are 2 IMPORTANT EXCEPTIONS to this:

  1. If x is an ordinary factor, as(x, "Factor") returns a Factor with the same levels, encoding, and names, as x. Note that after coercing an ordinary factor to Factor, going back to factor again (with as.factor()) restores the original object with no loss.

  2. If x is a Factor object, as(x, "Factor") is either a no-op (when x is a Factor instance), or a demotion to Factor (when x is a Factor derivative like GRangesFactor).

From Factor to integer: as.integer(x) is supported on Factor object x and returns its encoding (see Accessors section above).

From Factor to factor: as.factor(x) is supported on Factor object x and returns an ordinary factor where the levels are as.character(levels(x)).

From Factor to character: as.character(x) is supported on Factor object x and is equivalent to unfactor(as.factor(x)), which is also equivalent to as.character(unfactor(x)).

Subsetting

A Factor object can be subsetted with [, like an ordinary factor.

Concatenation

2 or more Factor objects can be concatenated with c(). Note that, unlike with ordinary factors, c() on Factor objects preserves the class i.e. it returns a Factor object. In other words, c() acts as an endomorphism on Factor objects.

The levels of c(x, y) are obtained by appending to levels(x) the levels in levels(y) that are "new" i.e. that are not already in levels(x).

append(), which is implemented on top of c(), also works on Factor objects.

Comparing & Ordering

Factor objects support comparing (e.g. ==, !=, <=, <, match()) and ordering (e.g. order(), sort(), rank()) operations. All these operations behave like they would on the unfactored versions of their operands.

For example F1 <= F2, match(F1, F2), and sort(F1), are equivalent to unfactor(F1) <= unfactor(F2), match(unfactor(F1), unfactor(F2)), and sort(unfactor(F1)), respectively.

Author(s)

Hervé Pagès, with contributions from Aaron Lun

See Also

  • factor in base R.

  • GRangesFactor objects in the GenomicRanges package.

  • IRanges objects in the IRanges package.

  • Vector objects for the parent class.

  • anyDuplicated in the BiocGenerics package.

Examples

showClass("Factor")  # Factor extends Vector

## ---------------------------------------------------------------------
## CONSTRUCTOR & ACCESSORS
## ---------------------------------------------------------------------
library(IRanges)
set.seed(123)
ir0 <- IRanges(sample(5, 8, replace=TRUE), width=10,
               names=letters[1:8], ID=paste0("ID", 1:8))

## Use explicit levels:
ir1 <- IRanges(1:6, width=10)
F1 <- Factor(ir0, levels=ir1)
F1
length(F1)
names(F1)
levels(F1)  # ir1
nlevels(F1)
as.integer(F1)  # encoding

## If we don't specify the levels, they'll be set to unique(ir0):
F2 <- Factor(ir0)
F2
length(F2)
names(F2)
levels(F2)  # unique(ir0)
nlevels(F2)
as.integer(F2)

## ---------------------------------------------------------------------
## DECODING
## ---------------------------------------------------------------------
unfactor(F1)

stopifnot(identical(ir0, unfactor(F1)))
stopifnot(identical(ir0, unfactor(F2)))

unfactor(F1, use.names=FALSE)
unfactor(F1, ignore.mcols=TRUE)

## ---------------------------------------------------------------------
## COERCION
## ---------------------------------------------------------------------
F2b <- as(ir0, "Factor")  # same as Factor(ir0)
stopifnot(identical(F2, F2b))

as.factor(F2)
as.factor(F1)

as.character(F1)  # same as unfactor(as.factor(F1)),
                  # and also same as as.character(unfactor(F1))

## On an ordinary factor 'f', 'as(f, "Factor")' and 'Factor(f)' are
## NOT the same:
f <- factor(sample(letters, 500, replace=TRUE), levels=letters)
as(f, "Factor")  # same levels as 'f'
Factor(f)        # levels **are** 'f'!

stopifnot(identical(f, as.factor(as(f, "Factor"))))

## ---------------------------------------------------------------------
## CONCATENATION
## ---------------------------------------------------------------------
ir3 <- IRanges(c(5, 2, 8:6), width=10)
F3 <- Factor(levels=ir3, index=2:4)
F13 <- c(F1, F3)
F13
levels(F13)

stopifnot(identical(c(unfactor(F1), unfactor(F3)), unfactor(F13)))

## ---------------------------------------------------------------------
## COMPARING & ORDERING
## ---------------------------------------------------------------------
F1 == F2   # same as unfactor(F1) == unfactor(F2)

order(F1)  # same as order(unfactor(F1))
order(F2)  # same as order(unfactor(F2))

## The levels of the Factor influence the order of the table:
table(F1)
table(F2)

Bioconductor/S4Vectors documentation built on Nov. 17, 2024, 6:55 p.m.