Description Usage Arguments Details Slots See Also Examples

The dfm class of object is a type of Matrix-class
object with additional slots, described below. quanteda uses two
subclasses of the `dfm`

class, depending on whether the object can be
represented by a sparse matrix, in which case it is a `dfmSparse`

class object, or if dense, then a `dfmDense`

object. See Details.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | ```
## S4 method for signature 'dfmDense'
t(x)
## S4 method for signature 'dfmSparse'
t(x)
## S4 method for signature 'dfmSparse'
colSums(x, na.rm = FALSE, dims = 1L, ...)
## S4 method for signature 'dfmSparse'
rowSums(x, na.rm = FALSE, dims = 1L, ...)
## S4 method for signature 'dfmSparse'
colMeans(x, na.rm = FALSE, dims = 1L, ...)
## S4 method for signature 'dfmSparse'
rowMeans(x, na.rm = FALSE, dims = 1L, ...)
## S4 method for signature 'dfmSparse,numeric'
e1 + e2
## S4 method for signature 'numeric,dfmSparse'
e1 + e2
## S4 method for signature 'dfmDense,numeric'
e1 + e2
## S4 method for signature 'numeric,dfmDense'
e1 + e2
## S4 method for signature 'dfm,index,index,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'dfm,index,index,logical'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'dfm,missing,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'dfm,missing,missing,logical'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'dfm,index,missing,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'dfm,index,missing,logical'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'dfm,missing,index,missing'
x[i, j, ..., drop = FALSE]
## S4 method for signature 'dfm,missing,index,logical'
x[i, j, ..., drop = FALSE]
``` |

`x` |
the dfm object |

`na.rm` |
if |

`dims` |
ignored |

`...` |
additional arguments not used here |

`e1` |
first quantity in "+" operation for dfm |

`e2` |
second quantity in "+" operation for dfm |

`i` |
index for documents |

`j` |
index for features |

`drop` |
always set to |

The `dfm`

class is a virtual class that will contain one of two
subclasses for containing the cell counts of document-feature matrixes:
`dfmSparse`

or `dfmDense`

.

The `dfmSparse`

class is a sparse matrix version of
`dfm-class`

, inheriting dgCMatrix-class from the
Matrix package. It is the default object type created when feature
counts are the object of interest, as typical text-based feature counts
tend contain many zeroes. As long as subsequent transformations of the dfm
preserve cells with zero counts, the dfm should remain sparse.

When the Matrix package implements sparse integer matrixes, we will switch the default object class to this object type, as integers are 4 bytes each (compared to the current numeric double type requiring 8 bytes per cell.)

The `dfmDense`

class is a sparse matrix version of `dfm-class`

,
inheriting dgeMatrix-class from the Matrix package. dfm objects that
are converted through weighting or other transformations into cells without zeroes will
be automatically converted to the dfmDense class. This will necessarily be a much larger sized
object than one of `dfmSparse`

class, because each cell is recorded as a numeric (double) type
requiring 8 bytes of storage.

`settings`

settings that govern corpus handling and subsequent downstream operations, including the settings used to clean and tokenize the texts, and to create the dfm. See

`settings`

.`weighting`

the feature weighting applied to the dfm. Default is

`"frequency"`

, indicating that the values in the cells of the dfm are simple feature counts. To change this, use the`weight`

method.`smooth`

a smoothing parameter, defaults to zero. Can be changed using either the

`smooth`

or the`weight`

methods.`Dimnames`

These are inherited from Matrix-class but are named

`docs`

and`features`

respectively.

dfm

1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ```
# dfm subsetting
x <- dfm(tokens(c("this contains lots of stopwords",
"no if, and, or but about it: lots",
"and a third document is it"),
remove_punct = TRUE))
x[1:2, ]
x[1:2, 1:5]
# fcm subsetting
y <- fcm(tokens(c("this contains lots of stopwords",
"no if, and, or but about it: lots"),
remove_punct = TRUE))
y[1:3, ]
y[4:5, 1:5]
``` |

quanteda documentation built on May 19, 2017, 8:44 a.m.

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs in the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.