dfm_group: Combine documents in a dfm by a grouping variable

Description Usage Arguments Value Examples

View source: R/dfm_group.R

Description

Combine documents in a dfm by a grouping variable, which can also be one of the docvars attached to the dfm. This is identical in functionality to using the "groups" argument in dfm.

Usage

1
dfm_group(x, groups = NULL, fill = FALSE)

Arguments

x

a dfm

groups

either: a character vector containing the names of document variables to be used for grouping; or a factor or object that can be coerced into a factor equal in length or rows to the number of documents. See groups for details.

fill

logical; if TRUE and groups is a factor, then use all levels of the factor when forming the new "documents" of the grouped dfm. This will result in documents with zero feature counts for levels not observed. Has no effect if the groups variable(s) are not factors.

Value

dfm_group returns a dfm whose documents are equal to the unique group combinations, and whose cell values are the sums of the previous values summed by group. Document-level variables that have no variation within groups are saved in docvars.

Setting the fill = TRUE offers a way to "pad" a dfm with document groups that may not have been observed, but for which an empty document is needed, for various reasons. If groups is a factor of dates, for instance, then using fill = TRUE ensures that the new documents will consist of one row of the dfm per date, regardless of whether any documents previously existed with that date.

Examples

1
2
3
4
5
6
7
8
9
mycorpus <- corpus(c("a a b", "a b c c", "a c d d", "a c c d"),
                   docvars = data.frame(grp = c("grp1", "grp1", "grp2", "grp2")))
mydfm <- dfm(mycorpus)
dfm_group(mydfm, groups = "grp")
dfm_group(mydfm, groups = c(1, 1, 2, 2))

# equivalent
dfm(mydfm, groups = "grp")
dfm(mydfm, groups = c(1, 1, 2, 2))

Example output

quanteda version 0.99
Using 2 of 1 threads for parallel computing

Attaching package: 'quanteda'

The following object is masked from 'package:utils':

    View

Document-feature matrix of: 2 documents, 4 features (25% sparse).
2 x 4 sparse Matrix of class "dfmSparse"
      features
docs   a b c d
  grp1 3 2 2 0
  grp2 2 0 3 3
Document-feature matrix of: 2 documents, 4 features (25% sparse).
2 x 4 sparse Matrix of class "dfmSparse"
    features
docs a b c d
   1 3 2 2 0
   2 2 0 3 3
Document-feature matrix of: 2 documents, 4 features (25% sparse).
2 x 4 sparse Matrix of class "dfmSparse"
      features
docs   a b c d
  grp1 3 2 2 0
  grp2 2 0 3 3
Document-feature matrix of: 2 documents, 4 features (25% sparse).
2 x 4 sparse Matrix of class "dfmSparse"
    features
docs a b c d
   1 3 2 2 0
   2 2 0 3 3

quanteda documentation built on Nov. 20, 2018, 1:04 a.m.