dfm_subset: Extract a subset of a dfm

Description Usage Arguments Details Value See Also Examples

View source: R/dfm_subset.R

Description

Returns document subsets of a dfm that meet certain conditions, including direct logical operations on docvars (document-level variables). dfm_subset functions identically to subset.data.frame, using non-standard evaluation to evaluate conditions based on the docvars in the dfm.

Usage

1
dfm_subset(x, subset, select, ...)

Arguments

x

dfm object to be subsetted

subset

logical expression indicating the documents to keep: missing values are taken as false

select

expression, indicating the docvars to select from the dfm; or a dfm object, in which case the returned dfm will contain the same documents as the original dfm, even if these are empty. See Details.

...

not used

Details

To select or subset features, see dfm_select instead.

When select is a dfm, then the returned dfm will be equal in document dimension and order to the dfm used for selection. This is the document-level version of using dfm_select where pattern is a dfm: that function matches features, while dfm_subset will match documents.

Value

dfm object, with a subset of documents (and docvars) selected according to arguments

See Also

subset.data.frame

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
testcorp <- corpus(c(d1 = "a b c d", d2 = "a a b e",
                     d3 = "b b c e", d4 = "e e f a b"),
                   docvars = data.frame(grp = c(1, 1, 2, 3)))
testdfm <- dfm(testcorp)
# selecting on a docvars condition
dfm_subset(testdfm, grp > 1)
# selecting on a supplied vector
dfm_subset(testdfm, c(TRUE, FALSE, TRUE, FALSE))

# selecting on a dfm
dfm1 <- dfm(c(d1 = "a b b c", d2 = "b b c d"))
dfm2 <- dfm(c(d1 = "x y z", d2 = "a b c c d", d3 = "x x x"))
dfm_subset(dfm1, subset = dfm2)
dfm_subset(dfm1, subset = dfm2[c(3,1,2), ])

Example output

quanteda version 0.99
Using 2 of 1 threads for parallel computing

Attaching package: 'quanteda'

The following object is masked from 'package:utils':

    View

Document-feature matrix of: 2 documents, 6 features (41.7% sparse).
2 x 6 sparse Matrix of class "dfmSparse"
    features
docs a b c d e f
  d3 0 2 1 0 1 0
  d4 1 1 0 0 2 1
Document-feature matrix of: 2 documents, 6 features (41.7% sparse).
2 x 6 sparse Matrix of class "dfmSparse"
    features
docs a b c d e f
  d1 1 1 1 1 0 0
  d3 0 2 1 0 1 0
Document-feature matrix of: 3 documents, 4 features (50% sparse).
3 x 4 sparse Matrix of class "dfmSparse"
    features
docs a b c d
  d1 1 2 1 0
  d2 0 2 1 1
  d3 0 0 0 0
Document-feature matrix of: 3 documents, 4 features (50% sparse).
3 x 4 sparse Matrix of class "dfmSparse"
    features
docs a b c d
  d3 0 0 0 0
  d1 1 2 1 0
  d2 0 2 1 1

quanteda documentation built on Nov. 20, 2018, 1:04 a.m.