slice.corpus: Subset documents using their positions

Description Usage Arguments Value Examples

View source: R/slice.R

Description

slice() lets you index documents by their (integer) locations. It allows you to select, remove, and duplicate documents. It is accompanied by a number of helpers for common use cases:

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## S3 method for class 'corpus'
slice(.data, ..., .preserve = FALSE)

## S3 method for class 'corpus'
slice_head(.data, ..., n, prop)

## S3 method for class 'corpus'
slice_tail(.data, ..., n, prop)

## S3 method for class 'corpus'
slice_sample(.data, ..., n, prop, weight_by = NULL, replace = FALSE)

## S3 method for class 'corpus'
slice_min(.data, ..., n, prop, with_ties = TRUE)

## S3 method for class 'corpus'
slice_max(.data, ..., n, prop, with_ties = TRUE)

Arguments

.data

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.

...

<data-masking> Expressions that return a logical value, and are defined in terms of the variables in .data. If multiple expressions are included, they are combined with the & operator. Only rows for which all conditions evaluate to TRUE are kept.

.preserve

Relevant when the .data input is grouped. If .preserve = FALSE (the default), the grouping structure is recalculated based on the resulting data, otherwise the grouping is kept as is.

n, prop

Provide either n, the number of documents, or prop, the proportion of documents to select. If neither are supplied, n = 1 will be used.

If n is greater than the number of rows in the group (or prop > 1), the result will be silently truncated to the group size. If the proportion of a group size is not an integer, it is rounded down.

weight_by

Sampling weights. This must evaluate to a vector of non-negative numbers the same length as the input. Weights are automatically standardised to sum to 1.

replace

Should sampling be performed with (TRUE) or without (FALSE, the default) replacement.

with_ties

Should ties be kept together? The default, TRUE, may return more rows than you request. Use FALSE to ignore ties, and return the first n rows.

Value

An object of the same type as .data. The output has the following properties:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
slice(data_corpus_inaugural, 2:5)
slice(data_corpus_inaugural, 55:n())
slice_head(data_corpus_inaugural, n = 2)
slice_tail(data_corpus_inaugural, n = 3)
slice_tail(data_corpus_inaugural, prop = .05)

set.seed(42)
slice_sample(data_corpus_inaugural, n = 3)
slice_sample(data_corpus_inaugural, prop = .10, replace = TRUE)

data_corpus_inaugural <- data_corpus_inaugural %>%
    mutate(ntoks = ntoken(data_corpus_inaugural))
# shortest three texts
slice_min(data_corpus_inaugural, ntoks, n = 3)
# longest three texts
slice_max(data_corpus_inaugural, ntoks, n = 3)

quanteda/quanteda.tidy documentation built on April 11, 2021, 3:44 p.m.