corpus_frame: Corpus Data Frame

Description Usage Arguments Details Value See Also Examples

View source: R/corpus_frame.R

Description

Create or test for corpus objects.

Usage

1
2
3
4
5

Arguments

...

data frame columns for corpus_frame; further arguments passed to as_corpus_text from as_corpus_frame.

row.names

character vector of row names for the corpus object.

filter

text filter object for the "text" column in the corpus object.

x

object to be coerced or tested.

Details

These functions create or convert another object to a corpus object. A corpus object is just a data frame with special functions for printing, and a column names "text" of type "corpus_text".

corpus has similar semantics to the data.frame function, except that string columns do not get converted to factors.

as_corpus_frame converts another object to a corpus data frame object. By default, the method converts x to a data frame with a column named "text" of type "corpus_text", and sets the class attribute of the result to c("corpus_frame", "data.frame").

is_corpus_frame tests whether x is a data frame with a column named "text" of type "corpus_text".

as_corpus_frame is generic: you can write methods to handle specific classes of objects.

Value

corpus_frame creates a data frame with a column named "text" of type "corpus_text", and a class attribute set to c("corpus_frame", "data.frame").

as_corpus_frame attempts to coerce its argument to a corpus data frame object, setting the row.names and calling as_corpus_text on the "text" column with the filter and ... arguments.

is_corpus_frame returns TRUE or FALSE depending on whether its argument is a valid corpus object or not.

See Also

corpus-package, print.corpus_frame, corpus_text, read_ndjson.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# convert a data frame:
emoji <- data.frame(text = sapply(0x1f600 + 1:30, intToUtf8),
                    stringsAsFactors = FALSE)
as_corpus_frame(emoji)

# construct directly (no need for stringsAsFactors = FALSE):
corpus_frame(text = sapply(0x1f600 + 1:30, intToUtf8))
    
# convert a character vector:
as_corpus_frame(c(a = "goodnight", b = "moon")) # keeps names
as_corpus_frame(c(a = "goodnight", b = "moon"), row.names = NULL) # drops names

Example output

   text      
1  \U0001f601
2  \U0001f602
3  \U0001f603
4  \U0001f604
5  \U0001f605
6  \U0001f606
7  \U0001f607
8  \U0001f608
9  \U0001f609
10 \U0001f60a
11 \U0001f60b
12 \U0001f60c
13 \U0001f60d
14 \U0001f60e
15 \U0001f60f
16 \U0001f610
17 \U0001f611
18 \U0001f612
19 \U0001f613
20 \U0001f614
.  (30 rows total)
   text      
1  \U0001f601
2  \U0001f602
3  \U0001f603
4  \U0001f604
5  \U0001f605
6  \U0001f606
7  \U0001f607
8  \U0001f608
9  \U0001f609
10 \U0001f60a
11 \U0001f60b
12 \U0001f60c
13 \U0001f60d
14 \U0001f60e
15 \U0001f60f
16 \U0001f610
17 \U0001f611
18 \U0001f612
19 \U0001f613
20 \U0001f614
.  (30 rows total)
  text     
a goodnight
b moon     
  text     
a goodnight
b moon     

corpus documentation built on May 2, 2021, 9:06 a.m.