as.textmeta.corpus: Transform corpus to textmeta

Description Usage Arguments Value Examples

View source: R/as.textmeta.corpus.R

Description

Transfers data from a corpus object - the way text data is stored in the package quanteda - to a textmeta object.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
as.textmeta.corpus(
  corpus,
  cols,
  dateFormat = "%Y-%m-%d",
  idCol = "id",
  dateCol = "date",
  titleCol = "title",
  textCol = "texts",
  duplicateAction = TRUE,
  addMetadata = TRUE
)

Arguments

corpus

Object of class corpus, package quanteda.

cols

Character: vector with columns which should be kept.

dateFormat

Character: string with the date format in the date column for as.Date.

idCol

Character: string with column name of the IDs in corpus - named "id" in the resulting data.frame.

dateCol

Character: string with column name of the Dates in corpus - named "date" in the resulting data.frame.

titleCol

Character: string with column name of the Titles in corpus - named "title" in the resulting data.frame.

textCol

Character: string with column name of the Texts in corpus - results in a named list ("id") of the Texts.

duplicateAction

Logical: Should deleteAndRenameDuplicates be applied to the created textmeta object?

addMetadata

Logical: Should the metadata flag of corpus be added to the meta flag of the textmeta object? If there are conflicts regarding the naming of columns, the metadata columns would be overwritten by the document specific columns.

Value

textmeta object

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
texts <- c("Give a Man a Fish, and You Feed Him for a Day.
 Teach a Man To Fish, and You Feed Him for a Lifetime",
 "So Long, and Thanks for All the Fish",
 "A very able manipulative mathematician, Fisher enjoys a real mastery
 in evaluating complicated multiple integrals.")

corp <- quanteda::corpus(x = texts)
obj <- as.textmeta.corpus(corp, addMetadata = FALSE)

quanteda::docvars(corp, "title") <- c("Fishing", "Don't panic!", "Sir Ronald")
quanteda::docvars(corp, "date") <- c("1885-01-02", "1979-03-04", "1951-05-06")
quanteda::docvars(corp, "id") <- c("A", "B", "C")
quanteda::docvars(corp, "additionalVariable") <- 1:3

obj <- as.textmeta.corpus(corp)

tosca documentation built on Oct. 28, 2021, 5:07 p.m.