write.corpus: writes texts to a disk in a format supported by the stylo...

Description Usage Arguments Value See Also Examples

View source: R/write.corpus.R

Description

writes texts to a disk in a format supported by the stylo package

Usage

1
2
write.corpus(data, groupBy1 = character(), groupBy2 = character(),
  directory = "corpus", sample = 1, limit = 10^7)

Arguments

data

set of texts to be written (obtained from get.texts)

groupBy1

variable name (or vector of variable names) describing stylo text class (texts grouping variable, e.g. author)

groupBy2

variable name (or vector of variable names) describing stylo text subclass (e.g. text title)

directory

directory path to write corpus into

sample

fraction of texts to sample from each group (from 0 to 1)

limit

maximum number of characters in each file

Value

data.frame all texts which were not sampled

See Also

get.texts

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
## Not run: 
  # connect to the texts database
  db.connect()

  # fetch all texts from the "British Fiction" source
  allTexts = get.texts()
  interestingTexts = allTexts %>%
    filter(source == 'British Fiction')

  # write to disk in the stylo package format
  # (for the stylo() function, so in directory called "corpus")
  # taking the author as a main category and the title as a subcategory
  write.corpus(interestingTexts, 'author', 'title', 'corpus')

  # write to disk in the stylo package format
  # (for the classify() function, so there are two sets of text in
  #   directories "primary_set" and "secondary_set")
  # taking the author as a main category and the title as a subcategory
  secondarySet = write.corpus(interestingTexts, 'author', 'title', 'primary_set', 0.7)
  write.corpus(secondarySet, 'author', 'title', 'secondary_set')

## End(Not run)

zozlak/styloWorkshop documentation built on May 5, 2019, 1:37 a.m.