Description Usage Arguments Examples
Parse all .txt files downloaded from atekst within a directory (including subfolders). It can use a pattern (regex) to identify files. The function returns a data frame with the headline, paper, date, time, mode (net/print), url, and text for each article. In order to speed it up it is possible to run it in parallel by setting parallel to TRUE and setting cores. When working with large corpuses it is recommended to run the function once and save the resulting data frame as a .RData-file. That way it can be loaded (using load()) into R in a fraction of the time it takes to parse the whole corpus.
1 2 |
dir |
Directory containing atekst .txt files. |
recursive |
If |
regex |
Regular expression (pattern) to use for selecting files to parse. |
parallel |
If |
cores |
The amount of cores to use (if |
1 2 | corpus <- read.atekst.dir("some/directory")
save(corpus, file = "atekst-corpus.RData")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.