Description Usage Arguments Examples
View source: R/read.atekst.dir_function.R
Parse all .txt files downloaded from atekst within a directory (including subfolders). It can use a pattern (regex) to identify files. The function returns a data frame with the headline, paper, date, time, mode (net/print), url, and text for each article. In order to speed it up it is possible to run it in parallel by setting parallel
to TRUE
and setting cores
. When working with large corpuses it is recommended to run the function once and save the resulting data frame as a .RData
-file. That way it can be loaded (using load()
) into R in a fraction of the time it takes to parse the whole corpus.
1 2 |
dir |
Directory containing atekst .txt files. |
recursive |
If |
regex |
Regular expression (pattern) to use for selecting files to parse. |
parallel |
If |
cores |
The amount of cores to use (if |
1 2 | corpus <- read.atekst.dir("some/directory")
save(corpus, file = "atekst-corpus.RData")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.