knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(textoteR)
This package makes it easier to convert text corpora from one format to another. The available formats right now are:
The package is provided with a small example of TXM corpus (multiple .txt files + one metadata.csv file): 9 famous fables of La Fontaine.
Here I get the path to the corpus according to where the package is installed:
path_to_txm_corpus=system.file("extdata/fables", package="textoteR") print(path_to_txm_corpus)
Here are the files in the directory:
list.files(path_to_txm_corpus)
Here is how you can convert this corpus into an R tibble:
txm_to_rtibble(from_dir=path_to_txm_corpus)
The format of the corpus can also be changed from TXM to IRaMuTeQ through:
txm_to_iramuteq(from_dir=path_to_txm_corpus, filename="fables_iramuteq.txt")
See the first 10 lines of the created file "fables_iramuteq.txt" (removed from local files right afterwards):
cat(readLines("fables_iramuteq.txt")[1:10], sep="\n") file.remove("fables_iramuteq.txt")
The package is provided with a small example of IRaMuTeQ corpus (single .txt file with starred tags): 5 speeches pronounced by French President Macron during the COVID-19 crisis in 2020.
path_to_iramuteq_corpus=system.file("extdata", package="textoteR")
iramuteq_to_rtibble(from_dir=path_to_iramuteq_corpus, filename="macron_covid.txt")
The format of the corpus can also be changed from IRaMuTeQ to TXM through:
iramuteq_to_txm(from_dir=path_to_iramuteq_corpus, filename="macron_covid.txt", to_dir="macron_covid_corpus")
See the content of directory "macron_covid_corpus" (removed from local files right afterwards), and the content of file txt1.txt :
list.files("macron_covid_corpus") cat(readLines("macron_covid_corpus/txt1.txt")) unlink("macron_covid_corpus",recursive=TRUE)
The package contains an R data tibble LVtweets, with tweets, that contains both metadata variables and text.
head(LVtweets)
Here is how you can export such data into an IRaMuTeQ or TXM format:
rtibble_to_txm(rtibble=LVtweets, to_dir="LVtweets_txm") list.files("LVtweets_txm") # remove directory: unlink("LVtweets_txm", recursive=TRUE)
rtibble_to_iramuteq(rtibble=LVtweets, filename="LVtweets_ira.txt") # remove file: file.remove("LVtweets_ira.txt")
Note that tweets contain a certain number of special characters (e.g. emojis) and links that might cause TXM or IRaMuTeQ imports to fail. Such text data should probably be cleaned in R before conversion to TXM or IRaMuTeQ formats.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.