library(tropr, warn.conflicts = F) library(dplyr, warn.conflicts = F) library(ggplot2, warn.conflicts = F)
Either you try stable CRAN version
install.packages("tropr")
Or unstable development version
devtools::install_github("zedoul/tropr")
You'll need to use library
to load as follows:
library(tropr)
library(tropr) .url <- "http://tvtropes.org/pmwiki/pmwiki.php/Main/SenseiChan" content <- trope_content(.url) .df <- as.data.frame(content) head(.df[, c("category", "link")])
The output is tidy
, so that you can easily turn them into a plot as follows:
library(ggplot2) q <- ggplot(.df, aes(category)) + geom_bar() + coord_flip() + ylab("Freq") + xlab("")
plot(q)
It would be very hard to implement and to maintain data fetching module from
the bottom. tropr
provides easy-to-use function trope_cache
which will check
redirects, filtering, and data fetching automatically.
Here is one example of how you can use trope_cache
.
library(tropr) .urls <- c("http://tvtropes.org/pmwiki/pmwiki.php/Main/SenseiChan") res <- trope_cache(.urls, depth = 3, filter_pattern = "/Main/")
It will fetch a set of tropes starting from .urls
in three tree-depth.
You can set filter option, where the example uses /Main/
, to fetch specific
category that you are interested in.
Once it cached, you can fetch those data with trope_data
function, regardless
of your network connection status.
Note that cache directory is temporary given from tempdir()
unless you set
it by yourself with cache_dir
parameter.
Apart from page analysis, tropr also provides functions for page history
analysis. Here is the example with Little Witch Academia
page history.
library(tropr) library(ggplot2) library(dplyr) .url <- "http://tvtropes.org/pmwiki/pmwiki.php/Characters/LittleWitchAcademia" hist_content <- trope_history(.url) .data <- aggr_history_daily_count(hist_content) .resolution <- "1 week" # In this case, we are interested in history only after the TV release, and # before the episode 23 .data <- .data[c(48:192), ] q <- ggplot(.data, aes(x = date, y = count)) + geom_line(size= .8) + theme(axis.text.x = element_text(angle = 50, hjust = 1, vjust = 1, size = 8), plot.title = element_text(hjust = 0.5)) + scale_x_date(date_breaks = .resolution) + ggtitle("Edits on Little Witch Academia Main Page in TV Tropes") + xlab("") + ylab("#edit")
plot(q)
And we can compare impacts of all episodes 1-25, with others as follows:
library(tropr) library(ggplot2) library(magrittr) impacts <- .data %>% group_by(date = cut(date, "week")) %>% summarise(impact = sum(count)) impacts$date <- paste0("7 days from ", as.character(impacts$date), " (", paste("Episode", rep(1:25)), ")") q <- ggplot(impacts, aes(x = reorder(date, impact), y = impact, fill = impact)) + geom_bar(stat= "identity") + theme(axis.text.y = element_text(size = 6)) + coord_flip() + ggtitle("Weekly edits on Little Witch Academia Characters Page in TV Tropes") + ylab("") + xlab("")
plot(q)
It is important to check the distribution of editors' commitments on the page. Following example shows how you can do with tropr. Note that this will create distribution not based on subset like previous examples, but on whole dataset.
.data <- aggr_history_editor_count(hist_content) q <- ggplot(.data, aes(x = reorder(editor, count), y = count, fill = count)) + geom_bar(stat= "identity") + theme(axis.text.y = element_text(size = 3)) + coord_flip() + ggtitle("Editors on Little Witch Academia Characters Page in TV Tropes\n", paste("from", tail(hist_content)[5, "datetime"], "to", head(hist_content[1, "datetime"]))) + ylab("") + xlab("")
plot(q)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.