knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4 )
R is known to be a very powerful language for the analysis and visualization. For visualizing data, there are better existing solutions that provide quite bit of interactivity with the ActivityInfo data e.g. ActivityInfo's built-in visualization tools, Power BI, Tableau etc.
However, there are number of things one can only do with R like advanced analysis such as prediction, statistical analysis, text mining and so on.
R is a batteries-included language that has many built-in calls for many
statistical analysis. Besides, it has external package system that one of the
best ones is ggplot
that provides powerful graphics to the users.
library(activityinfo)
We have an example fake dataset illustrating the building maintainance of the high schools in the Netherlands.
We first pull the data from ActivityInfo and we then use R calls to perform
analyses. The schools
table below is the saved & cleaned version of the data
pulled via the ActivityInfo API R client's queryTable()
call.
pat <- system.file("extdata", "schools.csv", package = "activityinfo") schools <- read.csv(pat, stringsAsFactors = FALSE) head(schools)
As an advanced analysis, we try to predict the building value based on the square meters of the building. For this analysis we use a linear regression model. On the one column, we have a high school that it has the square meters of the building and on the other column, we have a building value information.
library(ggplot2) ggplot(schools, aes(square_meters_of_school, building_value)) + geom_point(aes(color = as.factor(is_painted))) + geom_smooth(method = "lm", se = FALSE) + scale_y_continuous(labels = scales::dollar_format( scale = 1 / 1e6, prefix = "\u20ac", suffix = "M" )) + scale_color_discrete( name = "The building painted", breaks = c(1, 0), labels = c("Yes", "No") ) + labs( title = "Building value vs. sqm of building", x = "Square meters of building", y = "Building value" ) + theme_minimal()
We count the number of words in the description field, which is a multi-line narrative field in the ActivityInfo, and produce a bar chart.
library(quanteda) n_tokens <- ntoken(char_tolower(schools$situation_description), remove_punct = TRUE) n_tokens <- as.integer(n_tokens) df <- data.frame( school = schools$school_name, n_tokens = n_tokens, stringsAsFactors = FALSE ) df
ggplot(df, aes(x = reorder(school, n_tokens), y = n_tokens)) + geom_bar(stat = "identity", fill = "steelblue") + coord_flip() + theme_minimal()
For deeper information how the textual data stored in the ActivityInfo is analyzed, please check the QualMiner project.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.