knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

ideadata

ideadata helps data analysts at IDEA Public Schools access and use data stored in IDEA's data warehouse. ideadata does so in a tidyverse friendly way.

ideadata is an anagram for "data aide". Hench the logo. Ooooh Yeah!

Installation

Since ideadata is an internal IDEA package there is only a development version, which is installed from GitHub with:

``` {r install, eval=FALSE}

install.pacakges("remotes")

remotes::install_github("idea-analytics/ideadata")

renv::install("idea-analytics/ideadata@main") also works

## Example

Here's how you connect to a table in the warehouse.

```r
library(dplyr)
library(ideadata)

schools <- get_schools()

head(schools)

The schools object above is tbl object. That means it works with dplyr verbs and functions, but what happens in the background is that dplyr and dbplyr generate SQL that is sent to the database you are connected to and that all computation (e.g., filtering, selecting, joining, calculations, aggregation) are completed on the remote SQL Server instance and not on your computer.

Nevertheless, you will eventually want to pull that data down onto your machine when you want to use R or Python do what they can do (like modeling or graphics) that the database can't do.

Pulling that data down is easy with [dplyr::collect()]

library(dplyr)

schools_df <- schools %>% 
  collect() %>% 
  janitor::clean_names()

(Here janitor::clean_names() snake_cases all the column names).

What if I am pulling down lots fo data (say, millions of rows)?

In this instance the database connection may fail. It's not ideal, but it happens. One way to deal with this is to pull down the data piecemeal. The collector() function in ideadata makes this task trivial. It takes one argument, which is a column name form the table you want to pull down, which is used to break up the data into smaller sets of data that are pulled down from the database onto your computer and then recombined into a single table.

schools_df <- schools %>% 
  collector(SchoolState, CountyName) %>% 
  janitor::clean_names()


idea-analytics/ideadata documentation built on Feb. 1, 2024, 5:40 a.m.