The ideadata
package utilizes the simple concept of get_*
functions, which is named after the table that you want to connect to in the warehouse in one line. Functions like [get_students] are useful because they lower your cognitive load ("What DB and schema do I pull students from and what's the SQL look like") and they ensure that anyone using this package (read: R&A) are always pulling the same data when they start an analysis, save any processing logic.
One big downside: they need to be created one at a time and so if a new set of tables is added to the data warehouse, there can a lag between when those tables are added and when there is a corresponding get_*
function in this package.
get_table()
To mitigate against the time Chris has to develop new get_*
functions and the ever evolving nature of data warehouse, ideadata
now has a new workhourse function: get_table()
.
This function does exactly what it says: it get's tables from the warehouse by looking up table location in the MetaData
table in the Documentation
schema of R&A's SQL Server instance. That table updates frequently and contains a comprehensive view of the data warehouse at any time.
When you load ideadata
(i.e., when you run library(ideadat)
the package makes a connection to the MetaData
table and saves it as a dataframe names warehouse_meta_data
that you can access by running data(warehouse_meta_data)
.
Not that you might need to detach and reload ideadata
if you know the warehouse has a new table but get_table()
can't find it. This will force the warehouse_meta_data
table to update.
If the table is uniquely named in the data warehouse then the usage is very straightforward. You need to simply the table name to function:
library('ideadata') #> ℹ Loading ideadata #> ── Gathering warehouse metadata ─────────────────────────────────────────────── ideadata 3.0.1 ── #> ℹ Checking credentials ... #> christopher.haid's Kereberos TGT is current #> ✓ Success: Warehouse metadata gathered! #> ℹ The warehouse currently houses: #> 2,226 tables #> in 134 databases #> on 17 servers #> WHEE! ideadata and you are healthily neat! #library(ideadata) library(tidyverse) #> Warning: package 'tidyverse' was built under R version 4.0.2 #> ── Attaching packages ──────────────────────────────────────────────────────── tidyverse 1.3.1 ── #> ✓ ggplot2 3.3.6 ✓ purrr 0.3.4 #> ✓ tibble 3.1.6 ✓ stringr 1.4.0 #> ✓ tidyr 1.2.0 ✓ forcats 0.5.1 #> ✓ readr 2.1.2 #> Warning: package 'tibble' was built under R version 4.0.2 #> Warning: package 'tidyr' was built under R version 4.0.5 #> Warning: package 'readr' was built under R version 4.0.5 #> Warning: package 'forcats' was built under R version 4.0.2 #> ── Conflicts ─────────────────────────────────────────────────────────── tidyverse_conflicts() ── #> x lubridate::as.difftime() masks base::as.difftime() #> x lubridate::date() masks base::date() #> x dplyr::filter() masks stats::filter() #> x lubridate::intersect() masks base::intersect() #> x dplyr::lag() masks stats::lag() #> x lubridate::setdiff() masks base::setdiff() #> x lubridate::union() masks base::union() oe_responses <- get_table("OEResponses") #> christopher.haid's Kereberos TGT is current head(oe_responses) #> # Source: lazy query [?? x 7] #> # Database: Microsoft SQL Server 15.00.4198[dbo@RGVPDRA-DASQL/GPTW] #> ID AcademicYear OEQuestionKey OEResponseKey GroupType Group Response #> <int64> <chr> <int> <int> <chr> <chr> <chr> #> 1 1 2020-2021 5 1 Region Austin "I would not change a thing;… #> 2 2 2020-2021 5 2 Region Austin "As a new teacher I had hear… #> 3 3 2020-2021 5 3 Region Austin "Choose locations more caref… #> 4 4 2020-2021 5 4 Region Austin "At my previous campus, I wo… #> 5 5 2020-2021 5 5 Region Austin "To treat it's employees lik… #> 6 6 2020-2021 5 6 Region Austin "More check-ins with manager"
Not too bad.
This function will fail if the tables is not uniquely named in the warehouse.
Why?
Because there is no way to know which table you really mean. One solution, if you know the location, is to provide the table name, database name, and schema to the .table_name
, database_name
, .schema
, and .server_name
arguments of get_table()
. That will locate the right server and make the connection and avoid failure.
But here is something cool: What if you don't know that info? When get_table
fails in this scenario it fails informatively. get_table
will show you all the tables it's found in the warehouse and give you code you can copy and paste to get the table you want.
Let's say you ask for the Schools
table
schools <- get_table("Schools") #> ! There are 22 tables with that name in our warehouse #> ℹ You'll need to specify the database and schema name with db target. #> ✓ Any of these should work: #> get_table(.table_name = "Schools", .database_name = "CSIInstruction", .schema = "dbo", .server_name = "1064618-SQLDI") #> get_table(.table_name = "Schools", .database_name = "CSIInstructionStage", .schema = "dbo", .server_name = "1064618-SQLDI") #> get_table(.table_name = "Schools", .database_name = "IDEAInstruction", .schema = "dbo", .server_name = "1064618-SQLDI") #> get_table(.table_name = "Schools", .database_name = "PROD1_2021ss", .schema = "Schools", .server_name = "1065574-SQLPRD1") #> get_table(.table_name = "Schools", .database_name = "PROD1_2020ss", .schema = "Schools", .server_name = "1065574-SQLPRD1") #> get_table(.table_name = "Schools", .database_name = "PROD1", .schema = "Schools", .server_name = "RGVPDSD-DWPRD1") #> get_table(.table_name = "Schools", .database_name = "SRC_Florida_Schools", .schema = "Powerschool", .server_name = "RGVPDSD-DWSRC1") #> get_table(.table_name = "Schools", .database_name = "SRC_Louisiana_Schools", .schema = "Powerschool", .server_name = "RGVPDSD-DWSRC1") #> get_table(.table_name = "Schools", .database_name = "SRC_Texas_Travis_Schools", .schema = "Skyward", .server_name = "RGVPDSD-DWSRC1") #> get_table(.table_name = "Schools", .database_name = "TravisSnapshotData", .schema = "MidLand.PM", .server_name = "RGVPDSD-DWSRC1") #> get_table(.table_name = "Schools", .database_name = "SRC_EA", .schema = "dbo", .server_name = "RGVPDSD-DWSRC2") #> get_table(.table_name = "Schools", .database_name = "SRC_Ohio_Schools", .schema = "Powerschool", .server_name = "RGVPDSD-DWSRC1") #> get_table(.table_name = "Schools", .database_name = "SRC_Florida_Schools_Focus_Dev", .schema = "Focus", .server_name = "RGVPDSD-DWSRC1") #> get_table(.table_name = "Schools", .database_name = "SRC_Ohio_Assessments", .schema = "Edcite", .server_name = "RGVPDSD-DWSRC2") #> get_table(.table_name = "Schools", .database_name = "SRC_StateAssessments", .schema = "dbo", .server_name = "RGVPDSD-DWSRC2") #> get_table(.table_name = "Schools", .database_name = "CSIInstructionDebug", .schema = "dbo", .server_name = "1064618-SQLDI") #> get_table(.table_name = "Schools", .database_name = "IDEAInstructionDebug", .schema = "dbo", .server_name = "1064618-SQLDI") #> get_table(.table_name = "Schools", .database_name = "IDEAInstructionStage", .schema = "dbo", .server_name = "1064618-SQLDI") #> get_table(.table_name = "Schools", .database_name = "PROD1", .schema = "Schools", .server_name = "1065574-SQLPRD1") #> get_table(.table_name = "Schools", .database_name = "Dashboard", .schema = "dbo", .server_name = "791150-HQVRA") #> get_table(.table_name = "Schools", .database_name = "Dashboard", .schema = "dbo", .server_name = "RGVPDRA-DASQL") #> get_table(.table_name = "Schools", .database_name = "SRC_Texas_Schools", .schema = "Powerschool", .server_name = "RGVPDSD-DWSRC1")
Oh no! There are 21 tables named Schools
in the warehouse on 17 databases. Yikes.
But look at that error: It's got what you need!
You can copy and paste any one of those lines to get the unique table you are seeking (in this case the Schools
table in Schools
on Prod1
).
schools <- get_table(.table_name = "Schools", .database_name = "PROD1", .schema = "Schools") #> ! There are 2 tables with that name in our warehouse #> ℹ You'll need to specify the database and schema name with db target. #> ✓ Any of these should work: #> get_table(.table_name = "Schools", .database_name = "PROD1", .schema = "Schools", .server_name = "RGVPDSD-DWPRD1") #> get_table(.table_name = "Schools", .database_name = "PROD1", .schema = "Schools", .server_name = "1065574-SQLPRD1") head(schools) #> NULL
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.