knitr::opts_chunk$set(echo = TRUE)

getcdw is a tool to make it easier to query the CADS data warehouse from R.

You can install getcdw from github:

devtools::install_github("tarakc02/getcdw")

Queries

You can type a query:

# dplyr just prints the returned data.frames prettier
library(dplyr)
library(getcdw)

get_cdw("select report_name from cdw.d_entity_mv where entity_id = 640993")

You can also pass a query from a file, for instance: get_cdw("sql/filename.sql").

Table and column search

Look for tables with find_tables():

# want the committees table, but don't remember the exact name:
find_tables("committee")

# or view all of the d_bio tables:
find_tables("d_bio")

# all of the giving summary tables are like sf_something_summary_mv:
find_tables("^sf_.+_summary_mv$")

Or look for specific columns, either throughout the database or within a given table:

# what's the name of the household_id column in d_entity?
find_columns("household", table_name = "d_entity_mv")

# where are all of the places degree information appears?
find_columns("degree")

Code search

You can search through TMS tables to find codes:

# where/how do we record peace_corps participation?
find_codes("peace corps")

# interest in neuroscience may present itself in terms of a major/minor, 
# an explicit interest code, attendance at an event, etc. let's look for
# all possibilities:
find_codes("neuroscience")

# i just want to look for neuro-related event codes
find_codes("neuro", "^activity$")

dplyr backend

If you use dplyr, you can now query the data warehouse directly, rather than writing SQL and pulling data into a local data frame:

# you have to have dplyr loaded to use dplyr
library(dplyr)

# use src_oracle("DSNNAME") to connect to dsn DSNNAME
# for example: src_oracle("CDW2") or src_oracle("URELUAT")
cdw <- src_oracle()

# each table you reference is described using the dplyr's tbl() function
entity <- tbl(cdw, "cdw.d_entity_mv")
transactions <- tbl(cdw, "cdw.f_transaction_detail_mv")

# note that auto-complete works inside the dplyr verbs!
entity <- entity %>% select(entity_id, person_or_org)
entity

# the top donors of 2001, along with whether they are people or organizations
transactions %>%
    filter(between(giving_record_dt,
                   to_date('01/01/2001', 'mm/dd/yyyy'),
                   to_date('12/31/2001', 'mm/dd/yyyy'))) %>%
    filter(pledged_basis_flg == "Y") %>%
    group_by(donor_entity_id_nbr) %>%
    summarise(giving = sum(benefit_dept_credited_amt)) %>%
    inner_join(entity, by = c("donor_entity_id_nbr" = "entity_id")) %>%
    arrange(desc(giving))

# dplyr will not actually pull the data into memory, which makes things
# pretty fast. when you're ready to actually download a data frame, run collect():

transactions %>%
    filter(between(giving_record_dt,
                   to_date('01/01/2001', 'mm/dd/yyyy'),
                   to_date('12/31/2001', 'mm/dd/yyyy'))) %>%
    filter(pledged_basis_flg == "Y") %>%
    group_by(donor_entity_id_nbr) %>%
    summarise(giving = sum(benefit_dept_credited_amt)) %>%
    filter(giving > 5000000) %>% 
    collect

Parameterized Templates

The parameterize_template function allows you to turn templates that have ##highlighted## variables into functions:

f <- parameterize_template("my name is ##name##")
f("tarak")

# need named arguments when more than 1 parameter:
g <-parameterize_template("hi i'm ##name##, i am ##age## years old") 
g(name = "tarak", age = 36)

parameterize_template is useful in creating reports because it allows you to write SQL with parameters, with the added benefit of the convenient function interface.

template <- "
select donor_entity_id_nbr, 
sum(benefit_dept_credited_amt) as giving
from cdw.f_transaction_detail_mv
where pledged_basis_flg = 'Y'
and giving_record_dt between to_date('##from_date##', 'mm/dd/yyyy') 
and to_date('##to_date##', 'mm/dd/yyyy')
group by donor_entity_id_nbr
having sum(benefit_dept_credited_amt) > 5000000"

top_donors_between <- parameterize_template(template)
report <- top_donors_between(from_date = "01/01/2001", to_date = "12/31/2001")
get_cdw(report)

Just like with get_cdw, parameterize_template works with files also, for example parameterize_template('sql/my-template.sql').



tarakc02/getcdw documentation built on April 2, 2022, 4:13 p.m.