rcrunchbase is an R client to the CrunchBase API (http://data.crunchbase.com/), supplemented with (hopefully) helpful functions that aim to create a compositional query flow. As much as possible, complex queries can be built up from simple requests. The intent is for rcrunchbase can handle the messy stuff while you focus on getting the data you want.

A simple request

Let's start by looking at the most basic query we can make, pulling the profile for the Golden State Warriors:

library(rcrunchbase)
gsw <- crunchbase_get_details("organizations/golden-state-warriors")

# take a look at the structure
ls.str(gsw[[1]])

All objects from the CrunchBase API can be categorized as one of the following types:

Accordingly, there are two functions in rcrunchbase that reflect this structure:

So, our gsw object is a node detail. You may have noticed that the object is wrapped in a one-element list: this is because crunchbase_get_details allows you to pull multiple entities with a single query.

Let's inspect our gsw object.:

names(gsw[[1]])
ls.str(gsw[[1]]$relationships)

Notice that the relationships section includes a number of data frames. In fact, each relationship is a collection object. crunchbase_expand_section allows us to usefully open up any of these relationships and inspect them in a natural way:

crunchbase_expand_section(gsw, "current_team")
# note that we can also expand multiple sections and combine the results:
crunchbase_expand_section(gsw, c("current_team", "past_team"))

These three functions can be combined in diverse ways, resulting in a much richer and more expressive approach to the API. To take full advantage of the compositional nature of these functions, it's useful to have a "piping" operator to pass results of one function to inputs for the next function. I prefer the %>% operator from the magrittr package.

library(magrittr)

gs_invests <- crunchbase_expand_section(gsw, c("current_team", "past_team")) %>%
    crunchbase_get_details %>%
    crunchbase_expand_section("investments")

# a list of all companies that current and past team members at the Golden State Warriors have invested in:
gs_invests$relationships.funding_round.relationships.funded_organization.properties.name

As you can see, this process can go on and on, once you get the hang of it. With the following, we will have completed the query that answers the question, "find all people listed in CrunchBase who work or worked for an organization that received investments from someone who works or worked with the Golden State Warriors." This type of question can come up when relationship mining, for instance:

inv_recips <- gs_invests %>% 
    crunchbase_get_details(df_path = "relationships.funding_round.relationships.funded_organization.properties.api_path") %>% 
    crunchbase_expand_section(c("current_team", "past_team"))

# how many people did we identify?
nrow(inv_recips)

# who are they?
head(inv_recips[c("relationships.person.properties.first_name", "relationships.person.properties.last_name", "properties.title")])


gwcottle/rcrunchbase documentation built on Jan. 25, 2021, 2:40 a.m.