knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" ) system("docker run --name neo4j --env NEO4J_AUTH=neo4j/password --publish=7474:7474 --publish=7687:7687 -d neo4j") library(neo4r) con <- neo4j_api$new( url = "http://localhost:7474", user = "neo4j", password = "password" ) while (try(con$ping()) != 200){ Sys.sleep(3) }
Disclaimer: this package is still under active development. Read the NEWS.md to be informed of the last changes.
Read complementary documentation at https://neo4j-rstats.github.io/user-guide/
The goal of {neo4r} is to provide a modern and flexible Neo4J driver for R.
It's modern in the sense that the results are returned as tibbles whenever possible, it relies on modern tools, and it is designed to work with pipes. Our goal is to provide a driver that can be easily integrated in a data analysis workflow, especially by providing an API working smoothly with other data analysis ({dplyr}
or {purrr}
) and graph packages ({igraph}
, {ggraph}
, {visNetwork}
...).
It's flexible in the sense that it is rather unopinionated regarding the way it returns the results, by trying to stay as close as possible to the way Neo4J returns data. That way, you have the control over the way you will compute the results. At the same time, the result is not too complex, so that the "heavy lifting" of data wrangling is not left to the user.
The connexion object is also an easy to control R6 method, allowing you to update and query information from the API.
Please note that for now, the connection is only possible through http / https.
You can install {neo4r} from GitHub with:
# install.packages("remotes") remotes::install_github("neo4j-rstats/neo4r")
or from CRAN :
install.packages("neo4r")
Start by creating a new connexion object with neo4j_api$new
library(neo4r) con <- neo4j_api$new( url = "http://localhost:7474", user = "neo4j", password = "plop" )
This connexion object is designed to interact with the Neo4J API.
It comes with some methods to retrieve information from it. ping()
, for example, tests if the endpoint is available.
# Test the endpoint, that will not work : con$ping()
Being an R6 object, con
is flexible in the sense that you can change url
, user
and password
at any time:
con$reset_user("neo4j") con$reset_password("password") con$ping()
Other methods:
# Get Neo4J Version con$get_version() # List constaints (if any) con$get_constraints() # Get a vector of labels (if any) con$get_labels() # Get a vector of relationships (if any) con$get_relationships() # Get index con$get_index()
You can either create a separate query or insert it inside the call_neo4j
function.
The call_neo4j()
function takes several arguments :
query
: the cypher querycon
: the connexion object type
: "rows" or "graph": whether to return the results as a list of results in tibble, or as a graph object (with $nodes
and $relationships
)output
: the output format (R or json) include_stats
: whether or not to include the stats about the callmeta
: whether or not to include the meta arguments of the nodes when calling with "rows"Starting at version 0.1.3, the play_movie()
function returns the full cypher query to create the movie graph example from the Neo4J examples.
play_movies() %>% call_neo4j(con)
The user chooses whether or not to return a list of tibbles when calling the API. You get as many objects as specified in the RETURN cypher statement.
library(magrittr) 'MATCH (tom {name: "Tom Hanks"}) RETURN tom;' %>% call_neo4j(con) 'MATCH (cloudAtlas {title: "Cloud Atlas"}) RETURN cloudAtlas;' %>% call_neo4j(con) "MATCH (people:Person)-[relatedTo]-(:Movie {title: 'Cloud Atlas'}) RETURN people.name, Type(relatedTo), relatedTo" %>% call_neo4j(con, type = 'row')
By default, results are returned as an R list of tibbles. For example here, RETURN tom
will return a one element list, with object named tom
. We think this is the more "truthful" way to implement the outputs regarding Neo4J calls.
When you want to return two nodes types, you'll get two results, in the form of two tibbles - the result is a two elements list with each element being labelled the way it has been specified in the Cypher query.
'MATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(tomHanksMovies) RETURN tom,tomHanksMovies' %>% call_neo4j(con)
Results can also be returned in JSON, for example for writing to a file:
tmp <- tempfile(fileext = ".json") 'MATCH (people:Person) RETURN people.name LIMIT 1' %>% call_neo4j(con, output = "json") %>% write(tmp) jsonlite::read_json(tmp)
If you turn the type
argument to "graph"
, you'll get a graph result:
'MATCH (tom:Person {name: "Tom Hanks"})-[act:ACTED_IN]->(tomHanksMovies) RETURN act,tom,tomHanksMovies' %>% call_neo4j(con, type = "graph")
The result is returned as one node or relationship by row.
Due to the specific data format of Neo4J, there can be more than one label and property by node and relationship. That's why the results is returned, by design, as a list-dataframe.
We have designed several functions to unnest the output :
+unnest_nodes()
, that can unnest a node dataframe :
res <- 'MATCH (tom:Person {name:"Tom Hanks"})-[a:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors) RETURN m AS acted,coActors.name' %>% call_neo4j(con, type = "graph") unnest_nodes(res$nodes)
Please, note that this function will return NA
for the properties that aren't in a node.
Also, it is possible to unnest either the properties or the labels :
res %>% extract_nodes() %>% unnest_nodes(what = "properties")
res %>% extract_nodes() %>% unnest_nodes(what = "label")
unnest_relationships()
There is only one nested column in the relationship table, thus the function is quite straightforward :
'MATCH (people:Person)-[relatedTo]-(:Movie {title: "Cloud Atlas"}) RETURN people.name, Type(relatedTo), relatedTo' %>% call_neo4j(con, type = "graph") %>% extract_relationships() %>% unnest_relationships()
Note that unnest_relationships()
only does one level of unnesting.
unnest_graph
This function takes a graph results, and does unnest_nodes
and unnest_relationships
.
'MATCH (people:Person)-[relatedTo]-(:Movie {title: "Cloud Atlas"}) RETURN people.name, Type(relatedTo), relatedTo' %>% call_neo4j(con, type = "graph") %>% unnest_graph()
There are two convenient functions to extract nodes and relationships:
'MATCH (bacon:Person {name:"Kevin Bacon"})-[*1..4]-(hollywood) RETURN DISTINCT hollywood' %>% call_neo4j(con, type = "graph") %>% extract_nodes()
'MATCH p=shortestPath( (bacon:Person {name:"Kevin Bacon"})-[*]-(meg:Person {name:"Meg Ryan"}) ) RETURN p' %>% call_neo4j(con, type = "graph") %>% extract_relationships()
In order to be converted into a graph object:
The nodes should be a dataframe with the first column being a series of unique ID, understood as "names" by igraph - these are the ID columns from Neo4J. Other columns are considered attributes.
relationships need a start and an end, i.e. startNode and endNode in the Neo4J results.
Here how to create a graph object from a {neo4r}
result:
G <- "MATCH a=(p:Person {name: 'Tom Hanks'})-[r:ACTED_IN]->(m:Movie) RETURN a;" %>% call_neo4j(con, type = "graph") library(dplyr) library(purrr) # Create a dataframe with col 1 being the ID, # And columns 2 being the names G$nodes <- G$nodes %>% unnest_nodes(what = "properties") %>% # We're extracting the first label of each node, but # this column can also be removed if not needed mutate(label = map_chr(label, 1)) head(G$nodes)
We then reorder the relationnship table:
G$relationships <- G$relationships %>% unnest_relationships() %>% select(startNode, endNode, type, everything()) %>% mutate(roles = unlist(roles)) head(G$relationships)
graph_object <- igraph::graph_from_data_frame( d = G$relationships, directed = TRUE, vertices = G$nodes ) plot(graph_object)
This can also be used with {ggraph}
:
library(ggraph) graph_object %>% ggraph() + geom_node_label(aes(label = label)) + geom_edge_link() + theme_graph()
{visNetwork}
expects the following format :
(from ?visNetwork::visNetwork
).
visNetwork
is smart enough to transform a list column into several label, so we don't have to worry too much about this one.
Here's how to convert our {neo4r}
result:
G <-"MATCH a=(p:Person {name: 'Tom Hanks'})-[r:ACTED_IN]->(m:Movie) RETURN a;" %>% call_neo4j(con, type = "graph") # We'll just unnest the properties G$nodes <- G$nodes %>% unnest_nodes(what = "properties") head(G$nodes) # Turn the relationships : G$relationships <- G$relationships %>% unnest_relationships() %>% select(from = startNode, to = endNode, label = type) head(G$relationships) visNetwork::visNetwork(G$nodes, G$relationships)
You can simply send queries has we have just seen, by writing the cypher query and call the api.
vec_to_cypher()
creates a list : vec_to_cypher(iris[1, 1:3], "Species")
vec_to_cypher_with_var()
creates a cypher call starting with a variable : vec_to_cypher_with_var(iris[1, 1:3], "Species", a)
This can be combined inside a cypher call:
paste("MERGE", vec_to_cypher(iris[1, 1:3], "Species"))
read_cypher
reads a cypher file and returns a tibble of all the calls:read_cypher("data-raw/create.cypher")
send_cypher
reads a cypher file, and send it the the API. By default, the stats are returned. send_cypher("data-raw/constraints.cypher", con)
The load_csv
sends an csv from an url to the Neo4J browser.
The args are :
on_load
: the code to execute on load con
: the connexion object url
: the url of the csv to sendheader
: whether or not the csv has a headerperiodic_commit
: the volume for PERIODIC COMMITas
: the AS argument for LOAD CSVformat
: the format of the result include_stats
: whether or not to include the stats meta
: whether or not to return the meta informationLet's use Neo4J northwind-graph
example for that.
# Create the query that will create the nodes and relationships on_load_query <- 'CREATE (n:Product) SET n = row, n.unitPrice = toFloat(row.unitPrice), n.unitsInStock = toInteger(row.unitsInStock), n.unitsOnOrder = toInteger(row.unitsOnOrder), n.reorderLevel = toInteger(row.reorderLevel), n.discontinued = (row.discontinued <> "0");' # Send the csv load_csv(url = "http://data.neo4j.com/northwind/products.csv", con = con, header = TRUE, periodic_commit = 50, as = "row", on_load = on_load_query)
{neo4r}
comes with a Connection Pane interface for RStudio.
Once installed, you can go to the "Connections", and use the widget to connect to the Neo4J server:
You can get an RStudio / Neo4J sandbox with Docker :
docker pull colinfay/neo4r-docker docker run -e PASSWORD=plop -e ROOT=TRUE -d -p 8787:8787 neo4r
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
system("docker stop neo4j && sleep 2 && docker rm neo4j")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.