knitr::opts_chunk$set( comment = "#>", collapse = TRUE, warning = FALSE, message = FALSE )
A general purpose R interface to Solr
Development is now following Solr v7 and greater - which introduced many changes, which means many functions here may not work with your Solr installation older than v7.
Be aware that currently some functions will only work in certain Solr modes, e.g, collection_create()
won't work when you are not in Solrcloud mode. But, you should get an error message stating that you aren't.
Currently developing against Solr v8.2.0
The first thing to look at is SolrClient
to instantiate a client connection
to your Solr instance. ping
and schema
are helpful functions to look
at after instantiating your client.
There are two ways to use solrium
:
SolrClient
objectSolrClient
object to functionsFor example, if we instantiate a client like conn <- SolrClient$new()
, then
to use the first way we can do conn$search(...)
, and the second way by doing
solr_search(conn, ...)
. These two ways of using the package hopefully
make the package more user friendly for more people, those that prefer a more
object oriented approach, and those that prefer more of a functional approach.
Collections
Functions that start with collection
work with Solr collections when in
cloud mode. Note that these functions won't work when in Solr standard mode
Cores
Functions that start with core
work with Solr cores when in standard Solr
mode. Note that these functions won't work when in Solr cloud mode
Documents
The following functions work with documents in Solr
#> - add #> - delete_by_id #> - delete_by_query #> - update_atomic_json #> - update_atomic_xml #> - update_csv #> - update_json #> - update_xml
Search
Search functions, including solr_parse
for parsing results from different
functions appropriately
#> - solr_all #> - solr_facet #> - solr_get #> - solr_group #> - solr_highlight #> - solr_mlt #> - solr_parse #> - solr_search #> - solr_stats
Stable version from CRAN
install.packages("solrium")
Or development version from GitHub
remotes::install_github("ropensci/solrium")
library("solrium")
Use SolrClient$new()
to initialize your connection. These examples use a remote Solr server, but work on any local Solr server.
(cli <- SolrClient$new(host = "api.plos.org", path = "search", port = NULL))
You can also set whether you want simple or detailed error messages (via errors
), and whether you want URLs used in each function call or not (via verbose
), and your proxy settings (via proxy
) if needed. For example:
SolrClient$new(errors = "complete")
Your settings are printed in the print method for the connection object
cli
For local Solr server setup:
bin/solr start -e cloud -noprompt bin/post -c gettingstarted example/exampledocs/*.xml
(res <- cli$search(params = list(q='*:*', rows=2, fl='id')))
And you can get search metadata from the attributes:
attributes(res)
Most recent publication by journal
cli$group(params = list(q='*:*', group.field='journal', rows=5, group.limit=1, group.sort='publication_date desc', fl='publication_date, score'))
First publication by journal
cli$group(params = list(q = '*:*', group.field = 'journal', group.limit = 1, group.sort = 'publication_date asc', fl = c('publication_date', 'score'), fq = "publication_date:[1900-01-01T00:00:00Z TO *]"))
Search group query : Last 3 publications of 2013.
gq <- 'publication_date:[2013-01-01T00:00:00Z TO 2013-12-31T00:00:00Z]' cli$group( params = list(q='*:*', group.query = gq, group.limit = 3, group.sort = 'publication_date desc', fl = 'publication_date'))
Search group with format simple
cli$group(params = list(q='*:*', group.field='journal', rows=5, group.limit=3, group.sort='publication_date desc', group.format='simple', fl='journal, publication_date'))
cli$facet(params = list(q='*:*', facet.field='journal', facet.query=c('cell', 'bird')))
cli$highlight(params = list(q='alcohol', hl.fl = 'abstract', rows=2))
out <- cli$stats(params = list(q='ecology', stats.field=c('counter_total_all','alm_twitterCount'), stats.facet='journal'))
out$data
solr_mlt
is a function to return similar documents to the one
out <- cli$mlt(params = list(q='title:"ecology" AND body:"cell"', mlt.fl='title', mlt.mindf=1, mlt.mintf=1, fl='counter_total_all', rows=5))
out$docs
out$mlt
solr_parse
is a general purpose parser function with extension methods solr_parse.sr_search
, solr_parse.sr_facet
, and solr_parse.sr_high
, for parsing solr_search
, solr_facet
, and solr_highlight
function output, respectively. solr_parse
is used internally within those three functions (solr_search
, solr_facet
, solr_highlight
) to do parsing. You can optionally get back raw json
or xml
from solr_search
, solr_facet
, and solr_highlight
setting parameter raw=TRUE
, and then parsing after the fact with solr_parse
. All you need to know is solr_parse
can parse
For example:
(out <- cli$highlight(params = list(q='alcohol', hl.fl = 'abstract', rows=2), raw=TRUE))
Then parse
solr_parse(out, 'df')
only supported in the core search methods: search
, facet
, group
, mlt
, stats
, high
, all
library(httr) invisible(cli$search(params = list(q='*:*', rows=100, fl='id'), progress = httr::progress())) |==============================================| 100%
Function Queries allow you to query on actual numeric fields in the SOLR database, and do addition, multiplication, etc on one or many fields to sort results. For example, here, we search on the product of counter_total_all and alm_twitterCount, using a new temporary field "val"
cli$search(params = list(q='_val_:"product(counter_total_all,alm_twitterCount)"', rows=5, fl='id,title', fq='doc_type:full'))
Here, we search for the papers with the most citations
cli$search(params = list(q='_val_:"max(counter_total_all)"', rows=5, fl='id,counter_total_all', fq='doc_type:full'))
Or with the most tweets
cli$search(params = list(q='_val_:"max(alm_twitterCount)"', rows=5, fl='id,alm_twitterCount', fq='doc_type:full'))
USGS BISON service
The occurrences service
conn <- SolrClient$new(scheme = "https", host = "bison.usgs.gov", path = "solr/occurrences/select", port = NULL) conn$search(params = list(q = '*:*', fl = c('decimalLatitude','decimalLongitude','scientificName'), rows = 2))
The species names service
conn <- SolrClient$new(scheme = "https", host = "bison.usgs.gov", path = "solr/scientificName/select", port = NULL) conn$search(params = list(q = '*:*'))
PLOS Search API
Most of the examples above use the PLOS search API... :)
This isn't as complete as searching functions show above, but we're getting there.
conn <- SolrClient$new()
Many functions, e.g.:
core_create()
core_rename()
core_status()
Create a core
conn$core_create(name = "foo_bar")
Many functions, e.g.:
collection_create()
collection_list()
collection_addrole()
Create a collection
conn$collection_create(name = "hello_world")
Add documents, supports adding from files (json, xml, or csv format), and from R objects (including data.frame
and list
types so far)
df <- data.frame(id = c(67, 68), price = c(1000, 500000000)) conn$add(df, name = "books")
Delete documents, by id
conn$delete_by_id(name = "books", ids = c(3, 4))
Or by query
conn$delete_by_query(name = "books", query = "manu:bank")
solrium
in R doing citation(package = 'solrium')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.