knitr::opts_chunk$set( comment = "#>", collapse = TRUE, warning = FALSE, message = FALSE )
A general purpose R interface to Apache Solr
Stable version from CRAN
install.packages("solrium")
Or the development version from GitHub
install.packages("devtools") devtools::install_github("ropensci/solrium")
Load
library("solrium")
You can setup for a remote Solr instance or on your local machine.
(conn <- SolrClient$new(host = "api.plos.org", path = "search", port = NULL))
solr_search()
only returns the docs
element of a Solr response body. If docs
is
all you need, then this function will do the job. If you need facet data only, or mlt
data only, see the appropriate functions for each of those below. Another function,
solr_all()
has a similar interface in terms of parameter as solr_search()
, but
returns all parts of the response body, including, facets, mlt, groups, stats, etc.
as long as you request those.
solr_search()
returns only docs. A basic search:
conn$search(params = list(q = '*:*', rows = 2, fl = 'id'))
Search in specific fields with :
Search for word ecology in title and cell in the body
conn$search(params = list(q = 'title:"ecology" AND body:"cell"', fl = 'title', rows = 5))
Wildcards
Search for word that starts with "cell" in the title field
conn$search(params = list(q = 'title:"cell*"', fl = 'title', rows = 5))
Proximity search
Search for words "sports" and "alcohol" within four words of each other
conn$search(params = list(q = 'everything:"stem cell"~7', fl = 'title', rows = 3))
Range searches
Search for articles with Twitter count between 5 and 10
conn$search(params = list(q = '*:*', fl = c('alm_twitterCount', 'id'), fq = 'alm_twitterCount:[5 TO 50]', rows = 10))
Boosts
Assign higher boost to title matches than to body matches (compare the two calls)
conn$search(params = list(q = 'title:"cell" abstract:"science"', fl = 'title', rows = 3))
conn$search(params = list(q = 'title:"cell"^1.5 AND abstract:"science"', fl = 'title', rows = 3))
solr_all()
differs from solr_search()
in that it allows specifying facets, mlt, groups,
stats, etc, and returns all of those. It defaults to parsetype = "list"
and wt="json"
,
whereas solr_search()
defaults to parsetype = "df"
and wt="csv"
. solr_all()
returns
by default a list, whereas solr_search()
by default returns a data.frame.
A basic search, just docs output
conn$all(params = list(q = '*:*', rows = 2, fl = 'id'))
Get docs, mlt, and stats output
conn$all(params = list(q = 'ecology', rows = 2, fl = 'id', mlt = 'true', mlt.count = 2, mlt.fl = 'abstract', stats = 'true', stats.field = 'counter_total_all'))
conn$facet(params = list(q = '*:*', facet.field = 'journal', facet.query = c('cell', 'bird')))
conn$highlight(params = list(q = 'alcohol', hl.fl = 'abstract', rows = 2))
out <- conn$stats(params = list(q = 'ecology', stats.field = c('counter_total_all', 'alm_twitterCount'), stats.facet = c('journal', 'volume')))
out$data
out$facet
solr_mlt
is a function to return similar documents to the one
out <- conn$mlt(params = list(q = 'title:"ecology" AND body:"cell"', mlt.fl = 'title', mlt.mindf = 1, mlt.mintf = 1, fl = 'counter_total_all', rows = 5)) out$docs
out$mlt
solr_groups()
is a function to return similar documents to the one
conn$group(params = list(q = 'ecology', group.field = 'journal', group.limit = 1, fl = c('id', 'alm_twitterCount')))
solr_parse()
is a general purpose parser function with extension methods for parsing outputs from functions in solr
. solr_parse()
is used internally within functions to do parsing after retrieving data from the server. You can optionally get back raw json
, xml
, or csv
with the raw=TRUE
, and then parse afterwards with solr_parse()
.
For example:
(out <- conn$highlight(params = list(q = 'alcohol', hl.fl = 'abstract', rows = 2), raw = TRUE))
Then parse
solr_parse(out, 'df')
Please report any issues or bugs.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.