solr_group: Grouped search

View source: R/solr_group.r

solr_groupR Documentation

Grouped search

Description

Returns only group items

Usage

solr_group(
  conn,
  name = NULL,
  params = NULL,
  body = NULL,
  callopts = list(),
  raw = FALSE,
  parsetype = "df",
  concat = ",",
  progress = NULL,
  ...
)

Arguments

conn

A solrium connection object, see SolrClient

name

Name of a collection or core. Or leave as NULL if not needed.

params

(list) a named list of parameters, results in a GET request as long as no body parameters given

body

(list) a named list of parameters, if given a POST request will be performed

callopts

Call options passed on to crul::HttpClient

raw

(logical) If TRUE, returns raw data in format specified by wt param

parsetype

(character) One of 'list' or 'df'

concat

(character) Character to concatenate elements of longer than length 1. Note that this only works reliably when data format is json (wt='json'). The parsing is more complicated in XML format, but you can do that on your own.

progress

a function with logic for printing a progress bar for an HTTP request, ultimately passed down to curl. only supports httr::progress for now. See the README for an example.

...

Further args to be combined into query

Value

XML, JSON, a list, or data.frame

Group parameters

  • q Query terms, defaults to ':', or everything.

  • fq Filter query, this does not affect the search, only what gets returned

  • fl Fields to return

  • wt (character) Data type returned, defaults to 'json'. One of json or xml. If json, uses fromJSON to parse. If xml, uses xmlParse to parse. csv is only supported in solr_search and solr_all.

  • key API key, if needed.

  • group.field (fieldname) Group based on the unique values of a field. The field must currently be single-valued and must be either indexed, or be another field type that has a value source and works in a function query - such as ExternalFileField. Note: for Solr 3.x versions the field must by a string like field such as StrField or TextField, otherwise a http status 400 is returned.

  • group.func (function query) Group based on the unique values of a function query. Solr4.0 This parameter only is supported on 4.0

  • group.query (query) Return a single group of documents that also match the given query.

  • rows (number) The number of groups to return. Defaults to 10.

  • start (number) The offset into the list of groups.

  • group.limit (number) The number of results (documents) to return for each group. Defaults to 1.

  • group.offset (number) The offset into the document list of each group.

  • sort How to sort the groups relative to each other. For example, sort=popularity desc will cause the groups to be sorted according to the highest popularity doc in each group. Defaults to "score desc".

  • group.sort How to sort documents within a single group. Defaults to the same value as the sort parameter.

  • group.format One of grouped or simple. If simple, the grouped documents are presented in a single flat list. The start and rows parameters refer to numbers of documents instead of numbers of groups.

  • group.main (logical) If true, the result of the last field grouping command is used as the main result list in the response, using group.format=simple

  • group.ngroups (logical) If true, includes the number of groups that have matched the query. Default is false. Solr4.1 WARNING: If this parameter is set to true on a sharded environment, all the documents that belong to the same group have to be located in the same shard, otherwise the count will be incorrect. If you are using SolrCloud, consider using "custom hashing"

  • group.cache.percent (0-100) If > 0 enables grouping cache. Grouping is executed actual two searches. This option caches the second search. A value of 0 disables grouping caching. Default is 0. Tests have shown that this cache only improves search time with boolean queries, wildcard queries and fuzzy queries. For simple queries like a term query or a match all query this cache has a negative impact on performance

References

See https://lucene.apache.org/solr/guide/8_2/collapse-and-expand-results.html for more information.

See Also

solr_highlight(), solr_facet()

Examples

## Not run: 
# connect
(cli <- SolrClient$new())

# by default we do a GET request
cli$group("gettingstarted",
  params = list(q='*:*', group.field='compName_s'))
# OR
solr_group(cli, "gettingstarted",
  params = list(q='*:*', group.field='compName_s'))

# connect
(cli <- SolrClient$new(host = "api.plos.org", path = "search", port = NULL))

# Basic group query
solr_group(cli, params = list(q='ecology', group.field='journal',
  group.limit=3, fl=c('id','score')))
solr_group(cli, params = list(q='ecology', group.field='journal',
  group.limit=3, fl='article_type'))

# Different ways to sort (notice diff btw sort of group.sort)
# note that you can only sort on a field if you return that field
solr_group(cli, params = list(q='ecology', group.field='journal', group.limit=3,
   fl=c('id','score')))
solr_group(cli, params = list(q='ecology', group.field='journal', group.limit=3,
   fl=c('id','score','alm_twitterCount'), group.sort='alm_twitterCount desc'))
solr_group(cli, params = list(q='ecology', group.field='journal', group.limit=3,
   fl=c('id','score','alm_twitterCount'), sort='score asc',
   group.sort='alm_twitterCount desc'))

# Two group.field values
out <- solr_group(cli, params = list(q='ecology', group.field=c('journal','article_type'),
  group.limit=3, fl='id'), raw=TRUE)
solr_parse(out)
solr_parse(out, 'df')

# Get two groups, one with alm_twitterCount of 0-10, and another group
# with 10 to infinity
solr_group(cli, params = list(q='ecology', group.limit=3, fl=c('id','alm_twitterCount'),
 group.query=c('alm_twitterCount:[0 TO 10]','alm_twitterCount:[10 TO *]')))

# Use of group.format and group.simple.
## The raw data structure of these two calls are slightly different, but
## the parsing inside the function outputs the same results. You can
## of course set raw=TRUE to get back what the data actually look like
solr_group(cli, params = list(q='ecology', group.field='journal', group.limit=3,
  fl=c('id','score'), group.format='simple'))
solr_group(cli, params = list(q='ecology', group.field='journal', group.limit=3,
  fl=c('id','score'), group.format='grouped'))
solr_group(cli, params = list(q='ecology', group.field='journal', group.limit=3,
  fl=c('id','score'), group.format='grouped', group.main='true'))

# xml back
solr_group(cli, params = list(q='ecology', group.field='journal', group.limit=3,
  fl=c('id','score'), wt = "xml"))
solr_group(cli, params = list(q='ecology', group.field='journal', group.limit=3,
  fl=c('id','score'), wt = "xml"), parsetype = "list")
res <- solr_group(cli, params = list(q='ecology', group.field='journal', group.limit=3,
  fl=c('id','score'), wt = "xml"), raw = TRUE)
library("xml2")
xml2::read_xml(unclass(res))

solr_group(cli, params = list(q='ecology', group.field='journal', group.limit=3,
  fl='article_type', wt = "xml"))
solr_group(cli, params = list(q='ecology', group.field='journal', group.limit=3,
  fl='article_type', wt = "xml"), parsetype = "list")

## End(Not run)

ropensci/solrium documentation built on Sept. 12, 2022, 3:01 p.m.