solr_mlt: "more like this" search

View source: R/solr_mlt.r

solr_mltR Documentation

"more like this" search

Description

Returns only more like this items

Usage

solr_mlt(
  conn,
  name = NULL,
  params = NULL,
  body = NULL,
  callopts = list(),
  raw = FALSE,
  parsetype = "df",
  concat = ",",
  optimizeMaxRows = TRUE,
  minOptimizedRows = 50000L,
  progress = NULL,
  ...
)

Arguments

conn

A solrium connection object, see SolrClient

name

Name of a collection or core. Or leave as NULL if not needed.

params

(list) a named list of parameters, results in a GET request as long as no body parameters given

body

(list) a named list of parameters, if given a POST request will be performed

callopts

Call options passed on to crul::HttpClient

raw

(logical) If TRUE, returns raw data in format specified by wt param

parsetype

(character) One of 'list' or 'df'

concat

(character) Character to concatenate elements of longer than length 1. Note that this only works reliably when data format is json (wt='json'). The parsing is more complicated in XML format, but you can do that on your own.

optimizeMaxRows

(logical) If TRUE, then rows parameter will be adjusted to the number of returned results by the same constraints. It will only be applied if rows parameter is higher than minOptimizedRows. Default: TRUE

minOptimizedRows

(numeric) used by optimizedMaxRows parameter, the minimum optimized rows. Default: 50000

progress

a function with logic for printing a progress bar for an HTTP request, ultimately passed down to curl. only supports httr::progress for now. See the README for an example.

...

Further args to be combined into query

Value

XML, JSON, a list, or data.frame

More like this parameters

  • q Query terms, defaults to ':', or everything.

  • fq Filter query, this does not affect the search, only what gets returned

  • mlt.count The number of similar documents to return for each result. Default is 5.

  • mlt.fl The fields to use for similarity. NOTE: if possible these should have a stored TermVector DEFAULT_FIELD_NAMES = new String[] "contents"

  • mlt.mintf Minimum Term Frequency - the frequency below which terms will be ignored in the source doc. DEFAULT_MIN_TERM_FREQ = 2

  • mlt.mindf Minimum Document Frequency - the frequency at which words will be ignored which do not occur in at least this many docs. DEFAULT_MIN_DOC_FREQ = 5

  • mlt.minwl minimum word length below which words will be ignored. DEFAULT_MIN_WORD_LENGTH = 0

  • mlt.maxwl maximum word length above which words will be ignored. DEFAULT_MAX_WORD_LENGTH = 0

  • mlt.maxqt maximum number of query terms that will be included in any generated query. DEFAULT_MAX_QUERY_TERMS = 25

  • mlt.maxntp maximum number of tokens to parse in each example doc field that is not stored with TermVector support. DEFAULT_MAX_NUM_TOKENS_PARSED = 5000

  • mlt.boost (true/false) set if the query will be boosted by the interesting term relevance. DEFAULT_BOOST = false

  • mlt.qf Query fields and their boosts using the same format as that used in DisMaxQParserPlugin. These fields must also be specified in mlt.fl.

  • fl Fields to return. We force 'id' to be returned so that there is a unique identifier with each record.

  • wt (character) Data type returned, defaults to 'json'. One of json or xml. If json, uses fromJSON to parse. If xml, uses xmlParse to parse. csv is only supported in solr_search and solr_all.

  • start Record to start at, default to beginning.

  • rows Number of records to return. Defaults to 10.

  • key API key, if needed.

References

See https://lucene.apache.org/solr/guide/8_2/morelikethis.html for more information.

Examples

## Not run: 
# connect
(conn <- SolrClient$new(host = "api.plos.org", path = "search", port = NULL))

# more like this search
conn$mlt(params = list(q='*:*', mlt.count=2, mlt.fl='abstract', fl='score',
  fq="doc_type:full"))
conn$mlt(params = list(q='*:*', rows=2, mlt.fl='title', mlt.mindf=1,
  mlt.mintf=1, fl='alm_twitterCount'))
conn$mlt(params = list(q='title:"ecology" AND body:"cell"', mlt.fl='title',
  mlt.mindf=1, mlt.mintf=1, fl='counter_total_all', rows=5))
conn$mlt(params = list(q='ecology', mlt.fl='abstract', fl='title', rows=5))
solr_mlt(conn, params = list(q='ecology', mlt.fl='abstract',
  fl=c('score','eissn'), rows=5))
solr_mlt(conn, params = list(q='ecology', mlt.fl='abstract',
  fl=c('score','eissn'), rows=5, wt = "xml"))

# get raw data, and parse later if needed
out <- solr_mlt(conn, params=list(q='ecology', mlt.fl='abstract', fl='title',
 rows=2), raw=TRUE)
solr_parse(out, "df")

## End(Not run)

ropensci/solrium documentation built on Sept. 12, 2022, 3:01 p.m.