solr_search: Solr search.

Description Usage Arguments Value References See Also Examples

View source: R/solr_search.r

Description

Solr search.

Usage

1
2
3
4
5
6
solr_search(q = "*:*", sort = NULL, start = 0, rows = NULL,
  pageDoc = NULL, pageScore = NULL, fq = NULL, fl = NULL,
  defType = NULL, timeAllowed = NULL, qt = NULL, wt = "json",
  NOW = NULL, TZ = NULL, echoHandler = NULL, echoParams = NULL,
  key = NULL, base = NULL, callopts = list(), raw = FALSE,
  parsetype = "df", concat = ",", ..., verbose = TRUE)

Arguments

q

Query terms, defaults to '*:*', or everything.

sort

Field to sort on. You can specify ascending (e.g., score desc) or descending (e.g., score asc), sort by two fields (e.g., score desc, price asc), or sort by a function (e.g., sum(x_f, y_f) desc, which sorts by the sum of x_f and y_f in a descending order).

start

Record to start at, default to beginning.

rows

Number of records to return. Defaults to 10.

pageDoc

If you expect to be paging deeply into the results (say beyond page 10, assuming rows=10) and you are sorting by score, you may wish to add the pageDoc and pageScore parameters to your request. These two parameters tell Solr (and Lucene) what the last result (Lucene internal docid and score) of the previous page was, so that when scoring the query for the next set of pages, it can ignore any results that occur higher than that item. To get the Lucene internal doc id, you will need to add [docid] to the &fl list. e.g., q=*:*&start=10&pageDoc=5&pageScore=1.345&fl=[docid],score

pageScore

See pageDoc notes.

fq

Filter query, this does not affect the search, only what gets returned

fl

Fields to return

defType

Specify the query parser to use with this request.

timeAllowed

The time allowed for a search to finish. This value only applies to the search and not to requests in general. Time is in milliseconds. Values <= 0 mean no time restriction. Partial results may be returned (if there are any).

qt

Which query handler used.

wt

Data type returned, defaults to 'json'

NOW

Set a fixed time for evaluating Date based expresions

TZ

Time zone, you can override the default.

echoHandler

If the echoHandler parameter is true, Solr places the name of the handle used in the response to the client for debugging purposes.

echoParams

The echoParams parameter tells Solr what kinds of Request parameters should be included in the response for debugging purposes, legal values include:

  • none - don't include any request parameters for debugging

  • explicit - include the parameters explicitly specified by the client in the request

  • all - include all parameters involved in this request, either specified explicitly by the client, or implicit because of the request handler configuration.

key

API key, if needed.

base

URL endpoint.

callopts

Call options passed on to httr::GET

raw

(logical) If TRUE, returns raw data in format specified by wt param

parsetype

(character) One of 'list' or 'df'

concat

(character) Character to concatenate elements of longer than length 1. Note that this only works reliably when data format is json (wt='json'). The parsing is more complicated in XML format, but you can do that on your own.

...

Further args.

verbose

If TRUE (default) the url call used printed to console.

Value

XML, JSON, a list, or data.frame

References

See http://wiki.apache.org/solr/#Search_and_Indexing for more information.

See Also

solr_highlight, solr_facet

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
## Not run: 
url <- 'http://api.plos.org/search'

solr_search(q='*:*', rows=2, fl='id', base=url)

# Search for word ecology in title and cell in the body
solr_search(q='title:"ecology" AND body:"cell"', fl='title', rows=5, base=url)

# Search for word "cell" and not "body" in the title field
solr_search(q='title:"cell" -title:"lines"', fl='title', rows=5, base=url)

# Wildcards
## Search for word that starts with "cell" in the title field
solr_search(q='title:"cell*"', fl='title', rows=5, base=url)

# Proximity searching
## Search for words "sports" and "alcohol" within four words of each other
solr_search(q='everything:"sports alcohol"~7', fl='abstract', rows=3, base=url)

# Range searches
## Search for articles with Twitter count between 5 and 10
solr_search(q='*:*', fl=c('alm_twitterCount','title'), fq='alm_twitterCount:[5 TO 10]',
rows=3, base=url)

# Boosts
## Assign higher boost to title matches than to body matches (compare the two calls)
solr_search(q='title:"cell" abstract:"science"', fl='title', rows=3,
   base=url)
solr_search(q='title:"cell"^1.5 AND abstract:"science"', fl='title', rows=3,
   base=url)

# Parse data, using the USGS BISON API
url <- "http://bisonapi.usgs.ornl.gov/solr/occurrences/select"
out <- solr_search(q='*:*', fl=c('scientificName','decimalLatitude','decimalLongitude'),
   base=url, raw=TRUE)
solr_parse(out, 'df')
## gives the same result
solr_search(q='*:*', fl=c('scientificName','decimalLatitude','decimalLongitude'), base=url)

## You can choose how to combine elements longer than length 1
solr_search(q='*:*', fl=c('scientificName','decimalLatitude','decimalLongitude'), base=url,
   parsetype='df', concat=';')

# Using the USGS BISON API (http://bison.usgs.ornl.gov/services.html#solr)
## the species names endpoint
url2 <- "http://bisonapi.usgs.ornl.gov/solr/scientificName/select"
solr_search(q='*:*', base=url2, parsetype='list')

# FunctionQuery queries
## This kind of query allows you to use the actual values of fields to calculate
## relevancy scores for returned documents

## Here, we search on the product of counter_total_all and alm_twitterCount
## metrics for articles in PLOS Journals
url <- 'http://api.plos.org/search'
solr_search(q="{!func}product($v1,$v2)", v1 = 'sqrt(counter_total_all)',
   v2 = 'log(alm_twitterCount)', rows=5, fl=c('id','title'), fq='doc_type:full',
   base=url)

## here, search on the product of counter_total_all and alm_twitterCount, using
## a new temporary field "_val_"
solr_search(q='_val_:"product(counter_total_all,alm_twitterCount)"',
   rows=5, fl=c('id','title'), fq='doc_type:full', base=url)

## papers with most citations
solr_search(q='_val_:"max(counter_total_all)"',
   rows=5, fl=c('id','counter_total_all'), fq='doc_type:full', base=url)

## papers with most tweets
solr_search(q='_val_:"max(alm_twitterCount)"',
   rows=5, fl=c('id','alm_twitterCount'), fq='doc_type:full', base=url)

## End(Not run)

solr documentation built on May 29, 2017, 10:50 p.m.