SolrList-class: SolrList

SolrList-classR Documentation

SolrList

Description

The SolrList object makes Solr data accessible through a list-like interface. This interface is appropriate when the data are highly ragged.

Details

A SolrList should more or less behave analogously to a list. It provides the same basic accessors (length, names, [, [<-, [[, [[<-, $, $<-, head, tail, etc) and can be coerced to a list via as.list. Supported types of data manipulations include subset, transform, sort, xtabs, aggregate, unique, summary, etc.

An obvious difference between a SolrList and an ordinary list is that we know the SolrList contains only documents, which are themselves represented as named lists of fields, usually vectors of length one. This constraint enables us to provide the convenience of accessing fields by slicing across every document. We can pass a field selection to the second argument of [. Like data frame, selecting a single column with e.g. x[,"foo"] will return the field as a vector, filling NAs whereever a document lacks a value for the field.

The names are taken from the field declared in the schema to represent the unique document key. Schemas are not strictly required to declare such a field, so if there is no unique key, the names are NULL.

Field restrictions passed to e.g. [ or subset(fields=) may be specified by name, or wildcard pattern (glob). Similarly, a row index passed to [ must be either a character vector of identifiers (of length <= 1024, NAs are not supported, and this requires a unique key in the schema) or a SolrPromise/SolrExpression, but note that if it evaluates to NAs, the corresponding rows are excluded from the result, as with subset. Using a SolrPromise or SolrExpression is recommended, as filtering happens at the database.

A SolrList can be made lazy by calling defer on a SolrList, so that all column retrieval, e.g., via [, returns a SolrPromise object. Many operations on promises are deferred, until they are finally fulfilled by being shown or through explicit coercion to an R vector.

A note for developers: SolrFrame and SolrList share common functionality through the base Solr class. Much of the functionality mentioned here is actually implemented as methods on the Solr class.

Accessors

These are some accessors that SolrList adds on top of the basic data frame accessors. Most of these are for advanced use only.

  • ndoc(x): Gets the number of documents (rows); serves as an abstraction over SolrFrame and SolrList

  • nfield(x): Gets the number of fields (columns); serves as an abstraction over SolrFrame and SolrList

  • ids(x): Gets the document unique identifiers (may be NULL, treated as rownames); serves as an abstraction over SolrFrame and SolrList

  • fieldNames(x, ...): Gets the name of each field represented by any document in the Solr core, with ... being passed down to fieldNames on SolrCore.

  • core(x): Gets the SolrCore wrapped by x

  • query(x): Gets the query that is being constructed by x

Extended API

Most of the typical data frame accessors and data manipulation functions will work analogously on SolrList (see Details). Below, we list some of the non-standard methods that might be seen as an extension of the data frame API.

  • rename(x, ...): Renames the columns of x, where the names and character values of ... indicates the mapping (newname = oldname).

  • defer(x): Returns a SolrList that yields SolrPromise objects instead of vectors whenever a field is retrieved

  • searchDocs(x, q): Performs a conventional document search using the query string q. The main difference to filtering is that (by default) Solr will order the result by score, i.e., how well each document matches the query.

Constructor

  • SolrList(uri, ...): Constructs a new SolrList instance, representing a Solr core located at uri, which should be a string or a RestUri object. The ... are passed to the SolrQuery constructor.

Evaluation

  • eval(expr, envir, enclos): Evaluates R language expr in the SolrList envir, using enclos as the enclosing environment.

Coercion

  • as.data.frame(x, row.names=NULL, optional=FALSE, fill=FALSE): Downloads the data into an actual data.frame, specifically an instance of DocDataFrame. If fill is FALSE, only the fields represented in at least one document are added as columns.

  • as.list(x), as(x, "DocCollection"): Coerces x into the corresponding list, specifically an instance of DocList.

Author(s)

Michael Lawrence

See Also

SolrFrame for representing a Solr collection as a table instead of a list

Examples


     solr <- TestSolr()
     sr <- SolrList(solr$uri)
     length(sr)
     head(sr)
     sr[["GB18030TEST"]]
     # Solr tends to crash for some reason running this inside R CMD check
     ## Not run:  
     as.list(subset(sr, price > 100))[,"price"]
     
## End(Not run)
     solr$kill()


lawremi/rsolr documentation built on May 28, 2022, 6:17 a.m.