vfb_synonym_query: Find canonical ontology term using VFB SOLR to query synonyms

Description Usage Arguments Details Value Query details See Also Examples

View source: R/vfb_extended_query.R

Description

Find canonical ontology term using VFB SOLR to query synonyms

Usage

1
2
3
4
5
6
7
8
9
vfb_synonym_query(
  query,
  exact = TRUE,
  quote = NA,
  searchfields = c("synonym", "label"),
  fields = "short_form label synonym",
  verbose = interactive(),
  ...
)

Arguments

query

A character vector containing one or more queries - see details. Wildcards (?*) can be used, and have a similar meaning to their use in the shell. Queries are case insensitive.

exact

Whether to do an exact match (i.e. the query must match the whole string, the default) or to allow partial matches against synonyms. See Query details and examples.

quote

Whether to quote the query so that individual terms must all be matched in the order given or to OR each term in the query. The default quote=NA will quote when exact=TRUE and the query does not contain wildcards. See Query details and Examples.

searchfields

Character vector specifying fields to search. The default searches both synonyms and the canonical term label (since the canonical term is not included in the synonym list).

fields

The fields to return (defaults to 'short_form label synonym'). Setting fields="" implies all fields (but note that not all results may have the same fields - see discussion in details)

verbose

Whether to print messages to the screen.

...

Additional arguments passed to pbsapply when there are multiple input queries and eventually vfb_solr_query. You can use this e.g. to set the number of returned rows.

Details

Note that data.frame that is returned may contain a list in the synonym column because there will likely be multiple synonyms for a given query.

When x has length > 1 i.e. multiple query terms, then multiple calls to vfb_solr_query are wrapped in a sapply statement. You can pass arguments to sapply in ... such as simplify=FALSE if you wish. By default the return value will be a matrix with rows that you can index by field name and columns that you can index by queries. However the form of this is not guaranteed if you ask for fields that are present only for some of the results.

Value

A data.frame containing one or more result rows, ordered according to the solr result score, with attributes

Query details

The exact and quote arguments have a profound effect on how SOLR carries out searches. Setting both arguments to FALSE gives the least specific search and you may want to do this if you do not find any results with an initial query using the default values.

In general SOLR will break any text in fields like synonym into separate tokens based on whitespace, hyphens etc, so the value :

inter-antennal lobe tract

will map onto 4 tokens.

inter antennal lobe tract

This means that a search with query "inter antennal lobe tract", exact=F will be successful (note missing hyphen).

When exact=TRUE, the values will not be tokenised and the query must match the whole field.

The quote argument modifies the query. By default the query is tokenised as just described for fields in the database, tokens can be matched in any order and only one token must match (i.e. the are ORed together). When quote=TRUE, the query must match in the order given (this is achieved by wrapping it in double quotes before passing to SOLR). For example

vfb_synonym_query("antennal lobe tract", exact=F, quote=T)

returns 2 results at the time of writing whereas

vfb_synonym_query("antennal lobe tract", exact=F, quote=F)

returns 194 results.

See https://github.com/EBISPOT/OLS/blob/master/ols-solr/src/main/solr-conf/ontology/conf/schema.xml for definition of the of the synonym_s exact match field.

See Also

vfb_solr_query, vfb_autocomplete_query

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
vfb_synonym_query("SOG")
# anything with a synonym that includes SOG
vfb_synonym_query("SOG", exact=FALSE)

# query for PAM exactly
vfb_synonym_query("PAM")
# any term with a synonym containing PAM
vfb_synonym_query("PAM", exact = FALSE)

# terms with synonyms that start with "PAM-"
vfb_synonym_query("PAM-*")
# nb this doesn't return any results when exact=FALSE because solr matches
# against tokenised strings (i.e. strings that have broken on spaces and
# other non word characters such as dashes and underscores).
vfb_synonym_query("PAM-*", exact = FALSE)
# However if you quote the query you will get results
vfb_synonym_query("PAM-*", exact = FALSE, quote=TRUE)



# Search for MBON-01 to MBON-22 (sprintf is used to 0 pad the numbers)
vfb_synonym_query(sprintf("MBON-%02d",1:22))

# You can also use a wild card search, which is much faster since it only
# makes a single solr query but the hits are returned in an arbitrary order.
mbondf=vfb_synonym_query("MBON-??")
# then you can pick out your preferred synonym
# note that we use the glob2rx function to convert solr's simple shell-style
# wild card syntax to a regular expression
mbondf$aso=sapply(mbondf$synonym, function(x) grep(glob2rx("MBON-??"), x, value=TRUE))

jefferis/vfbr documentation built on Feb. 17, 2021, 4:46 p.m.