Description Usage Arguments Details Value Examples
Index the content of a data.frame
with the Xapian search
engine.
1 2 3 4 5 6 | xindex(formula, data, path, language = c("none", "english", "en", "danish",
"da", "dutch", "nl", "english_lovins", "lovins", "english_porter", "porter",
"finnish", "fi", "french", "fr", "german", "de", "german2", "hungarian", "hu",
"italian", "it", "kraaij_pohlmann", "norwegian", "nb", "nn", "no",
"portuguese", "pt", "romanian", "ro", "russian", "ru", "spanish", "es",
"swedish", "sv", "turkish", "tr"))
|
formula |
A formula with a symbolic description of the index plan for the columns in the data.frame. The details of the index plan specification are given under 'Details'. |
data |
The |
path |
A character vector specifying the path to a Xapian databases. If there is already a database in the specified directory, it will be opened. If there isn't an existing database in the specified directory, Xapian will try to create a new empty database there. |
language |
Either the English name for the language or the two letter ISO639 code. Default is 'none' |
The index plan for 'xindex' are specified symbolically. An index plan has the form 'data ~ terms' where 'data' is the blob of data returned from a request and the 'terms' are the basis for a search in Xapian. A first order term index the text in the column as free text. A specification of the form 'first:second' indicates that the text in 'second' should be indexed with prefix 'first'.
The prefix is a short string at the beginning of the term to indicate which field the term indexes. Valid prefixes are: 'A' ,'D', 'E', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'X', 'Y' and 'Z'. See http://xapian.org/docs/omega/termprefixes for a list of conventional prefixes.
The specification 'first*second' is the same as 'second + first:second'. The prefix 'X' will create a user defined prefix by appending the uppercase 'second' to 'X'. The prefix 'Q' will use data in the 'second' column as a unique identifier for the document. NA values in columns to be indexed are skipped.
No response e.g. '~ second + first:second' writes the row number as data to the document.
The specification '~X*.' creates prefix terms with all columns plus free text.
If the response contains one or more columns, e.g. 'col_1 + col_2 ~ X*.' the response is first converted to 'JSON'. A compact form to convert all fields to 'JSON' and to enable free text search on all fields is to use '.~.'. It is also possible to drop response fields e.g. '. - col_1 - col_2 ~ X*.' to include all fields in the response except 'col_1' and 'col_2'.
A xapian_database
object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | ## Not run:
## This example is borrowed from "Getting Started with Xapian"
## http://getting-started-with-xapian.readthedocs.org/en/latest/index.html
## were the example is implemented in Python.
##
## We are going to build a simple search system based on museum catalogue
## data released under the Creative Commons Attribution-NonCommercial-
## ShareAlike license (http://creativecommons.org/licenses/by-nc-sa/3.0/)
## by the Science Museum in London, UK.
## (http://api.sciencemuseum.org.uk/documentation/collections/)
## The first 100 rows of the museum catalogue data is distributed with
## the 'xapr' package
filename <- system.file("extdata/NMSI_100.csv", package="xapr")
nmsi <- read.csv(filename, as.is = TRUE, na.strings="")
## Create a temporary directory to hold the database
path <- tempfile(pattern="xapr-")
dir.create(path)
## Index the 'TITLE' and 'DESCRIPTION' fields with both a suitable
## prefix and without a prefix for general search. Use the 'id_NUMBER'
## as unique identifier. Store all the fields as JSON for display
## purposes.
db <- xindex(. ~ S*TITLE + X*DESCRIPTION + Q:id_NUMBER, nmsi, path)
## Display a summary of the Xapian database
summary(db)
## Run a search and display docid (rowname) and TITLE from each match
xsearch(db, "watch", TITLE ~ .)
## Run a search with multiple words
xsearch(db, "Dent watch", TITLE ~ .)
## Run a search with prefix
xsearch(db, "title:sunwatch", TITLE ~ title:S)
## Run a search with multiple prefixes
xsearch(db,
"description:\"leather case\" AND title:sundial",
TITLE ~ title:S + description:XDESCRIPTION)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.