Description Usage Arguments Value Scores Inputs Clear scroll Sliced scrolling Aggregations See Also Examples
Scroll search function
1 2 3 4 5 6 7 8 9 10 11 |
conn |
an Elasticsearch connection object, see |
x |
(character) For |
time_scroll |
(character) Specify how long a consistent view of the index should be maintained for scrolled search, e.g., "30s", "1m". See units-time. |
raw |
(logical) If |
asdf |
(logical) If |
stream_opts |
(list) A list of options passed to
|
... |
Curl args passed on to crul::verb-POST |
all |
(logical) If |
scroll()
returns a list, identical to what
Search()
returns. With attribute scroll
that is the
scroll value set via the time_scroll
parameter
scroll_clear()
returns a boolean (TRUE
on success)
Scores will be the same for all documents that are returned from a scroll request. Dems da rules.
Inputs to scroll()
can be one of:
list - This usually will be the output of Search()
, but
you could in theory make a list yourself with the appropriate elements
character - A scroll ID - this is typically the scroll id output
from a call to Search()
, accessed like res$`_scroll_id`
All other classes passed to scroll()
will fail with message
Lists passed to scroll()
without a _scroll_id
element will
trigger an error.
From lists output form Search()
there should be an attribute
("scroll") that is the scroll
value set in the Search()
request - if that attribute is missing from the list, we'll attempt to
use the time_scroll
parameter value set in the
scroll()
function call
The output of scroll()
has the scroll time value as an attribute so
the output can be passed back into scroll()
to continue.
Search context are automatically removed when the scroll timeout has
been exceeded. Keeping scrolls open has a cost, so scrolls should be
explicitly cleared as soon as the scroll is not being used anymore
using scroll_clear
For scroll queries that return a lot of documents it is possible to split the scroll in multiple slices which can be consumed independently.
See the example in this man file.
If the request specifies aggregations, only the initial search response will contain the aggregations results.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 | ## Not run:
# connection setup
(con <- connect())
# Basic usage - can use across all indices
res <- Search(con, time_scroll="1m")
scroll(con, res)$`_scroll_id`
# use on a specific index - and specify a query
res <- Search(con, index = 'shakespeare', q="a*", time_scroll="1m")
res$`_scroll_id`
# Setting "sort=_doc" to turn off sorting of results - faster
res <- Search(con, index = 'shakespeare', q="a*", time_scroll="1m",
body = '{"sort": ["_doc"]}')
res$`_scroll_id`
# Pass scroll_id to scroll function
scroll(con, res$`_scroll_id`)
# Get all results - one approach is to use a while loop
res <- Search(con, index = 'shakespeare', q="a*", time_scroll="5m",
body = '{"sort": ["_doc"]}')
out <- res$hits$hits
hits <- 1
while(hits != 0){
res <- scroll(con, res$`_scroll_id`, time_scroll="5m")
hits <- length(res$hits$hits)
if(hits > 0)
out <- c(out, res$hits$hits)
}
length(out)
res$hits$total
out[[1]]
# clear scroll
## individual scroll id
res <- Search(con, index = 'shakespeare', q="a*", time_scroll="5m",
body = '{"sort": ["_doc"]}')
scroll_clear(con, res$`_scroll_id`)
## many scroll ids
res1 <- Search(con, index = 'shakespeare', q="c*", time_scroll="5m",
body = '{"sort": ["_doc"]}')
res2 <- Search(con, index = 'shakespeare', q="d*", time_scroll="5m",
body = '{"sort": ["_doc"]}')
nodes_stats(con, metric = "indices")$nodes[[1]]$indices$search$open_contexts
scroll_clear(con, c(res1$`_scroll_id`, res2$`_scroll_id`))
nodes_stats(con, metric = "indices")$nodes[[1]]$indices$search$open_contexts
## all scroll ids
res1 <- Search(con, index = 'shakespeare', q="f*", time_scroll="1m",
body = '{"sort": ["_doc"]}')
res2 <- Search(con, index = 'shakespeare', q="g*", time_scroll="1m",
body = '{"sort": ["_doc"]}')
res3 <- Search(con, index = 'shakespeare', q="k*", time_scroll="1m",
body = '{"sort": ["_doc"]}')
scroll_clear(con, all = TRUE)
## sliced scrolling
body1 <- '{
"slice": {
"id": 0,
"max": 2
},
"query": {
"match" : {
"text_entry" : "a*"
}
}
}'
body2 <- '{
"slice": {
"id": 1,
"max": 2
},
"query": {
"match" : {
"text_entry" : "a*"
}
}
}'
res1 <- Search(con, index = 'shakespeare', time_scroll="1m", body = body1)
res2 <- Search(con, index = 'shakespeare', time_scroll="1m", body = body2)
scroll(con, res1$`_scroll_id`)
scroll(con, res2$`_scroll_id`)
out1 <- list()
hits <- 1
while(hits != 0){
tmp1 <- scroll(con, res1$`_scroll_id`)
hits <- length(tmp1$hits$hits)
if(hits > 0)
out1 <- c(out1, tmp1$hits$hits)
}
out2 <- list()
hits <- 1
while(hits != 0){
tmp2 <- scroll(con, res2$`_scroll_id`)
hits <- length(tmp2$hits$hits)
if(hits > 0)
out2 <- c(out2, tmp2$hits$hits)
}
c(
lapply(out1, "[[", "_source"),
lapply(out2, "[[", "_source")
)
# using jsonlite::stream_out
res <- Search(con, time_scroll = "1m")
file <- tempfile()
scroll(con,
x = res$`_scroll_id`,
stream_opts = list(file = file)
)
jsonlite::stream_in(file(file))
unlink(file)
## stream_out and while loop
(file <- tempfile())
res <- Search(con, index = "shakespeare", time_scroll = "5m",
size = 1000, stream_opts = list(file = file))
while(!inherits(res, "warning")) {
res <- tryCatch(scroll(
conn = con,
x = res$`_scroll_id`,
time_scroll = "5m",
stream_opts = list(file = file)
), warning = function(w) w)
}
NROW(df <- jsonlite::stream_in(file(file)))
head(df)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.