Description Format Details Constructor Arguments Public fields Active bindings Methods Author(s) References See Also Examples
KibioR is a lightweight package for data manipulation with Elasticsearch. Its main features allow easy data import, export, download, upload, searching and sharing to any Elasticsearch-based open architecture, scaling to billions of data and TB capability.
Kibior is a Kibio/Elasticsearch client written with R6 class. Instances of Kibior are object that allow to use Elasticsearch power and execute lots of predefined requests such as searching in massive amounts of data, joins between in-memory data and Elasticsearch indices, push and pull data to and from multiples Elasticsearch servers, and more. This little utilitary was built in the context of massive data invading biology and bioinformatics, but is completely versatile and can be applied to other fields. By adding it to R-scripts, it can perform several useful tasks such as: saving intermediary results, sharing them with a collaborator, automating import and upload of lots of files directly, and much more.
R6Class object.
A client to send, retrieve, search, join data in Elasticsearch.
| Argument | Type | Details | Default |
host | character | address or name of Elasticsearch server | "localhost" |
port | numeric | port of Elasticsearch server | 9200 |
user | character | if required by the server, the username for authentication | NULL |
pwd | character | if required by the server, the password for authentication | NULL |
verbose | logical | verbose mode | FALSE |
created
verboseverbose mode, prints out more informations during execution
quiet_progressprogressbar quiet mode, toggles progress bar
quiet_resultsresults quiet mode, toggles results printing
hostAccess and change the Elasticsearch host
portAccess and change the Elasticsearch port
endpointAccess the Elasticsearch main endpoint
userAccess the Elasticsearch user.
pwdAccess the Elasticsearch password.
connectionAccess the Elasticsearch connection object.
head_search_sizeAccess and change the head size default value.
cluster_nameAccess the cluster name if and only if already connected.
cluster_statusAccess the cluster status if and only if already connected.
nb_documentsAccess the current cluster total number of documents if and only if already connected.
versionAccess the Elasticsearch version if and only if already connected.
elastic_waitAccess and change the Elasticsearch wait time for update commands if and only if already connected.
valid_joinsAccess the valid joins available in Kibior.
valid_count_typesAccess the valid count types available (mainly observations = rows, variables = columns)
valid_elastic_metadata_typesAccess the valid Elasticsearch metadata types available.
valid_push_modesAccess the valid push modes available.
shard_numberAccess and modify the number of allocated primary shards when creating an Elasticsearch index.
shard_replicas_numberAccess and modify the number of allocated replicas in an Elasticsearch index.
default_id_colAccess and modify the default ID column/field created when pushing data to Elasticsearch.
new()Kibior$new(
host = "localhost",
port = 9200,
user = NULL,
pwd = NULL,
verbose = getOption("verbose")
)hostThe target host to connect to Elasticsearch REST API (default: "localhost").
portThe target port (default: 9200).
userIf the server needs authentication, your username (default: NULL).
pwdIf the server needs authentication, your password (default: NULL).
verboseThe verbose mode (default: FALSE).
Initialize a new object, automatically called when calling 'Kibior$new()'
a new instance/object of Kibior
\dontrun{
# default initiatlization, connect to "localhost:9200"
kc <- Kibior$new()
# connect to "192.168.2.145:9200"
kc <- Kibior$new("192.168.2.145")
# connect to "es:15005", verbose mode activated
kc <- Kibior$new(host = "elasticsearch", port = 15005, verbose = TRUE)
# connect to "192.168.2.145:9450" with credentials "foo:bar"
kc <- Kibior$new(host = "192.168.2.145", port = 9450, user = "foo", pwd = "bar")
# connect to "elasticsearch:9200"
kc <- Kibior$new("elasticsearch")
# get kibior var from env (".Renviron" file or local env)
dd <- system.file("doc_env", "kibior_build.R", package = "kibior")
source(dd, local = TRUE)
kc <- .kibior_get_instance_from_env()
kc$quiet_progress <- TRUE
# preparing all examples (do not mind this for this method)
delete_if_exists <- function(index_names){
tryCatch(
expr = { kc$delete(index_names) },
error = function(e){ }
)
}
delete_if_exists(c(
"aaa",
"bbb",
"ccc",
"ddd",
"sw",
"sw_naboo",
"sw_tatooine",
"sw_alderaan",
"sw_from_file",
"storms",
"starwars"
))
}
print()Kibior$print()
Print simple informations of the current object.
\dontrun{
print(kc)
}
eq()Kibior$eq(other = NULL)
otherAnother instance/object of Kibior (default: NULL).
Tells if another instance of Kibior has the same 'host:port' couple.
TRUE if hosts and ports are identical, else FALSE
\dontrun{
kc$eq(kc)
}
ne()Kibior$ne(other = NULL)
otherAnother instance/object of Kibior (default: NULL).
Tells if another instance of Kibior has a different 'host:port' couple.
TRUE if hosts and ports are differents, else FALSE
\dontrun{
kc$ne(kc)
}
create()Kibior$create(index_name, force = FALSE)
index_namea vector of index names to create (default: NULL).
forceErase already existing identical index names? (default: FALSE).
Create one or several indices in Elasticsearch.
a list containing results of creation per index
\dontrun{
kc$create("aaa")
kc$create(c("bbb", "ccc"))
}
list()Kibior$list(get_specials = FALSE)
get_specialsa boolean to get special indices (default: FALSE).
List indices in Elasticsearch.
a list of index names, NULL if no index found
\dontrun{
kc$list()
kc$list(get_specials = TRUE)
}
has()Kibior$has(index_name)
index_namea vector of index names to check.
Does Elasticsearch has one or several indices?
a list with TRUE for found index, else FALSE
\dontrun{
kc$has("aaa")
kc$has(c("bbb", "ccc"))
}
delete()Kibior$delete(index_name)
index_namea vector of index names to delete.
Delete one or several indices in Elasticsearch.
a list containing results of deletion per index, or NULL if no index name match
\dontrun{
kc$delete("aaa")
kc$delete(c("bbb", "ccc"))
}
add_description()Kibior$add_description( index_name, dataset_name, source_name, index_description, version, change_log, website, direct_download, version_date, license, contact, references, columns = list(), force = FALSE )
index_namethe index name to describe
dataset_namethe full length dataset name
source_namethe source/website/entity full length name
index_descriptionthe index description, should be explicit
versionthe version of the source dataset
change_logwhat have been done from the last version
websitethe website to the source dataset website
direct_downloadthe direct download url of the dataset source
version_datethe version or build date
licensethe license attached to this dataset, could be a url
contacta mailto/contact
referencessome paper and other references (e.g. doi, url)
columnsa list of (column_name = column_description) to register (default: list())
forceif FALSE, raise an error if the description already exists, else TRUE to overwrite (default: FALSE)
Add a description of a pushed dataset.
the index name if complete, else an error
\dontrun{
kc$add_description(
index_name = "sw",
dataset_name = "starwars",
source_name = "Package dplyr",
index_description = "Description of starwars characters, the data comes from the Star
Wars API.",
version = "dplyr (1.0.0)",
link = "http://swapi.dev/",
direct_download_link = "http://swapi.dev/",
version_date = "2020-05-28",
license_link = "MIT",
columns = list(
"name" = "Name of the character",
"height" = "Height (cm)",
"mass" = "Weight (kg)",
"hair_color" = "Hair colors",
"skin_color" = "Skin colors",
"eye_color" = "Eye colors",
"birth_year" = "Year born (BBY = Before Battle of Yavin)",
"sex" = "The biological sex of the character, namely male, female,
hermaphroditic, or none (as in the case for Droids).",
"gender" = "The gender role or gender identity of the character as determined by
their personality or the way they were progammed (as in the case for Droids
).",
"homeworld" = "Name of homeworld",
"species" = "Name of species",
"films" = "List of films the character appeared in",
"vehicles" = "List of vehicles the character has piloted",
"starships" = "List of starships the character has piloted"
)
)
}
has_description()Kibior$has_description(index_name)
index_namethe index name to describe
Does the description exists?
a list splitted by index, with TRUE if the description is found, else FALSE. Removes unknown index names.
\dontrun{
kc$has_description("s*")
kc$has_description(c("sw", "asdf"))
}
missing_descriptions()Kibior$missing_descriptions()
List indices that do no have descriptions.
a vector of indices not present in description.
\dontrun{
kc$missing_descriptions()
}
remove_description()Kibior$remove_description(index_name)
index_namethe index name to describe
Remove a description.
a vector of indices not present in description.
\dontrun{
# remove the description of 'test' index
kc$remove_description("test")
}
clean_descriptions()Kibior$clean_descriptions()
Remove all descriptions that do not have a index associated.
a list of index names which have been removed from descriptions.
\dontrun{
# remove the description of 'test' index
kc$clean_descriptions()
}
describe()Kibior$describe(index_name, columns = NULL, pretty = FALSE)
index_namethe index name to describe
columnsa vector of column names to describe (default: NULL)
prettypretty-print the result (default: FALSE)
Get the description of indices and their columns.
all description, grouped by indices
\dontrun{
kc$describe("s*")
kc$describe("sw", columns = c("name", "height"))
}
describe_index()Kibior$describe_index(index_name)
index_namethe index name to describe
Get the description text of indices.
a list of description text, grouped by indices
\dontrun{
kc$describe_index("s*")
}
describe_columns()Kibior$describe_columns(index_name, columns)
index_namethe index name to describe
columnsa vector of column names to describe
Get the description text of index columns.
a list of description text, grouped by indices
\dontrun{
kc$describe_columns("s*", c("name", "height"))
}
infos()Kibior$infos()
Get informations about Elasticsearch cluster
a list of statistics about the cluster
\dontrun{
kc$infos()
}
ping()Kibior$ping()
Ping cluster connection
the ping result with some basic infos
\dontrun{
kc$ping()
}
mappings()Kibior$mappings(index_name)
index_namea vector of index names to get mappings.
Get mappings of indices
the list of indices, containing their mapping
\dontrun{
kc$mappings()
kc$mappings("sw")
kc$mappings(c("sw", "sw_naboo"))
}
settings()Kibior$settings(index_name)
index_namea vector of index names to get settings.
Get settings of indices
the list of indices, containing their settings
\dontrun{
kc$settings()
kc$settings("sw")
kc$settings(c("sw", "sw_tatooine"))
}
aliases()Kibior$aliases(index_name)
index_namea vector of index names to get aliases.
Get aliases of indices
the list of indices, containing their aliases
\dontrun{
kc$aliases()
kc$aliases("sw")
kc$aliases(c("sw", "sw_alderaan"))
}
dim()Kibior$dim(index_name)
index_namea vector of index names to get aliases.
Shortcut to '$count()' to match the classical 'dim()' function pattern '[line col]'
the list of indices, containing their number of observations and variables.
\dontrun{
# Couple [<nb obs> <nb var>] in "sw"
kc$dim("sw")
# Couple [<nb obs> <nb var>] in indices "sw_naboo" and "sw_alderaan"
kc$dim(c("sw_naboo", "sw_alderaan"))
}
columns()Kibior$columns(index_name)
index_namea vector of index names, can be a pattern.
Get fields/columns of indices.
a list of indices, each containing their fields/columns.
\dontrun{
kc$columns("sw") # direct search
kc$columns("sw_*") # pattern search
}
count()Kibior$count(index_name, type = "observations", query = NULL)
index_namea vector of index names to get aliases.
typea string representing the type to count: "observations" (lines) or "variables" (columns) (default: "observations").
querya string as a query string syntax (default: NULL).
Count observations or variables in Elasticsearch data
the list of indices, containing their number of observations or variables. Use '$dim()' for both
\dontrun{
# Number of observations (nb of records) in "sw"
kc$count("sw")
# Number of observations in indices "sw_naboo" and "sw_tatooine"
kc$count(c("sw_naboo", "sw_tatooine"))
# Number of variables (nb of columns) in index "sw_naboo"
kc$count("sw_naboo", type = "variables")
}
avg()Kibior$avg(index_name, columns, query = NULL)
index_namea vector of index names.
columnsa vector of column names.
querya string as a query string syntax (default: NULL).
Get the average of numeric columns.
a tibble with avg, one line by matching index and column.
\dontrun{
# Avg of "sw" column "height"
kc$avg("sw", "height")
# if pattern
kc$avg("s*", "height")
# multiple indices, multiple columns
kc$avg(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
}
mean()Kibior$mean(index_name, columns, query = NULL)
index_namea vector of index names.
columnsa vector of column names.
querya string as a query string syntax (default: NULL).
Get the mean of numeric columns.
a tibble with mean, one line by matching index and column.
\dontrun{
# mean of "sw" column "height"
kc$mean("sw", "height")
# if pattern
kc$mean("s*", "height")
# multiple indices, multiple columns
kc$mean(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
}
min()Kibior$min(index_name, columns, query = NULL)
index_namea vector of index names.
columnsa vector of column names.
querya string as a query string syntax (default: NULL).
Get the minimum of numeric columns.
a tibble with min, one line by matching index and column.
\dontrun{
# min of "sw" column "height"
kc$min("sw", "height")
# if pattern
kc$min("s*", "height")
# multiple indices, multiple columns
kc$min(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
}
max()Kibior$max(index_name, columns, query = NULL)
index_namea vector of index names.
columnsa vector of column names.
querya string as a query string syntax (default: NULL).
Get the maximum of numeric columns.
a tibble with max, one line by matching index and column.
\dontrun{
# max of "sw" column "height"
kc$max("sw", "height")
# if pattern
kc$max("s*", "height")
# multiple indices, multiple columns
kc$max(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
}
sum()Kibior$sum(index_name, columns, query = NULL)
index_namea vector of index names.
columnsa vector of column names.
querya string as a query string syntax (default: NULL).
Get the sum of numeric columns.
a tibble with sum, one line by matching index and column.
\dontrun{
# sum of "sw" column "height"
kc$sum("sw", "height")
# if pattern
kc$sum("s*", "height")
# multiple indices, multiple columns
kc$sum(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
}
stats()Kibior$stats(index_name, columns, sigma = NULL, query = NULL)
index_namea vector of index names.
columnsa vector of column names.
sigma(default: NULL).
querya string as a query string syntax (default: NULL).
Produces descriptive statistics of a column. Returns a tibble composed of: count, min, max, avg, sum, sum_of_squares, variance, std_deviation (+ upper and lower bounds). Multiple warnings here. One for the count and one for the std_dev. 1/ Counts: they are approximate, see vignette. 2/ Std dev: as stated in ES documentation: "The standard deviation and its bounds are displayed by default, but they are not always applicable to all data-sets. Your data must be normally distributed for the metrics to make sense. The statistics behind standard deviations assumes normally distributed data, so if your data is skewed heavily left or right, the value returned will be misleading."
a tibble with descriptive stats, one line by matching index.
\dontrun{
# Stats of "sw" column "height"
kc$stats("sw", "height")
# if pattern
kc$stats("s*", "height")
# multiple indices and sigma definition
kc$stats(c("sw", "sw2"), "height", sigma = 2.5)
# multiple indices, multiple columns
kc$stats(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
}
percentiles()Kibior$percentiles(index_name, columns, percents = NULL, query = NULL)
index_namea vector of index names.
columnsa vector of column names.
percentsa numeric vector of pecents to use (default: NULL).
querya string as a query string syntax (default: NULL).
Get percentiles of numeric columns.
a list of tibble, splitted by indices with percentiles inside tibble columns.
\dontrun{
# percentiles of "sw" column "height", default is with q1, q2 and q3
kc$percentiles("sw", "height")
# if pattern
kc$percentiles("s*", "height")
# defining percents to get
kc$percentiles("s*", "height", percents = c(20, 25))
# multiple indices, multiple columns
kc$percentiles(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
}
q1()Kibior$q1(index_name, columns, query = NULL)
index_namea vector of index names.
columnsa vector of column names.
querya string as a query string syntax (default: NULL).
Get Q1 percentiles from numeric columns.
a list of tibble, splitted by indices with Q1 inside tibble columns.
\dontrun{
# Q1 of "sw" column "height"
kc$q1("sw", "height")
# if pattern
kc$q1("s*", "height")
# multiple indices, multiple columns
kc$q1(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
}
q2()Kibior$q2(index_name, columns, query = NULL)
index_namea vector of index names.
columnsa vector of column names.
querya string as a query string syntax (default: NULL).
Get Q2 percentiles from numeric columns.
a list of tibble, splitted by indices with Q2 inside tibble columns.
\dontrun{
# Q2 of "sw" column "height"
kc$q2("sw", "height")
# if pattern
kc$q2("s*", "height")
# multiple indices, multiple columns
kc$q2(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
}
median()Kibior$median(index_name, columns, query = NULL)
index_namea vector of index names.
columnsa vector of column names.
querya string as a query string syntax (default: NULL).
Get median from numeric columns. Basically a wrapper around '$q2()'.
a list of tibble, splitted by indices with median inside tibble columns.
\dontrun{
# median of "sw" column "height"
kc$median("sw", "height")
# if pattern
kc$median("s*", "height")
# multiple indices, multiple columns
kc$median(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
}
q3()Kibior$q3(index_name, columns, query = NULL)
index_namea vector of index names.
columnsa vector of column names.
querya string as a query string syntax (default: NULL).
Get Q3 percentiles from numeric columns.
a list of tibble, splitted by indices with Q3 inside tibble columns.
\dontrun{
# Q3 of "sw" column "height"
kc$q3("sw", "height")
# if pattern
kc$q3("s*", "height")
# multiple indices, multiple columns
kc$q3(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
}
summary()Kibior$summary(index_name, columns, query = NULL)
index_namea vector of index names.
columnsa vector of column names.
querya string as a query string syntax (default: NULL).
Summary for numeric columns. Cumulates '$min()', '$max()', '$q1()', '$q2()', '$q3()'.
a list of tibble, splitted by indices.
\dontrun{
# summary of "sw" column "height"
kc$summary("sw", "height")
# if pattern
kc$summary("s*", "height")
# multiple indices, multiple columns
kc$summary(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
}
keys()Kibior$keys(index_name, column, max_size = 1000)
index_namean index name.
columna field name of this index (default: NULL).
max_sizethe maximum result to return (default: 1000).
Get distinct keys elements of a specific column.
a vector of keys values from this field/column
\dontrun{
kc$keys("sw", "name")
kc$keys("sw", "eye_color")
}
bam_to_tibble()Kibior$bam_to_tibble(bam_data = NULL)
bam_datadata from a BAM file (default: NULL).
Transformation function for collapsing the BAM list of lists format into a single list as per the Rsamtools vignette
a tibble of BAM data
\dontrun{
dd_bai <- system.file("extdata", "test.bam.bai", package = "kibior")
bam_param <- Rsamtools::ScanBamParam(what = c("pos", "qwidth"))
bam_data <- Rsamtools::scanBam(dd_bai, param = bam_param)
kc$bam_to_tibble(bam_data)
}
soft_cast()Kibior$soft_cast(
data,
caster = getFromNamespace("as_tibble", "tibble"),
caster_args = list(.name_repair = "unique"),
warn = TRUE
)datadata to cast.
casterthe caster closure/function (default: tibble::as_tibble)
caster_argsothers caster args (default: list(.name_repair = "unique"))
warndo print warning if error? (default: TRUE)
Casting function that tries to cast a transformation closure. Uses tibble::as_tibble() by default.
a cast or the unchanged data.
\dontrun{
kc$soft_cast(datasets::iris)
}
get_resource()Kibior$get_resource(url_or_filepath, fileext = NULL)
url_or_filepatha filepath or an URL.
fileextthe file extension (default: NULL).
Get a local filepath or an URL data through a tempfile. If the file exists locally, the filepath will be returned, if not, it will tries to download the data and return the temp filepath.
a filepath.
\dontrun{
kc$get_resource(system.file("R", "kibior.R", package = "kibior"))
kc$get_resource("https://ftp.ncbi.nlm.nih.gov/entrez/README")
}
export()Kibior$export(data, filepath, format = "csv", force = FALSE)
dataan index name or in-memory data to be extracted to a file.
filepaththe filepath to use as export, must contain the file extention.
formatthe file format to use (default: "csv").
forceoverwrite the file? (default: FALSE).
Export data to a file. Needs 'rio' package from CRAN. Some data formats are not installed by default. Use 'rio::install_formats()' to be able to parse them.
the filepath if correctly exported, else an error
\dontrun{
f <- tempfile(fileext=".csv")
# export and overwrite last file with the same data from Elasticsearch
kc$export(data = "sw", filepath = f)
# export from in-memory data to a file
kc$export(data = dplyr::starwars, filepath = f, force = TRUE)
}
import_tabular()Kibior$import_tabular(filepath, to_tibble = TRUE, fileext = ".csv")
filepaththe filepath to use as import, must contain the file extention.
to_tibblereturns the result as tibble? If FALSE, the raw default rio::import() format will be used (default: TRUE).
fileextthe file extension (default: ".csv").
Import method for tabular data. Needs 'rio' package from CRAN. Works mainly with CSV, TSV, TAB, TXT and ZIPped formats.
data contained in the file as a tibble, or NULL.
\dontrun{
f <- tempfile(fileext = ".csv")
rio::export(ggplot2::diamonds, f)
# import to in-memory variable
kc$import_tabular(filepath = f)
# import raw data
kc$import_tabular(filepath = f, to_tibble = FALSE)
}
import_features()Kibior$import_features(filepath, to_tibble = TRUE, fileext = ".gtf")
filepaththe filepath to use as import, must contain the file extention.
to_tibblereturns the result as tibble? If FALSE, the raw default rtracklayer::import() format will be used (default: TRUE).
fileextthe file extension (default: ".gtf").
Import method for features data. Needs 'rtracklayer' package from Bioconductor. Works with BED, GTF, GFFx, and GZIPped formats.
data contained in the file as a tibble, or NULL.
\dontrun{
# get sample files
f_gff <- system.file("extdata", "chr_y.gff3.gz", package = "kibior")
f_bed <- system.file("extdata", "cpg.bed", package = "kibior")
# import to in-memory variable
kc$import_features(filepath = f_bed)
kc$import_features(filepath = f_gff)
# import raw data
kc$import_features(filepath = f_bed, to_tibble = FALSE)
kc$import_features(filepath = f_gff, to_tibble = FALSE)
}
import_alignments()Kibior$import_alignments(filepath, to_tibble = TRUE, fileext = ".bam")
filepaththe filepath to use as import, should contain the file extention.
to_tibblereturns the result as tibble? If FALSE, the raw default Rsamtools::scanBam() format will be used (default: TRUE).
fileextthe file extension (default: ".bam").
Import method for alignments data. Needs 'Rsamtools' packages from Bioconductor. Works with BAM format.
data contained in the file as a tibble, or NULL.
\dontrun{
# get sample file
f_bai <- system.file("extdata", "test.bam.bai", package = "kibior")
# import to in-memory variable
kc$import_alignments(filepath = f_bai)
# import raw data
kc$import_alignments(filepath = f_bai, to_tibble = FALSE)
}
import_json()Kibior$import_json(filepath, to_tibble = TRUE, fileext = ".json")
filepaththe filepath to use as import, should contain the file extention.
to_tibblereturns the result as tibble? If FALSE, the raw dataframe format will be used (default: TRUE).
fileextthe file extension (default: ".json").
Import method for JSON format. Needs 'jsonlite' packages from CRAN.
data contained in the file as a tibble, dataframe or NULL.
\dontrun{
# get sample file
f_json <- system.file("extdata", "storms100.json", package = "kibior")
# import to in-memory variable
kc$import_json(f_json)
# import raw data
kc$import_json(f_json, to_tibble = FALSE)
}
import_sequences()Kibior$import_sequences(filepath, to_tibble = TRUE, fasta_type = "auto")
filepaththe filepath to use as import, should contain the file extention.
to_tibblereturns the result as tibble? If FALSE, the raw default Rsamtools::scanBam() format will be used (default: TRUE).
fasta_typetype of parsing. It can be "dna", "rna", "aa" ou "auto" (default: "auto")
Import method for sequences data. Needs 'Biostrings' package from Bioconductor. Works with FASTA formats.
data contained in the file as a tibble, or NULL.
\dontrun{
# get sample file
f_dna <- system.file("extdata", "dna_human_y.fa.gz", package = "kibior")
f_rna <- system.file("extdata", "ncrna_mus_musculus.fa.gz", package = "kibior")
f_aa <- system.file("extdata", "pep_mus_spretus.fa.gz", package = "kibior")
# import to in-memory variable
kc$import_sequences(filepath = f_dna, fasta_type = "dna")
# import raw data
kc$import_sequences(filepath = f_rna, to_tibble = FALSE, fasta_type = "rna")
# import auto
kc$import_sequences(filepath = f_aa)
}
guess_import()Kibior$guess_import(filepath, to_tibble = TRUE)
filepaththe filepath to use as import, must contain the file extention.
to_tibblereturns the result as tibble? (default: TRUE).
Import method that will try to guess importation method. Will also try to read from compressed data if they are. This method will call other import_* methods when trying. Some data formats are not installed by default. Use 'rio::install_formats()' to be able to parse them.
data contained in the file, or NULL.
\dontrun{
# get sample file
f_dna <- system.file("extdata", "dna_human_y.fa.gz", package = "kibior")
f_rna <- system.file("extdata", "ncrna_mus_musculus.fa.gz", package = "kibior")
f_aa <- system.file("extdata", "pep_mus_spretus.fa.gz", package = "kibior")
f_bai <- system.file("extdata", "test.bam.bai", package = "kibior")
f_gff <- system.file("extdata", "chr_y.gff3.gz", package = "kibior")
f_bed <- system.file("extdata", "cpg.bed", package = "kibior")
# import
kc$guess_import(f_dna)
kc$guess_import(f_rna)
kc$guess_import(f_aa)
kc$guess_import(f_bai)
kc$guess_import(f_gff)
kc$guess_import(f_bed)
}
import()Kibior$import( filepath, import_type = "auto", push_index = NULL, push_mode = "check", id_col = NULL, to_tibble = TRUE )
filepaththe filepath to use as import, must contain the file extention.
import_typecan be one of "auto", "tabular", "features", "alignments", "sequences" (default: "auto").
push_indexthe name of the index where to push data (default: NULL).
push_modethe push mode (default: "check").
id_colthe column name of unique IDs (default: NULL).
to_tibblereturns the result as tibble? (default: TRUE).
Generic import method. This method will call other import_* methods when trying. Some data formats are not installed by default.
data contained in the file, or NULL.
\dontrun{
# get sample file
f_aa <- system.file("extdata", "pep_mus_spretus.fa.gz", package = "kibior")
f_gff <- system.file("extdata", "chr_y.gff3.gz", package = "kibior")
f_bai <- system.file("extdata", "test.bam.bai", package = "kibior")
# import
kc$import(filepath = f_aa)
# import to Elasticsearch index ("sw_from_file") if not exists
kc$import(filepath = f_bai, push_index = "sw_from_file")
# import to index by recreating it, then pull indexed data
kc$import(filepath = f_gff, push_index = "sw_from_file",
push_mode = "recreate")
}
push()Kibior$push(data, index_name, bulk_size = 1000, mode = "check", id_col = NULL)
datathe data to push.
index_namethe index name to use in Elasticsearch.
bulk_sizethe number of record to send to Elasticsearch in a row (default: 1000).
modethe push mode, could be "check", "recreate" or "update" (default: "check").
id_colan column anme to use as ID, must be composed of unique elements (default: NULL).
Push data from in-memory to Elasticsearch. Everything is done by bulk.
the index_name given if the push ended well, else an error.
\dontrun{
# erase the last push data by recreating the index and re-pushing data
kc$push(dplyr::starwars, index_name = "sw", mode = "recreate")
# characters names are unique, can be used as ID
kc$push(dplyr::starwars, index_name = "sw", mode = "recreate", id_col = "name")
# a bit more complicated: update some data of the dataset "starwars"
# 38 records on 87 filtered
some_new_data <- dplyr::filter(dplyr::starwars, height > 180)
# make them all "gender <- female"
some_new_data["gender"] <- "female"
# update that apply, based on cahracter names to match the right record
kc$push(some_new_data, "sw", mode = "update", id_col = "name")
# view result by querying
kc$pull("sw", query = "height:>180", columns = c("name", "gender"))
}
pull()Kibior$pull( index_name, bulk_size = 500, max_size = NULL, scroll_timer = "3m", keep_metadata = FALSE, columns = NULL, query = NULL )
index_namethe index name to use in Elasticsearch.
bulk_sizethe number of record to send to Elasticsearch in a row (default: 500).
max_sizethe number of record Elasticsearch will send (default: NULL (all data)).
scroll_timerthe time the scroll API will let the request alive to scroll on the result (default: "3m" (3 minute)).
keep_metadatadoes Elasticsearch needs to sent metadata? Data columns will be prefixed by "_source." (default: FALSE).
columnsa vector of columns to select (default: NULL (all columns)).
querya string formatted to Elasticsearch query syntax, see links for the syntax details (default: NULL)
# Simple syntax details:
Pull data from Elasticsearch. Everything is done by bulk. This method is essentially a wrapper around '$search()' with parameter 'head = FALSE'
a list of datasets corresponding to the pull request, else an error. Keys of the list are index names matching the request, value are the associated tibbles
\dontrun{
# push some data sample
kc$push(dplyr::storms, "storms")
# get the whole "sw" index
kc$pull("sw")
# get the whole "sw" index with all metadata
kc$pull("sw", keep_metadata = TRUE)
# get only "name" and "status" columns of indices starting with "s"
# columns not found will be ignored
kc$pull("s*", columns = c("name", "status"))
# limit the size of the result to 10
kc$pull("storms", max_size = 10, bulk_size = 10)
# use Elasticsearch query syntax to select and filter on all indices, for all data
# Here, we want to search for all records taht match the conditions:
# field "height" is strictly more than 180 AND field homeworld is "Tatooine" OR "Naboo"
r <- kc$pull("sw", query = "height:>180 && homeworld:(Tatooine || Naboo)")
# it can be used in conjunction with `columns` to select only columns that matter
r <- kc$pull("sw", query = "height:>180 && homeworld:(Tatooine || Naboo)", columns =
c("name", "hair_color", "homeworld"))
}
move()Kibior$move( from_index, to_index, from_instance = NULL, force = FALSE, copy = FALSE )
from_indexThe source index name (default: NULL).
to_indexThe destination index name (default: NULL).
from_instanceIf not NULL, the Kibior object of another instance. if NULL (default), this instance will be used. (default: NULL).
forceDoes the destination index need to be erase? (default: FALSE)
copyDoes the destination have to be a copy of the source? FALSE (default) will delete source index, TRUE will keep it. (default: FALSE).
Move data from one index to another. It needs to be configured in the 'config/elasticsearch.yml' file to actually work.
the reindex result
\dontrun{
kc$push(dplyr::starwars, "sw", mode = "recreate")
# move data from an index to another (change name, same instance)
r <- kc$move(from_index = "sw", to_index = "sw_new")
kc$pull("sw_new")
kc$list()
}
copy()Kibior$copy(from_index, to_index, from_instance = NULL, force = FALSE)
from_indexThe source index name (default: NULL).
to_indexThe destination index name (default: NULL).
from_instanceIf not NULL, the Kibior object of another instance. if NULL (default), this instance will be used. (default: NULL).
forceDoes the destination index need to be erase? (default: FALSE)
Copy data from one index to another. It needs to be configured in the 'config/elasticsearch.yml' file to actually work. This method is a wrapper around '$move(copy = TRUE)'.
the reindex result
\dontrun{
# copy data from one index to another (same instance)
r <- kc$copy(from_index = "sw_new", to_index = "sw")
kc$pull(c("sw", "sw_new"))
kc$list()
}
match()Kibior$match(index_name)
index_namethe index name to use in Elasticsearch, can be a pattern with '*'.
Match requested index names against Elasticsearch indices list.
a vector of matching index names, NULL if nothing matches.
\dontrun{
# search "sw" index name
kc$match("sw")
# search all starting with an "s"
kc$match("s*")
# get all index name, identical to `$list()`
kc$match("*")
# search multiple names
kc$match(c("sw", "sw_new", "nope"))
# search multiple names with pattern
kc$match(c("s*", "nope"))
}
search()Kibior$search( index_name = "_all", keep_metadata = FALSE, columns = NULL, bulk_size = 500, max_size = NULL, scroll_timer = "3m", head = TRUE, query = NULL )
index_namethe index name to use in Elasticsearch (default: NULL).
keep_metadatadoes Elasticsearch needs to sent metadata? Data columns will be prefixed by "_source." (default: FALSE).
columnsa vector of columns to select (default: NULL (all columns)).
bulk_sizethe number of record to send to Elasticsearch in a row (default: 500).
max_sizethe number of record Elasticsearch will send (default: NULL (all data)).
scroll_timerthe time the scroll API will let the request alive to scroll on the result (default: "3m" (3 minutes)).
heada boolean limiting the search result and time (default: TRUE)
querya string formatted to Elasticsearch query syntax, see links for the syntax details (default: NULL)
Search data from Elasticsearch. The goal of this method is to discover quickly what data are interesting, thus 'head = TRUE' by default. If you want to get all data, use 'head = FALSE' or '$pull()'. Everything is done by bulk.
a list of datasets corresponding to the pull request, else an error. Keys of the list are index names matching the request, value are the associated tibbles
\dontrun{
# search "sw" index, head mode on
kc$search("sw")
# search "sw" index with all metadata, head mode on
kc$search("sw", keep_metadata = TRUE)
# get only "name" field of the head of indices starting with "s"
# if an index does not have the "name" field, it will be empty
kc$search("s*", columns = "name")
# limit the size of the result to 50 to the whole index
kc$search("storms", max_size = 50, bulk_size = 50, head = FALSE)
# use Elasticsearch query syntax to select and filter on all indices, for all data
# Here, we want to search for all records taht match the conditions:
# field "height" is strictly more than 180 AND field homeworld is "Tatooine" OR "Naboo"
kc$search("*", query = "height:>180 && homeworld:(Tatooine || Naboo)")
# it can be used in conjunction with `columns` to select only columns that matter
kc$search("*", query = "height:>180 && homeworld:(Tatooine || Naboo)", columns =
c("name", "hair_color", "homeworld"))
}
inner_join()Kibior$inner_join(...)
...see 'join()' params.
Execute a inner join between two datasets using 'dplyr' joins. The datasets can be in-memory (variable name) or the name of an currently stored Elasticsearch index. Joins cannot be done on column of type "list" ("by" argument).
a tibble
\dontrun{
# some data for joins examples
kc$push(ggplot2::diamonds, "diamonds")
# prepare join datasets, only big the biggest diamonds are selected (9)
sup_carat <- dplyr::filter(ggplot2::diamonds, carat > 3.5)
r <- kc$push(sup_carat, "diamonds_superior")
# execute a inner_join with one index and one in-memory dataset
kc$inner_join(ggplot2::diamonds, "diamonds_superior")
# execute a inner_join with one index queried, and one in-memory dataset
kc$inner_join(ggplot2::diamonds, "diamonds", right_query
= "carat:>3.5")
}
full_join()Kibior$full_join(...)
...see 'join()' params.
Execute a full join between two datasets using 'dplyr' joins. The datasets can be in-memory (variable name) or the name of an currently stored Elasticsearch index. Joins cannot be done on column of type "list" ("by" argument).
a tibble
\dontrun{
# prepare join datasets, fair cuts
fair_cut <- dplyr::filter(ggplot2::diamonds, cut == "Fair") # 1605 lines
sup_carat <- kc$pull("diamonds_superior")$diamonds_superior
# execute a full_join with one index and one in-memory dataset
kc$full_join(fair_cut, "diamonds_superior")
# execute a full_join with one index queried, and one in-memory dataset
kc$full_join(sup_carat, "diamonds", right_query = "cut:fair")
}
left_join()Kibior$left_join(...)
...see 'join()' params.
Execute a left join between two datasets using 'dplyr' joins. The datasets can be in-memory (variable name) or the name of an currently stored Elasticsearch index. Joins cannot be done on column of type "list" ("by" argument).
a tibble
\dontrun{
# prepare join datasets, fair cuts
fair_cut <- dplyr::filter(ggplot2::diamonds, cut == "Fair") # 1605 lines
sup_carat <- kc$pull("diamonds_superior")$diamonds_superior
# execute a left_join with one index and one in-memory dataset
kc$left_join(fair_cut, "diamonds_superior")
# execute a left_join with one index queried, and one in-memory dataset
kc$left_join(sup_carat, "diamonds", right_query
= "cut:fair")
}
right_join()Kibior$right_join(...)
...see 'join()' params.
Execute a right join between two datasets using 'dplyr' joins. The datasets can be in-memory (variable name) or the name of an currently stored Elasticsearch index. Joins cannot be done on column of type "list" ("by" argument).
a tibble
\dontrun{
# prepare join datasets, fair cuts
fair_cut <- dplyr::filter(ggplot2::diamonds, cut == "Fair") # 1605 lines
sup_carat <- kc$pull("diamonds_superior")$diamonds_superior
# execute a right_join with one index and one in-memory dataset
kc$right_join(fair_cut, "diamonds_superior")
# execute a right_join with one index queried, and one in-memory dataset
kc$right_join(sup_carat, "diamonds", right_query
= "cut:fair")
}
semi_join()Kibior$semi_join(...)
...see 'join()' params.
Execute a semi join between two datasets using 'dplyr' joins. The datasets can be in-memory (variable name) or the name of an currently stored Elasticsearch index. Joins cannot be done on column of type "list" ("by" argument).
a tibble
\dontrun{
# prepare join datasets, fair cuts
fair_cut <- dplyr::filter(ggplot2::diamonds, cut == "Fair") # 1605 lines
sup_carat <- kc$pull("diamonds_superior")$diamonds_superior
# execute a semi_join with one index and one in-memory dataset
kc$semi_join(fair_cut, "diamonds_superior")
# execute a semi_join with one index queried, and one in-memory dataset
kc$semi_join(sup_carat, "diamonds", right_query
= "cut:fair")
}
anti_join()Kibior$anti_join(...)
...see 'join()' params.
Execute a anti join between two datasets using 'dplyr' joins. The datasets can be in-memory (variable name) or the name of an currently stored Elasticsearch index. Joins cannot be done on column of type "list" ("by" argument).
a tibble
\dontrun{
# prepare join datasets, fair cuts
fair_cut <- dplyr::filter(ggplot2::diamonds, cut == "Fair") # 1605 lines
sup_carat <- kc$pull("diamonds_superior")$diamonds_superior
# execute a anti_join with one index and one in-memory dataset
kc$anti_join(fair_cut, "diamonds_superior")
# execute a anti_join with one index queried, and one in-memory dataset
kc$anti_join(sup_carat, "diamonds", right_query
= "cut:fair")
#
# Do not mind this, removing example indices
elastic::index_delete(kc$connection, "*")
kc <- NULL
}
clone()The objects of this class are cloneable with this method.
Kibior$clone(deep = FALSE)
deepWhether to make a deep clone.
Régis Ongaro-Carcy, regis.ongaro-carcy2@crchudequebec.ulaval.ca
Kibio.science: http://kibio.science,
Elasticsearch documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
you should use count for more accurate count.
https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#time-units for time-units and https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax for the Elasticsearch query string syntax.
: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html Elasticsearch reindex feature for more information.
: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html Elasticsearch reindex feature for more information.
https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#time-units for time-units and https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax for the Elasticsearch query string syntax.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 | ## ------------------------------------------------
## Method `Kibior$new`
## ------------------------------------------------
## Not run:
# default initiatlization, connect to "localhost:9200"
kc <- Kibior$new()
# connect to "192.168.2.145:9200"
kc <- Kibior$new("192.168.2.145")
# connect to "es:15005", verbose mode activated
kc <- Kibior$new(host = "elasticsearch", port = 15005, verbose = TRUE)
# connect to "192.168.2.145:9450" with credentials "foo:bar"
kc <- Kibior$new(host = "192.168.2.145", port = 9450, user = "foo", pwd = "bar")
# connect to "elasticsearch:9200"
kc <- Kibior$new("elasticsearch")
# get kibior var from env (".Renviron" file or local env)
dd <- system.file("doc_env", "kibior_build.R", package = "kibior")
source(dd, local = TRUE)
kc <- .kibior_get_instance_from_env()
kc$quiet_progress <- TRUE
# preparing all examples (do not mind this for this method)
delete_if_exists <- function(index_names){
tryCatch(
expr = { kc$delete(index_names) },
error = function(e){ }
)
}
delete_if_exists(c(
"aaa",
"bbb",
"ccc",
"ddd",
"sw",
"sw_naboo",
"sw_tatooine",
"sw_alderaan",
"sw_from_file",
"storms",
"starwars"
))
## End(Not run)
## ------------------------------------------------
## Method `Kibior$print`
## ------------------------------------------------
## Not run:
print(kc)
## End(Not run)
## ------------------------------------------------
## Method `Kibior$eq`
## ------------------------------------------------
## Not run:
kc$eq(kc)
## End(Not run)
## ------------------------------------------------
## Method `Kibior$ne`
## ------------------------------------------------
## Not run:
kc$ne(kc)
## End(Not run)
## ------------------------------------------------
## Method `Kibior$create`
## ------------------------------------------------
## Not run:
kc$create("aaa")
kc$create(c("bbb", "ccc"))
## End(Not run)
## ------------------------------------------------
## Method `Kibior$list`
## ------------------------------------------------
## Not run:
kc$list()
kc$list(get_specials = TRUE)
## End(Not run)
## ------------------------------------------------
## Method `Kibior$has`
## ------------------------------------------------
## Not run:
kc$has("aaa")
kc$has(c("bbb", "ccc"))
## End(Not run)
## ------------------------------------------------
## Method `Kibior$delete`
## ------------------------------------------------
## Not run:
kc$delete("aaa")
kc$delete(c("bbb", "ccc"))
## End(Not run)
## ------------------------------------------------
## Method `Kibior$add_description`
## ------------------------------------------------
## Not run:
kc$add_description(
index_name = "sw",
dataset_name = "starwars",
source_name = "Package dplyr",
index_description = "Description of starwars characters, the data comes from the Star
Wars API.",
version = "dplyr (1.0.0)",
link = "http://swapi.dev/",
direct_download_link = "http://swapi.dev/",
version_date = "2020-05-28",
license_link = "MIT",
columns = list(
"name" = "Name of the character",
"height" = "Height (cm)",
"mass" = "Weight (kg)",
"hair_color" = "Hair colors",
"skin_color" = "Skin colors",
"eye_color" = "Eye colors",
"birth_year" = "Year born (BBY = Before Battle of Yavin)",
"sex" = "The biological sex of the character, namely male, female,
hermaphroditic, or none (as in the case for Droids).",
"gender" = "The gender role or gender identity of the character as determined by
their personality or the way they were progammed (as in the case for Droids
).",
"homeworld" = "Name of homeworld",
"species" = "Name of species",
"films" = "List of films the character appeared in",
"vehicles" = "List of vehicles the character has piloted",
"starships" = "List of starships the character has piloted"
)
)
## End(Not run)
## ------------------------------------------------
## Method `Kibior$has_description`
## ------------------------------------------------
## Not run:
kc$has_description("s*")
kc$has_description(c("sw", "asdf"))
## End(Not run)
## ------------------------------------------------
## Method `Kibior$missing_descriptions`
## ------------------------------------------------
## Not run:
kc$missing_descriptions()
## End(Not run)
## ------------------------------------------------
## Method `Kibior$remove_description`
## ------------------------------------------------
## Not run:
# remove the description of 'test' index
kc$remove_description("test")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$clean_descriptions`
## ------------------------------------------------
## Not run:
# remove the description of 'test' index
kc$clean_descriptions()
## End(Not run)
## ------------------------------------------------
## Method `Kibior$describe`
## ------------------------------------------------
## Not run:
kc$describe("s*")
kc$describe("sw", columns = c("name", "height"))
## End(Not run)
## ------------------------------------------------
## Method `Kibior$describe_index`
## ------------------------------------------------
## Not run:
kc$describe_index("s*")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$describe_columns`
## ------------------------------------------------
## Not run:
kc$describe_columns("s*", c("name", "height"))
## End(Not run)
## ------------------------------------------------
## Method `Kibior$infos`
## ------------------------------------------------
## Not run:
kc$infos()
## End(Not run)
## ------------------------------------------------
## Method `Kibior$ping`
## ------------------------------------------------
## Not run:
kc$ping()
## End(Not run)
## ------------------------------------------------
## Method `Kibior$mappings`
## ------------------------------------------------
## Not run:
kc$mappings()
kc$mappings("sw")
kc$mappings(c("sw", "sw_naboo"))
## End(Not run)
## ------------------------------------------------
## Method `Kibior$settings`
## ------------------------------------------------
## Not run:
kc$settings()
kc$settings("sw")
kc$settings(c("sw", "sw_tatooine"))
## End(Not run)
## ------------------------------------------------
## Method `Kibior$aliases`
## ------------------------------------------------
## Not run:
kc$aliases()
kc$aliases("sw")
kc$aliases(c("sw", "sw_alderaan"))
## End(Not run)
## ------------------------------------------------
## Method `Kibior$dim`
## ------------------------------------------------
## Not run:
# Couple [<nb obs> <nb var>] in "sw"
kc$dim("sw")
# Couple [<nb obs> <nb var>] in indices "sw_naboo" and "sw_alderaan"
kc$dim(c("sw_naboo", "sw_alderaan"))
## End(Not run)
## ------------------------------------------------
## Method `Kibior$columns`
## ------------------------------------------------
## Not run:
kc$columns("sw") # direct search
kc$columns("sw_*") # pattern search
## End(Not run)
## ------------------------------------------------
## Method `Kibior$count`
## ------------------------------------------------
## Not run:
# Number of observations (nb of records) in "sw"
kc$count("sw")
# Number of observations in indices "sw_naboo" and "sw_tatooine"
kc$count(c("sw_naboo", "sw_tatooine"))
# Number of variables (nb of columns) in index "sw_naboo"
kc$count("sw_naboo", type = "variables")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$avg`
## ------------------------------------------------
## Not run:
# Avg of "sw" column "height"
kc$avg("sw", "height")
# if pattern
kc$avg("s*", "height")
# multiple indices, multiple columns
kc$avg(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$mean`
## ------------------------------------------------
## Not run:
# mean of "sw" column "height"
kc$mean("sw", "height")
# if pattern
kc$mean("s*", "height")
# multiple indices, multiple columns
kc$mean(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$min`
## ------------------------------------------------
## Not run:
# min of "sw" column "height"
kc$min("sw", "height")
# if pattern
kc$min("s*", "height")
# multiple indices, multiple columns
kc$min(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$max`
## ------------------------------------------------
## Not run:
# max of "sw" column "height"
kc$max("sw", "height")
# if pattern
kc$max("s*", "height")
# multiple indices, multiple columns
kc$max(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$sum`
## ------------------------------------------------
## Not run:
# sum of "sw" column "height"
kc$sum("sw", "height")
# if pattern
kc$sum("s*", "height")
# multiple indices, multiple columns
kc$sum(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$stats`
## ------------------------------------------------
## Not run:
# Stats of "sw" column "height"
kc$stats("sw", "height")
# if pattern
kc$stats("s*", "height")
# multiple indices and sigma definition
kc$stats(c("sw", "sw2"), "height", sigma = 2.5)
# multiple indices, multiple columns
kc$stats(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$percentiles`
## ------------------------------------------------
## Not run:
# percentiles of "sw" column "height", default is with q1, q2 and q3
kc$percentiles("sw", "height")
# if pattern
kc$percentiles("s*", "height")
# defining percents to get
kc$percentiles("s*", "height", percents = c(20, 25))
# multiple indices, multiple columns
kc$percentiles(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$q1`
## ------------------------------------------------
## Not run:
# Q1 of "sw" column "height"
kc$q1("sw", "height")
# if pattern
kc$q1("s*", "height")
# multiple indices, multiple columns
kc$q1(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$q2`
## ------------------------------------------------
## Not run:
# Q2 of "sw" column "height"
kc$q2("sw", "height")
# if pattern
kc$q2("s*", "height")
# multiple indices, multiple columns
kc$q2(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$median`
## ------------------------------------------------
## Not run:
# median of "sw" column "height"
kc$median("sw", "height")
# if pattern
kc$median("s*", "height")
# multiple indices, multiple columns
kc$median(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$q3`
## ------------------------------------------------
## Not run:
# Q3 of "sw" column "height"
kc$q3("sw", "height")
# if pattern
kc$q3("s*", "height")
# multiple indices, multiple columns
kc$q3(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$summary`
## ------------------------------------------------
## Not run:
# summary of "sw" column "height"
kc$summary("sw", "height")
# if pattern
kc$summary("s*", "height")
# multiple indices, multiple columns
kc$summary(c("sw", "sw2"), c("height", "mass"), query = "homeworld:naboo")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$keys`
## ------------------------------------------------
## Not run:
kc$keys("sw", "name")
kc$keys("sw", "eye_color")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$bam_to_tibble`
## ------------------------------------------------
## Not run:
dd_bai <- system.file("extdata", "test.bam.bai", package = "kibior")
bam_param <- Rsamtools::ScanBamParam(what = c("pos", "qwidth"))
bam_data <- Rsamtools::scanBam(dd_bai, param = bam_param)
kc$bam_to_tibble(bam_data)
## End(Not run)
## ------------------------------------------------
## Method `Kibior$soft_cast`
## ------------------------------------------------
## Not run:
kc$soft_cast(datasets::iris)
## End(Not run)
## ------------------------------------------------
## Method `Kibior$get_resource`
## ------------------------------------------------
## Not run:
kc$get_resource(system.file("R", "kibior.R", package = "kibior"))
kc$get_resource("https://ftp.ncbi.nlm.nih.gov/entrez/README")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$export`
## ------------------------------------------------
## Not run:
f <- tempfile(fileext=".csv")
# export and overwrite last file with the same data from Elasticsearch
kc$export(data = "sw", filepath = f)
# export from in-memory data to a file
kc$export(data = dplyr::starwars, filepath = f, force = TRUE)
## End(Not run)
## ------------------------------------------------
## Method `Kibior$import_tabular`
## ------------------------------------------------
## Not run:
f <- tempfile(fileext = ".csv")
rio::export(ggplot2::diamonds, f)
# import to in-memory variable
kc$import_tabular(filepath = f)
# import raw data
kc$import_tabular(filepath = f, to_tibble = FALSE)
## End(Not run)
## ------------------------------------------------
## Method `Kibior$import_features`
## ------------------------------------------------
## Not run:
# get sample files
f_gff <- system.file("extdata", "chr_y.gff3.gz", package = "kibior")
f_bed <- system.file("extdata", "cpg.bed", package = "kibior")
# import to in-memory variable
kc$import_features(filepath = f_bed)
kc$import_features(filepath = f_gff)
# import raw data
kc$import_features(filepath = f_bed, to_tibble = FALSE)
kc$import_features(filepath = f_gff, to_tibble = FALSE)
## End(Not run)
## ------------------------------------------------
## Method `Kibior$import_alignments`
## ------------------------------------------------
## Not run:
# get sample file
f_bai <- system.file("extdata", "test.bam.bai", package = "kibior")
# import to in-memory variable
kc$import_alignments(filepath = f_bai)
# import raw data
kc$import_alignments(filepath = f_bai, to_tibble = FALSE)
## End(Not run)
## ------------------------------------------------
## Method `Kibior$import_json`
## ------------------------------------------------
## Not run:
# get sample file
f_json <- system.file("extdata", "storms100.json", package = "kibior")
# import to in-memory variable
kc$import_json(f_json)
# import raw data
kc$import_json(f_json, to_tibble = FALSE)
## End(Not run)
## ------------------------------------------------
## Method `Kibior$import_sequences`
## ------------------------------------------------
## Not run:
# get sample file
f_dna <- system.file("extdata", "dna_human_y.fa.gz", package = "kibior")
f_rna <- system.file("extdata", "ncrna_mus_musculus.fa.gz", package = "kibior")
f_aa <- system.file("extdata", "pep_mus_spretus.fa.gz", package = "kibior")
# import to in-memory variable
kc$import_sequences(filepath = f_dna, fasta_type = "dna")
# import raw data
kc$import_sequences(filepath = f_rna, to_tibble = FALSE, fasta_type = "rna")
# import auto
kc$import_sequences(filepath = f_aa)
## End(Not run)
## ------------------------------------------------
## Method `Kibior$guess_import`
## ------------------------------------------------
## Not run:
# get sample file
f_dna <- system.file("extdata", "dna_human_y.fa.gz", package = "kibior")
f_rna <- system.file("extdata", "ncrna_mus_musculus.fa.gz", package = "kibior")
f_aa <- system.file("extdata", "pep_mus_spretus.fa.gz", package = "kibior")
f_bai <- system.file("extdata", "test.bam.bai", package = "kibior")
f_gff <- system.file("extdata", "chr_y.gff3.gz", package = "kibior")
f_bed <- system.file("extdata", "cpg.bed", package = "kibior")
# import
kc$guess_import(f_dna)
kc$guess_import(f_rna)
kc$guess_import(f_aa)
kc$guess_import(f_bai)
kc$guess_import(f_gff)
kc$guess_import(f_bed)
## End(Not run)
## ------------------------------------------------
## Method `Kibior$import`
## ------------------------------------------------
## Not run:
# get sample file
f_aa <- system.file("extdata", "pep_mus_spretus.fa.gz", package = "kibior")
f_gff <- system.file("extdata", "chr_y.gff3.gz", package = "kibior")
f_bai <- system.file("extdata", "test.bam.bai", package = "kibior")
# import
kc$import(filepath = f_aa)
# import to Elasticsearch index ("sw_from_file") if not exists
kc$import(filepath = f_bai, push_index = "sw_from_file")
# import to index by recreating it, then pull indexed data
kc$import(filepath = f_gff, push_index = "sw_from_file",
push_mode = "recreate")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$push`
## ------------------------------------------------
## Not run:
# erase the last push data by recreating the index and re-pushing data
kc$push(dplyr::starwars, index_name = "sw", mode = "recreate")
# characters names are unique, can be used as ID
kc$push(dplyr::starwars, index_name = "sw", mode = "recreate", id_col = "name")
# a bit more complicated: update some data of the dataset "starwars"
# 38 records on 87 filtered
some_new_data <- dplyr::filter(dplyr::starwars, height > 180)
# make them all "gender <- female"
some_new_data["gender"] <- "female"
# update that apply, based on cahracter names to match the right record
kc$push(some_new_data, "sw", mode = "update", id_col = "name")
# view result by querying
kc$pull("sw", query = "height:>180", columns = c("name", "gender"))
## End(Not run)
## ------------------------------------------------
## Method `Kibior$pull`
## ------------------------------------------------
## Not run:
# push some data sample
kc$push(dplyr::storms, "storms")
# get the whole "sw" index
kc$pull("sw")
# get the whole "sw" index with all metadata
kc$pull("sw", keep_metadata = TRUE)
# get only "name" and "status" columns of indices starting with "s"
# columns not found will be ignored
kc$pull("s*", columns = c("name", "status"))
# limit the size of the result to 10
kc$pull("storms", max_size = 10, bulk_size = 10)
# use Elasticsearch query syntax to select and filter on all indices, for all data
# Here, we want to search for all records taht match the conditions:
# field "height" is strictly more than 180 AND field homeworld is "Tatooine" OR "Naboo"
r <- kc$pull("sw", query = "height:>180 && homeworld:(Tatooine || Naboo)")
# it can be used in conjunction with `columns` to select only columns that matter
r <- kc$pull("sw", query = "height:>180 && homeworld:(Tatooine || Naboo)", columns =
c("name", "hair_color", "homeworld"))
## End(Not run)
## ------------------------------------------------
## Method `Kibior$move`
## ------------------------------------------------
## Not run:
kc$push(dplyr::starwars, "sw", mode = "recreate")
# move data from an index to another (change name, same instance)
r <- kc$move(from_index = "sw", to_index = "sw_new")
kc$pull("sw_new")
kc$list()
## End(Not run)
## ------------------------------------------------
## Method `Kibior$copy`
## ------------------------------------------------
## Not run:
# copy data from one index to another (same instance)
r <- kc$copy(from_index = "sw_new", to_index = "sw")
kc$pull(c("sw", "sw_new"))
kc$list()
## End(Not run)
## ------------------------------------------------
## Method `Kibior$match`
## ------------------------------------------------
## Not run:
# search "sw" index name
kc$match("sw")
# search all starting with an "s"
kc$match("s*")
# get all index name, identical to `$list()`
kc$match("*")
# search multiple names
kc$match(c("sw", "sw_new", "nope"))
# search multiple names with pattern
kc$match(c("s*", "nope"))
## End(Not run)
## ------------------------------------------------
## Method `Kibior$search`
## ------------------------------------------------
## Not run:
# search "sw" index, head mode on
kc$search("sw")
# search "sw" index with all metadata, head mode on
kc$search("sw", keep_metadata = TRUE)
# get only "name" field of the head of indices starting with "s"
# if an index does not have the "name" field, it will be empty
kc$search("s*", columns = "name")
# limit the size of the result to 50 to the whole index
kc$search("storms", max_size = 50, bulk_size = 50, head = FALSE)
# use Elasticsearch query syntax to select and filter on all indices, for all data
# Here, we want to search for all records taht match the conditions:
# field "height" is strictly more than 180 AND field homeworld is "Tatooine" OR "Naboo"
kc$search("*", query = "height:>180 && homeworld:(Tatooine || Naboo)")
# it can be used in conjunction with `columns` to select only columns that matter
kc$search("*", query = "height:>180 && homeworld:(Tatooine || Naboo)", columns =
c("name", "hair_color", "homeworld"))
## End(Not run)
## ------------------------------------------------
## Method `Kibior$inner_join`
## ------------------------------------------------
## Not run:
# some data for joins examples
kc$push(ggplot2::diamonds, "diamonds")
# prepare join datasets, only big the biggest diamonds are selected (9)
sup_carat <- dplyr::filter(ggplot2::diamonds, carat > 3.5)
r <- kc$push(sup_carat, "diamonds_superior")
# execute a inner_join with one index and one in-memory dataset
kc$inner_join(ggplot2::diamonds, "diamonds_superior")
# execute a inner_join with one index queried, and one in-memory dataset
kc$inner_join(ggplot2::diamonds, "diamonds", right_query
= "carat:>3.5")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$full_join`
## ------------------------------------------------
## Not run:
# prepare join datasets, fair cuts
fair_cut <- dplyr::filter(ggplot2::diamonds, cut == "Fair") # 1605 lines
sup_carat <- kc$pull("diamonds_superior")$diamonds_superior
# execute a full_join with one index and one in-memory dataset
kc$full_join(fair_cut, "diamonds_superior")
# execute a full_join with one index queried, and one in-memory dataset
kc$full_join(sup_carat, "diamonds", right_query = "cut:fair")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$left_join`
## ------------------------------------------------
## Not run:
# prepare join datasets, fair cuts
fair_cut <- dplyr::filter(ggplot2::diamonds, cut == "Fair") # 1605 lines
sup_carat <- kc$pull("diamonds_superior")$diamonds_superior
# execute a left_join with one index and one in-memory dataset
kc$left_join(fair_cut, "diamonds_superior")
# execute a left_join with one index queried, and one in-memory dataset
kc$left_join(sup_carat, "diamonds", right_query
= "cut:fair")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$right_join`
## ------------------------------------------------
## Not run:
# prepare join datasets, fair cuts
fair_cut <- dplyr::filter(ggplot2::diamonds, cut == "Fair") # 1605 lines
sup_carat <- kc$pull("diamonds_superior")$diamonds_superior
# execute a right_join with one index and one in-memory dataset
kc$right_join(fair_cut, "diamonds_superior")
# execute a right_join with one index queried, and one in-memory dataset
kc$right_join(sup_carat, "diamonds", right_query
= "cut:fair")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$semi_join`
## ------------------------------------------------
## Not run:
# prepare join datasets, fair cuts
fair_cut <- dplyr::filter(ggplot2::diamonds, cut == "Fair") # 1605 lines
sup_carat <- kc$pull("diamonds_superior")$diamonds_superior
# execute a semi_join with one index and one in-memory dataset
kc$semi_join(fair_cut, "diamonds_superior")
# execute a semi_join with one index queried, and one in-memory dataset
kc$semi_join(sup_carat, "diamonds", right_query
= "cut:fair")
## End(Not run)
## ------------------------------------------------
## Method `Kibior$anti_join`
## ------------------------------------------------
## Not run:
# prepare join datasets, fair cuts
fair_cut <- dplyr::filter(ggplot2::diamonds, cut == "Fair") # 1605 lines
sup_carat <- kc$pull("diamonds_superior")$diamonds_superior
# execute a anti_join with one index and one in-memory dataset
kc$anti_join(fair_cut, "diamonds_superior")
# execute a anti_join with one index queried, and one in-memory dataset
kc$anti_join(sup_carat, "diamonds", right_query
= "cut:fair")
#
# Do not mind this, removing example indices
elastic::index_delete(kc$connection, "*")
kc <- NULL
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.