Home

/

GitHub

/

regisoc/kibior

/

README.md

README.md
In regisoc/kibior: A Simple Data Management and Sharing Tool

kibior: easy scientific data handling, searching and sharing with Elasticsearch

Version: 0.1.1

| | | |-|-| | What | kibior is a R package dedicated to ease the pain of data handling in science, and more notably with biological data. | | Where | kibior is using Elasticsearch as database and search engine. | | Who | kibior is built for data science and data manipulation, so when any data-related action or need is involved, notably sharing data. It mainly targets bioinformaticians, and more broadly, data scientists. | | When | Available now from this repository, or CRAN repository. | | Public instances | Use the $get_kibio_instance() method to connect to Kibio and access known datasets. See Kibio datasets at the end of this document for a complete list. | | Cite this package | In R session, run citation("kibior") | | Publication | 10.1093/bioinformatics/btab157 |

This package allows:

Pushing, pulling, joining, sharing and searching tabular data between an R session and one or multiple Elasticsearch instances/clusters.
Massive data query and filter with Elasticsearch engine.
Multiple living Elasticsearch connections to different addresses.
Method autocompletion in proper environments (e.g. R cli, RStudio).
Import and export datasets from an to files.
Server-side execution for most of operations (i.e. on Elasticsearch instances/clusters).

# Get from CRAN
install.packages("kibior")

# or get the latest from Github
devtools::install_github("regisoc/kibior")

# load
library(kibior)

# Get a specific instance
kc <- Kibior$new("server_or_address", port)

# Or try something bigger...
kibio <- Kibior$get_kibio_instance()
kibio$list()

Here is an extract of some of the features proposed by KibioR. See Introduction vignette for more advanced usage.

Example: `push` datasets

# Push data (R memory -> Elasticsearch)
dplyr::starwars %>% kc$push("sw")
dplyr::storms %>% kc$push("st")

Example: `pull` datasets

# Pull data with columns selection (Elasticsearch -> R memory)
kc$pull("sw", query = "homeworld:(naboo || tatooine)", 
              columns = c("name", "homeworld", "height", "mass", "species"))
# see vignette for query syntax

Example: `copy` datasets

# Copy dataset (Elasticsearch internal operation)
kc$copy("sw", "sw_copy")

Example: `delete` datasets


# Delete datasets
kc$delete("sw_copy")

Example: `list`, `match` dataset names

# List available datasets
kc$list()

# Search for index names starting with "s"
kc$match("s*")

Example: get `columns` names and list unique `keys` in values

# Get columns of all datasets starting with "s"
kc$columns("s*")

# Get unique values of a column
kc$keys("sw", "homeworld")

# Count number of lines in dataset
kc$count("st")

# Count number of lines with query (name of the storm is Anita)
kc$count("st", query = "name:anita")

# Generic stats on two columns
kc$stats("sw", c("height", "mass"))

# Specific descriptive stats with query
kc$avg("sw", c("height", "mass"), query = "homeworld:naboo")

Example: `join`

# Inner join between:
#   1/ a Elasticsearch-based dataset with query ("sw"), 
#   2/ and a in-memory R dataset (dplyr::starwars) 
kc$inner_join("sw", dplyr::starwars, 
              left_query = "hair_color:black",
              left_columns = c("name", "mass", "height"),
              by = "name")

regisoc/kibior documentation built on Aug. 15, 2021, 9:51 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

regisoc/kibior
A Simple Data Management and Sharing Tool

README.md
In regisoc/kibior: A Simple Data Management and Sharing Tool

kibior: easy scientific data handling, searching and sharing with Elasticsearch

TL;DR

Main features

How

Install

Run

Examples

Example: `push` datasets

Example: `pull` datasets

Example: `copy` datasets

Example: `delete` datasets

Example: `list`, `match` dataset names

Example: get `columns` names and list unique `keys` in values

Example: some Elasticsearch basic statistical methods

Example: `join`

R Package Documentation

Browse R Packages

We want your feedback!

regisoc/kibior A Simple Data Management and Sharing Tool

README.md In regisoc/kibior: A Simple Data Management and Sharing Tool

kibior: easy scientific data handling, searching and sharing with Elasticsearch

TL;DR

Main features

How

Install

Run

Examples

Example: push datasets

Example: pull datasets

Example: copy datasets

Example: delete datasets

Example: list, match dataset names

Example: get columns names and list unique keys in values

Example: some Elasticsearch basic statistical methods

Example: join

R Package Documentation

Browse R Packages

We want your feedback!

regisoc/kibior
A Simple Data Management and Sharing Tool

README.md
In regisoc/kibior: A Simple Data Management and Sharing Tool

Example: `push` datasets

Example: `pull` datasets

Example: `copy` datasets

Example: `delete` datasets

Example: `list`, `match` dataset names

Example: get `columns` names and list unique `keys` in values

Example: `join`