knitr::opts_chunk$set(eval = FALSE)

Tools for KI Data Scientists

Travis build status AppVeyor build status

The kitools package provides utilities for data scientists working on Knowledge Integration (KI) projects and supporting workflows within these projects.

The kitools package is available as both Python and R packages. To tailor this documentation to your language of choice, use the R/Python toggle in the navigation bar. This introduction provides background and an overview of package functionality. To learn how to install and use the package, follow the links in the "Articles" menu in the navigation bar.

KI Data Workflows

The primary workflow supported by kitools is working with data that is stored in repositories on content nodes.

Types of data in KI

In KI projects, there are three major classes of data:

Types of repositories on content nodes

Associating data with a KI project

KI data scientists perform their work on a local workstation, but they need a way to get the data they need to analyze to their workstation as well as to share data and results coming out of their analysis.

The kitools package provides functionality that helps you register all data associated with your analysis and handles pushing and pulling that data to and from the content node. This is handled through the notion of a "KI project".

KI projects

At the heart of the kitools package is the notion of a "KI project", which is a directory on the data scientist's workstation in which all analysis data, code and artifacts are stored, and a corresponding analysis Synapse repository.

The package provides a function that initializes a KI project directory and Synapse space (or can associate the project with an existing analysis Synapse space), with additional functions that help you register datasets that are associated with the analysis. All datasets are tracked in a KI project "manifest" file, which provides a mapping of data in the local directory and where the data is located on Synapse.

Typical KI project workflow

A typical KI analysis begins with the specification of core datasets that will be used in the analysis, with functions to add these datasets to the project manifest and pull them from their corresponding core data repository spaces on Synapse.

Then, throughout the analysis, as the data scientist adds auxiliary data or creates analysis artifacts and results, these can be added and pushed to the KI project's KI analysis repository.

You can learn the specifics of doing this in the "Articles" documents accessible from the navigation bar.

Why use KI projects and kitools

There are many reasons for keeping a close tracking of data used in KI analyses and providing the push/pull of the data between the analyst's workstation and the content node. These include:

Dive in

To get started using kitools, visit the installation and setup guide.



ki-tools/kitools-r documentation built on Jan. 7, 2020, 5:46 a.m.