The ML Studio is an interactive platform for data visualization, statistical modeling and machine learning applications. Based on Shiny and shinydashboard interface, with Plotly interactive data visualization, DT HTML tables and H2O machine learning and deep learning algorithms. The ML Studio provides a set of tools for the data science pipeline workflow.
Please note: a full version of this vignette (with gif documentation) is available to download here
The package is available for installation with the devtools package (if devetools package is not installed please use install.packages("devtools")
to install it).
# Install the MLstudio devtools::install_github("RamiKrispin/MLstudio")
Please note - the H2O package may require additional Java adds-in (if not installed) and therefor is listed under the "Suggests" packages list of the MLstudio package (and not under the Imports or Depends list) and won't be installed automatically during the installation of the MLstudio package. More information about the installation of H2O can be find in H2O documentation (under the "INSTALL IN R" tab).
The app is launched from R and opened on the default web browser (running best on Google Chrome). To open the app please use:
# Launch the MLstudio runML()
The ML Studio provides the user with the ability to load (or remove), modify, visualize and analyze multiple datasets at the same time.
Under the "Data" tab there are two sub-tabs:
Load - set of tools to load data into the platform (from R environment, R datasets and/or csv file)
Prep - data prep tools:
Variables summary
There are three methods to load a dataset into the platform:
Loading the dataset from the R environment, currently supporting data frame, data table, matrix and ts objects.
Loading the available dataset within installed packages, supporting data frame, data table, matrix and ts objects.
Loading from csv file.
The variables attributions can be seen in the "Prep" tab in the middle table, a more in depth summary is available in the variable summary box. Using the variable attributes option, it is possible to modify if needed the attributes. Below is an interactive table, the fields can be sorted and a search option is available.
A data summary function is available on the "Prep" tab under the "Select Option" dropdown menu. This is a dplyr based function and it provides the ability to summarize data by a specific group. Currently the summary categories are - count, mean, sd, max and min.
Utilizing Plotly interactive data visualization tools along with Shiny engine, the ML Studio provides the user with effective tools for data exploration. The "Visualization" tab provides key functionality:
Application for multivariate visualization - scatter, line, boxplot histogram, density, and correlation plots
Application for time series visualization - seasonality, boxplot and lags plots
The models applications of the ML Studio are still under development and currently four classification models from the H2O package are available (Deep Learning, GBM, GLM and Random Forest).
Features that are under development:
Deep learning applications with Keras
Time series and forecasting:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.