knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

datascience.eda.R

codecov R-CMD-check

This package includes functions assisting data scientists with various common tasks during the exploratory data analysis stage of a data science project. Its functions will help the data scientist to do preliminary analysis on common column types like numeric columns, categorical columns and text columns; it will also conduct several experimental clusterings on the dataset.

Our functions are tailored based on our own experience, there are also similar packages published, a few good ones worth mentioning:

Main functions

Installation

You can install the development version of datascience.eda with:

# install.packages("devtools")
devtools::install_github("UBC-MDS/datascience.eda.R")

Example

explore_KMeans_clustering and explore_DBSCAN_clustering

library(datascience.eda)
library(palmerpenguins)

# you can call each clustering algorithm separately 
explore_KMeans_clustering(penguins, centers = seq(3, 5))
explore_DBSCAN_clustering(penguins, eps = c(1), minPts = c(5))

# OR you can just call explore_clustering(penguins) to apply both KMeans and DBSCAN at once

explore_text_columns

library(sacred)
results <- explore_text_columns(apocrypha)

explore_numeric_columns

results <- explore_numeric_columns(penguins)

explore_categorical_columns

library(dplyr)
library(MASS)
df <- data.frame(lapply(survey[, c('Sex','Clap')], as.character),
                 stringsAsFactors=FALSE) %>% tibble()

results <- explore_categorical_columns(df, c('Sex','Clap'))
results[[1]] %>% knitr::kable()
results[[2]][[1]]
results[[2]][[2]]


UBC-MDS/datascience.eda.R documentation built on March 24, 2021, 2:22 a.m.