Home

/

GitHub

/

README.md
In nietodaniel/LargeDataExplorer: Powerful package to clean and re-format very large datasets after classifying its variables by their usefulness for machine learning

LargeDataExplorer

Powerful package to clean and re-format very large datasets after classifying its variables by their usefulness for machine learning. (Full Package Information)

Because summary() isn't enough when you have >200 columns and even GBs of data, and you can't easily know which variables have no relevant information

# install.packages("devtools")
library(devtools)
devtools::install_github("nietodaniel/LargeDataExplorer")
library(LargeDataExplorer)

LargeDataExplorer can automatically classify the variables of a dataset within the following categories:

LDE.Explore() classifies the variables by its type and usefulness. It generates descriptive statistics, but doesn't transform the data. Useful for datasets of gygabytes, where the RAM is limited

df<-secop1.full                                                                          #Example dataset of government purchases included in this package. See full package info
keyNamesMatch <- c("key","id")                                                           #Variable names that start or end with these strings will be asigned as keys. E.g. c("key","id,"code"). String vector, or NULL to ignore.
Explore.df <- LDE.Explore(df,keyNamesMatch)

LDE.AutoProcess() returns a cleaned and reformatted dataset after removing unuseful varibles (It also returns statistics and classification)

df<-secop1.full   
keyNamesMatch <- c("key","id")                                                           #See LDE.Explore()
Auto.df <- LDE.AutoProcess(df,keyNamesMatch)                                               
df.clean <- Auto.df$df.filtered                                                          #Cleaned dataset

Daniel Nieto-González - GitHub Profile - Send email * CEO - Digital MedTools

nietodaniel/LargeDataExplorer documentation built on Sept. 20, 2020, 7:57 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

nietodaniel/LargeDataExplorer
Powerful package to clean and re-format very large datasets after classifying its variables by their usefulness for machine learning

README.md
In nietodaniel/LargeDataExplorer: Powerful package to clean and re-format very large datasets after classifying its variables by their usefulness for machine learning

LargeDataExplorer

Installation & Loading

Detected variable types

Variable exploration & classification, and descriptive statistics

Automatical exploration, variable filtering & re-formatting

More

Author

R Package Documentation

Browse R Packages

We want your feedback!

nietodaniel/LargeDataExplorer Powerful package to clean and re-format very large datasets after classifying its variables by their usefulness for machine learning

README.md In nietodaniel/LargeDataExplorer: Powerful package to clean and re-format very large datasets after classifying its variables by their usefulness for machine learning

LargeDataExplorer

Installation & Loading

Detected variable types

Variable exploration & classification, and descriptive statistics

Automatical exploration, variable filtering & re-formatting

More

Author

R Package Documentation

Browse R Packages

We want your feedback!

nietodaniel/LargeDataExplorer
Powerful package to clean and re-format very large datasets after classifying its variables by their usefulness for machine learning

README.md
In nietodaniel/LargeDataExplorer: Powerful package to clean and re-format very large datasets after classifying its variables by their usefulness for machine learning