autodb is an R package for automatic normalisation of a data frame to third normal form, with the intention of easing the process of data cleaning. (Usage to design your actual database for you is not advised.)
You can install the development version of autodb from GitHub with:
# install.packages("devtools")
devtools::install_github("CharnelMouse/autodb")
Turning a simple data frame into a database:
library(autodb)
#>
#> Attaching package: 'autodb'
#> The following object is masked from 'package:stats':
#>
#> decompose
summary(ChickWeight)
#> weight Time Chick Diet
#> Min. : 35.0 Min. : 0.00 13 : 12 1:220
#> 1st Qu.: 63.0 1st Qu.: 4.00 9 : 12 2:120
#> Median :103.0 Median :10.00 20 : 12 3:120
#> Mean :121.8 Mean :10.72 10 : 12 4:118
#> 3rd Qu.:163.8 3rd Qu.:16.00 17 : 12
#> Max. :373.0 Max. :21.00 19 : 12
#> (Other):506
db <- autodb(ChickWeight, name = "ChickWeight")
db
#> database ChickWeight with 2 relations
#> 4 attributes: weight, Time, Chick, Diet
#> relation Chick: Chick, Diet; 50 records
#> key 1: Chick
#> relation Time_Chick: Time, Chick, weight; 578 records
#> key 1: Time, Chick
#> references:
#> Time_Chick.{Chick} -> Chick.{Chick}
graphviz_text <- gv(db)
DiagrammeR::grViz(graphviz_text)
Using the exclude
argument to forbid certain variables from appearing
in keys:
summary(CO2)
#> Plant Type Treatment conc uptake
#> Qn1 : 7 Quebec :42 nonchilled:42 Min. : 95 Min. : 7.70
#> Qn2 : 7 Mississippi:42 chilled :42 1st Qu.: 175 1st Qu.:17.90
#> Qn3 : 7 Median : 350 Median :28.30
#> Qc1 : 7 Mean : 435 Mean :27.21
#> Qc3 : 7 3rd Qu.: 675 3rd Qu.:37.12
#> Qc2 : 7 Max. :1000 Max. :45.50
#> (Other):42
db2_noexclude <- autodb(CO2, name = "CO2")
db2_noexclude
#> database CO2 with 3 relations
#> 5 attributes: Plant, Type, Treatment, conc, uptake
#> relation Plant: Plant, Type, Treatment; 12 records
#> key 1: Plant
#> relation Plant_conc: Plant, conc, Treatment, uptake; 84 records
#> key 1: Plant, conc
#> key 2: Treatment, conc, uptake
#> relation conc_uptake: conc, uptake, Type; 82 records
#> key 1: conc, uptake
#> references:
#> Plant_conc.{Plant} -> Plant.{Plant}
#> Plant_conc.{conc, uptake} -> conc_uptake.{conc, uptake}
graphviz_text2_noexclude <- gv(db2_noexclude)
DiagrammeR::grViz(graphviz_text2_noexclude)
db2 <- autodb(CO2, name = "CO2", exclude = "uptake")
db2
#> database CO2 with 2 relations
#> 5 attributes: Plant, Type, Treatment, conc, uptake
#> relation Plant: Plant, Type, Treatment; 12 records
#> key 1: Plant
#> relation Plant_conc: Plant, conc, uptake; 84 records
#> key 1: Plant, conc
#> references:
#> Plant_conc.{Plant} -> Plant.{Plant}
graphviz_text2 <- gv(db2)
DiagrammeR::grViz(graphviz_text2)
There are also functions for doing each step of the database creation separately, including functional dependency detection and normalisation. See the vignette for more details.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.