README.md

homodatum

Overview

homodatum helps to manage dataframes in a more human way. This package mainly adds information to data frames (metadata) by creating new classes for variables (hdTypes) and dataframes (hdFringe) and add them more descriptive properties.

Installation

Install the development version of makeup from GitHub with:

# install.packages("devtools")
remotes::install_github("datasketch/homodatum")

Example

This is a basic example which shows you how this packages work:

Let´s load homodatum package

library(homodatum)

New type of values

One of the main properties of this package is to add new type of variables, in order to offer ones with (more) metadata and information. The valid available variable new types from homodatum can be viewed with available_hdTypes():

Available hdTypes for variables id label \_\_\_ Null Uid Uid Cat Categorical Bin Binary Seq Sequential Num Numeric Pct Percentage Dst Distribution Dat Date Yea Year Mon Month Day Day Wdy Day of week Ywe Week in year Dtm Date time Hms Time HMS Min Minutes Sec Seconds Hie Hierarchy Grp Group Txt Text Mny Money Gnm Geo name Gcd Geo code Glt Geo latitude Gln Geo longitude Img Image Aud Audio

New type of data frame

In order to offer a more detailed information about a data frame, homodatum offers the function fringe(), which takes a data frame and converts it into a more informative object adding properties such as a dictionary, value type information, data frame name and description and several summary calculation from de variables, depending on their type.

Creating a fringe object:

# Create a dataframe
df <- data.frame(name = c("Roberta", "Ruby", "Roberta", "Maria"),
                 age  = c(98, 43, 98, 12))

# Create a fringe object
fr <- fringe(df)

This is how it looks with all the properties added:

str(fr)
#> List of 9
#>  $ data       : tibble[,2] (S3: tbl_df/tbl/data.frame/hd_tbl)
#>   ..$ name: Cat [1:4] Roberta, Ruby, Roberta, Maria
#>    .. ..@ categories  : chr [1:3] "Roberta" "Ruby" "Maria"
#>    .. ..@ n_categories: int 3
#>    .. ..@ stats       :List of 4
#>    .. .. ..$ n_unique: int 3
#>    .. .. ..$ n_na    : int 0
#>    .. .. ..$ pct_na  : num 0
#>    .. .. ..$ summary : tibble [4 × 4] (S3: tbl_df/tbl/data.frame)
#>    .. .. .. ..$ category: chr [1:4] "Maria" "Roberta" "Ruby" NA
#>    .. .. .. ..$ n       : int [1:4] 1 2 1 0
#>    .. .. .. ..$ dist    : num [1:4] 0.25 0.5 0.25 0
#>    .. .. .. ..$ names   : logi [1:4] NA NA NA NA
#>   ..$ age : Num [1:4] 98, 43, 98, 12
#>    .. ..@ stats:List of 5
#>    .. .. ..$ n_unique: int 3
#>    .. .. ..$ n_na    : int 0
#>    .. .. ..$ pct_na  : num 0
#>    .. .. ..$ min     : num 12
#>    .. .. ..$ max     : num 98
#>  $ dic        : tibble [2 × 3] (S3: tbl_df/tbl/data.frame)
#>   ..$ id    : chr [1:2] "name" "age"
#>   ..$ label : chr [1:2] "name" "age"
#>   ..$ hdType: hdType [1:2] Cat, Num
#>  $ frtype     : frType [1:1] Cat-Num
#>    ..@ hdTypes: hdType [1:2] Cat, Num
#>    ..@ group  : chr "Cat-Num"
#>  $ group      : chr "Cat-Num"
#>  $ name       : chr "df"
#>  $ description: chr ""
#>  $ slug       : chr "df"
#>  $ meta       : list()
#>  $ stats      :List of 3
#>   ..$ nrow     : int 4
#>   ..$ ncol     : int 2
#>   ..$ col_stats:List of 2
#>   .. ..$ name:List of 4
#>   .. .. ..$ n_unique: int 3
#>   .. .. ..$ n_na    : int 0
#>   .. .. ..$ pct_na  : num 0
#>   .. .. ..$ summary : tibble [4 × 4] (S3: tbl_df/tbl/data.frame)
#>   .. .. .. ..$ category: chr [1:4] "Maria" "Roberta" "Ruby" NA
#>   .. .. .. ..$ n       : int [1:4] 1 2 1 0
#>   .. .. .. ..$ dist    : num [1:4] 0.25 0.5 0.25 0
#>   .. .. .. ..$ names   : logi [1:4] NA NA NA NA
#>   .. ..$ age :List of 5
#>   .. .. ..$ n_unique: int 3
#>   .. .. ..$ n_na    : int 0
#>   .. .. ..$ pct_na  : num 0
#>   .. .. ..$ min     : num 12
#>   .. .. ..$ max     : num 98
#>  - attr(*, "class")= chr "fringe"

You can inspect specifics attibutes of the fringe object such as:

fr$data
#> # A tibble: 4 × 2
#>   name      age
#>   <Cat>   <Num>
#> 1 Roberta    98
#> 2 Ruby       43
#> 3 Roberta    98
#> 4 Maria      12
fr$dic
#> # A tibble: 2 × 3
#>   id    label hdType  
#>   <chr> <chr> <hdType>
#> 1 name  name  Cat     
#> 2 age   age   Num
fr$stats
#> $nrow
#> [1] 4
#> 
#> $ncol
#> [1] 2
#> 
#> $col_stats
#> $col_stats$name
#> $col_stats$name$n_unique
#> [1] 3
#> 
#> $col_stats$name$n_na
#> [1] 0
#> 
#> $col_stats$name$pct_na
#> [1] 0
#> 
#> $col_stats$name$summary
#> # A tibble: 4 × 4
#>   category     n  dist names
#>   <chr>    <int> <dbl> <lgl>
#> 1 Maria        1  0.25 NA   
#> 2 Roberta      2  0.5  NA   
#> 3 Ruby         1  0.25 NA   
#> 4 <NA>         0  0    NA   
#> 
#> 
#> $col_stats$age
#> $col_stats$age$n_unique
#> [1] 3
#> 
#> $col_stats$age$n_na
#> [1] 0
#> 
#> $col_stats$age$pct_na
#> [1] 0
#> 
#> $col_stats$age$min
#> [1] 12
#> 
#> $col_stats$age$max
#> [1] 98

Learn about the many ways to work with formatting dates values in vignette("set-name")



jpmarindiaz/homodatum documentation built on May 1, 2023, 7:24 p.m.