README.md

dataset

dataset 패키지는 LG 인화원의 교육을 위해서 작성한 데이터 패키지입니다.

설치

dataset 패키지는 아래 코드를 이용해 설치할 수 있습니다. remotes 패키지는 설치되어 있지 않다면 설치해야 github내의 패키지를 설치할 수 있습니다.

install.packages("remotes")
remotes::install_github("lgleadershipacademy/dataset")

사용할 수 있는 데이터 리스트

library(dataset)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
list_of_dataset()
#>  [1] "avocado"        "churn"          "emp_tmnt"       "emp_attr"       "mall"          
#>  [6] "credit_train"   "credit_test"    "games"          "retail"         "mining_samples"

데이터 사용

리스트내에 mining_sample과 같이 데이터셋 이름의 뒤에 _sample 이 붙은 데이터가 아니라면 바로 이름으로 사용할 수 있습니다.

avocado
#> # A tibble: 18,249 x 13
#>    Date                AveragePrice `Total Volume` `4046` `4225` `4770` `Total Bags` `Small Bags`
#>    <dttm>                     <dbl>          <dbl>  <dbl>  <dbl>  <dbl>        <dbl>        <dbl>
#>  1 2015-12-27 00:00:00         1.33         64237.  1037. 5.45e4   48.2        8697.        8604.
#>  2 2015-12-20 00:00:00         1.35         54877.   674. 4.46e4   58.3        9506.        9408.
#>  3 2015-12-13 00:00:00         0.93        118220.   795. 1.09e5  130.         8145.        8042.
#>  4 2015-12-06 00:00:00         1.08         78992.  1132  7.20e4   72.6        5811.        5677.
#>  5 2015-11-29 00:00:00         1.28         51040.   941. 4.38e4   75.8        6184.        5986.
#>  6 2015-11-22 00:00:00         1.26         55980.  1184. 4.81e4   43.6        6684.        6556.
#>  7 2015-11-15 00:00:00         0.99         83454.  1369. 7.37e4   93.3        8319.        8197.
#>  8 2015-11-08 00:00:00         0.98        109428.   704. 1.02e5   80          6829.        6267.
#>  9 2015-11-01 00:00:00         1.02         99811.  1022. 8.73e4   85.3       11388.       11105.
#> 10 2015-10-25 00:00:00         1.07         74339.   842. 6.48e4  113          8626.        8061.
#> # ... with 18,239 more rows, and 5 more variables: `Large Bags` <dbl>, `XLarge Bags` <dbl>,
#> #   type <chr>, year <dbl>, region <chr>
glimpse(avocado)
#> Observations: 18,249
#> Variables: 13
#> $ Date           <dttm> 2015-12-27, 2015-12-20, 2015-12-13, 2015-12-06, 2015-11-29, 2015-11-22,...
#> $ AveragePrice   <dbl> 1.33, 1.35, 0.93, 1.08, 1.28, 1.26, 0.99, 0.98, 1.02, 1.07, 1.12, 1.28, ...
#> $ `Total Volume` <dbl> 64236.62, 54876.98, 118220.22, 78992.15, 51039.60, 55979.78, 83453.76, 1...
#> $ `4046`         <dbl> 1036.74, 674.28, 794.70, 1132.00, 941.48, 1184.27, 1368.92, 703.75, 1022...
#> $ `4225`         <dbl> 54454.85, 44638.81, 109149.67, 71976.41, 43838.39, 48067.99, 73672.72, 1...
#> $ `4770`         <dbl> 48.16, 58.33, 130.50, 72.58, 75.78, 43.61, 93.26, 80.00, 85.34, 113.00, ...
#> $ `Total Bags`   <dbl> 8696.87, 9505.56, 8145.35, 5811.16, 6183.95, 6683.91, 8318.86, 6829.22, ...
#> $ `Small Bags`   <dbl> 8603.62, 9408.07, 8042.21, 5677.40, 5986.26, 6556.47, 8196.81, 6266.85, ...
#> $ `Large Bags`   <dbl> 93.25, 97.49, 103.14, 133.76, 197.69, 127.44, 122.05, 562.37, 283.83, 56...
#> $ `XLarge Bags`  <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, ...
#> $ type           <chr> "conventional", "conventional", "conventional", "conventional", "convent...
#> $ year           <dbl> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, ...
#> $ region         <chr> "Albany", "Albany", "Albany", "Albany", "Albany", "Albany", "Albany", "A...

mining_sample

mining 데이터셋은 용량이 커서 다운로드로 제공하며, 패키지 내에는 mining_sample로 첫 100줄의 데이터만 확인할 수 있습니다. 전체 데이터를 불러오려면 get_mining_data() 함수를 사용하시면 됩니다.

mining <- get_mining_data()
glimpse(mining)
#> Observations: 737,453
#> Variables: 24
#> $ date                           <dttm> 2017-03-10 01:00:00, 2017-03-10 01:00:00, 2017-03-10 01...
#> $ `% Iron Feed`                  <dbl> 55.2, 55.2, 55.2, 55.2, 55.2, 55.2, 55.2, 55.2, 55.2, 55...
#> $ `% Silica Feed`                <dbl> 16.98, 16.98, 16.98, 16.98, 16.98, 16.98, 16.98, 16.98, ...
#> $ `Starch Flow`                  <dbl> 3019.53, 3024.41, 3043.46, 3047.36, 3033.69, 3079.10, 31...
#> $ `Amina Flow`                   <dbl> 557.434, 563.965, 568.054, 568.665, 558.167, 564.697, 56...
#> $ `Ore Pulp Flow`                <dbl> 395.713, 397.383, 399.668, 397.939, 400.254, 396.533, 39...
#> $ `Ore Pulp pH`                  <dbl> 10.0664, 10.0672, 10.0680, 10.0689, 10.0697, 10.0705, 10...
#> $ `Ore Pulp Density`             <dbl> 1.74, 1.74, 1.74, 1.74, 1.74, 1.74, 1.74, 1.74, 1.74, 1....
#> $ `Flotation Column 01 Air Flow` <dbl> 249.214, 249.719, 249.741, 249.917, 250.203, 250.730, 25...
#> $ `Flotation Column 02 Air Flow` <dbl> 253.235, 250.532, 247.874, 254.487, 252.136, 248.906, 25...
#> $ `Flotation Column 03 Air Flow` <dbl> 250.576, 250.862, 250.313, 250.049, 249.895, 249.521, 24...
#> $ `Flotation Column 04 Air Flow` <dbl> 295.096, 295.096, 295.096, 295.096, 295.096, 295.096, 29...
#> $ `Flotation Column 05 Air Flow` <dbl> 306.4, 306.4, 306.4, 306.4, 306.4, 306.4, 306.4, 306.4, ...
#> $ `Flotation Column 06 Air Flow` <dbl> 250.225, 250.137, 251.345, 250.422, 249.983, 250.356, 25...
#> $ `Flotation Column 07 Air Flow` <dbl> 250.884, 248.994, 248.071, 251.147, 248.928, 251.873, 25...
#> $ `Flotation Column 01 Level`    <dbl> 457.396, 451.891, 451.240, 452.441, 452.441, 444.384, 44...
#> $ `Flotation Column 02 Level`    <dbl> 432.962, 429.560, 468.927, 458.165, 452.900, 443.269, 44...
#> $ `Flotation Column 03 Level`    <dbl> 424.954, 432.939, 434.610, 442.865, 450.523, 460.449, 45...
#> $ `Flotation Column 04 Level`    <dbl> 443.558, 448.086, 449.688, 446.210, 453.670, 439.920, 43...
#> $ `Flotation Column 05 Level`    <dbl> 502.255, 496.363, 484.411, 471.411, 462.598, 451.588, 44...
#> $ `Flotation Column 06 Level`    <dbl> 446.370, 445.922, 447.826, 437.690, 443.682, 433.539, 44...
#> $ `Flotation Column 07 Level`    <dbl> 523.344, 498.075, 458.567, 427.669, 425.679, 425.458, 43...
#> $ `% Iron Concentrate`           <dbl> 66.91, 66.91, 66.91, 66.91, 66.91, 66.91, 66.91, 66.91, ...
#> $ `% Silica Concentrate`         <dbl> 1.31, 1.31, 1.31, 1.31, 1.31, 1.31, 1.31, 1.31, 1.31, 1....


lgleadershipacademy/dataset documentation built on May 7, 2019, 6:58 p.m.