value_attribute_classify: Value/Attribute Classifier

Description Usage Arguments Details Value See Also Examples

View source: R/value_attribute_classify.R

Description

After as_cell_df (entry point to tidycells) you may need to use this function or individual Value/Attribute Classifier-functions as listed below in "see also" - section.

Here the idea is to classify all cells into either value, attribute, empty which will be used by analyze_cells for further processing.

Usage

1

Arguments

d

a Cell DF

classifier

a classifier

Details

In order to understand the data orientation and detect data-blocks Cell DF requires additional column named type. This type column potentially contains either value, attribute, empty. The value are given corresponding to cells with observations in it. The tag, attribute is for the identifier of these cells. Lastly, empty cells are useless cells or cells with no meaningful information.

For classifier following options are present:

Each of the above are available as individual functions. Those can also be directly applied on a cell-df. However, it is recommended to use value_attribute_classify as it tests for integrity after classification.

Value

a Cell DF with Value/Attribute Classification. The underlying tibble will contain an extra column named type.

See Also

Individual classifier functions:

For interactive Value/Attribute Classification check visual_va_classify

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
iris %>%
  as_cell_df() %>%
  sample_based_classifier(value_sample = "setosa") %>%
  plot()

iris %>%
  as_cell_df() %>%
  sample_based_classifier(value_sample = "setosa") %>%
  numeric_values_classifier() %>%
  plot()

if (rlang::is_installed("tidyxl")) {
  cdn <- system.file("extdata", "RBI_HBS_Table_No_166.xlsx", package = "tidycells") %>%
    tidyxl::xlsx_cells()
  cdn <- cdn %>%
    dplyr::filter(sheet == sheet[1]) %>%
    as_cell_df()

  # all of these are same except value_attribute_classify will perform validate_cells once again
  cd1 <- sample_based_classifier(cdn, value_sample = "APR")
  cd2 <- sample_based_classifier(value_sample = "APR")(cdn)
  cd3 <- value_attribute_classify(cdn,
    classifier = sample_based_classifier(value_sample = "APR")
  )
  # see it
  plot(cd3)
}

tidycells documentation built on March 26, 2020, 7:35 p.m.