read_cells: Read Cells from file
In r-rudra/tidycells: Read Tabular Data from Diverse Sources and Easily Make Them Tidy

read_cells

R Documentation

Read Cells from file

Description

This function is designed to read cell level information (and the finally analyze, compose and collate_columns) from many file types like xls, pdf, doc etc. This is a wrapper function to functions from multiple packages. The support for a specific file is dependent on the installed packages. To see the list of supported files and potentially required packages (if any) just run read_cells() in the console. This function supports the file format based on content and not based on just the file extension. That means if a file is saved as pdf and then the extension is removed (or extension modified to say .xlsx) then also the read_cells will detect it as pdf and read its content.

Note :

read_cells is supposed to work for any kind of data. However, if it fails in intermediate stage it will raise a warning and give results till successfully processed stage.
The heuristic-algorithm are not well-optimized (yet) so may be slow on large files.
If the target table has numerical values as data and text as their attribute (identifier of the data elements), straight forward method is sufficient in the majority of situations. Otherwise, you may need to utilize other functions.

A Word of Warning :

The functions used inside read_cells are heuristic-algorithm based. Thus, outcomes may be unexpected. It is recommend to try read_cells on the target file. If the outcome is expected., it is fine. If not try again with read_cells(file_name, at_level = "compose"). If after that also the output is not as expected then other functions are required to be used. At that time start again with read_cells(file_name, at_level = "make_cells") and proceed to further functions.

Usage

read_cells(
  x,
  at_level = c("collate", "detect_and_read", "make_cells", "va_classify", "analyze",
    "compose"),
  omit = NULL,
  simplify = TRUE,
  compose_main_cols_only = TRUE,
  from_level,
  silent = TRUE,
  ...
)

Arguments

`x`	either a valid file path or a `read_cell_part`
`at_level`	till which level to process. Should be one of `detect_and_read`, `make_cells`, `va_classify`, `analyze`, `compose`, `collate`. Or simply a number (like 1 means `detect_and_read`, 5 means `compose`).
`omit`	(Optional) the file-types to omit. A character vector.
`simplify`	whether to simplify the output. (Default `TRUE`). If set to `FALSE` a `read_cell_part` will be returned.
`compose_main_cols_only`	whether to compose main columns only. (Default `TRUE`).
`from_level`	(Optional) override start level. (`read_cells` will process after `from_level`)
`silent`	if `TRUE` no message will be displayed.(Default `TRUE`)
`...`	further arguments

Details

It performs following set of actions if called with default at_level.

detect_and_read: Detect file type based on content and attempt to read the same in a format suitable to convert as cell_df.
make_cells: Convert the file content to cell_df using as_cell_df.
va_classify: Run Value Attribute Classification using numeric_values_classifier.
analyze: Analyze the cells using analyze_cells.
compose: Compose the cell-analysis to a tidy form using compose_cells.
collate: Finally, collate columns based on content using collate_columns.

Here is the flowchart of the same:

Value

If simplify=TRUE then different kind of object is returned in different levels (depends on at_level). If at_level="compose" then only final tibble is returned otherwise if the output is not NULL an attribute will be present named "read_cells_stage".

If simplify=FALSE then it will return a read_cell_part which you can process manually and continue again with read_cells (perhaps then from_level may be useful).

r-rudra/tidycells documentation built on Feb. 22, 2025, 11:25 a.m.

r-rudra/tidycells index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

r-rudra/tidycells
Read Tabular Data from Diverse Sources and Easily Make Them Tidy

read_cells: Read Cells from file
In r-rudra/tidycells: Read Tabular Data from Diverse Sources and Easily Make Them Tidy

Read Cells from file

Description

Usage

Arguments

Details

Value

Related to read_cells in r-rudra/tidycells...

R Package Documentation

Browse R Packages

We want your feedback!

r-rudra/tidycells Read Tabular Data from Diverse Sources and Easily Make Them Tidy

read_cells: Read Cells from file In r-rudra/tidycells: Read Tabular Data from Diverse Sources and Easily Make Them Tidy

Read Cells from file

Description

Usage

Arguments

Details

Value

Related to read_cells in r-rudra/tidycells...

R Package Documentation

Browse R Packages

We want your feedback!

r-rudra/tidycells
Read Tabular Data from Diverse Sources and Easily Make Them Tidy

read_cells: Read Cells from file
In r-rudra/tidycells: Read Tabular Data from Diverse Sources and Easily Make Them Tidy