This function is designed to read cell level information
(and the finally analyze, compose and collate_columns)
from many file types like xls, pdf, doc etc.
This is a wrapper function to functions from multiple packages. The support for a specific file is dependent on
the installed packages. To see the list of supported files and potentially required packages (if any) just
read_cells() in the console. This function supports the file format based on content and not based on just the file
extension. That means if a file is saved as pdf and then the extension is removed (or extension modified to say
then also the
read_cells will detect it as pdf and read its content.
read_cells is supposed to work for any kind of data. However, if it fails in intermediate stage it will raise
a warning and give results till successfully processed stage.
The heuristic-algorithm are not well-optimized (yet) so may be slow on large files.
If the target table has numerical values as data and text as their attribute (identifier of the data elements), straight forward method is sufficient in the majority of situations. Otherwise, you may need to utilize other functions.
A Word of Warning :
The functions used inside
read_cells are heuristic-algorithm based. Thus, outcomes may be unexpected.
It is recommend to try
read_cells on the target file. If the outcome is expected., it is fine.
If not try again with
read_cells(file_name, at_level = "compose"). If after that also the output is not as expected
then other functions are required to be used. At that time start again with
read_cells(file_name, at_level = "make_cells")
and proceed to further functions.
1 2 3 4 5 6 7 8 9 10 11
either a valid file path or a
till which level to process.
Should be one of
(Optional) the file-types to omit. A character vector.
whether to simplify the output. (Default
whether to compose main columns only. (Default
(Optional) override start level. (
It performs following set of actions if called with default
detect_and_read: Detect file type based on content and attempt to read the same in a format suitable to convert as
make_cells: Convert the file content to
va_classify: Run Value Attribute Classification using
analyze: Analyze the cells using
compose: Compose the cell-analysis to a tidy form using
collate: Finally, collate columns based on content using
Here is the flowchart of the same:
simplify=TRUE then different kind of object is returned in different levels (depends on
at_level="compose" then only final tibble is returned otherwise if the output is not
NULL an attribute will be present
simplify=FALSE then it will return a
read_cell_part which you can process manually
and continue again with
read_cells (perhaps then
from_level may be useful).
1 2 3 4 5 6 7 8 9 10
# see supported files read_cells() fold <- system.file("extdata", "messy", package = "tidycells", mustWork = TRUE) # File extension is intentionally given wrong # while filename is the actual identifier of the file type fcsv <- list.files(fold, pattern = "^csv.", full.names = TRUE) # read the data read_cells(fcsv) read_cells(fcsv, simplify = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.