read_cells: Read Cells from file

Description Usage Arguments Details Value Examples

View source: R/read_cells.R

Description

This function is designed to read cell level information (and the finally analyze, compose and collate_columns) from many file types like xls, pdf, doc etc. This is a wrapper function to functions from multiple packages. The support for a specific file is dependent on the installed packages. To see the list of supported files and potentially required packages (if any) just run read_cells() in the console. This function supports the file format based on content and not based on just the file extension. That means if a file is saved as pdf and then the extension is removed (or extension modified to say .xlsx) then also the read_cells will detect it as pdf and read its content.

Note :

A Word of Warning :

The functions used inside read_cells are heuristic-algorithm based. Thus, outcomes may be unexpected. It is recommend to try read_cells on the target file. If the outcome is expected., it is fine. If not try again with read_cells(file_name, at_level = "compose"). If after that also the output is not as expected then other functions are required to be used. At that time start again with read_cells(file_name, at_level = "make_cells") and proceed to further functions.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
read_cells(
  x,
  at_level = c("collate", "detect_and_read", "make_cells", "va_classify", "analyze",
    "compose"),
  omit = NULL,
  simplify = TRUE,
  compose_main_cols_only = TRUE,
  from_level,
  silent = TRUE,
  ...
)

Arguments

x

either a valid file path or a read_cell_part

at_level

till which level to process. Should be one of detect_and_read, make_cells, va_classify, analyze, compose, collate. Or simply a number (like 1 means detect_and_read, 5 means compose).

omit

(Optional) the file-types to omit. A character vector.

simplify

whether to simplify the output. (Default TRUE). If set to FALSE a read_cell_part will be returned.

compose_main_cols_only

whether to compose main columns only. (Default TRUE).

from_level

(Optional) override start level. (read_cells will process after from_level)

silent

if TRUE no message will be displayed.(Default TRUE)

...

further arguments

Details

It performs following set of actions if called with default at_level.

Here is the flowchart of the same:

Value

If simplify=TRUE then different kind of object is returned in different levels (depends on at_level). If at_level="compose" then only final tibble is returned otherwise if the output is not NULL an attribute will be present named "read_cells_stage".

If simplify=FALSE then it will return a read_cell_part which you can process manually and continue again with read_cells (perhaps then from_level may be useful).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# see supported files
read_cells()

fold <- system.file("extdata", "messy", package = "tidycells", mustWork = TRUE)
# File extension is intentionally given wrong
# while filename is the actual identifier of the file type
fcsv <- list.files(fold, pattern = "^csv.", full.names = TRUE)[1]
# read the data
read_cells(fcsv)
read_cells(fcsv, simplify = FALSE)

tidycells documentation built on March 26, 2020, 7:35 p.m.