wb_to_df: Create a data frame from a Workbook
In openxlsx2: Read, Write and Edit 'xlsx' Files

wb_to_df

R Documentation

Create a data frame from a Workbook

Description

Simple function to create a data.frame from a sheet in workbook. Simple as in it was simply written down. read_xlsx() and wb_read() are just internal wrappers of wb_to_df() intended for people coming from other packages.

Usage

wb_to_df(
  file,
  sheet,
  start_row = 1,
  start_col = NULL,
  row_names = FALSE,
  col_names = TRUE,
  skip_empty_rows = FALSE,
  skip_empty_cols = FALSE,
  skip_hidden_rows = FALSE,
  skip_hidden_cols = FALSE,
  rows = NULL,
  cols = NULL,
  detect_dates = TRUE,
  na.strings = "#N/A",
  na.numbers = NA,
  fill_merged_cells = FALSE,
  dims,
  show_formula = FALSE,
  convert = TRUE,
  types,
  named_region,
  keep_attributes = FALSE,
  check_names = FALSE,
  show_hyperlinks = FALSE,
  ...
)

read_xlsx(
  file,
  sheet,
  start_row = 1,
  start_col = NULL,
  row_names = FALSE,
  col_names = TRUE,
  skip_empty_rows = FALSE,
  skip_empty_cols = FALSE,
  rows = NULL,
  cols = NULL,
  detect_dates = TRUE,
  named_region,
  na.strings = "#N/A",
  na.numbers = NA,
  fill_merged_cells = FALSE,
  check_names = FALSE,
  show_hyperlinks = FALSE,
  ...
)

wb_read(
  file,
  sheet = 1,
  start_row = 1,
  start_col = NULL,
  row_names = FALSE,
  col_names = TRUE,
  skip_empty_rows = FALSE,
  skip_empty_cols = FALSE,
  rows = NULL,
  cols = NULL,
  detect_dates = TRUE,
  named_region,
  na.strings = "NA",
  na.numbers = NA,
  check_names = FALSE,
  show_hyperlinks = FALSE,
  ...
)

Arguments

`file`	An xlsx file, wbWorkbook object or URL to xlsx file.
`sheet`	Either sheet name or index. When missing the first sheet in the workbook is selected.
`start_row`	first row to begin looking for data.
`start_col`	first column to begin looking for data.
`row_names`	If `TRUE`, the first col of data will be used as row names.
`col_names`	If `TRUE`, the first row of data will be used as column names.
`skip_empty_rows`	If `TRUE`, empty rows are skipped.
`skip_empty_cols`	If `TRUE`, empty columns are skipped.
`skip_hidden_rows`	If `TRUE`, hidden rows are skipped.
`skip_hidden_cols`	If `TRUE`, hidden columns are skipped.
`rows`	A numeric vector specifying which rows in the xlsx file to read. If `NULL`, all rows are read.
`cols`	A numeric vector specifying which columns in the xlsx file to read. If `NULL`, all columns are read.
`detect_dates`	If `TRUE`, attempt to recognize dates and perform conversion.
`na.strings`	A character vector of strings which are to be interpreted as `NA`. Blank cells will be returned as `NA`.
`na.numbers`	A numeric vector of digits which are to be interpreted as `NA`. Blank cells will be returned as `NA`.
`fill_merged_cells`	If `TRUE`, the value in a merged cell is given to all cells within the merge.
`dims`	Character string of type "A1:B2" as optional dimensions to be imported.
`show_formula`	If `TRUE`, the underlying Excel formulas are shown.
`convert`	If `TRUE`, a conversion to dates and numerics is attempted.
`types`	A named numeric indicating, the type of the data. Names must match the returned data. See Details for more.
`named_region`	Character string with a `named_region` (defined name or table). If no sheet is selected, the first appearance will be selected. See `wb_get_named_regions()`
`keep_attributes`	If `TRUE` additional attributes are returned. (These are used internally to define a cell type.)
`check_names`	If `TRUE` then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names.
`show_hyperlinks`	If `TRUE` instead of the displayed text, hyperlink targets are shown.
`...`	additional arguments

Details

The returned data frame will have named rows matching the rows of the worksheet. With col_names = FALSE the returned data frame will have column names matching the columns of the worksheet. Otherwise the first row is selected as column name.

Depending if the R package hms is loaded, wb_to_df() returns hms variables or string variables in the hh:mm:ss format.

The types argument can be a named numeric or a character string of the matching R variable type. Either c(foo = 1) or c(foo = "numeric").

0: character
1: numeric
2: Date
3: POSIXct (datetime)
4: logical

If no type is specified, the column types are derived based on all cells in a column within the selected data range, excluding potential column names. If keep_attr is TRUE, the derived column types can be inspected as an attribute of the data frame.

wb_to_df() will not pick up formulas added to a workbook object via wb_add_formula(). This is because only the formula is written and left to be evaluated when the file is opened in a spreadsheet software. Opening, saving and closing the file in a spreadsheet software will resolve this.

Before release 1.15, datetime variables (in 'yyyy-mm-dd hh:mm:ss' format) were imported using the user's local system timezone (Sys.timezone()). This behavior has been updated. Now, all datetime variables are imported with the timezone set to "UTC". If automatic date detection and conversion are enabled but the conversion is unsuccessful (for instance, in a column containing a mix of data types like strings, numbers, and dates) dates might be displayed as a Unix timestamp. Usually they are converted to character for character columns. If date detection is disabled, dates will show up as a spreadsheet date format. To convert these, you can use the functions convert_date(), convert_datetime(), or convert_hms(). If types are specified, date detection is disabled.

Examples

###########################################################################
# numerics, dates, missings, bool and string
example_file <- system.file("extdata", "openxlsx2_example.xlsx", package = "openxlsx2")
wb1 <- wb_load(example_file)

# import workbook
wb_to_df(wb1)

# do not convert first row to column names
wb_to_df(wb1, col_names = FALSE)

# do not try to identify dates in the data
wb_to_df(wb1, detect_dates = FALSE)

# return the underlying Excel formula instead of their values
wb_to_df(wb1, show_formula = TRUE)

# read dimension without colNames
wb_to_df(wb1, dims = "A2:C5", col_names = FALSE)

# read selected cols
wb_to_df(wb1, cols = c("A:B", "G"))

# read selected rows
wb_to_df(wb1, rows = c(2, 4, 6))

# convert characters to numerics and date (logical too?)
wb_to_df(wb1, convert = FALSE)

# erase empty rows from dataset
wb_to_df(wb1, skip_empty_rows = TRUE)

# erase empty columns from dataset
wb_to_df(wb1, skip_empty_cols = TRUE)

# convert first row to rownames
wb_to_df(wb1, sheet = 2, dims = "C6:G9", row_names = TRUE)

# define type of the data.frame
wb_to_df(wb1, cols = c(2, 5), types = c("Var1" = 0, "Var3" = 1))

# start in row 5
wb_to_df(wb1, start_row = 5, col_names = FALSE)

# na string
wb_to_df(wb1, na.strings = "a")

###########################################################################
# Named regions
file_named_region <- system.file("extdata", "namedRegions3.xlsx", package = "openxlsx2")
wb2 <- wb_load(file_named_region)

# read dataset with named_region (returns global first)
wb_to_df(wb2, named_region = "MyRange", col_names = FALSE)

# read named_region from sheet
wb_to_df(wb2, named_region = "MyRange", sheet = 4, col_names = FALSE)

# read_xlsx() and wb_read()
example_file <- system.file("extdata", "openxlsx2_example.xlsx", package = "openxlsx2")
read_xlsx(file = example_file)
df1 <- wb_read(file = example_file, sheet = 1)
df2 <- wb_read(file = example_file, sheet = 1, rows = c(1, 3, 5), cols = 1:3)

openxlsx2 documentation built on June 8, 2025, 11:24 a.m.