knitr::opts_chunk$set(echo=TRUE, collapse=T, comment='#>')
library(RefManageR)
bib <- ReadBib('xlsx.bib')
BibOptions(check.entries = FALSE, style = "markdown", cite.style = "numeric",bib.style = "numeric")

Introduction

The package xlsx makes possible to interact with Excel 2007 files from R r Cite(bib,'R-base'). While a power R user usually does not need to use Excel or even avoids it altogether, there are cases when being able to generate Excel output or read Excel files into R is a must. One such case is in an office environment when you need to collaborate with co-workers who use Excel as their primary tool. Another case is to use Excel for basic reporting of results. For moderate data sets, the formatting capabilities of Excel can prove useful. A flexible way to manipulate Excel 2007 xlsx files from R would then be a nice addition.

The xlsx package focuses on Excel 2007 because for Excel 97 there are already several solutions, RODBC r Cite(bib,'RODBC'), readxl``r Cite(bib,'readxl'), etc. While some of these packages work with Excel 2007 files too, the contribution of the xlsx package is different.

The xlsx Excel 2007 file format is essentially a zipped set of xml files. It is possible to interact directly with these files from R as shown by the package RExcelXML``r Cite(bib,'RExcelXML'). All the functionality of the xlsx package can be replicated with RExcelXML package or by extending it. Working directly with the zipped xml files, and using the xml schema to extract the useful information into R gives you ultimate control.

The approach taken in the xlsx package is to use a proven, existing API between Java and Excel 2007 and use the rJava r Cite(bib,'rJava') package to link Java and R. The advantage of this approach is that the code on the R side is very compact, easy to maintain, and easy to extend even for people with little Java experience. All the heavy lifting of parsing XML schemas is being done in Java. We also benefit from a mature software project with many developers, test suites, and users that report issues on the Java side. In principle, this should make the maintainance of the xlsx package easy. The Java API used by the xlsx is one project of the Apache Software Foundation, called Apache POI and can be found at https://poi.apache.org/.

The Apache POI Project's mission is to create and maintain Java APIs for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2). These include Excel, Word, and PowerPoint documents. While the focus of the xlsx package has been only on Excel files, extensions for Word and PowerPoint documents are available in the ReporteRs package.

That said, sometimes using Java from R can be a bit tricky to configure and work with. While using Apache POI allows us to benefit from a robust Java community, others might prefer to use a C++ API and Rcpp to interact with the raw XML in a way more natural to R. For more reading on this approach, see the openxlsx package or officer (for Word / PowerPoint).

High level API

See read.xlsx for reading the sheet of an xlsx file into a data.frame. See write.xlsx writing a data.frame to an xlsx file.

Low level API

See Workbook for creating workbooks. See Worksheet for code to manipulate worksheets. See Cell for manipulating cells.

See OtherEffects for various spreadsheet effects, for example, merge cells, auto size columns, create freeze panels, create split panels, set print area, set the zoom, etc.

Use PrintSetup for customizing the settings for printing.

Cell Formatting

See CellStyle for how to format a particular cell.

Use Font to set a font.

Conclusion

By adding a lightweight R layer on top of the Apache project Java interface to Excel 2007 documents, we achieve a multi-platform solution for interacting with Excel 2007 file formats from R.

References

PrintBibliography(bib, .opts = list(check.entries = FALSE, sorting = "ynt"))


colearendt/xlsx documentation built on Feb. 4, 2022, 7:34 p.m.