knitr::opts_chunk$set(echo=TRUE, collapse=T, comment='#>') library(RefManageR) bib <- ReadBib('xlsx.bib') BibOptions(check.entries = FALSE, style = "markdown", cite.style = "numeric",bib.style = "numeric")
The package xlsx
makes possible to interact with Excel 2007 files from R r
Cite(bib,'R-base')
. While a power R user usually does not need to use Excel or
even avoids it altogether, there are cases when being able to generate Excel
output or read Excel files into R is a must. One such case is in an office
environment when you need to collaborate with co-workers who use Excel as their
primary tool. Another case is to use Excel for basic reporting of results. For
moderate data sets, the formatting capabilities of Excel can prove useful. A
flexible way to manipulate Excel 2007 xlsx files from R would then be a nice
addition.
The xlsx
package focuses on Excel 2007 because for Excel 97 there are already
several solutions, RODBC
r Cite(bib,'RODBC')
, readxl``r
Cite(bib,'readxl')
, etc. While some of these packages work with Excel 2007
files too, the contribution of the xlsx
package is different.
The xlsx Excel 2007 file format is essentially a zipped set of xml
files. It is possible to interact directly with these files from R
as shown by the package RExcelXML``r Cite(bib,'RExcelXML')
. All the
functionality of the xlsx
package can be replicated with
RExcelXML
package or by extending it. Working directly with
the zipped xml files, and using the xml schema to extract the useful
information into R gives you ultimate control.
The approach taken in the xlsx
package is to use a proven,
existing API between Java and Excel 2007 and use the rJava
r Cite(bib,'rJava')
package to link Java and R. The advantage of this
approach is that the code on the R side is very compact, easy to
maintain, and easy to extend even for people with little Java
experience. All the heavy lifting of parsing XML schemas is being
done in Java. We also benefit from a mature software project with
many developers, test suites, and users that report issues on the Java
side. In principle, this should make the maintainance of the
xlsx
package easy. The Java API used by the xlsx
is one
project of the Apache Software Foundation, called Apache POI and can
be found at http://poi.apache.org/.
The Apache POI Project's mission is to create and maintain Java APIs
for manipulating various file formats based upon the Office Open XML
standards (OOXML) and Microsoft's OLE 2 Compound Document format
(OLE2). These include Excel, Word, and PowerPoint documents. While
the focus of the xlsx
package has been only on Excel files, extensions for
Word and PowerPoint documents are available in the
ReporteRs
package.
That said, sometimes using Java from R can be a bit tricky to configure and work
with. While using Apache POI allows us to benefit from a robust Java community,
others might prefer to use a C++ API and Rcpp
to interact with the raw XML in
a way more natural to R. For more reading on this approach, see the
openxlsx
package or officer
(for Word / PowerPoint).
See read.xlsx
for reading the sheet of an xlsx file into a
data.frame. See write.xlsx
writing a data.frame to an xlsx
file.
See Workbook
for creating workbooks. See Worksheet
for
code to manipulate worksheets. See Cell
for manipulating
cells.
See OtherEffects
for various spreadsheet effects, for example,
merge cells, auto size columns, create freeze panels, create split
panels, set print area, set the zoom, etc.
Use PrintSetup
for customizing the settings for printing.
See CellStyle
for how to format a particular cell.
Use Font
to set a font.
By adding a lightweight R layer on top of the Apache project Java interface to Excel 2007 documents, we achieve a multi-platform solution for interacting with Excel 2007 file formats from R.
PrintBibliography(bib, .opts = list(check.entries = FALSE, sorting = "ynt"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.