tables: Read lists and tables from a Word XML document

listsR Documentation

Read lists and tables from a Word XML document

Description

These functions allow us to retrieve the content of and the XML nodes associated with the tables and lists within a Word document.

Usage

lists(x, nested = FALSE, ...)
tables(x, convert = readTable, ...)
getTableNodes(doc, ...)
getListNodes(doc, ...)
getImages(doc, ...)
readList(node, elFun = xmlValue, nested = FALSE, level = 0, ...)
readTable(node, as.data.frame = TRUE, colClasses = character(), header = FALSE, stringsAsFactors = FALSE, 
          elFun = cellValue, ..., rows = xmlChildren(node)[names(node) == "tr"])

Arguments

x

the WordArchive document

doc

the WordArchive document

node

the XML node that starts the list

as.data.frame

a logical value that controls whether the result should be structured as a data frame

colClasses

a character vector giving the classes for each of the columns in the table when it is converted to a data frame

header

a logical value that controls whether the first row of the table is treated as the header, giving the names of the column.

stringsAsFactors
elFun

the function for processing each node in the list

nested

a logical value that controls whether lists within lists are returned with that structure or if the results are returned as a single "flat" list.

level

an integer value used internally for recursive calls.

rows

a list of the XML nodes within a table to process. This allows the caller to omit certain rows of nodes which should not be processed.

convert

a function that is used to process each table node in the document and convert its contents into an R object

...

additional arguments

Value

lists returns either a vector of the list elements, or if nested is TRUE and there are nested lists, a list in which each element is either a single value or sub-list.

tables returns a list of the tables in the Word document. Each table is converted to an R object according to convert and so the type of the elements in the result are controlled by that function. By default, these are data frames.

getTableNodes and getListNodes return a list of XMLInternalElementNode objects.

Author(s)

Duncan Temple Lang duncan@wald.ucdavis.edu

Examples

   #  Here we get the lists
d = wordDoc(system.file("SampleDocs", "sampleLists.docx", package = "RWordXML"))
lists(d)
         #  Here we get the lists
d = wordDoc(system.file("SampleDocs", "sampleTables.docx", package = "RWordXML"))
tables(d)
         #  Here we get the images
d = wordDoc(system.file("SampleDocs", "Images2.docx", package = "RWordXML"))
getImages(d)
        

duncantl/RWordXML documentation built on Nov. 23, 2023, 4:23 p.m.