docx_extract_tbl: Extract a table from a Word document

Description Usage Arguments Value See Also Examples

View source: R/docx-extract-tbl.r

Description

Given a document read with read_docx and a table to extract (optionally indicating whether there was a header or not and if cell whitepace trimming is desired) extract the contents of the table to a data.frame.

Usage

1
2
3
4
5
6
7
docx_extract_tbl(
  docx,
  tbl_number = 1,
  header = TRUE,
  preserve = FALSE,
  trim = TRUE
)

Arguments

docx

docx object read with read_docx

tbl_number

which table to extract (defaults to 1)

header

assume first row of table is a header row? (default; TRUE)

preserve

preserve line breaks within a cell? Default: FALSE. NOTE: This overrides trim.

trim

trim leading/trailing whitespace (if any) in cells? (default: TRUE)

Value

data.frame

See Also

docx_extract_all, docx_extract_tbl, assign_colnames

Examples

1
2
3
4
5
6
doc3 <- read_docx(system.file("examples/data3.docx", package="docxtractr"))
docx_extract_tbl(doc3, 3)

intracell_whitespace <- read_docx(system.file("examples/preserve.docx", package="docxtractr"))
docx_extract_tbl(intracell_whitespace, 2, preserve=FALSE)
docx_extract_tbl(intracell_whitespace, 2, preserve=TRUE)

Example output

# A tibble: 6 x 2
  Foo   Bar  
  <chr> <chr>
1 Aa    Bb   
2 Dd    Ee   
3 Gg    Hh   
4 1     2    
5 Zz    Jj   
6 Tt    ii   
# A tibble: 2 x 4
  X     Kite  Lemur      Madagascar
  <chr> <chr> <chr>      <chr>     
1 Nanny Open  Port       Quarter   
2 Rain  Sand  Television Unicorn   
# A tibble: 2 x 4
  X     Kite  Lemur      Madagascar
  <chr> <chr> <chr>      <chr>     
1 Nanny Open  Port       Quarter   
2 Rain  Sand  Television Unicorn   

docxtractr documentation built on July 8, 2020, 6:23 p.m.