| docx_summary | R Documentation |
read content of a Word document and return a data.frame representing the document.
docx_summary(x, preserve = FALSE, remove_fields = FALSE, detailed = FALSE)
x |
an rdocx object |
preserve |
If |
remove_fields |
if TRUE, prevent field codes from appearing in the returned data.frame. |
detailed |
Should run-level information be included in the dataframe?
Defaults to |
A data.frame with the following columns depending on the value of detailed:
When detailed = FALSE (default), the data.frame contains:
doc_index: Document element index (integer).
content_type: Type of content: "paragraph" or "table cell" (character).
style_name: Name of the paragraph style (character).
text: Collapsed text content of the paragraph or cell (character).
table_index: Index of the table (integer). NA for non-table content.
row_id: Row position in table (integer). NA for non-table content.
cell_id: Cell position in table row (integer). NA for non-table content.
is_header: Whether the row is a table header (logical). NA for non-table content.
row_span: Number of rows spanned by the cell (integer). 0 for merged cells. NA for non-table content.
col_span: Number of columns spanned by the cell (character). NA for non-table content.
table_stylename: Name of the table style (character). NA for non-table content.
When detailed = TRUE, the data.frame contains additional run-level information:
run_index: Index of the run within the paragraph (integer).
run_content_index: Index of content element within the run (integer).
run_content_text: Text content of the run element (character).
image_path: Path to embedded image stored in the temporary directory
associated with the rdocx object (character).
Images should be copied to a permanent location before closing the R
session if needed.
field_code: Field code content (character).
footnote_text: Footnote text content (character).
link: Hyperlink URL (character).
link_to_bookmark: Internal bookmark anchor name for hyperlinks (character).
bookmark_start: Names of the bookmarks starting on this paragraph
(values are concatenated with '|').
character_stylename: Name of the character/run style (character).
sz: Font size in half-points (integer).
sz_cs: Complex script font size in half-points (integer).
font_family_ascii: Font family for ASCII characters (character).
font_family_eastasia: Font family for East Asian characters (character).
font_family_hansi: Font family for high ANSI characters (character).
font_family_cs: Font family for complex script characters (character).
bold: Whether the run is bold (logical).
italic: Whether the run is italic (logical).
underline: Whether the run is underlined (logical).
color: Text color in hexadecimal format (character).
shading: Shading pattern (character).
shading_color: Shading foreground color (character).
shading_fill: Shading background fill color (character).
keep_with_next: Whether paragraph should stay with next (logical).
align: Paragraph alignment (character).
level: Numbering level (integer). NA if not a numbered list.
num_id: Numbering definition ID (integer). NA if not a numbered list.
Documents included with body_add_docx() will
not be accessible in the results.
example_docx <- system.file(
package = "officer",
"doc_examples/example.docx"
)
doc <- read_docx(example_docx)
docx_summary(doc)
docx_summary(doc, detailed = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.