Description Usage Arguments Details Value
View source: R/split_metadata.R
Split metadata from a worksheet or region of a worksheet (a worksheet view).
1 2 3 4 5 6 | split_metadata(x, include_merged = TRUE, min_jump = 2, min_data_block = 5)
split_metadata_find(x, include_merged = TRUE, min_jump = 2,
min_data_block = 5)
split_metadata_apply(x, n)
|
x |
A worksheet or a worksheet view |
include_merged |
Include merged cells when computing row width. This is generally going to be what you want to do, but if you have a sheet with metadata rows that are entirely merged across, you probably will do better by turning this option off (in that case they'd be counted as a single column wide). |
min_jump |
The minimum number of columns increase that we'll count as moving from the metadata block to the data block. A single column (1) is probably going to be prone to false positives, and in sheets with this pattern the jump is often quite large. |
min_data_block |
The minimum number of rows without decreasing in size before we can conclude that we're in the data block. If we run off the end of the worksheet before reaching this number, we'll conclude no metadata was found. |
n |
A scalar integer representing the number of rows to
consider to be metadata (i.e., equivalent to the return value of
|
There is a common pattern in spreadsheets where data is stored as:
1 2 3 4 5 | mmm
mmm
HHHHHHHH
dddddddd
dddddddd
|
where m
is some metadata (perhaps indicating table name,
creator, dates, etc), H
is the header and d
is the
actual data. This function will split the metadata (m
)
part off, leaving a table that is more suitable for further
processing. In many ways this is like the skip
argument to
readxl::read_excel
and read.csv
, but we will retain
the metadata (somewhere!).
The idea here is that the metadata block starts when we get a
shift in the number of non-blank cells. There needs to be some
heuristics here to help, and things will need to be tuneable: do
merged cells count as non-empty (include_merged
; default is
to include them), how big a jump we look for (min_jump
;
default is 2 columns), how many rows of the same size do we look
for in the data block (min_data_block
; default is 5 rows).
Other things that might be useful, but which aren't supported yet, include looking for different colours and fonts in the metadata and the main block; when we switch from one to the other we're likely to see things like a change here.
For split_metadata
and split_metadata_apply
,
a worksheet view; in this view the data$metadata
element
will be a worksheet view of the metadata. For
split_metadata_find
, a single integer representing the
number of rows of metadata found (with zero indicating no
metadata).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.