Description Usage Arguments Value Examples
hypoparsr takes a different approach to CSV parsing by creating different parsing hypotheses for a given file and ranking them based on data quality features. parse_file
creates and returns the ranked parsing results.
1 2 3 4 5 | parse_file(file, pruning_level = 0.1, quality_weights =
c(warnings = -1, edits = -1, moves = -1, confidence = 1,
total_cells = 1, typed_cells = 1, empty_header = -1,
empty_cells = -1, non_latin_chars = -1, row_col_ratio =
1))
|
file |
Path to a CSV file. |
pruning_level |
Numeric value between 0-1 which defined the lower threshold for confidence values of parsing hypotheses. The higher the value, the less hypotheses are created and the correct hypothesis might be omitted. |
quality_weights |
A named list of numeric quality feature weights which influence the hypothesis ranking. Positive weights improve the ranking of results with the respective characteristic and negative weights penalize the same. |
A hypoparsr_result
, which contains all created hypotheses and their ranking. Call as.data.frame()
on this object to retrieve the highest ranked parsing result.
1 2 3 4 5 6 7 8 9 10 | # generate a CSV
csv <- tempfile()
write.csv(iris, csv, row.names=FALSE)
# call hypoparsr
res <- hypoparsr::parse_file(csv)
# get result data frames
best_guess <- as.data.frame(res)
second_best_guess <- as.data.frame(res, rank=2)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.