Multi-hypothesis parsing of CSV files.
hypoparsr takes a different approach to CSV parsing by creating different parsing hypotheses for a given file and ranking them based on data quality features.
parse_file creates and returns the ranked parsing results.
1 2 3 4 5
Path to a CSV file.
Numeric value between 0-1 which defined the lower threshold for confidence values of parsing hypotheses. The higher the value, the less hypotheses are created and the correct hypothesis might be omitted.
A named list of numeric quality feature weights which influence the hypothesis ranking. Positive weights improve the ranking of results with the respective characteristic and negative weights penalize the same.
hypoparsr_result, which contains all created hypotheses and their ranking. Call
as.data.frame() on this object to retrieve the highest ranked parsing result.
1 2 3 4 5 6 7 8 9 10
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.