Match isolated quotes across records
Look for unmatched quotes in a character vector. If found, look for a matching quote starting the next characer string in the vector, possibly after a blank line. If found, merge the two strings and return the resulting shortened character vector.
a character vector to scan for unmatched
maximum number of characters in the following string to
concatonate two adjacent strings (possibly separated by
a blank line) with unmatched
optional arguments for
This function was written to help parse data from the US Department
of Health and Human Services on
cyber-security breaches affecting 500 or more individuals.
As of 2014-06-03 the csv version of these data included commas
in quotes that are not
sep characters, quotes that are
not matched, lines with zero characters, followed by lines
with 3 characters being a quote and a comma. This function
was written to drop the blank lines and append the quote-comma
line to the preceeding line so it contained matching quotes.
The input character vector possibly shortened with the following attributes explaining what was found:
unmatchedQuotes indices of the input
xwith an unmatched
blankLinesDropped indices of the input
xthat were dropped because they (1) followed an unmatched
Quoteand (2) contained no non-blank characters.
quoteLinesAppendedindices of the input
xthat were concatonated with a preceeding line because the two lines contained unmatched
Quotecharacters, and concatonating them produced a line with all
ncharsAppendedan integer vector of the same length as
quoteLinesConcatonatedgiving the number of characters in the second line concatonated onto the previous line.
1 2 3 4 5 6 7 8 9 10 11
chvec <- c('abc', 'de"f', ' ', '",', 'g"h', 'matched"quotes"', '') ch. <- matchQuote(chvec) # check chv. <- c('abc', 'de"f ",', 'g"h', 'matched"quotes"', '') attr(chv., 'unmatchedQuotes') <- c(2, 4, 5) attr(chv., 'blankLinesDropped') <- 3 attr(chv., 'quoteLinesAppended') <- 4 attr(chv., 'ncharsAppended') <- 2 all.equal(ch., chv.)
Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.