Swim_Parse: Formats swimming and diving data read with 'read_results'...

View source: R/Swim_Parse.R

Swim_ParseR Documentation

Formats swimming and diving data read with read_results into a data frame

Description

Takes the output of read_results and cleans it, yielding a data frame of swimming (and diving) results

Usage

Swim_Parse(
  file,
  avoid = NULL,
  typo = typo_default,
  replacement = replacement_default,
  format_results = TRUE,
  splits = FALSE,
  split_length = 50,
  relay_swimmers = FALSE
)

swim_parse(
  file,
  avoid = NULL,
  typo = typo_default,
  replacement = replacement_default,
  format_results = TRUE,
  splits = FALSE,
  split_length = 50,
  relay_swimmers = FALSE
)

Arguments

file

output from read_results

avoid

a list of strings. Rows in file containing these strings will not be included. For example "Pool:", often used to label pool records, could be passed to avoid. The default is avoid_default, which contains many strings similar to "Pool:", such as "STATE:" and "Qual:". Users can supply their own lists to avoid. avoid is handled before typo and replacement.

typo

a list of strings that are typos in the original results. swim_parse is particularly sensitive to accidental double spaces, so "Central High School", with two spaces between "Central" and "High" is a problem, which can be fixed. Pass "Central High School" to typo. Unexpected commas as also an issue, for example "Texas, University of" should be fixed using typo and replacement

replacement

a list of fixes for the strings in typo. Here one could pass "Central High School" (one space between "Central" and "High") and "Texas" to replacement fix the issues described in typo

format_results

should the results be formatted for analysis (special strings like "DQ" replaced with NA, Finals as definitive column)? Default is TRUE

splits

either TRUE or the default, FALSE - should swim_parse attempt to include splits.

split_length

either 25 or the default, 50, the length of pool at which splits are recorded. Not all results are internally consistent on this issue - some have races with splits by 50 and other races with splits by 25.

relay_swimmers

either TRUE or the default, FALSE - should relay swimmers be reported. Relay swimmers are reported in separate columns named Relay_Swimmer_1 etc.

Value

returns a data frame with columns Name, Place, Age, Team, Prelims, Finals, Points, Event & DQ. Note all swims will have a Finals, even if that time was actually swam in the prelims (i.e. a swimmer did not qualify for finals). This is so that final results for an event can be generated from just one column.

See Also

swim_parse must be run on the output of read_results

Examples

## Not run: 
swim_parse(read_results("http://www.nyhsswim.com/Results/Boys/2008/NYS/Single.htm", node = "pre"),
 typo = c("-1NORTH ROCKL"), replacement = c("1-NORTH ROCKL"),
 splits = TRUE,
 relay_swimmers = TRUE)
 
## End(Not run)
## Not run: 
swim_parse(read_results("inst/extdata/Texas-Florida-Indiana.pdf"),
 typo =  c("Indiana  University", ", University of"), replacement = c("Indiana University", ""),
 splits = TRUE,
 relay_swimmers = TRUE)
 
## End(Not run)

gpilgrim2670/SwimmeR documentation built on March 26, 2023, 5:05 p.m.