In ropensci/stats19: Work with Open Road Traffic Casualty Data from Great Britain

Responses to review 2 of stats19

Thanks for the detailed review. We think all the suggestions make sense, and link to previous discussions about combining the 3 stage (dl, read, format) process into a single function call, which we were thinking of calling get_stats19(). There reason for splitting the process up is to ensure maximum transparency and to give the user control over what the package is doing. However, as long as it is properly documents, we think the benefits of a get_stats19() function will outweigh any possible negatives we can think of so we plan to go ahead and create this function.

We would very much welcome a pull request that address some of the other issues you mention.

Responses to other issues/questions/comments are provided below. We suspect that some of the comments refer to an older version of the package, which is completely understandable (the package has evolved since the initial submission!) and explains why some of the responses are short / questions.

[x]Clearly much effort has gone into this package. I greatly support the sentiment behind making these data available in R having done the same for other data sets, myself. This package should be a great benefit to researchers in this area. I really appreciate the slim dependencies. It will make this package much easier to maintain into the future.

Thanks for the comments, we have indeed tried to keep dependencies to a minimum but consider readr and tibble worthwhile. readxl and curl have been demoted to Suggests, as detailed in another comment.

[x]I found the package to be well documented, the vignette is helpful in illustrating some use cases for the data along with how to access it and the code is clear.

Thanks. If you think of other was we can communicate the value of the data, do let us know (I think the second mapping figure could be improved...).

[ ] Some of the functionality I find to be mildly confusing like downloading and then importing the files into the R session and then formatting. As a user I'd prefer it all in one step, but there are likely use cases I'm not aware of that mean that this is useful so some examples of this would be useful I think.

We have long been planning to add a get_stats19() function as per https://github.com/ITSLeeds/stats19/issues/11 The review comment, combined with further discussion, has triggered us to re-prioritise it. It's been beneficial to polish each of the component functions first, however, and good to document each stage for maximum transparency, however, so we plan to keep the dl, read and format functions exported.

[ ] My general comments on the code follow and sections for the DESCRTIPTION and Vignette as well. I've commented quite a bit on grammar and spelling as I think that the polish of a package is important as it lends to the perception of the quality.

Agreed.

[ ] Per rOpenSci policy, avoid start-up messages, rely on the documentation for citation information: https://ropensci.github.io/dev_guide/building.html#recommended-scaffolding.

The guidance is to 'Only use package startup messages when necessary'. A case can be made that this is necessary. As with osmdata, the package provides access to data that has a license that requires it to be cited. The osmdata load message is as follows:

library(osmdata)

We fully agree with the reasoning behind remove package startup messages however. As a compromise, we've shortened the startup from 4 lines to 2:

# before:
# Data provided under the conditions of the Open Government License.
# If you use data from this package, mention the source
# (UK Department for Transport), cite the package, and link to:
# www.nationalarchives.gov.uk/doc/open-government-licence/version/3/.

# after:
library(stats19)

[ ] Avoid long lines >80 chars

running goodpractice::gp() found the following lines with > 80 lines:

    R/format.R:62:1
    R/format.R:67:1
    R/read.R:141:1
    R/utils.R:167:1

All these have been fixed.

[ ] Inconsistent use of white spaces in code, see find_file_name() if statements for examples.
[ ] The package does not pass R CMD check. curl, readxl and tibble are all listed as an Imports in DESCRIPTION but not imported from. With curl being used in tests, this means it should be in Suggests, I think. The others should be removed.

curl is used in the tests and readxl is used in the examples. These have been demoted to Suggests. tibble has been removed from the DESCRIPTION file.

[ ] I don't think it's good form to have an example that won't work on Windows in the help file for stats19_schema, from data.R - line 17? Most of what I see there would be better served in a data_raw folder showing how the data were created with the documentation actually documenting what the variables are not how they were created, see http://r-pkgs.had.co.nz/data.html and for an example, https://github.com/ropensci/GSODR/tree/master/data-raw.
[ ] I would suggest to use proper formatting in help files, when naming packages, e.g. \pkg{stats19} and when referring to documented functions or data, e.g. \code{\link{stats19_schema}}, or with single quotes around abbreviations, e.g. 'DfT'. @ColinFay has an excellent page that outlines the formatting options and when/how to use them, https://colinfay.me/writing-r-extensions/writing-r-documentation-files.html. This will greatly enhance the users' experience when using the help files by making them more readable.
[ ] I also would suggest making use of @seealso in documentation. For example, the dl_stats19() example works great in the help files, but from there I have the data but it's not in R. Using the @seealso you can let the user know about the read_*() functions.
[ ] I downloaded files using dl_stats19(), selecting "Casualties", and then ran read_collisions() and got Is it possible to be more descriptive and say that I've used the wrong read_*() based on the file/data found and offer to import it?
[ ] Missing "." after "e.g." in dl.R on lines 8 and 9, there may be others that I didn't spy.
[ ] Capitalisation in help files is inconsistent, e.g. lines 123-125 of read.R, parameter descriptions are mixed upper and lower case for first word after parameter itself. This applies to other functions where the descriptions are given in all lower case for other functions or upper case.
[ ] Testing the functionality, I get this, when I expect it to tell me that deaths is not a valid input. But then when I hit escape, I expect it simply exit, not provide a warning message on the way out as well.

dl_stats19(year = 1979, type = "deaths")
No files of that type found for that year.
This will download 240 MB+ (1.8 GB unzipped).
Files identified: Stats19-Data1979-2004.zip

Download now (y = enter, n = esc)? 

Warning message:
In find_file_name(years = year, type = type) :
  Coordinates unreliable in this data.

[ ] I got caught out when using the interactive features. I read "y = enter" but hit "y" thinking that would work as well as hitting "enter", but R cancelled the operation anyway just as if I'd hit "esc"
[ ] Per a recent conversation with CRAN, you should use donttest() rather than dontrun() for examples you don't want to be run on CRAN. Then set .travis.yml to run them by using r_check_args: --as-cran --run-donttest. This may not be appropriate in all cases, e.g. interactive functions.
[ ] When validating user inputs and using stop() it's nice to use call. = FALSE to simplify the error message that the user receives.
[ ] Consider using hoardr for managing user-saved files on disk that aren't in tempdir()?
[ ] When using utils::download.file(), you should use mode = "wb" or Windows users may end up with corrupted downloads in my experience. curl::curl_download() does the same thing but uses more updated ways of doing it and defaults to using a binary mode (wb).
[ ] I don't think that there is much need for the Attempting download from or Reading in: message. If it takes that long, I would suggest to use a progress bar to show progress. But this is just a personal observation.
[ ] Consider setting up a pkgdown site? It's easy to do and you can automate deployment with your Travis-CI so it's less to remember.

Tests

[ ] I'm unclear how the interactive portion of the package functions is handled in testing? There are ways to handle this, but I don't see any implemented and when I run devtools::test() I'm asked to provide my own input.
[ ] Suggest using skip_on_cran() since some of the tests can take some time to execute due to download times.

DESCRIPTION File

[ ] In the DESCRIPTION file, Mark's author entry is missing his ORCID.
[ ] More information in the DESCRIPTION's Description field would be desirable, a link to the data's website or other information to give more background perhaps.
[ ] STATS19 should be in "'" in DESCRIPTION for CRAN, i.e., 'STATS19', I think.
[ ] Check spelling in DESCRIPTION file, see: "analysie"
[ ] The Description should include a link to the DfT website.
[ ] Language field should be set, Language: en-GB

README File(s)

[ ] Use remotes::install_github() in place of devtools::install_github() in README.
[ ] The code style is inconsistent in the README.Rmd file in the code chunks, e.g. line 85 is missing space around =.
[ ] The example in the README showing two steps seems necessarily confusing to new users. If there is a good reason for having the raw data in R, document in a vignette why this is useful and show the two-step process, but if normal users won't do this, I wouldn't show it in the quick-start.
[ ] Line 43 of README uses inconsistent "(" around the phrases with the other two read_* function description.

Vignette

[ ] Run spell-check on it.
[ ] The term "attach"" has a specific meaning in R. Suggest rewording the portion about installation and loading the package to omit the use of "attach", since you're not using attach() in the R sense (and really shouldn't use it anyway).
[ ] I would describe why a user might want or need to install the Development version from GitHub in the vignette. Presumably if they are reading the vignette, they've already installed the package from CRAN (in the future).
[ ] Try to consistently use function() to identify functions in the vignette text. This also means that if/when you use pkgdown to build a site, the functions are linked to the help file.
[ ] In the introduction, the description of why there are read_*() and format_*() functions is confusing. To me, it reads as if format is only a parameter for read_*() in the introduction. I was left wondering why it's documented there or why the format_*()s even exist until I reached the end of the vignette.
[ ] There is a comma out of place in Vignette,
[ ] Format: Each of the read_*() functions has a format parameter which, when TRUE, adds

should be

[ ] Format: Each of the read_*() functions has a format parameter, which, when TRUE, adds
[x] I'm unsure about including a package that's not on CRAN in the vignette (ukboundaries), something like this should be listed in Suggests, but it's not on CRAN, @sckott do you have any thoughts?

This is a good point. Fixed, by adding a much more useful dataset, representing the juristictions of polic forces across England and Wales.

[ ] The first figures in the sf section after the join aren't immediately clear to me. The axis lack labels, I'm not really sure what I'm looking at.

ropensci/stats19
Work with Open Road Traffic Casualty Data from Great Britain

In ropensci/stats19: Work with Open Road Traffic Casualty Data from Great Britain

Responses to review 2 of stats19

Tests

DESCRIPTION File

README File(s)

Vignette

Meta

R Package Documentation

Browse R Packages

We want your feedback!

ropensci/stats19 Work with Open Road Traffic Casualty Data from Great Britain

In ropensci/stats19: Work with Open Road Traffic Casualty Data from Great Britain

Responses to review 2 of stats19

Tests

DESCRIPTION File

README File(s)

Vignette

Meta

R Package Documentation

Browse R Packages

We want your feedback!

ropensci/stats19
Work with Open Road Traffic Casualty Data from Great Britain