Responses to review 1 of stats19

Thanks for the review. We've had a chance, after making some changes and fixes to the package, to take-in and act on each of the comments. The code-base has evolved substantially since the review, but the fundamental design of the package, with its 3 stage API mirroring workflows that happened before the package was developed, remains unchanged. That is:

dl_stats19(year = 2017)
# Multiple matches. Which do you want to download?
# 
# 1: dftRoadSafetyData_Vehicles_2017.zip
# 2: dftRoadSafetyData_Casualties_2017.zip
# 3: dftRoadSafetyData_Accidents_2017.zip
dl_stats19(year = 2017, type = "ac")
# Files identified: dftRoadSafetyData_Accidents_2017.zip
# 
# Wanna do it (y = enter, n = esc)? 
dl_stats19(year = 1985)
# Year not in range, changing to match 1979:2004 data
# This file is over 240 MB in size.
# Once unzipped it is over 1.8 GB.
# Files identified: Stats19-Data1979-2004.zip
# 
# Wanna do it (y = enter, n = esc)?

We'll focus on areas flagged in the review for the rest of this response:

I would tease a bit more of what's in these data sets. I wasn't entirely sure until I downloaded and opened the supporting documentation. If I were searching for this kind of data, and I didn't know what STATS19 was, I'd like to know I'm in the right place after scanning the README. Maybe a map?

We have added a map (well technically 9 maps!) and a couple of time series plots showing the scale of the data. Also show a sample of the additional casualty and vehicle tables has been added to show more clearly the richness of data provided.

I couldn't load the vignette from the console:

We also could not see the vignette when installing using devtools::install_github(build_vignettes = TRUE. But we can see the vignette if we install locally.

This was the code we ran:

devtools::install(build_vignettes = TRUE)
vignette(package = "stats19")

Several of the examples failed:

These have now been fixed - thanks for testing and reporting.

I couldn't find any explicit contributing guidelines in the README, and there is no CONTRIBUTING document.

A CONTRIBUTING is added now. Thank you.

The package has an obvious research application according to JOSS's definition

There is no paper.md.

One is added with:

Review Comments

A superb and essential package--we need this data and we need it in these formats. The download-format-read-explore workflow is intuitive and relatively frictionless. I have only some brief comments:

Thank you.

I wonder you could possibly merge the formatting and reading step with a raw = TRUE or format = TRUE argument in the read_* functions. But perhaps that's my tendency towards abstraction. Something like ac = read_accidents(year = 2017, format = TRUE)

Done, appreciate your input.

My personal preference would be to have the schema from dl_schema lazily loaded with the package.

DESCRIPTION: has the line LazyData which means stats19_schema is lazy loaded.

According to the vignette, the dl_* functions are interactive, although the interactivity is commented out in the code. Will the interactivity be returning? Or does the vignette need to be updated?

Back in, as stated above.

Out of curiosity, what's happening with https://github.com/cyipt/stats19? It was updated recently.

@mem48 answered this: cyipt/stats19 is not actually a proper R package. It is a repo containing scripts for CyIPT project, it has different sources (UK DS), and usage so there is no current need to adapt the use to this package. Malcolm is one of the contributors to this package.

I confess I wish the package name was more expressive--stats19 sounds like an introductory statistics class.

This a reasonable point that we have thought of and discussed. We are open minded about changing the name but, as with so many things, there are +s and -s (outlined for some options below):

The main benefit we can see of changing the name would be making the package easier to find. We think good documentation and clear description and some write-ups of the package and what it does could address these issues. We've explored stat19 name and it links directly to (and is almost synonymous with) road crash data. See https://en.wikipedia.org/wiki/STATS19 for an excellent example (we plan to add this link to the README)

so the name is OK for we think, but we're open minded to alternative names mentioned above and perhaps names we've not thought of.

This data will be used to make many maps. I personally would love a nudge in that direction in either the README or the vignette.

Definitely. Thank you very much for your input.



ITSLeeds/stats19 documentation built on May 4, 2019, 7:35 a.m.