Move large files out of folder under version control. i.e. moved to ../large-data-files
Use NPSY4 data files for testing. They don't have the idiosyncracies as in the ihs4p data.
In the ihs4p_x0_rf1_daily.csv data, some columns have invalid dates, e.g. (3rd and 4th columns invalid dates): rf_19830227 rf_19830228 rf_19830229 rf_19830230 rf_19830301 1 0 10 0 0 0 2 0 10 0 0 0 3 0 10 0 0 0 4 0 10 0 0 0
Looks like each year has values for: 29 February (only an issue for common years) 30 February 31 April 31 June But not values for: 31 September 31 November
to_long
sets these as NA
values. For now, just exclude before
enumerate_seasons
is called
Also, each year only has data from 01 January to 31 August
Data coming in are wide, where:
E.g. y3_hhid rf_19830101 rf_19830102 rf_19830103 rf_19830104 1 0001-002 0 0 0 0 2 0003-001 0 0 0 0 3 0003-003 0 0 0 0 4 0015-001 0 0 0 0 5 0004-001 0 0 0 0
Generally a choice of either doing rain or temperature. This should be dictated by the function chosen, not an argument passed to a function
Summary stats are created for a season, defined a priori through starting month, ending month, and day of month. Some challenges with this:
Will probably need to convert to long and parse column names into three separate columns: year, month, day
Output example [1:5, 1:5] of data/results_rain.csv: y3_hhid mean_season_1983 median_season_1983 sd_season_1983 total_season_1983 1 0001-002 10.20635 10 8.187635 643 2 0003-001 10.20635 10 8.187635 643 3 0003-003 10.20635 10 8.187635 643 4 0015-001 10.20635 10 8.187635 643 5 0004-001 10.20635 10 8.187635 643
Biggest issue will be enumerating each "season"
Start with:
Mean daily rainfall Median daily rainfall Standard deviation of daily rainfall
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.