tools/Rwrapper_Explanations/2_SWSF_p0of5_Notes_v51.R

#---------------------------------------------------------------------------------------#

#------------------------FRAMEWORK FOR SOILWAT2 SIMULATIONS: CREATING SIMULATION RUNS, EXECUTING SIMULATIONS, AND AGGREGATING OUTPUTS

#---------------------------------------------------------------------------------------#

#------CODE developed and written by
# - Daniel R Schlaepfer (dschlaep@uwyo.edu, drs): 2009-2014
# - Donovan Miller (dlm): 2012
# - Ryan Murphy (rjm): 2012-2014
#for contact and further information see also: sites.google.com/site/drschlaepfer

#The R code below was tested on R version 3.1.1

#------DISCLAIMER: This program is distributed in the hope that it will be useful,
#but WITHOUT ANY WARRANTY; without even the implied warranty of
#MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

#------USER INPUT OPTIONS
#Folder hierarchy
#  rSFSW2 consists of one main project folder (dir_prj)
#  the main project folder contains three sub-folders
#    - 'dir_in_sw': for the input data (with three sub-folders: SOILWAT2-run project, datafiles, and  treatments),
#    - 'dir_out_sw': for the individual simulation SOILWAT2-runs (these can also be under a different parent directory defined by 'dir.runs', e.g., at a external hard disk with lots of space), and
#    - 'dir_out': for the aggregated output files

#rSFSW2 abilities include four different 'actions'
#  = "sim_create": puts information and files together for each simulation SOILWAT2 run
#  = "sim_execute": executes the SOILWAT2 simulations
#  = "sim_aggregate": calculates aggregated response variables from the SOILWAT2 output
#  = "concat_dbOut": takes all individual temporary output files (those generated by each call to do_OneSite), sorts and concatenates the information, and writes all into one .csv output file

#Input data for the SOILWAT2 simulation runs can come from one of five sources ('source_input')
#  = "datafiles&treatments":  - if actions include "sim_create", then
#                  - 1. step: if flagged in variable 'do.ExternalInformation', then data from external sources is accessed prior to starting simulations runs for all runs at once, and written to the corresponding 'datafile.X' and the use-flag set appropriately (see step 4.). Options include
#                        - 'calculate field capacity and wilting point based on soil texture' (data is written to 'datafile.soils'): calculate soil texture based on Cosby et al. 1984 for those soil layers for which the use flag of sand and clay is set; executed after eventual call to extract soil texture information
#                        - 'extract soil data from CONUS-SOIL' (data is written to 'datafile.soils' and 'datafile.soillayers'): extracts soil depth, bulk density, sand and clay content, and sets soil layer depths according to the CONUS-SOIL database
#                        - 'extract sky/climate data from Climate Atlas for the US by NOAA' (data is written to 'datafile.cloud'),
#                        - 'extract climate change scenarios' (data is written to 'datafile.weathersetup'),
#                        - 'extract topography and elevation' (elevation, aspect, and slope is written to 'datafile.SWRunInformation'),
#                  - 2. step: code creates SOILWAT2-runs with copies of SOILWAT2 input files from 'dir_in_sw'
#                  - 3. step: if flagged in 'datafile.treatments', then entire SOILWAT2 input files or substantial chunks are replaced with files from folder 'dir.sw.tr'
#                        Options for input files: sw, filesin, prodin, siteparamin, soilsin, weathersetupin, cloudin
#                        Options for chunks:
#                          - 'weather' (the entire folder for the weather data, i.e., cloudin and yearly files of daily weather, is copied)
#                          - 'climate.temp' (monthly deltaTempC addands are loaded from table according to scenario name, which are the columns in table)
#                          - 'climate.ppt' (monthly PPT scaling factors are loaded from table according to scenario name, which are the columns in table)
#                          - 'ppt.shifted' & 'shiftedPPT.Category' (monthly PPT scaling factors are loaded from table according to scenario name, which are the columns in table, rows/sites are selected by 'shiftedPPT.Category')
#                          - 'lookup transpiration regions for each soil layer based on a table' (data is written to 'datafile.soils'): region distribution type for each SOILWAT2-run is loaded from 'datafile.treatments';
#                          - 'lookup evaporation coefficients for each soil layer based on a table' (data is written to 'datafile.soils'): coefficient distribution type for each SOILWAT2-run is loaded from 'datafile.treatments';
#                          - 'lookup transpiration coefficients for each soil layer based on a table' (data is NOT written to 'datafile.soils'): coefficient distribution type for each SOILWAT2-run is loaded from 'datafile.treatments';
#                          - 'lookup snow density based on a table' (data is written to 'datafile.cloud'): snow density category for each SOILWAT2-run is loaded from 'datafile.treatments';
#                  - 4-. step: if flagged in variable 'do.ExternalInformation', then data from external sources is accessed during each simulations runs and added to the SOILWAT2-run. Options include
#                        - 'extract gridded daily weather from Maurer et al. 2002' (entire 'dirname.sw.runs.weather' is replaced), [data = Daily 1/8-degree gridded meteorological data [1 Jan 1949 - 31 Dec 2010]; http://www.engr.scu.edu/~emaurer/gridded_obs/index_gridded_obs.html; Maurer, E.P., A.W. Wood, J.C. Adam, D.P. Lettenmaier, and B. Nijssen, 2002, A Long-Term Hydrologically-Based Data Set of Land Surface Fluxes and States for the Conterminous United States, J. Climate 15, 3237-3251.]
#                  - 4. step: if flagged in 'datafile.X' (with X = input file name), then information is added per flagged element to SOILWAT2 input files taken from .csv datafiles in 'dir_in_dat'
#                - if actions does not include "sim_create", then code assumes that all SOILWAT2-runs are pre-prepared in the directory 'dir_out_sw' (5. source);
#                  i.e., code obtains the list of simulation runs from 'datafile.SWRunInformation' and executes and/or aggregates the runs
#  = "folders":  assumes that all SOILWAT2-runs are pre-prepared in the directory 'dir_out_sw' (5. source); 'actions' == "sim_create" is ignored
#          i.e., unlike above, code obtains the list of simulation runs from reading the folder content and executes and/or aggregates the runs (this mode doesn't rely on 'SWRunInformation' for SOILWAT2 input);
#          except if Index_RunInformation != NULL, then selected 'SWRunInformation' columns are appended to aggregated output

#  'datafile.SWRunInformation': datafile describes the design of the simulation experiment with the following columns (these and additional columns can be appended to the aggregated output files if selected by Index_RunInformation:
#            this datafile is in all cases needed except if 'source_input' == "folders" & Index_RunInformation == NULL
#            each row represents one SOILWAT2 simulation (multiplied by Scenario_No)
#            Mandatory columns for all simulations, if this datafile present:
#              - 'Label' (string) = name of the SOILWAT2-run
#              - 'Include_YN' (0/1) = whether (1) or not (0) the run is included in the simulations;
#              - 'Scenario_No' (integer) = number of climate scenarios in addition to the standard scenario (e.g., Scenario_No == 2, for scenarios current, B1, and A2)
#            Additional mandatory columns if 'source_input' == "datafiles&treatments":
#              - 'ScenarioX' (string) = name of scenario number X; NOTE: if extracting climate change information then these scenario names must correspond in alphabetical order to the folder names in dir_external/ExtractClimateScenarios
#            Additional mandatory columns if 'source_input' == "datafiles&treatments" & 'actions' == "sim_create":
#              - 'X_WGS84' (double) = longitude in degrees (WGS84)
#              - 'Y_WGS84' (double) = latitude in degrees (WGS84)
#              - 'ELEV_m' (double) = elevation (m a.s.l.)
#            Optional:
#              - 'ASPECT' (double) = aspect (degrees)
#              - 'SLOPE' (double) = slope (degrees)
#              - other variables

#  'datafile.treatments': datafile describes the treatment design
#            - the first line ('UseInformationToCreateSoilWatRuns') specifies which treatments are to be included (flags) (1) or not (0)
#            - Options for input files: for each SOILWAT2-run, enter name of input file to be copied from 'dir_in_treat' to the SOILWAT2-run folder
#              - sw, filesin, prodin, siteparamin, soilsin, weathersetupin, cloudin
#            - Options for information chunks:
#              - 'LookupWeatherFolder': enter name of weather folder to be copied from 'dir_in_treat' to the SOILWAT2-run folder
#              - 'LookupClimateTemp' and 'LookupClimatePPT': enter name of column of table in 'dir_in_treat' that is to be written to the weathersetup input file of the SOILWAT2-run folder
#              - 'LookupShiftedPPT': enter name of column-group of table in 'dir_in_treat' that is to be written (multiplied to existing factors) to the weathersetup input file of the SOILWAT2-run folder
#              - 'LookupShiftedPPTCategory': enter name of row of table in 'dir_in_treat'
#              - 'LookupSnowDensity_Category': enter name of row in snow-density table
#              - 'LookupTranspRegions_Type': enter name of row in transpiration-region table
#              - 'LookupTranspCoefs_Type'{Grass, Shrub, Tree}: enter name of column in transpiration coefficient table

#  'datafile.soillayers':  datafile describes the soil layer structure: soil depth and layers
#            this datafile is needed if 'source_input' == "datafiles&treatments"
#            Mandatory columns:
#              - 'Label' (string) = name of the SOILWAT2-run
#              - 'SoilDepth_cm' (double) = soil depth in cm
#              - 'depth_LX' (double) = lower depth (cm) of soil layer X (X <= 'slyrs_maxN')

#For most SOILWAT2 input files there is a datafile with adjustable parameters:
#  the first line ('UseInformationToCreateSoilWatRuns') of each datafile specifies which information is to be added to the input files (flags) (1) or not (0)
#  the first column: 'Label' (string) = name of the SOILWAT2-run
#  currently available datafiles:
#    - 'datafile.cloud'
#    - 'datafile.prod'
#    - 'datafile.siteparam'
#    - 'datafile.soils'
#    - 'datafile.weathersetup'

#  'datafile.climatescenarios': datafile describes the monthly PPT-factors and Temp-addands for each climate scenario, including the standard scenario, defined by 'Scenario_No' and 'Scenario_X' in 'datafile.SWRunInformation'
#  the first line of datafile specifies whether information is to be added to the input files (flags)
#  NOTE: there are three ways to incorporate climate scenarios
#      1. Use 'Scenario_No', i.e., > 0, then each row represents 1+Scenario_No SOILWAT2-runs, the standard and 'Scenario_No' climate scenarios, as defined in 'datafile.climatescenarios'
#      2. For 'Scenario_No' == 0, then each row represents one climate scenario, the one in the standard columns, i.e., ending in "sc00", of 'datafile.climatescenarios'
#      3. Use one or a combinations of the treatment options of (weather, climate.temp, climate.ppt) and set 'Scenario_No' == 0, then each row represents its own climate scenario

#Output options: two types of aggregated output files are generated: (i) overal aggregated response variables, (ii) separate files for daily mean and daily stats::sd of selected response variables (for two or three aggregated soil layers)
#  filename.aggregatedResults: name of file for (i) output
#  output_aggregate_daily: select response variables for which (ii) is calculated



#------ADDING FUNCTIONALIY
#Adding new treatments
#  1. Add new column to 'datafile.treatments'
#  2. If this is to add a new input file then
#    - Add code to add the input file in the 3. step of the create section in the do_OneSite function
#     If this is to import data from external resources then
#    - Add code to import data in the 1. step of the external data section before to the do_OneSite function

#Adding new parameters to an existing datafile:
#  1. Add new column to datafile (with first row = use flag and then the data)
#  2. Add code to add this new information to the input file in the 4. step of the create section in the do_OneSite function

#Adding a new datafile
#  1. Create a new datafile
#  2. Define its name at input section 'filenames of files that contain input data or treatment codes'
#  3. Add code to import datafile and its use flags in the preparation section 'import data'
#  4. Add code to add its new information to the input file in the 4. step of the create section in the do_OneSite function

#Adding aggregation response variables
#  1. If its a new response variable group, then add a new line to 'output_aggregates'
#  2. Update 'n_variables' (number of response variables) in the preparation section 'constants'
#  3. Add code to calculate response variables in the aggregate section of do_OneSite function
#  4. Update res, resultfiles.Aggregates.header, and nv accordingly

#Adding a daily response variable
#  1. Add variable name to option list for output_aggregate_daily
#  2. Add code to calculate response variable in the 'daily output' part of the aggregate section of do_OneSite function
#  3. Update scaler, agg.file, agg.analysis, and agg.agg accordingly

#soilWat shared object data indexs are listed in output.

#---------------------------------------------------------------------------------------#
#---------------------------------------------------------------------------------------#


#---------------------------------------------------------------------------------------#

#------------------------FRAMEWORK FOR SOILWAT2 SIMULATIONS: CREATING SIMULATION RUNS, EXECUTING SIMULATIONS, AND AGGREGATING OUTPUTS

#---------------------------------------------------------------------------------------#

#------CODE developed and written by
# - Daniel R Schlaepfer (dschlaep@uwyo.edu, drs): 2009-2013
# - Donovan Miller (dlm): 2012
# - Ryan Murphy (rjm): 2012-2013
#for contact and further information see also: sites.google.com/site/drschlaepfer

#The R code below was tested on R version 3.0.2

#------DISCLAIMER: This program is distributed in the hope that it will be useful,
#but WITHOUT ANY WARRANTY; without even the implied warranty of
#MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

#------SOILWAT2 Simulation Framework, rSFSW2

#------NOTES:
#  - the code performs only rudimentary error checking and handling
#  - SOILWAT2 is forced by:
#    - daily: rainfall (cm), maximum and minimum air temperature at 2-m height (C)
#    - mean monthly: wind speed at 2-m height (miles/h before v24, m/s starting with v24), relative humidity at 2-m height (%), and cloud cover (%)

#------VERSIONING
#  v37 (20111213):
#  v38  (20111227-20120104):
#    - added to datafile.treatments: YearStart and YearEnd to adjust simulation period for each SOILWAT2-run individually
#    - adjusted for treatment = 'LookupWeatherFolder' so that if the weather folder does not contain cloudin and treatment != 'cloudin', then cloudin is copied from 'dir_in_sw'
#    - fixed bug in function 'TranspCoeffByVegType' where if 'DepthCM', the coefficients for the deepest soil layer were not added up if the soil layer was deeper than the coefficients
#    - fixed bug in 'ExtractSkyDataFromNOAAClimateAtlasUS': codes for cover and wind were erroneous
#    - fixed bug in 'ExtractSoilDataFromCONUSSOILFromSTATSGO': extraction of values was erroneous
#    - fixed bug in 'add soil information to soilsin': re-adjustment of deepest soil layer when multiple lowest soil layer data was empty, was erroneous
#    - fixed bug in function 'setSecondLayer' and 'setThirdLayer': DeepestSecondDailyAggLayer and DeepestThirdDailyAggLayer were calculated erroneously if soil layer re-adjustment decreased number of soil layers below desired value
#    - fixed bug in 'determine soil layer structure': if not create, then re-adjustment of deepest soil layer when multiple lowest soil layer data was empty, was erroneous
#    - converted header information from levels of factors to characters if necessary
#    - function 'SWPtoVWC': can now handle NAs in sand or clay
#    - improved dir.remove: removes now also files in recursive directories
#  v39 (20120104-20120113):
#    - aggregated output files contain only selected output variables and no longer all
#    - added aggregated output:
#      - stats::sd for climate and weather, for climate correlations, and for water balance fluxes
#      - group 'dailyWeatherEventSizeDistribution' including frequency distribution of prcp-event size and of duration of dry days (no prcp); also added two constants to determine bin sizes
#      - group 'dailySWPdrynessDurationDistribution' including cummulative frequency distribution values (i.e., sampled at deciles plus at extremes) of duration of dry soils in top and bottom soils for each of the four seasons for each of the SWP.crit
#    - improved 'LookupEvapCoefs' and 'LookupTranspRegions': if datafile has from previous run more soil layers than current, these old data are now deleted
#    - introduced function 'simTiming' which produces a list 'simTime': to better handle run-specific timing
#    - fixed bug in 'dailySnowpack': checks now that there are enough snow years available (i.e., 2) to calculate response variables
#  v40 (20120113-20120123):
#    - added flag 'deleteSoilWatOutputAfterAggregation': deletes all SOILWAT2 simulation output after aggregation for run is complete, but maintains the SOILWAT2 input file structure; this safes space (e.g., for 1000 years of complete output = 490 MB); if other response variables are needed, then only actions == c("sim_execute", "sim_aggregate") are needed, but not "sim_create"
#    - fixed bug in column names of daily aggregations
#    - added vector 'delete.exceptions': if 'deleteSoilWatOutputAfterAggregation' is turned on, then it doesn't delete files listed
#    - fixed bug in 'dailyWeatherEventSizeDistribution': incorrect result or subscript out of bounds if no data present for a summary bin
#  v41 (20120123-20120217):
#    - added flag 'resume': if true, then performs actions only if not already completed, i.e., SOILWAT2-run folder present for 'create', output folder not empty unless deleteSoilWatOutputAfterAggregation == TRUE & delete.exceptions == NULL for 'execute', and appropriate temporary output files present for 'aggregate'
#    - fixed bug in calculation of 'doy_ForEachUsedDay', 'year_ForEachUsedDay', and 'month_ForEachUsedDay': required output file may not have been available if 'deleteSoilWatOutputAfterAggregation' was on and actions != execute
#    - fixed bug in daily aggregation: rain, snowfall, snowmelt
#    - added daily aggregation response variables: PET, TotalPrecipitation
#    - fixed bug in 'dailyWeatherEventSizeDistribution': subscript out of bounds if data present for only one bin larger than the largest summary bin
#    - changed daily aggregation 'SWA' from daily means of SWA (truncated at 0) to daily SWA based on daily means of SWC: otherwise if a single day had SWA > 0 then daily mean for that doy > 0 even though daily average of SWP could be < SWPcrit
#    - added external data source option for 'CalculateBareSoilEvaporationCoefficientsFromSoilTexture'
#    - added code to delete duplicated input files originating by copying 'swruns' and then adding treatment files
#    - fixed bug in calculation of number of layers if soildepth was exactly a lower layer boundary
#    - column names of daily aggregation 'SWA' include now value of SWPcrit instead an index
#    - added daily aggregation response variables: minimum and maximum temperature
#    - fixed bug in ETA calculation: didn't ignore excluded runs
#    - fixed bug in calculation of ifirst: if first row was not included then no column headers were generated
#    - fixed bug in concatenating output files: if error writing file then all temp files still deleted instead of saved for later attempts
#    - fixed simTime
#  v42  (20120217-20120503)
#    - added aggregation: Artemisia tridentata germination and first-year seedling survival
#    - added option for 4 aggregated soil layers for daily aggregation responses
#    - fixed bug in calculation of daily aggregation of SWA
#    - unified simulation output file access: data is read only once
#    - temporary variables are now removed for each aggregation group
#    - fixed bug in aggregation of 'dailySnowpack': apply failed if nrow(matrix) == 1
#    - added aggregation 'dailyWeatherGeneratorCharacteristics': duration of wet spells (days with precipitation), duration of dry spells (days without precipitation), daily temperature standard deviation
#    - fixed bug in daily aggregation of SWA: soiltexture
#    - fixed bug in column names for aggregation of 'dailyWeatherEventSizeDistribution'
#    - fixed bug in aggregation 'monthlyCorrelations' when there was no precipitation during a given year
#    - fixed bug in creation of soils.in: if use_imperm not set and soilsin contained no data, then set impermeability to 0
#    - fixed bug in aggregation of 'dailyWeatherEventSizeDistribution': if only 1 or 2 years of simulation data
#  v43 (20120503-20120604)
#    - added option for aggregated time series output
#    - added option for 'ArtemisiaTridentataRegeneration' yearly time series output of germination and seedling survival successes
#    - fixed bug in aggregation of 'ArtemisiaTridentataRegeneration': number of years was mis-calculated
#    - fixed bug in determination of layers_depth if adjusted during soilsin creation: added function: layers_depth <- adjustLayersDepth(layers_depth, d)
#    - fixed bug in creation of soils.in: if no data for impermeability in datafile and datafile different soil layer structure than soilsin, then set impermeability to 0
#    - added overall timing measurements and output
#    - fixed bug in determination of 'layers_depth.datafile': if more depths values available in datafile than column 'SoilDepth_cm' indicated, then values for soilsin taken from datafile (which may be empty) instead of input file
#  v44 (20120604-20120612)
#    - (DLM) began updating the framework for use with SOILWAT2 v_23 (now includes: soil temperature)
#    - (DLM) updated the adding of site information to siteparamin to account for changes in the siteparamin file... (IE: the inclusion of the soil temperature constants).  updated the line numbers for the transp regions part.
#    - (DLM) updated the adding of soil information to soilsin to include the initial soil temperature for each soil layer.  also changed the SW_Runs_InputData_Soils.csv file to include columns for the initial soil temperature for each soil layer.
#    - (DLM) added new aggregation response variable "monthlySoilTemp" to output monthly soil temperature
#    - (DLM) added new daily response variable "SoilTemperature" to output daily soil temperature
#    - added to datafile 'SWRuns_InputData_siteparam_v10.csv': option to turn on/off soil temperature calculations, and option to adjust constant soil temperature at lower boundary (estimated as mean annual air temperature)
#    - added option to extract data externally 'EstimateConstantSoilTemperatureAtUpperAndLowerBoundaryAsMeanAnnualAirTemperature': calculates mean annual air temperature from daily weather input previous to runs; data per run needs to be stored temporarilly and afterwards concatenated and stored into datafile.siteparam
#    - added option to extract data externally 'EstimateInitialSoilTemperatureForEachSoilLayer': estimate initial soil temperature values for each layer as linear regression = f(soil layer depth) between (0 cm, 0C) and (lower bound, lower bound constant temperature); data per run needs to be stored temporarilly and afterwards concatenated and stored into datafile.soils
#    - fixed bug in 'collect_ResultsWithTemporaryDataFrame' if there was only one run to aggregate
#    - fixed bug in creation of soils.in: if datafile.soils used and not data for initial soil temperatures, but these calculated in EstimateInitialSoilTemperatureForEachSoilLayer, then values were not taken from the calculated ones
#  v45 (20120612-20120705)
#    - updated 'ArtemisiaTridentataRegeneration' and made it on average 15% faster
#    - renamed 'ArtemisiaTridentataRegeneration' to 'dailyRegeneration_byTempSWPSnow'
#    - parameter values for 'dailyRegeneration_byTempSWPSnow' are read in from a species.csv file from folder input/regeneration
#    - added option for multiple species in 'dailyRegeneration_byTempSWPSnow': for each file in folder input/regeneration, 'dailyRegeneration_byTempSWPSnow' is called
#    - added option 'be.quiet': if TRUE, no reports on timing and runs is printed
#  v46 (20120705-20120725)
#    - (DLM) added a few changes for running on JANUS, made a R script to run the framework (rToCallR.R) that sets a global variable "use_janus" to make a few minor changes for specifically running the code on JANUS.  (ie. doesn't load libraries since they're not installed on JANUS, sets the project directory differently, sets the number of cores to use differently, & calls the C code slightly differently).
#    - (DLM) made a few small changes to facilitate running the code on R version 2.13 (ie. the version on JANUS) mainly wrote a function list.dirs2 to handle list.dir calls, since the list.dir function is different in 2.13 and 2.15.  also made a small edit to the dir.copy function (commented out 1 line that was giving an error & added a new one that does the same thing and doesn't throw an error).  Code is running on both version 2.13 & 2.15 now!
#    - framework is no longer compatible with SOILWAT2 versions prior to v20 due to requirements for sw.inputs and sw.output folders
#    - added options to extract data from global datasets: "ExtractGriddedDailyWeatherFromNCEPCFSR_Global", "ExtractClimateChangeScenariosMaurer2009_Global", "ExtractSkyDataFromNCEPCFSR_Global"
#    - added input datafile 'SWRuns_InputData_ClimateScenarios_Values_v10.csv': SOILWAT2 climate scenario factors (delta temperature, precipitation multiplier) are not taken from 'SWRuns_InputData_ClimateScenarios_Change_v10.csv', but are calculated as difference between climate scenario values and current climate calculated from daily input data (goal: match timeframe of model and change)
#    - added option to 'actions': concatenate temporarily stored files after simulation runs or not (e.g., run everything on JANUS and then concatenate locally because it's not parallelized)
#    - fixed bug in aggregation of mean or median aggregated soil layer response variables: previously, aggregated mean or medians were simple averages across soil layers thereby values from wide soil layers were underrepresented; now, aggregated means are weighted by soil layer width (note, SWP aggregated across soil layers is now done by first aggregating VWC and then converting to SWP)
#    - added function 'VWCtoSWP': converting VWC for layers with known sand and clay contents to SWP
#    - aggregation of SWP across soil layers is now done by first aggregating VWC (weighted by soil layer widths) and then converting to SWP because SWP is not linear
#    - improved 'resume': now checks first for each temporary output file and then decides whether it needs to re-execute SOILWAT2 and aggregate or not
#    - improved handling of 'include_YN': the work-horse function is now only called for runs/sites with include_YN > 0
#  v47 (20120725-20120829)
#    - implemented parallelization with doSNOW/snow as an option to doMC/multicore - JANUS: only use snow, because multicore cannot access cores outside master node
#    - calculation of 'calculate doy_ForEachUsedDay, month_ForEachUsedDay, and year_ForEachUsedDay' is expensive for many years: creation of these vectors once before each simulation is started if possible, i.e., already executed and all simulations have same simulation years
#    - temporary files written to a run-specific sub-directory in dir_out.temp instead of all files to dir_out.temp: linear increase in computation time per run if too many files in dir_out.temp (limit ca. 600, 000 files)
#    - (DLM) added function dir.create2: made this function b/c dir.create wasn't always working correctly on JANUS for some reason... so if the simulations are being run on JANUS then it uses the system mkdir call to make the directories.
#    - added option 'deleteTemporaryAggregationFiles' to delete all temporary aggregation files in folder dir_out.temp after successful concatenation (or not if FALSE)
#    - (DLM) concatenation made much faster: concatenation step only creates the file list once instead of each time
#    - concatenation of temporary aggregated output is now performed in parallel
#    - added option to check completeness of SOILWAT2 simulation directories and of temporary output aggregation files 'checkCompleteness'; creates a list with missing directories and files
#    - (DLM) added option to use Rmpi for the multi-threading, it's performing better on JANUS and is slightly easier to debug then using SNOW (as it will print out an error message for every site that fails, unlike SNOW which will only print out the first error).  The only bad thing is that the output is harder to read as all the print statements that are made inside the do_OneSite() function are seperated into log files for each thread by Rmpi.  To keep all the logfiles change the call to mpi.close.Rslaves(dellog = TRUE) to mpi.close.Rslaves(dellog = FALSE).  It's fairly easy to check if it ran right because if any of them do not it will still print the error to the main logfile (or stdout if running locally).
#    - (DLM) changed dir.create2 & file.copy2 functions to recursively call themselves until the file/dir is copied/created.  This is because for some strange reason when using MPI on JANUS, the functions both don't work everytime and fail to throw an error when they don't work.  It's quite aggravating really, but this is the only way I got it actually working.
#    - (DLM) program is still called the same on JANUS when using Rmpi, ex: "R --no-save --worker < 2_SoilWatSimulationFramework_PC_GlobalDrylandEcoHydro_PrelimSim_v47.R", and is running correctly on JANUS again
#    - (DLM) fixed bug in daily aggregation if no daily aggregation was selected
#    - fixed bug in "CalculateBareSoilEvaporationCoefficientsFromSoilTexture"
#    - fixed bug in "CalculateFieldCapacityANDWiltingPointFromSoilTexture": call to SWPtoVWC was not designed for matrices of sand and clay
#  v48 (20120829-20121018)
#    - replaced variable in 'ExtractSkyDataFromNOAAClimateAtlas_USA': replaced 'Sky Cover' with 'Percent Sunshine' as SOILWAT2's PET function estimates clearsky = 1 - cloudcover (hence, percent sunshine fits SOILWAT2's interpretation better than sky cover; PET values were too low generally as sky cover values are generally high in the NOAA US Climate Atlas)
#    - added option in overall aggregations: 'yearlyUNAridityIndex' = mean(PPT/PET)
#    - added option 'deleteSoilWatFolderAfterAggregation': to completely delete each SOILWAT2 run folder after completion of all required 'actions'; calculations of 'todo' actions (picking up after stop) accounts for 'deleteSoilWatFolderAfterAggregation'
#    - added option in overall aggregations: 'monthlyPlantGrowthControls', see Nemani RR, Keeling CD, Hashimoto H et al. (2003) Climate-Driven Increases in Global Terrestrial Net Primary Production from 1982 to 1999. Science, 300, 1560-1563.
#    - fixed bug: guaranteed that all climate change entries in 'weathersetup' are finite: this was not the case for instance if any(meanMonthlyClimate$meanMonthlyPPTcm == 0)
#    - added code for storing to disk/reading from disk/creating the list of temporary output files to concatenate (because creating this list can be quite costly), and checking whether enough temporary files are on disk for successful concatenations
#    - fixed bug: in 'dailySWPbasedRegeneration' swp and SWE were not of the same time frame
#    - introduced sw_v24: main change: unit of wind speed at 2-m height (miles/h before v24, m/s starting with v24)
#    - added code to check if input datafile measurement height above ground of windspeed is SOILWAT2 required 2-m, if not convert with function 'adjust.WindspeedHeight'
#    - new vegetation interception coefficients for shrubs
#  v49 (20121102-20130801)
#    - (rjm) adjusted line numbers for prod.in file
#    - (rjm) Fixed sim timing to have Start and End not Start and Start
#    - (rjm) Aggregate for each senario, has a header that includes YearStart and YearEnd
#    - (rjm) created variables   shrub_limit       - potential natural vegetation based on climate data (Jose Paruelo et al. 1996, 1998)
#    - (rjm)           growseason_Tlimit_C   - used to define threshold temp of a growing month
#    - (rjm)            grass.c4.fractionG, grass.c3.fractionG, grass.Annual.fraction - represent percentage of Grass based on various values.
#    - (rjm) added code to find relative composition of vegetation based on climate
#    - (rjm) added code to adjust composition based on mean temp and mean ppt.
#    - (rjm) added code to calc rooting fraction based on those relative composition
#    - (rjm) Fixed the code so that all 'create' sections based on climate is carried out after the climate change scenario are made
#    - more verbose during preparation stages of code
#    - streamlined code for parallelization
#    - added option in overall aggregations: 'dailyC4_TempVar' = c('mean July minimum temperature', 'mean length of annual freeze-free periods (days)', 'mean annual degree-days above 65F (day C)'
#    - replaced simTime$mo == 1:12 with global variable SFSW2_glovars[["st_mo"]]: functions to extract external information don't need to import simTime any more
#    - added function 'simTiming_ForEachUsedTimeUnit' to calculate necessary timing variables such as 'doy_ForEachUsedDay' without readin from SOILWAT2 files (which was quite costly)
#    - accounting for North/South hemispheres:
#      - added global option flag 'adjust_NorthSouth': if TRUE then add a set of counting of the timing variables for sites with latitude < 0 (i.e., southern hemisphere), which have time shifted by 6 months (e.g., July becomes 1st month, etc.); i.e., introduced simTime2 as calculated by function 'simTiming_ForEachUsedTimeUnit'
#        - this affects aggregation of: monthlySWPdryness, dailySWPdrynessANDwetness, dailySnowpack, dailyC4_TempVar, dailySWPbasedRegeneration, dailyRegeneration_byTempSWPSnow (param$Doy_SeedDispersalStart0 must be set correctly)
#      - (rjm) added global option flag 'adjust_veg_input_NS' to shift monthly production values in prod.in file by six months
#    - moved scenarios from master datafile to script because it should be the same for every simulation run
#    - fixed bug in "CalculateBareSoilEvaporationCoefficientsFromSoilTexture": affected layers were incorrectly estimated when limit was a soil layer boundary
#    - fixed bug in "CalculateFieldCapacityANDWiltingPointFromSoilTexture": data was incorrectly assigned to simulation runs
#    - changed aggregated output names for SWPcrit from 1, 2, 3, ... to actual values, e.g., 1500kPa
#    - efforts made to reduce memory load for forked processes:
#      (i) deleted all global variables when not needed any more,
#      (ii) 'large' variables such as SWRunInformation or sw_input_climscen_values only passed row-wise to function do_OneSite()
#        - problems: works on multicore, works on snow (but each process utilizes 2x memory than if run with multicore), does not work on mpi
#    - added global option for daily aggregations: output for every soil layer instead for aggregated soil layers
#    - added option to treatment datafile: 'Vegetation_Biomass_ScalingFactor': litter and total biomass values are multiplied by scaling factor
#    - added option to treatment datafile: 'Vegetation_Height_ScalingFactor': height of vegetation (either constant or as described by a tangens-function(biomass)) is multiplied by scaling factor
#    - added overall output option: replaced 'yearlyUNAridityIndex' with 'yearlymonthlyTemperateDrylandIndices': includes UN Aridity Index, Trewartha's D, and the combination of both
#    - added name of climate scenarios to output filenames
#    - (rjm) introduced sw_v25: main change: topography (slope, aspect as inputs to siteparam.in) affects PET; datafile siteparam adjusted
#    - (rjm)  added code to handle ensemble.families & ensemble.quantiles, new function collect_EnsembleFromScenarios
#    - (rjm) tested relative composition based on climate, adjust composition based on mean temp and mean ppt, rooting profile
#    - (rjm) fixed line numbers to prod file in my code, added infile <- file(infilename, "w+b") in my code, fixed mpi bug, replaced TranspirationCoefficients.csv, removed soillayers_depth <- NULL from my code, removed double tab when writing to prodin file line 6 in my code
#    - (rjm) 'input_FractionVegetationComposition' in output_aggregates: add three columns ("SWinput_Fraction_{Grasses, Shrubs, Trees}") to overall aggregation file: with content of line 6 of file prodin
#    - (rjm) 'input_VegetationPeak' in output_aggregates:add one column ("SWinput_PeakLiveBiomass_Month") to overall aggregation file: with content of the month corresponding to the maximal value, weighted by fractional composition of grasses, shrubs, and trees, of biomass * percLive
#    - (rjm) 'input_ClimatePerturbations' in output_aggregates:' add 36 columns ("SWinput_ClimatePerturbations_{PrcpMultiplier, TminAddand, TmaxAddand}_m1:12") to overall aggregation file: with content of the mean monthly climate perturbations from file weathsetupin lines 17-28
#    - (rjm) add column to treatment datafile 'PotentialNaturalVegetation_Composition_basedOnReferenceOrScenarioClimate" with values "Reference" or "Scenario" : calculate C3, C4, and shrub potential compositions using 'SiteClimate_Ambient', if "Reference", and 'SiteClimate_Scenario', if "Scenario" or not set
#    - (rjm) introduced sw_v26: main change: a set fraction of ponded water can be runoff
#    - (drs) updated 'PotentialNaturalVegetation_CompositionShrubsC3C4_Paruelo1996': fraction of C4 grasses accounts now limits set by Teeri JA, Stowe LG (1976); If no or only one successful equation, then   add 100% C3 if MAT < 10 C, 100% shrubs if MAP < 600 mm, and 100% C4 if MAT >= 10C & MAP >= 600 mm
#    - (drs) updated 'max.duration': added handling of durations of length 0
#    - (drs) added stats::sd to output: 'dailySWPextremes', 'dailySnowpack'
#    - (drs)  incorporated circular calculations for time output (via library 'circular' accessed with wrapper functions 'circ_mean' and 'circ_sd'): 'dailySWPextremes', 'dailySnowpack', 'monthlySWPdryness', 'dailySWPdrynessANDwetness'
#    - (drs) fixed code after introduction of 'experimental design' so that do_OneSite is called with proper inputs only if include_YN > 0
#    - (drs) in 'create' at the end of each scenario-loop: control that transpiration regions are within limits of adjusted soil depth and rooting depth
#    - (drs) fixed 'makeInputForExperimentalDesign' & !makeOutputDB: parsing of the files was incorrect if separator was already used by text; added a (hopefully) unique 'ExpInput_Seperator'
#    - (drs) generalized experimental design: structure of 'datafile.Experimentals' does no longer need to be a copy of the structure of 'datafile.treatments'; currently, experimental design could now cover any changes to sw_input_treatments, sw_input_site, and sw_input_soils
#    - (drs) fixed concatenation: 'tempFiles_N' wasn't exported to nodes
#    - (drs) fixed aggregation of 'input_VegetationPeak': 2nd column wasn't labelled correctly
#    - (drs) output of functions circ.xxx (e.g., xxx = {mean, range, stats::sd}) give now numeric result, instead of class circular; i.e., this caused some inadverted problems converting results to vectors
#    - (drs) added to aggregation of 'yearlymonthlyTemperateDrylandIndices': indices based on climate normals in addition to meanĀ±stats::sd of time series
#    - (drs) added columns for source information to datafile.cloud, which will be promoted to the cloudin file
#    - (drs) function 'get.LookupSnowDensity' replaces months with NA or 0 with an estimate of density for freshly fallen snow
#    - (drs) snow density values from datafile.cloud are now tagged with hemisphere, and if different than location adjusted
#    - (drs) fixed bug that failed daily aggregation of 'EvaporationTotal' and 'EvaporationSoil' if soil evaporation was only one layer deep
#    - (drs) fixed bug in calculation of 'count.AllConcats': if makeInputForExperimentalDesign was TRUE but trowExperimentals not used, then count.AllConcats was too large
#    - (drs) fixed bug in ensembles.maker$outputs (wrong dimensions) and created a own section for do_ensembles with timing, pulling together the version for temporary files and the one with the MPI-based sql-database
#    - (drs) fixed bug in 'get.LookupSnowDensity': wrong dimension of references; added support of references for 'ExtractSkyDataFromNOAAClimateAtlas_USA'
#    - (drs) fixed bug in 'AdjRootProfile': formatting of very small numbers was incorrect while writing to soilsin file
#    - (drs) fixed bug in daily aggregation of response variables with many soil layers: dimension of weight factors was incorrect for weighted means
#    - (drs) increased precision of climate change values: changed number of digits of monthly PPT and T scalers written to weatherin from 2 to 4 (differences between input and output could be larger than 0.5%)
#    - (drs) fixed bug in how 'experimental design' is assigned to the underlaying inputs: subsetting threw error if multiple columns didn't match beside at least a matching one. (Bug in R? that data.frame[i, c(0, 0, 3, 0)]  <- 3 shows error of 'duplicate subscripts for columns')
#    - (drs) fixed bug in climate scenario creation: error due to wrong dimensions in scalors if treatments of ClimateScenario_{Temp, PPT}_PerturbationInMeanSeasonalityBothOrNone were 'None'
#    - (drs) fixed bug and numerical instability in circ_mean: wrong cycle for x = int+1 yielded 0 instead of int; rounding error for x = 366 if int = 365 yielded 366 instead of 365
#    - (drs) fixed bug in treatment option 'Vegetation_Biomass_Scaling': no scaling occured if 'Vegetation_Biomass_ScalingSeason_AllGrowingORNongrowing' was 'All'
#    - (drs) fixed bug in creation of soilsin from input datafile: writing to file of impermeability was done after rounding to 0 digits, thereby missing all content except 0 and 1
#    - (drs) aggregation options 'dailySWPextremes', 'dailyTranspirationExtremes', 'dailyTotalEvaporationExtremes', and 'dailyDrainageExtremes', 'dailyAETExtremes': adapted that the mean instead of the first of multiple extreme days is used
#    - (drs) aggregation options 'dailyTranspirationExtremes', 'dailyTotalEvaporationExtremes', 'dailyDrainageExtremes', and 'dailyAETExtremes': fixed a index bug in the header
#    - (drs) aggregation option 'RainOnSnowOfMAP': added stats::sd
#    - (drs) aggregation option 'dailySWPdrynessIntensity': added mean of the amount of dry-period missing water, as well as the duration and the number
#    - (drs) fixed bug in aggregation options 'dailySWPdrynessEventSizeDistribution' and 'dailySWPdrynessIntensity': used xx.dy.all instead of xx.dy (yearly aggregations are based on xx.dy-formatted data)
#    - (drs) generalized functions 'start10days' and 'end10days' (only used in aggregation option 'dailySWPdrynessANDwetness'), i.e., removed exlcusion of dry periods in first 90 days, generalized from fix 10 days to n days periods; replaced with 'startDoyOfDuration', 'endDoyAfterDuration'
#    - (drs) added aggregation option 'monthlySPEIEvents': duration and intensity of the standardized precipitation-evapotranspiration index at different scales
#    - (drs) renamed 'SWCtot' -> 'SWC' and 'SWCvol' -> 'VWC'
#    - (drs) updated n_variables to represent actual numbers of aggregations: apparently, in the past, when aggregation options were added, the update of n_variables was forgotten
#    - (drs) fixed bug in 'PotentialNaturalVegetation_CompositionShrubsC3C4_Paruelo1996': if all fractions set, but didn't quite sum up to 1 (due to that fractions in prodin are only written with 3 digits), then error was thrown
#    - (drs) added information on number of runs, scenarios, concatenations, ensembles, and workers to the overall timing file
#    - (drs) adjusted 'adjustLayersDepth': included a round(, 0) because the wrapper only handles 1-cm resolution of soil depths (maily because of the trco)
#    - (drs) aggregation option 'dailyWeatherEventSizeDistribution": re-formulated counts per year to frequencies per year, added mean count per year and stats::sd to output (adjusted n_variables by +4)
#    - (drs) aggregation option 'dailySWPdrynessEventSizeDistribution": re-formulated counts per year to frequencies per year, added mean count per year and stats::sd to output (adjusted n_variables by +4 per icrit)
#    - (drs) added action 'ensemble': as separate from concatenation
#    - (drs) re-organized 'ensemble's:
#          - ensemble levels are now taken by ranks instead of quantiles (type 3, even order statistics); they are taken individually for each variable = column
#          - ensembles are taken directly for all response variables except for stats::sd which are the ones associated at the scenario-level with the selected ensemble variable
#          - i.e., all output must be organized in two files: one for the means, etc., and one file for the SDs organized in the same columns --> overall aggregated output is now split up in two files like the daily aggregations already were
#          - option 'save.scenario.ranks' if TRUE then for each ensemble.levels an additional file is generated with the scenario numbers corresponding to the ensemble.levels
#    - (drs) re-arranged overall aggregated output: it is now split up in two files (means and others, optional SDs), like the daily aggregations already were
#          - Column names always consist of three parts:
#            1. variable name: sortable,
#              i.e., most common name parts go first,
#              if monthly then at the end m1 ... m12, separated by "."
#              if for top and bottom layers, then ".topLayers", ".bottomLayers"
#            2. units: "_mm", "_fraction", "_TF" (true or false), etc. if unitless then "_none"
#            3. type of aggregation: "_mean", "_sd", "_const", "_median", "_quantilePROBS", etc.
#    - (drs) fixed bug in some of the aggregations if bottomL == 0
#    - (drs) fixed bug in create 'control transpiration regions for adjusted soil depth and rooting depth' if number of soil layers == 1
#    - (drs) fixed bug in some of the aggregations if topL == 1
#    - (drs) changed functions 'circ.' {mean, range, stats::sd}: if all elements of x are NA and na.rm == TRUE, then output was 'numeric(0)' which caused aggregated output variables from numeric vectors into a list; changed that these functions now put out NA instead of 'numeric(0)'
#    - (drs) faster version of 'ExtractGriddedDailyWeatherFromMaurer2002_NorthAmerica'
#    - (drs) deleted empty line(s) in soilsin -> r wrapper hangs with empty last line
#    - (drs) added option 'print.debug' to print statements about code advancement (may be useful for debugging)
#    - (drs) fixed bug in 'dailyRegeneration_byTempSWPSnow': if only one soil layer, then variable 'swp' needs to be forced to be a matrix
#    - (drs) fixed bug in 'SiteClimate': MAT was taken as mean of monthly values which biased the result; instead the function is now correctly taking the mean of the mean annual temperature values
#    - (drs) added code that reports if SOILWAT2 execution or 'ExtractGriddedDailyWeatherFromMaurer2002_NorthAmerica' failed
#    - (drs) fixed bug in 'PotentialNaturalVegetation_CompositionShrubsC3C4_Paruelo1996': if no vegetation was estimated for the components to be estimated and the fixed ones summed up to 0, then the result contained NA: now, vegetation is forced > 0
#    - (drs) fixed bug in section of concatenation: if actions included concatenation and ensemble, but not aggregation, then no concatenation occurred mistakenly
#    - (drs) added option 'concatenation.inMemory': #concatenation.inMemory: all temp output is loaded into a giant data.frame, then written to final file; if !concatenation.inMemory, temp output is read and immediately appended to final file
#    - (drs) renamed function 'collect_ResultsWithTemporaryDataFrame' to 'concatenate_TemporaryResultFiles'
#    - (drs) added two functions 'concatenate_TemporaryResultFiles': one to concatenate in memory (as before), one to concatenate via append (new); these are selected via option 'concatenation.inMemory'
#    - (drs) fixed bug in 'concatenate_TemporaryResultFiles': if cleanup and concatenation resumed, then already final files deleted
#  v50 (20130801-20130909)
#    - (rjm) fixed database creation so page size is max and using REAL instead of double.
#    - (rjm) Each node writes out a sql file with all the data from run. This is loaded into the dbTables.db database after runs are finished
#    - (rjm) Ensembles via database now works in with Rmpi. A database for each scenario table is generated with its ensembles.
#    - (rjm) The weather is loaded from a database by the master node and gives it to worker node via memory OR it can look it up in files
#    - (rjm) The soilwat setup files are loaded in memory via Rsoilwat structures objects and passed to the workers who then work with that instead of files.
#    - (rjm) daily values are sorted to match database.
#    - (rjm) fixed SWA output only outputing for one crit
#    - (drs) 'deleteSoilWatFolderAfterAggregation' set to FALSE: requests SOILWAT2 input/output to be stored on disk
#    - (drs) 'yearlyWaterBalanceFluxes': fixed column names (percolation and hydred were switched)
#    - (drs) experimental design can replace information from the prodin datafile
#    - (drs) added overall aggregation option 'dailyFrostInSnowfreePeriod'
#    - (drs) fixed temporary db output if create_treatments == "Exclude_ClimateAmbient": superfluous end of line and header was factor values instead of characters
#    - (drs) fixed adding treatment/experimental information to siteparam: flags were vectors of 0/1 and thus not suitable to subset
#    - (drs) added overall aggregation option 'input_TranspirationCoeff'
#    - (drs) fixed TranspCoeffByVegType(): read in .csv table in two pieces: tr_input_TranspCoeff and tr_input_TranspCoeff_Code, so that tr_input_TranspCoeff is numeric and there can be no errors translating levels into numbers
#    - (drs) fixed max.duration(): circumvented that 'no non-missing arguments to max'
#    - (drs) fixed aggregation of 'input_TranspirationCoeff': if there is only one aggLs or no bottomL
#    - (drs) fixed get_Response_aggL(): transp and hydred output is for each soil layer for total and 3 vegtypes; # soil layer calculation was incorrect
#    - (drs) added to aggregation 'input_FractionVegetationComposition': C3, C4, and annual-grass fractions
#    - (drs) fixed create:soils: comparison of soil layer structure
#    - (drs) adjusted wrapper function circ_sd() because circular::sd.circular() can return NaN instead of 0 [in packageVersion("circular") <= "0.4.7"]
#    - (drs) scale TranspCoeffByVegType() to 1 as SOILWAT2 does: co/sum(co)
#    - (drs) added output option 'input_SoilProfile'
#    - (drs) added 'adjustType' option to function TranspCoeffByVegType(): with 'positive' as recommended for grasses (built-in Soilwat) and 'inverse' as recommended for woody plants and forbs
#    - (drs) added output options 'dailyRechargeExtremes' and 'dailyHotDays'; added option for multiple Tmin values in 'dailyFrostInSnowfreePeriod'
#    - (drs) added output options 'dailySuitablePeriodsDuration', 'dailySuitablePeriodsAvailableWater', and 'dailySuitablePeriodsDrySpells'
#    - (drs) fixed 'startDoyOfDuration' and 'endDoyOfDuration': picked first/last period of length of first suitable period instead of picking first/last suitable period of length of first/last suitable period
Burke-Lauenroth-Lab/SoilWat_R_Wrapper documentation built on Aug. 14, 2020, 5:17 p.m.