AggInputs: R6 Class for storing inputs to the 'AggDat' class

AggInputsR Documentation

R6 Class for storing inputs to the 'AggDat' class

Description

R6 class for for storing information needed for the 'AggDat' class that executes the aggregation methods. These include the field and year to aggregate data for, the locations to aggregate data to, the response variable to use, the experimental variable, the length of year to gather for, etc.

Inputs can be supplied directly to this class during instantiation, however this is NOT recommended except for advanced users. It is recommended that the user supplies the database connection and uses the interactive selection methods to select user inputs.

This class is passed to the 'AggDat' class that executes the methods for aggregating data and storing in the database. Most methods are executed in the database.

Public fields

dbCon

Database connection object connected to an OFPE formatted database, see DBCon class.

boundary_import

Yes/No, will user be uploading their own field boundary? Used for spatially querying the database for intersecting data.

boundary_location

Only relevant if boundary_import == "Yes". Is the location of the shapefile containing the field boundary to use for spatial queries.

fieldname

Name of the field to aggregate data for. Selected from the 'all_farms.fields' table of an OFPE formatted database.

farmername

Name of the farmer that owns the selected field.

respvar

Response variable to aggregate data for, select/input 'Yield', 'Protein', 'Satellite'. 'Satellite' data aggregates only remotely sensed data and does not include any on-farm collected data. This option is important because the user needs to aggregate 'Satellite' data for years that they would like to simulate management outcomes in during the analysis and simulation step of the OFPE data cycle.

expvar

Experimental variable to aggregate data for. Select or supply 'As-Applied Nitrogen' or 'As-Applied Seed Rate'. This is the type of input that was experimentally varied across the field as part of the on-farm experimentation.

cy_resp

The year of interest of the selected response variable. This is considered the 'current year' (CY), and differs from the 'previous year' (PY), which is the latest year before the CY during which the field was cropped. This is separate from the cy_exp variable because in some instances, applied data was categorized as or applied in the year prior (i.e. WW seeding rates occur in fall of the year before harvest in the following August).

py_resp

The year prior to the selected CY that a crop was harvested in the specified field. This is the latest year before the CY during which the field was cropped. If you do not data from any previous year, you can provide a year for labeling and annotations sake in output figures.

cy_exp

The year of interest of the selected experimental variable. This is the year that the experimental variable was applied to grow the crop in the selected 'cy_resp' year. This is separate from the cy_resp variable because in some instances, applied data was categorized as or applied in the year prior (i.e. WW seeding rates occur in fall of the year before harvest in the following August).

py_exp

The application year of the experimental variable that was used to grow the crop in the 'py_resp' year. If you do not data from any previous year, you can provide a year for labeling and annotations sake in output figures.

GRID

Determines the location of the aggregated data. Either select 'Grid' or 'Observed'. 'Grid' aggregates data to the centroids of a 10m grid laid across the field, while 'Observed' aggregates data to the locations of the observed response variable (yield or protein). Note that when 'Satellite' is selected there are no observed points to use, so the 'Grid' option is selected by default.

dat_used

Option for the length of year to use for data. Either select 'Decision Point' or 'Full Year'. In winter wheat conventional systems and organic spring wheat systems, there is a decision point around March 30th at which farmers must make decisions on their fertilizer or seeding rates, respectively. When making these decisions, farmers will only have data up to that decision point. This options determines the covariate data aggregated for the CY, where 'Decision Point' aggregates data for the CY from January 1st to March 30th of that year, while 'Full Year' aggregates data for the CY from January 1st to December 31st. This is a residual special use case for Paul Hegedus' 2020 ICPA work.

cy_resp_files

Vector of names. Based on the user's selected field and year, the database can be queried for the original file names of the data uploaded to the database. The user selects the correct file name that corresponds to the response variable data in the specified field and year. In this case the 'cy_resp' year. Multiple files are allowed for selection because of the cases in which multiple files correspond to the data in the given year and field (i.e. if harvest took multiple days etc.).

py_resp_files

Vector of names. Based on the user's selected field and year, the database can be queried for the original file names of the data uploaded to the database. The user selects the correct file name that corresponds to the response variable data in the specified field and year. In this case the 'py_resp' year. Multiple files are allowed for selection because of the cases in which multiple files correspond to the data in the given year and field (i.e. if harvest took multiple days etc.). If you do not have data available from a previous year and are not using the interactive input selectors, simply input "None", otherwise when using the interactive methods this is handled for you.

cy_exp_files

Data.frame with 2 columns, 'orig_file' and 'table'. Based on the user's selected field and year, the database can be queried for the original file names of the data uploaded to the database. The user selects the correct file name that corresponds to the experimental variable data in the specified field and year. In this case, the 'cy_exp' year. Multiple files are allowed for selection because of the cases in which multiple files correspond to the data in the given year and field (i.e. if application took multiple days etc.). The 'orig_file' column contains the name of the original filename uploaded to the database with the raw data and 'table' contains the table within the farmer schema that the data is housed. This is because some as-applied experimental data are point vectors and some are polygons.

py_exp_files

Data.frame with 2 columns, 'orig_file' and 'table'. Based on the user's selected field and year, the database can be queried for the original file names of the data uploaded to the database. The user selects the correct file name that corresponds to the experimental variable data in the specified field and year. In this case, the 'py_exp' year. Multiple files are allowed for selection because of the cases in which multiple files correspond to the data in the given year and field (i.e. if application took multiple days etc.). The 'orig_file' column contains the name of the original filename uploaded to the database with the raw data and 'table' contains the table within the farmer schema that the data is housed. This is because some as-applied experimental data are point vectors and some are polygons. If no data from the desired year is available, put "None" in the 'orig_file' column.

save_in_db

Yes/No. Logical, whether to save the aggregated data into the OFPE formatted database. Not an option if you are supplying your own boundary for which to aggregate data.

export

Yes/No. Logical, whether to export the aggregated data as a '.csv' file. If yes, the user will need to provide the 'export_name'.

export_name

If exporting the aggregated data as a '.csv', the user needs to specify the name of the file to export. This includes the file path.

cy_resp_col

Data.frame with 3 columns, 'resp', 'dist', and 'orig_file'. This data.frame contains the column containing data for the selected response variable ('resp') and the column (if any) that correspond to the distance between observations ('dist'). The 'dist' selection is optional (can be NA) and optionally used during a cleaning step where points are removed if they are more than 4SD from the mean distance between observations as this indicates that the equipment was moving at an abnormal speed and potentially resulting in erroneous measurements. The 'orig_file' is the same as those selected in the 'cy_resp_files' vector.

py_resp_col

Data.frame with 3 columns, 'resp', 'dist', and 'orig_file'. This data.frame contains the column containing data for the selected response variable ('resp') and the column (if any) that correspond to the distance between observations ('dist'). The 'dist' selection is optional and optionally used during a cleaning step where points are removed if they are more than 4SD from the mean distance between observations as this indicates that the equipment was moving at an abnormal speed and potentially resulting in erroneous measurements. The 'orig_file' is the same as those selected in the 'py_resp_files' vector.

cy_exp_col

Data.frame with 4 columns, 'EXP', 'dist', 'product', and 'orig_file'. This data.frame contains the column containing data for the selected experimental variable ('EXP') and the column (if any) that correspond to the distance between observations ('dist'). The 'dist' selection is optional (can be NA) and optionally used during a cleaning step where points are removed if they are more than 4SD from the mean distance between observations as this indicates that the equipment was moving at an abnormal speed and potentially resulting in erroneous measurements. The 'product' column is also optional, and not relevant for seeding rate data, and used if there is a column corresponding to the product applied. When left blank, it is assumed that the 'EXP' column contains data in lbs/acre and the user is not given an option to provide a conversion rate. It is good practice to always select a column (even if no specific 'product' column) here to explicitly state a conversion factor. The 'orig_file' is the same as those selected in the 'cy_exp_files' table.

py_exp_col

Data.frame with 4 columns, 'EXP', 'dist', 'product', and 'orig_file'. This data.frame contains the column containing data for the selected experimental variable ('EXP') and the column (if any) that correspond to the distance between observations ('dist'). The 'dist' selection is optional (can be NA) and optionally used during a cleaning step where points are removed if they are more than 4SD from the mean distance between observations as this indicates that the equipment was moving at an abnormal speed and potentially resulting in erroneous measurements. The 'product' column is also optional, and not relevant for seeding rate data, and used if there is a column corresponding to the product applied. When left blank, it is assumed that the 'EXP' column contains data in lbs/acre and the user is not given an option to provide a conversion rate. It is good practice to always select a column (even if no specific 'product' column) here to explicitly state a conversion factor. The 'orig_file' is the same as those selected in the 'py_exp_files' table.

cy_exp_conv

Data.frame with 3 columns, 'FORMULA', 'conversion', and 'orig_file'. Based on the users selection of 'product' in the cy_exp_col data.frame, the formula from the 'product' column is extracted and used to ask the user the desired conversion factor from the product applied to lbs per acre. Again, this is only really applicable for fertilizer rates unless seeding rates are reported in units besides lbs per acre. The 'orig_file' is the same as those selected in the 'cy_exp_files' table.

py_exp_conv

Data.frame with 3 columns, 'FORMULA', 'conversion', and 'orig_file'. Based on the users selection of 'product' in the cy_exp_col data.frame, the formula from the 'product' column is extracted and used to ask the user the desired conversion factor from the product applied to lbs per acre. Again, this is only really applicable for fertilizer rates unless seeding rates are reported in units besides lbs per acre. The 'orig_file' is the same as those selected in the 'cy_exp_files' table.

size

Optional, the size, in meters, to make a grid across the field. Mostly necessary for when 'Grid' is selected for the 'GRID' parameter, however is the scale at which the finest resolution cleaning of data occurs. Defaults to 10m if left NULL.

Methods

Public methods


Method new()

Initialize an object for storing aggregation inputs.

Usage
AggInputs$new(
  dbCon,
  boundary_import = "No",
  boundary_location = NULL,
  fieldname = NULL,
  farmername = NULL,
  respvar = NULL,
  expvar = NULL,
  cy_resp = NULL,
  py_resp = NULL,
  cy_exp = NULL,
  py_exp = NULL,
  GRID = NULL,
  dat_used = NULL,
  cy_resp_files = NULL,
  py_resp_files = NULL,
  cy_exp_files = NULL,
  py_exp_files = NULL,
  save_in_db = NULL,
  export = NULL,
  export_name = NULL,
  cy_resp_col = NULL,
  py_resp_col = NULL,
  cy_exp_col = NULL,
  py_exp_col = NULL,
  cy_exp_conv = NULL,
  py_exp_conv = NULL,
  size = 10
)
Arguments
dbCon

Database connection object connected to an OFPE formatted database, see DBCon class.

boundary_import

Yes/No, will user be uploading their own field boundary? Used for spatially querying the database for intersecting data.

boundary_location

Only relevant if boundary_import == "Yes". Is the location of the shapefile containing the field boundary to use for spatial queries.

fieldname

Name of the field to aggregate data for. Selected from the 'all_farms.fields' table of an OFPE formatted database.

farmername

Name of the farmer that owns the selected field.

respvar

Response variable to aggregate data for, select/input 'Yield', 'Protein', 'Satellite'. 'Satellite' data aggregates only remotely sensed data and does not include any on-farm collected data.

expvar

Experimental variable to aggregate data for, select/input 'As-Applied Nitrogen' or 'As-Applied Seed Rate'. This is the type of input that was experimentally varied across the field as part of the on-farm experimentation.

cy_resp

The year of interest of the selected response variable. This is considered the 'current year' (CY), and differs from the 'previous year' (PY), which is the latest year before the CY during which the field was cropped. This is separate from the cy_exp variable because in some instances, applied data was categorized as or applied in the year prior (i.e. WW seeding rates occur in fall of the year before harvest in the following August).

py_resp

The year prior to the selected CY that a crop was harvested in the specified field. This is the latest year before the CY during which the field was cropped. If you do not data from any previous year, you can provide a year for labeling and annotations sake in output figures.

cy_exp

The year of interest of the selected experimental variable. This is the year that the experimental variable was applied to grow the crop in the selected 'cy_resp' year. This is separate from the cy_resp variable because in some instances, applied data was categorized as or applied in the year prior (i.e. WW seeding rates occur in fall of the year before harvest in the following August).

py_exp

The application year of the experimental variable that was used to grow the crop in the 'py_resp' year. If you do not data from any previous year, you can provide a year for labeling and annotations sake in output figures.

GRID

Determines the location of the aggregated data. Either select 'Grid' or 'Observed'. 'Grid' aggregates data to the centroids of a 10m grid laid across the field, while 'Observed' aggregates data to the locations of the observed response variable (yield or protein). Note that when 'Satellite' is selected there are no observed points to use, so the 'Grid' option is selected by default.

dat_used

Option for the length of year to use for CY data. In winter wheat conventional systems and organic spring wheat systems, there is a decision point around March 30th at which farmers must make decisions on their fertilizer or seeding rates, respectively. When making these decisions, farmers will only have data up to that decision point. This options determines the covariate data aggregated for the CY, where 'Decision Point' aggregates data for the CY from January 1st to March 30th of that year, while 'Full Year' aggregates data for the CY from January 1st to December 31st. This is a residual special use case for Paul Hegedus' 2020 ICPA work.

cy_resp_files

Vector of names. Based on the user's selected field and year, the database can be queried for the original file names of the data uploaded to the database. The user selects the correct file name that corresponds to the response variable data in the specified field and year. In this case the 'cy_resp' year. Multiple files are allowed for selection because of the cases in which multiple files correspond to the data in the given year and field (i.e. if harvest took multiple days etc.).

py_resp_files

Vector of names. Based on the user's selected field and year, the database can be queried for the original file names of the data uploaded to the database. The user selects the correct file name that corresponds to the response variable data in the specified field and year. In this case the 'py_resp' year. Multiple files are allowed for selection because of the cases in which multiple files correspond to the data in the given year and field (i.e. if harvest took multiple days etc.). If you do not have data available from a previous year and are not using the interactive input selectors, simply input "None", otherwise when using the interactive methods this is handled for you.

cy_exp_files

Data.frame with 2 columns, 'orig_file' and 'table'. Based on the user's selected field and year, the database can be queried for the original file names of the data uploaded to the database. The user selects the correct file name that corresponds to the experimental variable data in the specified field and year. In this case, the 'cy_exp' year. Multiple files are allowed for selection because of the cases in which multiple files correspond to the data in the given year and field (i.e. if application took multiple days etc.). The 'orig_file' column contains the name of the original filename uploaded to the database with the raw data and 'table' contains the table within the farmer schema that the data is housed. This is because some as-applied experimental data are point vectors and some are polygons.

py_exp_files

Data.frame with 2 columns, 'orig_file' and 'table'. Based on the user's selected field and year, the database can be queried for the original file names of the data uploaded to the database. The user selects the correct file name that corresponds to the experimental variable data in the specified field and year. In this case, the 'py_exp' year. Multiple files are allowed for selection because of the cases in which multiple files correspond to the data in the given year and field (i.e. if application took multiple days etc.). The 'orig_file' column contains the name of the original filename uploaded to the database with the raw data and 'table' contains the table within the farmer schema that the data is housed. This is because some as-applied experimental data are point vectors and some are polygons. If no data from the desired year is available, put "None" in the 'orig_file' column.

save_in_db

Yes/No. Logical, whether to save the aggregated data into the OFPE formatted database. Not an option if you are supplying your own boundary for which to aggregate data.

export

Yes/No. Logical, whether to export the aggregated data as a '.csv' file. If yes, the user will need to provide the 'export_name'.

export_name

If exporting the aggregated data as a '.csv', the user needs to specify the name of the file to export. This includes the file path.

cy_resp_col

Data.frame with 3 columns, 'resp', 'dist', and 'orig_file'. This data.frame contains the column containing data for the selected response variable ('resp') and the column (if any) that correspond to the distance between observations ('dist'). The 'dist' selection is optional (can be NA) and optionally used during a cleaning step where points are removed if they are more than 4SD from the mean distance between observations as this indicates that the equipment was moving at an abnormal speed and potentially resulting in erroneous measurements. The 'orig_file' is the same as those selected in the 'cy_resp_files' vector.

py_resp_col

Data.frame with 3 columns, 'resp', 'dist', and 'orig_file'. This data.frame contains the column containing data for the selected response variable ('resp') and the column (if any) that correspond to the distance between observations ('dist'). The 'dist' selection is optional and optionally used during a cleaning step where points are removed if they are more than 4SD from the mean distance between observations as this indicates that the equipment was moving at an abnormal speed and potentially resulting in erroneous measurements. The 'orig_file' is the same as those selected in the 'py_resp_files' vector.

cy_exp_col

Data.frame with 4 columns, 'EXP', 'dist', 'product', and 'orig_file'. This data.frame contains the column containing data for the selected experimental variable ('EXP') and the column (if any) that correspond to the distance between observations ('dist'). The 'dist' selection is optional (can be NA) and optionally used during a cleaning step where points are removed if they are more than 4SD from the mean distance between observations as this indicates that the equipment was moving at an abnormal speed and potentially resulting in erroneous measurements. The 'product' column is also optional, and not relevant for seeding rate data, and used if there is a column corresponding to the product applied. When left blank, it is assumed that the 'EXP' column contains data in lbs/acre and the user is not given an option to provide a conversion rate. It is good practice to always select a column (even if no specific 'product' column) here to explicitly state a conversion factor. The 'orig_file' is the same as those selected in the 'cy_exp_files' table.

py_exp_col

Data.frame with 4 columns, 'EXP', 'dist', 'product', and 'orig_file'. This data.frame contains the column containing data for the selected experimental variable ('EXP') and the column (if any) that correspond to the distance between observations ('dist'). The 'dist' selection is optional (can be NA) and optionally used during a cleaning step where points are removed if they are more than 4SD from the mean distance between observations as this indicates that the equipment was moving at an abnormal speed and potentially resulting in erroneous measurements. The 'product' column is also optional, and not relevant for seeding rate data, and used if there is a column corresponding to the product applied. When left blank, it is assumed that the 'EXP' column contains data in lbs/acre and the user is not given an option to provide a conversion rate. It is good practice to always select a column (even if no specific 'product' column) here to explicitly state a conversion factor. The 'orig_file' is the same as those selected in the 'py_exp_files' table.

cy_exp_conv

Data.frame with 3 columns, 'FORMULA', 'conversion', and 'orig_file'. Based on the users selection of 'product' in the cy_exp_col data.frame, the formula from the 'product' column is extracted and used to ask the user the desired conversion factor from the product applied to lbs per acre. Again, this is only really applicable for fertilizer rates unless seeding rates are reported in units besides lbs per acre. The 'orig_file' is the same as those selected in the 'cy_exp_files' table.

py_exp_conv

Data.frame with 3 columns, 'FORMULA', 'conversion', and 'orig_file'. Based on the users selection of 'product' in the cy_exp_col data.frame, the formula from the 'product' column is extracted and used to ask the user the desired conversion factor from the product applied to lbs per acre. Again, this is only really applicable for fertilizer rates unless seeding rates are reported in units besides lbs per acre. The 'orig_file' is the same as those selected in the 'cy_exp_files' table.

size

Optional, the size, in meters, to make a grid across the field. Mostly necessary for when 'Grid' is selected for the 'GRID' parameter, however is the scale at which the finest resolution cleaning of data occurs. Defaults to 10m if left NULL.

Returns

A new 'AggInputs' object.


Method selectInputs()

Interactive method for selecting aggregation input options. The description below describes the process of interactively selecting the necessary parameters for automated data aggregation.

The user selects whether to import bounding box or select from field in database. If user uploads their own field boundary they are asked for the field name and to identify the associated farmer. If using a field boundary from the database the user simply selects which field.

Select the response variable to aggregate data for (yield or protein). Using this information the database is queried for years that data is available for this field. Also, if the user selected to only collect satellite data, the user can choose any year from 2000 to present to gather data from, however the user must have gathered data from that year in the database. This 'Satellite' data is used in the analysis and simulation step to simulate management outcomes under a selection of the years for which the the user aggregated 'Satellite' data. Also select the experimental variable to aggregate data for (as-applied nitrogen or as-applied seed rate).

Select a data constraint for determining the time span for which to gather data. If 'Decision Point' is selected, then data from the current year is gathered up until 03-31. If 'Full Year' is selected, then data from the current year is gathered past the decision point and harvest through the entire year to 12-31 of the current selected year.

Select the variable that was experimentally varied across the field. This package was developed with options for "As-Applied Nitrogen" and "As-Applied Seed Rate".

User needs to select whether to aggregate data to a grid or to use observed locations. This is required now because if using observed locations the user can only select one file, whereas with the grid option the user can select multiple files because data will be averaged to the grid cell centroid locations. This only applies to yield or protein data because by default satellite data is aggregated to the grid cell locations.

Select current year and previous year(s) to aggregate data for. If user imports their own field boundary it needs to be added to a temporary folder to do PostGIS functions within the database and not R.

Using the selected experimental variable the database is queried for years that data is available for this field. Select current year and previous year(s) to get experimental data for.

Get column names from the current year response variable table to identify that which corresponds to response and the distance column. The distance column is used to clean the data and will be removed eventually. The distance column is also not typically present in protein data and can be omitted as it is an optional argument to the cleaning function. If no response files for the current year are selected this section will be skipped.

Also, get column names from the previous year response variable table to identify that which corresponds to response and the distance column. The distance column is used to clean the data and will be removed eventually. The distance column is also not typically present in protein data and can be omitted as it is an optional argument to the cleaning function. If no previous year response variable selected this section will be skipped.

Get column names from the current year experimental data to identify the column that corresponds to the experimental variable. Get column names from the previous year experimental data to identify the column that corresponds to the experimental variable.

Fill in parameters for export, such as whether to save in the database or to export as a .csv file. If the user imported their own field boundary for aggregating data, it will not be saved in the database.

Usage
AggInputs$selectInputs()
Arguments
None

No arguments needed because passed in during class instantiation.

Returns

A completed 'AggInputs' object.


Method clone()

The objects of this class are cloneable with this method.

Usage
AggInputs$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

DBCon for the database connection class, AggDat for the class responsible for aggregating on-farm data, AggGEE for the class responsible for aggregating Google Earth Engine data.


paulhegedus/OFPE documentation built on Nov. 23, 2022, 5:09 a.m.