DatClass: R6 Class for storing inputs and data for analysis and...
In paulhegedus/OFPE: On-Farm Precision Experiments (OFPE) Data Management and Analysis Tools

DatClass

R Documentation

R6 Class for storing inputs and data for analysis and simulations

Description

R6 Class for storing user specified inputs and processing data for the analysis/simulation and Rx building steps of the OFPE data cycle. This object includes user selections such as the field and year of data to export from an OFPE database and the type of data (grid or observed) for analysis and simulation/prescription generation.

Inputs can be supplied directly to this class during instantiation, however this is NOT recommended except for advanced users. It is recommended that the user supplies the database connection and uses the interactive selection methods to select user inputs.

This class stores inputs from the user and has the methods for for exporting data from the database and processing the data for analysis, simulation, and prescription building.

Public fields

dbCon: Database connection object connected to an OFPE formatted database, see DBCon class.
farmername: Name of the farmer that owns the selected field.
fieldname: Name of the field for analysis. Selected from the 'all_farms.fields' table of an OFPE formatted database.
respvar: Response variable(s) to optimize experimental inputs based off of. The user can select 'Yield' and/or 'Protein' based on data availability. User must select at least 'Yield'.
expvar: Experimental variable to optimize, select/input 'As-Applied Nitrogen' or 'As-Applied Seed Rate'. This is the type of input that was experimentally varied across the field as part of the on-farm experimentation.
sys_type: Provide the type of system used in the experiment. This determines the price used for calculating net-return and for the net-return of the opposite type. Select from "Conventional" and "Organic". The net-returns will be calculated with the corresponding economic data based on this choice, and the 'NRopp' management scenario (see SimClass$executeSim) will be based on the opposite (e.g. if you are growing conventional wheat, the management outcome 'NRopp' shows the net-return calculated from organically grown wheat). In the example, organic prices are calculated from 0 N fertilizer rates, however with seeding rates it is purely the difference in the price received used to calculate net-return.
yldyears: The year(s) of interest for the yield response variables in the selected field. This must be a named list with the specified field names.
proyears: The year(s) of interest for the protein response variables in the selected field. This must be a named list with the specified field names.
mod_grid: Select whether to use gridded or observed data locations for the analysis step. See the 'AggInputs' class for more information on the 'GRID' option. The user must have aggregated data with the specified GRID option prior to this step. (i.e. you will not have access to data aggregated with the 'Grid' option if you have not executed the process of aggregation with the 'Grid' option selected. The same principle applies for the 'Observed' option. It is recommended that the analysis is performed with 'Observed' data, and for the simulation to be performed with 'Grid' data.
sim_grid: Select whether to use gridded or observed data locations for the simulation and subsequent prescription building step. See the 'AggInputs' class for more information on the 'GRID' option. The user must have aggregated data with the specified GRID option prior to this step. (i.e. you will not have access to data aggregated with the 'Grid' option if you have not executed the process of aggregation with the 'Grid' option selected. The same principle applies for the 'Observed' option. It is recommended that the analysis is performed with 'Observed' data, and for the simulation to be performed with 'Grid' data.
dat_used: Option for the length of year to use data in the analysis, simulation, and prescription building steps. See the 'AggInputs' class documentation for more information on the 'dat_used' selection.
center: TRUE/FALSE. Option for whether to center explanatory data around each explanatory variables mean or to use the raw observed explanatory varaible data. Centering is recommended as it puts variables on similar scales and makes the model fitting process less error prone.
split_pct: Select the percentage of data to use for the training dataset in the analysis step. The training dataset is used to fit the model to each of the crop responses. The difference will be split into a validation dataset that is used to evaluate the model performance on data it has not 'seen' before.
clean_rate: Select the maximum rate that could be realistically be applied by the application equipment (sprayer or seeder). This is used for a rudimentary cleaning of the data that removes observations with as-applied rates above this user supplied threshold. Rates above this threshold should be able to be classified as machine measurement errors. For example, based on knowledge of the prescription/ experiment applied and taking into account double applications on turns, a rate for as-applied nitrogen might be something like 300 - 400 lbs N/acre.
mod_dat: Based off of the user selections such as 'mod_grid', this is a named list for each response variable ('yld' and/or 'pro'). The data in each of these lists are processed and then split into training and validation datasets. This data is used for the model fitting and evaluations steps.
sim_dat: Based off of the user selections such as 'sim_grid', this is a named list for each year specified in the SimClass 'sim_years' field. The data in each of these lists are processed and used in the Monte Carlo simulation.
mod_num_means: Named vector of the means for each numerical covariate, including the experimental variable. This is used for converting centered data back to the original form. The centering process does not center four numerical variables; the x and y coordinates, the response variable (yld/pro), and the experimental variable. This is for the data specified from the analysis data inputs (grid specific).
sim_num_means: Named vector of the means for each numerical covariate, including the experimental variable. This is used for converting centered data back to the original form. The centering process does not center three numerical variables; the x and y coordinates, the response variable (yld/pro) and the experimental variable. This is for the data specified from the analysis data inputs (grid specific).
opp_sys_type: Opposite of the user selected system type ('sys_type'). This is used to select the correct price received to calculate 'NRopp' in the Monte Carlo simulation.
fieldname_codes: Data.frame with a column for the names of the fields selected by the user and a corresponding code. This is used in the simulation data when being passed to C++ functions as purely numeric matrices.
SI: Logical, whether to use SI units. If TRUE, yield and experimental data are converted to kg/ha. If FALSE, the default values from the database are used. These are bu/ac for yield and lbs/ac for experimental data (nitrogen or seed).

Methods

Method `new()`

Usage

DatClass$new(
  dbCon,
  farmername = NULL,
  fieldname = NULL,
  respvar = NULL,
  expvar = NULL,
  sys_type = NULL,
  yldyears = NULL,
  proyears = NULL,
  mod_grid = NULL,
  dat_used = NULL,
  center = NULL,
  split_pct = NULL,
  SI = FALSE,
  clean_rate = NULL
)

Arguments

dbCon: Database connection object connected to an OFPE formatted database, see DBCon class.
farmername: Name of the farmer that owns the selected field.
fieldname: Name of the field to for analysis. Selected from the 'all_farms.fields' table of an OFPE formatted database.
respvar: Response variable(s) to optimize experimental inputs based off of. The user can select 'Yield' and/or 'Protein' based on data availability. User must select at least 'Yield'.
expvar: Experimental variable to optimize, select/input 'As-Applied Nitrogen' or 'As-Applied Seed Rate'. This is the type of input that was experimentally varied across the field as part of the on-farm experimentation.
sys_type: Provide the type of system used in the experiment. This determines the price used for calculating net-return and for the net-return of the opposite type. Select from "Conventional" and "Organic". The net-returns will be calculated with the corresponding economic data based on this choice, and the 'NRopp' management scenario (see SimClass$executeSim) will be based on the opposite (e.g. if you are growing conventional wheat, the management outcome 'NRopp' shows the net-return calculated from organically grown wheat). In the example, organic prices are calculated from 0 N fertilizer rates, however with seeding rates it is purely the difference in the price received used to calculate net-return.
yldyears: The year(s) of interest for the yield response variables in the selected field. This must be a named list with the specified field names.
proyears: The year(s) of interest for the protein response variables in the selected field. This must be a named list with the specified field names.
mod_grid: Select whether to use gridded or observed data locations for the analysis step. See the 'AggInputs' class for more information on the 'GRID' option. The user must have aggregated data with the specified GRID option prior to this step. (i.e. you will not have access to data aggregated with the 'Grid' option if you have not executed the process of aggregation with the 'Grid' option selected. The same principle applies for the 'Observed' option. It is recommended that the analysis is performed with 'Observed' data, and for the simulation to be performed with 'Grid' data.
dat_used: Option for the length of year to use data in the analysis, simulation, and prescription building steps. See the 'AggInputs' class documentation for more information on the 'dat_used' selection.
center: TRUE/FALSE. Option for whether to center explanatory data around each explanatory variables mean or to use the raw observed explanatory varaible data. Centering is recommended as it puts variables on similar scales and makes the model fitting process less error prone.
split_pct: Select the percentage of data to use for the training dataset in the analysis step. The training dataset is used to fit the model to each of the crop responses. The difference will be split into a validation dataset that is used to evaluate the model performance on data it has not 'seen' before.
SI: Logical, whether to use SI units. If TRUE, yield and experimental data are converted to kg/ha. If FALSE, the default values from the database are used. These are bu/ac for yield and lbs/ac for experimental data (nitrogen or seed).
clean_rate: Select the maximum rate that could be realistically be applied by the application equipment (sprayer or seeder). This is used for a rudimentary cleaning of the data that removes observations with as-applied rates above this user supplied threshold. Rates above this threshold should be able to be classified as machine measurement errors. For example, based on knowledge of the prescription/ experiment applied and taking into account double applications on turns, a rate for as-applied nitrogen might be something like 300 - 400 lbs N/acre. NOTE: make sure to specify in the correct units, for example if SI = FALSE specify in lbs/ac, otherwise in kg/ha.

Returns

A new 'AggInputs' object.

Method `selectInputs()`

Interactive method for selecting inputs related to the data used in the analysis, simulation, and subsequent prescription generation steps. The description below describes the process of interactively selecting the necessary parameters needed for the automated analysis, simulation, and prescription building.

The user first selects a farmer for which they want to analyze a field from, which is used to compile a list of available fields ready for analysis, indicated by its presence in the farmername_a schema of the OFPE database.

The user then selects the response variables to optimize on and the experimental variable to optimize. The user must know what data is available for the specific field (i.e. if the user select 'Protein' they must have aggregated protein data for the specified field, or if the user selects 'As-Applied Seed Rate' seed rates must have been the experimental variable of interest when aggregating data).

The user then selects the location of aggregated data to use for both the analysis and simulation/prescription building steps. The user also needs to select the length of the year for which 'current' year data was aggregated for (March 30th decision point or the full year).

The user also has the choice of which vegetation index data to use as covariates, as well as the preferred source for precipitation and growing degree day data. Finally, the user has the option of whether to center covariate data or to use the raw observed data for analysis and simulation and the percent of data to use in the training data for model fitting. The rest of the data is withheld for validation.

Usage

DatClass$selectInputs()

Arguments

None: No arguments needed because passed in during class instantiation.

Returns

A 'DatClass' object with complete user selections.

Method `setupDat()`

This function calls the private methods for data gathering and processing. The data gather step takes the user selected inputs for the field, the response variables, and the data types ('mod_grid') and exports the appropriate data into a a list, called 'mod_dat' with lists, named for each response variable ('yld' and/or 'pro') with each data type data from all fields selected.

The processing step goes through each data frame contained in the nested 'mod_dat' list and trims the data based on the user selections for the vegetation index and precipitation and growing degree day sources. If the user selected to center the covariate data, the values of each variable will be subtracted from the mean of that variable. In this case, a named vector of each variable and the mean will be created for reverting back to observed values.

After this step, the data in 'mod_dat' is split into training and validation sets based on the percentage of data the user selected to include in the training dataset.

Usage

DatClass$setupDat()

Arguments

None: No arguments needed because passed in during class

Returns

A named list with training and validation data, called 'mod_dat', for each response variable ('yld' and/or 'pro').

Method `getSimDat()`

This function calls the private methods for data gathering and processing. The gathering process takes the vector of simulation years and gathers the appropriate 'sat' data from the OFPE database and then processes the data using the same parameters as for the data used in the model fitting process.

Usage

DatClass$getSimDat(sim_years)

Arguments

sim_years: Vector of years available in the database to gather to simulate management outcomes in.

Returns

A data.table with the user specified data for the simulation.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

DatClass$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

paulhegedus/OFPE
On-Farm Precision Experiments (OFPE) Data Management and Analysis Tools

DatClass: R6 Class for storing inputs and data for analysis and...
In paulhegedus/OFPE: On-Farm Precision Experiments (OFPE) Data Management and Analysis Tools

R6 Class for storing inputs and data for analysis and simulations

Description

Public fields

Methods

Public methods

Method `new()`

Usage

Arguments

Returns

Method `selectInputs()`

Usage

Arguments

Returns

Method `setupDat()`

Usage

Arguments

Returns

Method `getSimDat()`

Usage

Arguments

Returns

Method `clone()`

Usage

Arguments

See Also

Related to DatClass in paulhegedus/OFPE...

R Package Documentation

Browse R Packages

We want your feedback!

paulhegedus/OFPE On-Farm Precision Experiments (OFPE) Data Management and Analysis Tools

DatClass: R6 Class for storing inputs and data for analysis and... In paulhegedus/OFPE: On-Farm Precision Experiments (OFPE) Data Management and Analysis Tools

R6 Class for storing inputs and data for analysis and simulations

Description

Public fields

Methods

Public methods

Method new()

Usage

Arguments

Returns

Method selectInputs()

Usage

Arguments

Returns

Method setupDat()

Usage

Arguments

Returns

Method getSimDat()

Usage

Arguments

Returns

Method clone()

Usage

Arguments

See Also

Related to DatClass in paulhegedus/OFPE...

R Package Documentation

Browse R Packages

We want your feedback!

paulhegedus/OFPE
On-Farm Precision Experiments (OFPE) Data Management and Analysis Tools

DatClass: R6 Class for storing inputs and data for analysis and...
In paulhegedus/OFPE: On-Farm Precision Experiments (OFPE) Data Management and Analysis Tools

Method `new()`

Method `selectInputs()`

Method `setupDat()`

Method `getSimDat()`

Method `clone()`