ImputeSuperIndividuals_StoX3: Impute missing super-individual data

View source: R/ImputeSuperIndividuals_StoX3.R

ImputeSuperIndividuals_StoX3R Documentation

Impute missing super-individual data

Description

WARNING, DEPRECATED FUNCTION: This is the old imputation function used in StoX 3.0.0 through 3.6.2. The function contains a weakness when hauls are assigned to AcousticPSUs in more than one stratum in BioticAssignment. The resulting SuperIndividuals will then have duplicated individuals and consequently non-unique values in the Individual column, which are used to identify rows to impute from in this function. The result is that values are imputed only from the first of the rows with duplicated Individual, so that information in the other rows are not available, which may lead to incomplete imputation.

Usage

ImputeSuperIndividuals_StoX3(
  SuperIndividualsData,
  ImputationMethod = c("RandomSampling", "Regression"),
  ImputeAtMissing = character(),
  ImputeByEqual = character(),
  ToImpute = character(),
  ImputationLevels = c("Haul", "Stratum", "Survey"),
  Seed = 1,
  RegressionDefinition = c("FunctionParameter", "FunctionInput"),
  GroupingVariables = character(),
  RegressionModel = c("SimpleLinear", "Power"),
  RegressionTable = data.table::data.table(),
  Regression
)

Arguments

SuperIndividualsData

The SuperIndividualsData data.

ImputationMethod

The method to use for the imputation. Currently, only "RandomSampling" is implemented, but may be accompanied "Regression" in a coming release.

ImputeAtMissing

A single string naming the variable which when missing identifies target individuals to input data to. I.e., if ImputeAtMissing is missing for an individual, perform the imputation. In StoX 3.0.0 and older, ImputeAtMissing was hard coded to IndividualAge.

ImputeByEqual

A vector of strings naming the variable(s) which, when identical to the target individual, identifies the source individuals to impute data from. The source individuals need also to have non-missing ImputeAtMissing. In StoX 3.0.0 and older, ImputeByEqual was hard coded to c("SpeciesCategory","IndividualTotalLength").

ToImpute

A vector of strings naming the variable(s) to impute (copy to the target individual). Values that are not missing are not imputed. Note that values are only imputed when ImputeAtMissing is missing, so including many variables in ToImpute is only recommended if all these are present for the individuals (see Details). In StoX 3.0.0 and older, ToImpute was hard coded to all available variables of the BioticData contained in the SuperIndividualsData.

ImputationLevels

A vector of strings naming the levels at which to input, defaulted to c("Haul", "Stratum", "Survey"). To prevent imputation at the Survey level, use c("Haul", "Stratum").

Seed

An integer giving the seed to use for the random sampling used to obtain the imputed data.

RegressionDefinition

Character: A string naming the method to use, one of FunctionParameter to define the Regression on the fly in this function (using GroupingVariables, RegressionModel and RegressionTable), or FunctionInput to import Regression process data from a previously run process using the function

GroupingVariables

An optional vector of strings defining variables seving as grouping variables in the RegressionTable. Setting this adds the its elements as columns in the RegressionTable in the GUI.

RegressionModel

Character: A string naming the model to use for the regression. See Details for options.

RegressionTable

A table with one row defining the name of the dependent variable (column name DependentVariable), the name of the independent variable (column name IndependentVariable), and the Intersect and Slope if RegressionModel = "SimpleLinear" and Factor and Exponent if RegressionModel = "Power".

Regression

The Regression process data.

Details

For this reason the function is deprecated and the function ImputeSuperIndividuals, which considers the unique Individual column when imputing, should be used instead. However, due to the difference in the imputation method the results will differ between the two functions even when all Individual are unique. Existing StoX projects saved with StoX <= 3.6.2 will be changed to using ImputeSuperIndividuals_StoX3 when opening in StoX >= 4.0.0, but the recommendation is to change these projects to using ImputeSuperIndividuals instead.

For each (target) individual with missing value in ImputeAtMissing, identify all (source) individuals in the haul for which ImputeAtMissing is non-missing and for which the values in ImputeByEqual are identical to the target individual. Then sample one of these source individuals, and copy values of ToImpute to the target individual. Only values that are non-missing are copied from the sampled individual, and only missing values in the target individual are replaced. If no source individuals are found in the haul, expand the search to the stratum, and finally to the survey. If no source individuals are found in the survey, leave the target individual unchanged.

When ToImpute contains more variables than that given by ImputeAtMissing there is a risk that values remain missing even after successful imputation. E.g., if ImputeAtMissing is IndividualAge, and ToImpute includes IndividualRoundWeight, then the weight is only imputed when age is missing. Super-individuals with age but not weight will then still have missing weight. Variables that are naturally connected, such as IndividualRoundWeight and WeightMeasurement, or IndividualTotalLength and LengthResolution, should both be included in ToImpute.

Value

An object of StoX data type SuperIndividualsData.

See Also

SuperIndividuals for distributing Abundance to the Individuals.


StoXProject/RstoxBase documentation built on July 14, 2024, 9:39 a.m.