chooseSpatialModelOnIC.asrtests: Uses information criteria to choose the best fitting spatial...

View source: R/spatial.funcs.v7.r

chooseSpatialModelOnIC.asrtestsR Documentation

Uses information criteria to choose the best fitting spatial model for accounting for local spatial variation.

Description

For a response variable measured on a potentially irregular grid of rows and columns of the units, uses information criteria (IC) to decide whether the fit and parsimony of the model fitted to a set of data can be improved by adding, to the fitted model stored in the supplied asrtests.object, one of the following spatial models to account for the local spatial variation: (i) a two-dimensional first-order autocorrelation model, (ii) a two-dimensional tensor-product natural cubic smoothing spline model (TPNCSS), (iii) a two-dimensional tensor-product penalized P-spline model (TPPCS) model, or (iv) a two-dimensional tensor-product penalized linear spline model with first-difference penalties (TPP1LS). The models from which to select can be reduced to a subset of these four models. For each model, a term from the spatial model is only added to the supplied model if the IC of the supplied model is decreased with the addition of that term. If no term improves the IC when a local spatial variation model is added, then the supplied, nonspatial model will be returned. The data can be arranged in sections, for each of which there is a grid and for which the model is to be fitted separately. Also, the rows and columns of a grid are not necessarily one observational unit wide. For TPPCS models, the improvement in the fit from rotating the eigenvectors of the penalty matrix can be investigated; if there is no improvement, the unrotated fit will be returned.

One or more rows is added to the test.summary data.frame of the asrtests.object, for each section and each spatial model, stating whether or not the new model has been swapped for a model in which the spatial model has been added to the supplied model. Convergence in fitting the model is checked and a note included in the action if there was not. All components of the asrtests.object are updated to exhibit the differences between the supplied and any new model.

Usage

## S3 method for class 'asrtests'
chooseSpatialModelOnIC(asrtests.obj, trySpatial = "all", 
                       sections = NULL, 
                       row.covar = "cRow", col.covar = "cCol", 
                       row.factor = "Row", col.factor = "Col", 
                       corr.funcs = c("ar1", "ar1"), 
                       row.corrFitfirst = TRUE, allow.corrsJointFit = TRUE, 
                       dropRowterm = NULL, dropColterm = NULL, 
                       nsegs = NULL, nestorder = c(1,1), 
                       usRandLinCoeffs = TRUE, 
                       rotateX = FALSE, ngridangles = c(18, 18), 
                       which.rotacriterion = "AIC", nrotacores = 1, 
                       asreml.option = "mbf", tpps4mbf.obj = NULL, 
                       allow.unconverged = FALSE, 
                       allow.fixedcorrelation = FALSE,
                       checkboundaryonly = FALSE, update = FALSE, 
                       maxit = 30, IClikelihood = "full", which.IC = "AIC", 
                       return.asrts = "best", ...)

Arguments

asrtests.obj

An asrtests.object containing the components (i) asreml.obj, (ii) wald.tab, and (iii) test.summary.

trySpatial

A character string nominating the types of spatial model whose fits are to be assessed. Possible values are none, corr, TPNCSS, TPPCS, and TPP1LS. If set to none, then just the supplied nonspatial model and the information about its information criteria will be returned.

sections

A single character string that specifies the name of the column in the data.frame that contains the factor that identifies different sections of the data to which separate spatial models are to be fitted. Note that, for other terms that involve sections in the random formula, there should be separate terms for each level of sections. For example, in a blocked experiment involving multiple sites, there should be a sum of separate term for the Blocks at each Site i.e. a formula that contains terms like at(Site, i):Block for each site and these are separated by '+'. Otherwise, the combined term (e.g. Site:Block) will impact on the fitting of the local spatial models for the different Sites. Similarly, if considerations other than the fitting of spatial models do not require a residual variance structure e.g. heterogeneous residual variances depending on treatments, there should be separate units terms for each of the sections. This will allow the fitting of a nugget variance for each of the sections. To achieve, this use the sum of dsum terms for each of the sections, each term being of the form dsum(~ units | Section, levels = list(i)) where i is a level of Section. It is important to specify separate terms, rather than compound terms obtained by specifying multiple levels in the list or leaving levels out altogether. Separate terms are needed so that terms for individual sections can be manipulated independently.

row.covar

A single character string nominating a numeric that contains the values of a centred covariate indexing the rows of a grid. The numeric must be a column in the data.frame stored in the asreml.obj that is a component of the supplied asrtests.obj.

col.covar

A single character string nominating a numeric that contains the values of a centred covariate indexing the columns of a grid. The numeric must be a column in the data.frame stored in the asreml.obj that is a component of the supplied asrtests.obj.

row.factor

A single character string nominating a factor that indexes the rows of a grid that are to be one dimension of a spatial correlation model. The factor must a column in the data.frame stored in the asreml.obj that is a component of the supplied asrtests.obj.

col.factor

A single character string nominating a factor that indexes the columns of a grid that are to be one dimension of a spatial correlation model. The factor must a column in the data.frame stored in the asreml.obj that is a component of the supplied asrtests.obj.

corr.funcs

A single character string of length two that specifies the asreml one-dimensional correlation or variance model function for the row and column dimensions of a two-dimensional separable spatial correlation model to be fitted when spatial.model is corr; the two-dimensional model is fitted as a random term. If a correlation or variance model is not to be investigated for one of the dimensions, specify "" for that dimension.

row.corrFitfirst

A logical. If TRUE then, in fitting the model for spatial.model set to corr, the row correlation or variance function is fitted first, followed by the addition of the column correlation or variance function. If FALSE, the order of fitting is reversed.

allow.corrsJointFit

A logical which, if TRUE, will allow the simultaneous fitting of correlation functions for the two dimensions of the grid when separate fits have failed to fit any correlation functions. This argument is available for when a joint fit hangs the system.

dropRowterm

A single character string nominating a factor in the data.frame that has as many levels as there are unique values in row.covar. This argument is applicable for spatial.model set to TPNCSS or TPPS. It is used to remove a term corresponding to the dropRowterm and a random row deviations term based on row.covar will be included in the model. If the argument is NULL, it is assumed that such a term is not included in the fitted model stored in asrtests.obj.

dropColterm

A single character string nominating a factor in the data.frame that has as many levels as there are unique values in col.covar. This argument is applicable for spatial.model set to TPNCSS or TPPS. It is used to remove a term corresponding to the dropColterm and a random column deviations term based on col.covar will be included in the model. If the argument is NULL, it is assumed that such a term is not included in the fitted model stored in asrtests.obj.

nsegs

A pair of numeric values giving the number of segments into which the column and row ranges are to be split, respectively, for fitting a P-spline model (TPPS) (each value specifies the number of internal knots + 1). If not specified, then (number of unique values - 1) is used in each dimension; for a grid layout with equal spacing, this gives a knot at each data value. If sections is not NULL and the grid differs between the sections, then nsegs will differ between the sections.

nestorder

A numeric of length 2. The order of nesting for column and row dimensions, respectively, in fitting a P-spline model (TPPS). A value of 1 specifies no nesting, a value of 2 generates a spline with half the number of segments in that dimension, etc. The number of segments in each direction must be a multiple of the order of nesting.

usRandLinCoeffs

A logical which, if TRUE, will attempt to fit an unstructured variance model to the constant and linear terms in the interactions for constant and linear terms in one grid dimension interacting with smoooth terms in the second grid dimension. The unstructured variance model can only be fitted if both the constant and linear interaction terms have been retained in the fitted model. This argument can be used to omit the attempt to fit an unstructured variance model when the attempt results in a stystem error.

rotateX

A logical indicating whether to rotate the eigenvectors of the penalty matrix, as described by Piepho, Boer and Williams (2022), when fitting a P-spline (TPSS). Setting rotateX to TRUE results in an optimized rotation produced by exploring a two-dimensional grid of rotation angle pairs, and for each pair analyzing the data under a model that omits the random interaction terms. The angle pair with the minimum deviance is used to apply an optimized rotation. Rotation of the eigenvectors is only relevant for difforder values greater than 1 and has only been implemented for difforder equal to 2.

ngridangles

A numeric of length 2. The numbers of angles between 0 and 90 degrees for each of the row and column dimensions to be used in determining the optimal pair of angles for rotating the eignevectors of the penalty matrix of a P-spline (TPSS). Specifying factors of 90 will result in integer-valued angles. The number of grid points, and hence re-analyses will be the product of the values of (ngridangles + 1).

which.rotacriterion

A single character string nominating which of the criteria, out of the deviance, the likelihood, the AIC and the BIC, is to be used in determining the optimal rotation of the eigenvectors of the penalty matrix. The deviance uses the REML value computed by asreml; the other criteria use the full likelihood, evaluated using the REML estimates, that is computed by infoCriteria.asreml.

nrotacores

A numeric specifying the number of cores to deploy for running the analyses required to search the two-diemsional grid of rotation angles when rotateX is TRUE. Parallel processing has been implemented for analyzing, for each column angle, the set of angles to be investigated for the row dimension. The default value of one means that parallel processing will not be used. The value chosen for nrotacores needs to balanced against the other processes that are using parallel processing at the same time.

asreml.option

A single character string specifying whether the grp or mbf methods are to be used to supply externally formed covariate matrices to asreml when fitting a P-spline (TPSS). Compared to the mbf method, the grp method is somewhat faster, but creates large asrtests.objects for which the time it takes to save them can exceed any gains in execution speed. The grp method adds columns to the data.frame containing the data. On the other hand, the mbf method adds only the fixed covariates to data and stores the random covariates in the environment of the internal function that calls the spline-fitting function; there are three smaller data.frames for each section that are not stored in the asreml.object resulting from the fitted model.

tpps4mbf.obj

An object made with makeTPPSplineMats.data.frame that contains the spline basis information for fitting P-splines. The argument tpps4mbf.obj only needs to be set when the mbf option of asreml.option is being used and it is desired to use mbf data.frames that have been created and stored prior to calling chooseSpatialModelOnIC.asrtests. If tpps4mbf.obj is NULL,
makeTPPSplineMats.data.frame will be called internally to produce the required mbf data.frames.

allow.unconverged

A logical indicating whether to accept a new model even when it does not converge. If FALSE and the fit of the new model does not converge, the supplied asrtests.obj is returned. Also, if FALSE and the fit of the new model has converged, but that of the old model has not, the new model will be accepted.

allow.fixedcorrelation

A logical indicating whether to accept a new model even when it contains correlations in the model whose values have been designated as fixed, bound or singular. If FALSE and the new model contains correlations whose values have not been able to be estimated, the supplied asrtests.obj is returned. The fit in the asreml.obj component of the supplied asrtests.obj will also be tested and a warning issued if both fixed correlations are found in it and allow.fixedcorrelation is FALSE.

checkboundaryonly

If TRUE then boundary and singular terms are not removed by rmboundary.asrtests; a warning is issued instead. Note that, for correlation models, the fitting of each dimension and the test for a nugget term are performed with checkboundaryonly set to TRUE and its supplied setting only honoured using a call to rmboundary.asrtests immediately prior to returning the final result of the fitting.

update

If TRUE, and set.terms is NULL, then newfit.asreml is called to fit the model to be tested, using the values of the variance parameters stored in the asreml.object, that is stored in asrtests.obj, as starting values. If FALSE or set.terms is not NULL, then newfit.asreml will not use the stored variance parameter values as starting values when fitting the new model, the only modifications being (i) to fit aptial terms and (ii) those specified via ....

which.IC

A character specifying the information criterion to be used in selecting the best model. Possible values are AIC and BIC. The value of the criterion for supplied model must exceed that for changed model for the changed model to be returned. (For choosing the rotation angle of the eigenvectors of the penalty matrix, see which.rotacriterion.

maxit

A numeric specifying the maximum number of iterations that asreml should perform in fitting a model.

IClikelihood

A character specifying whether Restricted Maximum Likelihood (REML) or the full likelihood, evaluated using REML estimates, (full) are to be used in calculating the information criteria to be included in the test.summary of an asrtests.object or to be used in choosing the best model.

return.asrts

A character string specifying whether the asrtests.object for the best fitting model (smallest AIC or BIC) is returned or the asrtests.objects resulting from the attempted fits of all of the models specifed using trySpatial are returned.

...

Further arguments passed to changeModelOnIC.asrtests, asreml and tpsmmb.

Details

For each spatial model that is to be fitted, a fitted spatial model is only returned if it improves the fit over an above that achieved with the model fit supplied in the asrtests.obj, because terms in the spatial model are not added unless model fit is improved by their addition as measured by an IC. If return.asrts is all, then this applies to each spatial model specified by trySpatial. To force a spatial model to be fitted use addSpatialModel.asrtests. The model fit supplied in the asrtests.obj should not include terms that will be included in any local spatial model. All spatial model terms are fitted as fixed or random. Consequently, the residual model does not have to be iid. The improvement in the fit resulting from the addition of a spatial model to the supplied model is evaluated. Note that the data must be in the order that correponds to the residual argument with a variable to the right of another variable changes levels in the data frame faster than those of the preceding variables e.g. Row:Column implies that all levels for Column in consecutive rows of the data.frame with a single Row level.

For the corr spatial model, the default model is an autocorrelation model of order one (ar1) for each dimension. However, any of the single dimension correlation/variance models from asreml can be specified for each dimension, as can no correlation model for a dimension; the models for the two dimensions can differ. Using a forward selection procedure, a series of models are tried, without removing boundary or singular terms, beginning with the addition of row correlation and followed by the addition of column correlation or, if the row.corrFitfirst is set to FALSE, the reverse order. If the fitting of the first-fitted correlation did not result in a model change because the fitting did not converge or correlations were fixed, but the fit of the second correlation was successful, then adding the first correlation will be retried. If one of the metric correlation functions is specified (e.g. exp), then the row.covar or col.covar will be used in the spatial model. However, because the correlations are fitted separately for the two dimensions, the row.factor and col.factor are needed for all models and is used for a dimension that does not involve a correlation/variance function for the fit being performed. Also, the correlation models are fitted as random terms and so the correlation model will include a variance parameter for the grid even when ar1 is used to specify the correlation model, i.e. the model fitted is a variance model and there is no difference between ar1 and ar1v in fitting the model. The variance parameter for this term represents the spatial variance and the fit necessarily includes a nugget term, this being the residual variance. If any correlation is retained in the model, for a section if sections is not NULL, then the need for a nuggest term is assessed by fixing the corresponding residual variance to one, unless there are multiple residual variances and these are not related to the sections. Once the fitting of the correlation model has been completed, the rmboundary function will be executed with the checkboundaryonly value supplied in the chooseSpatialModelOnIC.asrtests call. Finally, checking for bound and singular random terms associated with the correlation model and residual terms will be carried out when there are correlation terms in the model and checkboundaryonly has been set to FALSE; as many as possible will be removed from the fitted model, in some cases by fixing variance terms to one.

The tensor-product natural-cubic-smoothing-spline (TPNCSS) spatial model is as decribed by Verbyla et al. (2018), the tensor-product penalized-cubic-spline (TPPCS) model is similar to that described by Rodriguez-Alvarez et al. (2018), and the tensor-product, first-difference-penalty, linear spline (TPP1LS) model is amongst those described by Piepho, Boer and Williams (2022). The fixed terms for the spline models are row.covar + col.covar + row.covar:col.covar and the random terms are spl(row.covar) + spl(col.covar) + dev(row.covar) + dev(col.covar) + spl(row.covar):col.covar + row.covar:spl(col.covar) + spl(row.covar):spl(col.covar), except that spl(row.covar) + spl(col.covar) is not included in TPPCS and TPP1LS models. The supplied model should not include any of these terms. However, any fixed or random main-effect Row or Column term that has been included as an initial model for comparison with a spatial model can be removed prior to fitting the spatial model using dropRowterm or dropColterm. For the penalized P-spline models with second-order differencing, the model matrices used to fit the random terms spl(row.covar):col.covar and row.covar:spl(col.covar) are transformed using the spectral decomposition of their penalty matrices, and unstructured variance models across the columns of each of them will be tried. For TPPCS, it is also possible to investigate the rotation of the penalty matrix eigenvectors for the random terms spl(row.covar):col.covar and row.covar:spl(col.covar) (for more information see Piepho, Boer and Williams, 2022). The investigation takes the form of a search over a grid of rotation angles for a reduced model; the fit of the full model wuth rotation using the optimal rotation angles will only be returned if it improves on the fit of the full, unrotated model.

The TPPCS and TPP1LS models are fitted using functions from the R package TPSbits authored by Sue Welham (2022). There are two methods for supplying the spline basis information produced by tpsmmb to asreml. The grp method adds it to the data.frame supplied in the data argument of the asreml call. The mbf method creates smaller data.frames with the spline basis information in the same environment as the internal function that calls the spline-fitting function. If it is desired to use in a later session, an asreml function, or asrtests function that calls asreml, (e.g. predict.asreml, predictPlus.asreml, or changeTerms.asrtests) on an asreml.object created using mbf terms, then the mbf data.frames will need to be recreated using makeTPPSplineMats.data.frame in the new session, supplying, if there has been rotation of the penalty matrix eigenvectors, the theta values that are returned as the attribute theta.opt of the asreml.obj.

All models utlize the function changeModelOnIC.asrtests to assess the model fit, the information critera used in assessing the fit being calculated using infoCriteria. Arguments from tpsmmb and changeModelOnIC.asrtests can be supplied in calls to chooseSpatialModelOnIC.asrtests and will be passed on to the relevant function throught the ellipses argument (...).

The data for experiment can be divided into sections and an attempt to fit the same spatial model to each is made. The fit may differ for each of the sections, but the fit over all of the sections is assessed. For more detail see sections above.

Each combination of a row.coords and a col.coords does not have to specify a single observation; for example, to fit a local spatial model to the main units of a split-unit design, each combination would correspond to a main unit and all subunits of the main unit would would have the same combination.

Value

A list containing four components: (i) asrts, (ii) spatial.IC, (iii) best.spatial.mod, and (iv) best.spatial.IC.

The component asrts itself holds a list of one or more asrtests.objects, either the best overall out of the supplied model and the spatial models, or, for each spatial model, the best out of the supplied model and that spatial model. Each asrtests.object contains the components: (i) asreml.obj, (ii) wald.tab, and (iii) test.summary. If the asrtests.object is the result of fitting a TPPCS model with an exploration of the rotation of the eigenvectors of the penalty matrix for the linear components, then the asreml.obj will have an attribute theta.opt that contains the optimal rotation angles of the eigenvectors.

The spatial.IC component holds a data.frame with summary of the values of the information criteria for the supplied model and those resulting from adding the spatial model to the supplied model. In the se of a sptial correlation model, the information criteira for the selected spatial correlation model is returned. If a spatial model could not be fitted, then all returned values will be NA).

The best.spatial component is a character giving the name of the best spatial model, and best.spatial.AIC gives the value of its AIC.

Author(s)

Chris Brien

References

Piepho, H.-P., Boer, M. P., & Williams, E. R. (2022). Two-dimensional P-spline smoothing for spatial analysis of plant breeding trials. Biometrical Journal, 64, 835-857.

Rodriguez-Alvarez, M. X., Boer, M. P., van Eeuwijk, F. A., & Eilers, P. H. C. (2018). Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statistics, 23, 52-71.

Verbyla, A. P., De Faveri, J., Wilkie, J. D., & Lewis, T. (2018). Tensor Cubic Smoothing Splines in Designed Experiments Requiring Residual Modelling. Journal of Agricultural, Biological and Environmental Statistics, 23(4), 478-508.

Welham, S. J. (2022) TPSbits: Creates Structures to Enable Fitting and Examination of 2D Tensor-Product Splines using ASReml-R. Version 1.0.0 https://mmade.org/tpsbits/

See Also

as.asrtests, makeTPPSplineMats.data.frame, addSpatialModelOnIC.asrtests,
addSpatialModel.asrtests, changeModelOnIC.asrtests, changeTerms.asrtests,
rmboundary.asrtests, testranfix.asrtests, testresidual.asrtests, newfit.asreml,
reparamSigDevn.asrtests, changeTerms.asrtests, infoCriteria.asreml

Examples

## Not run: 

data(Wheat.dat)

#Add row and column covariates
Wheat.dat <- within(Wheat.dat, 
                    {
                      cColumn <- dae::as.numfac(Column)
                      cColumn <- cColumn  - mean(unique(cColumn))
                      cRow <- dae::as.numfac(Row)
                      cRow <- cRow - mean(unique(cRow))
                    })

#Fit initial model
current.asr <- asreml(yield ~ Rep + WithinColPairs + Variety, 
                      random = ~ Row + Column,
                      data=Wheat.dat)

#Create an asrtests object, removing boundary terms
current.asrt <- as.asrtests(current.asr, NULL, NULL, 
                            label = "Random Row and Column effects")
current.asrt <- rmboundary(current.asrt)

# Choose the best of four models for the local spatial variation
current.asrt <- chooseSpatialModelOnIC(current.asrt, 
                                       row.covar = "cRow", col.covar = "cColumn",
                                       dropRowterm = "Row", dropColterm = "Column",
                                       asreml.option = "grp")

## End(Not run)

asremlPlus documentation built on Nov. 5, 2023, 5:07 p.m.