intervalPVA.data.frame: Selects a subset of variables using Principal Variable...

intervalPVA.data.frameR Documentation

Selects a subset of variables using Principal Variable Analysis (PVA), based on the observed values within a specified time interval

Description

Principal Variable Analysis (PVA) (Cumming and Wooff, 2007) selects a subset from a set of the variables such that the variables in the subset are as uncorrelated as possible, in an effort to ensure that all aspects of the variation in the data are covered. Here, all observations in a specified time interval are used for calculation the correlations on which the selection is based.

Usage

## S3 method for class 'data.frame'
intervalPVA(obj, responses, times = "Days", start.time, end.time, 
            nvarselect = NULL, p.variance = 1, include = NULL, 
            plot = TRUE, ...)

Arguments

obj

A data.frame containing the columns of variables from which the selection is to be made.

responses

A character giving the names of the columns in data from which the variables are to be selected.

times

A character giving the name of the column in data containing the times at which the data was collected, either as a numeric, factor, or character. It will be used to identify the subset and, if a factor or character, the values should be numerics stored as characters.

start.time

A numeric giving the time, in terms of values in times, at which the time interval begins; observations at this time and up to and including end.time will be included.

end.time

A numeric giving the time, in terms of values in times, at the end of the interval; observations after this time will not be included.

nvarselect

A numeric specifying the number of variables to be selected, which includes those listed in include. If nvarselect = 1, as many variables are selected as is need to satisfy p.variance.

p.variance

A numeric specifying the minimum proportion of the variance that the selected variables must account for,

include

A character giving the names of the columns in data for the variables whose selection is mandatory.

plot

A logical indicating whether a plot of the cumulative proportion of the variance explained is to be produced.

...

allows passing of arguments to other functions.

Details

The variable that is most correlated with the other variables is selected first for inclusion. The partial correlation for each of the remaining variables, given the first selected variable, is calculated and the most correlated of these variables is selects for inclusion next. Then the partial correlations are adjust for the second included variables. This process is repeated until the specified criteria have been satisfied. The possibilities are to:

  1. the default (nvarselect = NULL and p.variance = 1) select all variables in increasing order of amount of information they provide;

  2. select exactly nvarselect variables;

  3. select just enough variables, up to a maximum of nvarselect variables, to explain at least p.variance*100 per cent of the total variance.

Value

A data.frame giving the results of the variable selection. It will contain the columns Variable, Selected, h.partial, Added.Propn and Cumulative.Propn.

Author(s)

Chris Brien

References

Cumming, J. A. and D. A. Wooff (2007) Dimension reduction via principal variables. Computational Statistics and Data Analysis, 52, 550–565.

See Also

PVA, rcontrib

Examples

data(exampleData)
longi.dat <- prepImageData(data=raw.dat, smarthouse.lev=1)
longi.dat <- within(longi.dat, 
                    {
                      Max.Height <- pmax(Max.Dist.Above.Horizon.Line.SV1,  
                                         Max.Dist.Above.Horizon.Line.SV2)
                      Density <- PSA/Max.Height
                      PSA.SV = (PSA.SV1 + PSA.SV2) / 2
                      Image.Biomass = PSA.SV * (PSA.TV^0.5)
                      Centre.Mass <- (Center.Of.Mass.Y.SV1 + Center.Of.Mass.Y.SV2) / 2
                      Compactness.SV = (Compactness.SV1 + Compactness.SV2) / 2
                    })
responses <- c("PSA","PSA.SV","PSA.TV", "Image.Biomass", "Max.Height","Centre.Mass",
               "Density", "Compactness.TV", "Compactness.SV")
results <-  intervalPVA(longi.dat, responses, times = "DAP", 
                        start.time = "31", end.time = "31",
                        p.variance=0.9, plot = FALSE)

growthPheno documentation built on Sept. 11, 2024, 6:42 p.m.