# PVA.matrix: Selects a subset of variables using Principal Variable... In growthPheno: Functional Analysis of Phenotypic Growth Data to Smooth and Extract Traits

 PVA.matrix R Documentation

## Selects a subset of variables using Principal Variable Analysis (PVA) based on a correlation matrix

### Description

Principal Variable Analysis (PVA) (Cumming and Wooff, 2007) selects a subset from a set of the variables such that the variables in the subset are as uncorrelated as possible, in an effort to ensure that all aspects of the variation in the data are covered.

### Usage

``````## S3 method for class 'matrix'
PVA(obj, responses, nvarselect = NULL, p.variance = 1, include = NULL,
plot = TRUE, ...)``````

### Arguments

 `obj` A `matrix` containing the correlation matrix for the variables from which the selection is to be made. `responses` A `character` giving the names of the rows and columns in `obj`, being the names of the variables from which the selection is to be made. `nvarselect` A `numeric` specifying the number of variables to be selected, which includes those listed in `include`. If `nvarselect = 1`, as many variables are selected as is need to satisfy `p.variance`. `p.variance` A `numeric` specifying the minimum proportion of the variance that the selected variables must account for, `include` A `character` giving the names of the columns in `data` for the variables whose selection is mandatory. `plot` A `logical` indicating whether a plot of the cumulative proportion of the variance explained is to be produced. `...` allows passing of arguments to other functions

### Details

The variable that is most correlated with the other variables is selected first for inclusion. The partial correlation for each of the remaining variables, given the first selected variable, is calculated and the most correlated of these variables is selects for inclusion next. Then the partial correlations are adjust for the second included variables. This process is repeated until the specified criteria have been satisfied. The possibilities are:

1. the default (`nvarselect = NULL` and `p.variance = 1`), which selects all variables in increasing order of amount of information they provide;

2. to select exactly `nvarselect` variables;

3. to select just enough variables, up to a maximum of `nvarselect` variables, to explain at least `p.variance`*100 per cent of the total variance.

### Value

A `data.frame` giving the results of the variable selection. It will contain the columns `Variable`, `Selected`, `h.partial`, `Added.Propn` and `Cumulative.Propn`.

Chris Brien

### References

Cumming, J. A. and D. A. Wooff (2007) Dimension reduction via principal variables. Computational Statistics and Data Analysis, 52, 550–565.

`PVA`, `PVA.data.frame`, `intervalPVA.data.frame`, `rcontrib`

### Examples

``````data(exampleData)
longi.dat <- prepImageData(data=raw.dat, smarthouse.lev=1)
longi.dat <- within(longi.dat,
{
Max.Height <- pmax(Max.Dist.Above.Horizon.Line.SV1,
Max.Dist.Above.Horizon.Line.SV2)
Density <- PSA/Max.Height
PSA.SV = (PSA.SV1 + PSA.SV2) / 2
Image.Biomass = PSA.SV * (PSA.TV^0.5)
Centre.Mass <- (Center.Of.Mass.Y.SV1 + Center.Of.Mass.Y.SV2) / 2
Compactness.SV = (Compactness.SV1 + Compactness.SV2) / 2
})
responses <- c("PSA","PSA.SV","PSA.TV", "Image.Biomass", "Max.Height","Centre.Mass",
"Density", "Compactness.TV", "Compactness.SV")
R <- Hmisc::rcorr(as.matrix(longi.dat[responses]))\$r
results <-  PVA(R, responses, p.variance=0.9, plot = FALSE)
``````

growthPheno documentation built on May 29, 2024, 6:03 a.m.