pcurve: Principal Curve Analysis
In gavinsimpson/pcurve: Principal curve analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

Fits a principal curve to a numeric multivariate dataset in arbitrary dimensions. Produces diagnostic plots.

pcurve(x, xcan = NULL, start = "ca", rank = FALSE, cv.fit = FALSE,
penalty= 1, cv.all = FALSE, df = "vary", fit.meth = "spline",
canfit = "lm",candf = FALSE, vary.adj = FALSE, subset,
robust = FALSE, lowf = 0.5, min.df, max.df, max.df.cv.fit,
ext.dist = TRUE, ext.dc = 0.9, metric = "bray", latent = FALSE,
plot.pca = TRUE, thresh = 0.001, plot.true = TRUE,
plot.init = FALSE, plot.segs = TRUE, plot.resp = TRUE,
plot.cov = TRUE, maxit = 10, stretch = 2, fits = FALSE,
prnt.fits = TRUE, trace = TRUE, trace.all = FALSE, pch = 1,
row.chk0 = FALSE, col.chk0 = TRUE, use.loc = FALSE)

`x`	numeric data matrix or data.frame.
`xcan`	data.frame or matrix of explanatory variables to be used in constrained PCs.
`start`	specifies how to determine the starting configuration (location of points on initial curve): "ca" = correspondence analysis; "pca" = principal components analysis with Euclidan metric; "pca.bc" = principal components analysis with Bray-Curtis metric; "mds" = non-metric multidimensional scaling with Euclidean metric; "mds.bc" = non-metric multidimensional scaling with Bray-Curtis metric; "cs.bc" = classical scaling (metric multidimensional scaling) with Bray-Curtis metric; "ran" = random start. Or if start is numeric and of length dim(x)[1] a user supplied configuration will be used.
`rank`	if TRUE starting configuration is transformed to rank
`cv.fit`	if TRUE a final iteration using cross-validation is done.
`penalty`	penalty for smoothing spline. A value of 1 corresponds to no penalty with values > 1 giving a less-smoothed fit. Increasing the penalty for small data sets can reduce over-fitting. If penalty = "np", penalty = 1 for N > 1000, penalty = 2 for N <=100, and penalty = 4-log(N, 10) for N > 100 and N <= 1000.
`cv.all`	if TRUE a cross-validated smoothing spline fit at each iteration.
`df`	if numeric specifies the df for the smoothing spline.
`fit.meth`	specifies smoother. "spline" = smooth.spline, "poisson" = poisson general additive model, "binomial" = binomial general additive model, "lowess" = lowess smoother (this argument overridden by robust = TRUE).
`canfit`	"lm" or "gam", model used to relate pc to xcan.
`candf`	if canfit = "gam", df for model. May be a single value or a vector of FALSE or positive integers indicating dfs for each explanatory variable in xcan. If FALSE, this is equivalent to fx=FALSE in `gam`, and d.f. is selected by GCV.UBRE
`vary.adj`	if FALSE the same df are used for the smooth of each variable, otherwise each variable has its own df.
`subset`	used to take a subset of x and start (if numeric).
`robust`	if TRUE uses lowess smooths, if FALSE uses smoothing spline.
`lowf`	specifies the span of the lowess smooth.
`min.df`	specifies the min df for the smoothing.
`max.df`	specifies the max df for smoothing during cross-validation.
`max.df.cv.fit`	specifies the max df for the smoothing.
`ext.dist`	if TRUE extended dissimilarities in calculation of initial configuration using the flexible shortest path. If FALSE standard dissimilarites are used (see De'ath, 1999b and `stepacross` in package vegan).
`ext.dc`	critical distance, the toolong argument in `stepacross`.
`metric`	similarity metric, the method argument in `vegdist` in package vegan.
`latent`	if FALSE locations are rescaled after each iteration to give distance along the curve; if TRUE no rescaling is done.
`plot.pca`	if TRUE the fitting is plotted (assuming plot.true = TRUE) in the first 2 dimensions of PCA space.
`thresh`	threshold value of difference in cross-validation for ceasing iteration
`plot.true`	if TRUE the fitting process is plotted.
`plot.init`	if TRUE the initial fits to each variable are plotted.
`plot.segs`	if TRUE segments linking the fitted points on the curves to their corresponding data points are plotted.
`plot.resp`	if TRUE the final response curves are plotted.
`plot.cov`	if TRUE covariate partial effects are plotted (only if xcan is not null).
`maxit`	specifies the maximin number of iterations.
`stretch`	end segments of the curve are stretched by this factor at each iteration.
`fits`	if TRUE value of pcurve includes diagnostics for each variable.
`prnt.fits`	statistics on model fits printed.
`trace`	prints out useful fitting diagnostics at each iteration.
`trace.all`	if TRUE prints out all curve details at each iteration.
`pch`	symbol for plots
`row.chk0`	if TRUE checks for and removes rows of x identically 0.
`col.chk0`	if TRUE checks for and removes columns of x identically 0.
`use.loc`	if TRUE pauses during the fitting displays (left mouse-click to progress to next plot).

See De'ath (1999a) for a full discussion of the functions and their application.

An object of class principal curve containing a list comprising

`s`	fitted values
`tag`	order of points along the curve
`lambda`	locations along the curve
`dist`	sum of squared distances of points from the curve
`c`	call to pcurve
`x`	data to which the curve was fitted
`df`	degrees of freedom for the smoothers used in the fit
`fit.list`	diagnostics for each variable, only included if fits = TRUE.

R port by Chris Walsh cwalsh@unimelb.edu.au from S+ library by Glenn De'ath g.death@aims.gov.au. Original S code for principal curve analysis by Trevor Hastie hastie@stat.stanford.edu.

De'ath, G. 1999a Principal Curves: a new technique for indirect and direct gradient analysis. Ecology 80, 2237–2253.

De'ath, G. 1999b Extended dissimilarity: method of robust estimation of ecological distances with high beta diversity. Plant Ecology 144, 191–199.

Gittins, R. 1985 Canonical Analysis. A review with applications in ecology. Berlin: Springer-Verlag.

Hastie, T.J and Tibshirani, R.J. 1990 Generalized additive models. London: Chapman and Hall.

Hastie, T.J. and Stuetzle, W. 1989 Principal Curves. Journal of the American Statistical Association 84, 502–516.

pcdiags.plt, vegdist, stepacross

#a simulated dataset with 4 response variables (taxa 1-4),
#n=100.  The response curve is Gaussian and noise is Poisson.
    data(sim4var)
    sim4fit <-  pcurve(sim4var, plot.init = FALSE, use.loc = TRUE)

#Limestone grassland community example worked by De'ath (1999a),
#from data in Gittins (1985)
    data(soilspec)
    species <- sqrt(soilspec[,2:9])
    envvar <- soilspec[,10:12]
#indirect gradient analysis
    spec.fit <- pcurve(species, start = "mds.bc", plot.init = FALSE,
                       use.loc = TRUE)
#direct gradient analysis
    soilspec.fit <- pcurve(species, xcan = envvar, 
                           start = "mds.bc", plot.init = FALSE,  
                           fits = TRUE, prnt.fits = TRUE,
                           use.loc = TRUE)