cfit: Fits a K-dimensional simplex in M dimensions
In HenrikBengtsson/sfit: Multidimensional Simplex Fitting

cfit	R Documentation

Fits a K-dimensional simplex in M dimensions

Description

Fits a K-dimensional simplex in M dimensions. A K-dimensional simplex is the K-dimensional generalization of a triangle.

Usage

## S3 method for class 'matrix'
cfit(y, k=ncol(y) + 1, dump=1, chopless=NULL, chopmore=NULL, maxiter=NULL, ...,
  retX=FALSE, cfit=getOption("cfit"), verbose=FALSE)

Arguments

`y`	Matrix or data frame of size IxN containing I rows of vectors in R^N.
`k`	The number of vertices of the fitted simplex. By default, the number of vertices is equal to the number of dimension (N) + 1.
`dump`	The output format.
`chopless, chopmore`	Lower and upper percentile thresholds at which extreme data points are assigned zero weights.
`maxiter`	"maximum number of REX steps". Default value is 60.
`...`	Named argument passed to the external 'cfit' program.
`retX`	If `TRUE`, an estimate of `X` is returned, otherwise not.
`cfit`	Shell command to call the 'cfit' executable.
`verbose`	If `TRUE`, verbose output is displayed, otherwise not.

Details

Let Y=(y_1, …, y_I) where y_i=(y_{i1},…,y_{iN}) is an observation in N dimensions. Let M=(μ_1,…,μ_K) be the K-dimensional simplex where mu_k is a vertex in N dimensions. Let X=(x_1,…,X_I) where x_i=(x_{i1},…,x_{iN}). The simplex fitting algorithm decompose Y into:

Y \approx MX

such that ∑_i x_{ik} = 1.

Value

Returns a named list structure elements:

`M`	IxN `matrix` where each rows is the coordinate for one of the vertices.
`X`	(optional) the IxN `matrix` X.

Author(s)

Algorithm and C code/binary by Pratyaksha J. Wirapati. R wrapper by Henrik Bengtsson.

References

[1] P. Wirapati, & T. Speed, Fitting polyhedrial cones and simplices to multivariate data points, Walter and Eliza Hall Institute of Medical Research, December 30, 2001.
[2] P. Wirapati and T. Speed, An algorithm to fit a simplex to a set of multidimensional points, Walter and Eliza Hall Institute of Medical Research, January 15, 2002.

Examples

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Simulate data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
N <- 1000

# Simulate genotypes
g <- sample(c("AA", "AB", "AB", "BB"), size=N, replace=TRUE)

# Simulate concentrations of allele A and allele B
X <- matrix(rexp(N), nrow=N, ncol=2)
colnames(X) <- c("A", "B")
X[g == "AA", "B"] <- 0
X[g == "BB", "A"] <- 0
X[g == "AB",] <- X[g == "AB",] / 2

# Transform noisy X
xi <- matrix(rnorm(2*N, mean=0, sd=0.05), ncol=2)
a0 <- c(0,0)+0.3
A <- matrix(c(0.9, 0.1, 0.1, 0.8), nrow=2, byrow=TRUE)
A <- apply(A, MARGIN=2, FUN=function(u) u / sqrt(sum(u^2)))
Z <- t(a0 + A %*% t(X + xi))

# Add noise to Y
eps <- matrix(rnorm(2*N, mean=0, sd=0.05), ncol=2)
Y <- Z + eps

layout(matrix(1:4, ncol=2, byrow=TRUE))
par(mar=c(5,4,3,2)+0.1)
xlab <- "Allele A"
ylab <- "Allele B"
lim <- c(-0.5,8)
plot(X, xlab=xlab, ylab=ylab, xlim=lim, ylim=lim)
points(Z, col="blue")
points(Y, col="red")

legend("topright", pch=19, pt.cex=2, legend=c("X", "Z", "Y"),
       col=c("black", "blue", "red"), title="Variables:", bg="#eeeeee")


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Fit model
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
alpha <- c(0.10, 0.075, 0.05, 0.03, 0.01, 0.001)
fit <- cfit(Y, dump=2, alpha=alpha, q=2, Q=98)
Ms <- fit$M
col <- terrain.colors(length(Ms))
col[length(Ms)] <- "red"

plot(Y, cex=0.8, xlab=xlab, ylab=ylab, xlim=lim, ylim=lim, main="Y")

for (kk in seq_along(Ms)) {
  M <- Ms[[kk]]
  points(M, pch=19, cex=2.5, col=col[kk])
  lines(M, col=col[kk], lwd=2)
  text(M, cex=0.8, labels=kk)
}

legend("topright", pch=19, pt.cex=2, legend=c(alpha, "final"),
       col=col, title=expression(alpha), bg="#eeeeee")

apex <- which.min(apply(M, MARGIN=1, FUN=function(u) sum(u^2)))
a0hat <- M[apex,]
Ahat <- M[-apex,]
Ahat <- apply(Ahat, MARGIN=2, FUN=function(u) u / sqrt(sum(u^2)))
if (sum(Ahat[c(1,4)]^2) < sum(Ahat[c(2,3)]^2)) {
  Ahat <- matrix(Ahat[c(2,1,4,3)], nrow=2)
}
Ainv <- solve(Ahat)
Xhat <- t(Ainv %*% (t(Y) - a0hat))

cat("True A:\n")
print(A)

cat("Estimated A:\n")
print(Ahat)

plot(Xhat, cex=0.8, xlab=xlab, ylab=ylab, xlim=lim, ylim=lim, main=expression(hat(X)))
x1 <- par("usr")[2]
y1 <- par("usr")[4]
lines(x=c(0,x1), y=c(0,0), col="red", lwd=2)
lines(x=c(0,0), y=c(0,y1), col="red", lwd=2)
lines(x=c(0,x1), y=c(0,y1), col="blue", lwd=2)

plot(X[,1], Xhat[,1], cex=0.8, xlab=expression(X), ylab=expression(hat(X)), xlim=lim, ylim=lim)
points(X[,2], Xhat[,2], cex=0.8, col="red")
abline(a=0, b=1, lwd=2)

HenrikBengtsson/sfit documentation built on Nov. 12, 2022, 8:26 p.m.