Residual Sum of Squares and Explained Variance
Description
rss
and evar
are S4 generic functions that
respectively computes the Residual Sum of Squares (RSS)
and explained variance achieved by a model.
The explained variance for a target V is computed as:
evar = 1  RSS/sum v_{ij}^2
,
Usage
1 2 3 4 5 6 7 8 9 
Arguments
object 
an R object with a suitable

... 
extra arguments to allow extension, e.g.
passed to 
target 
target matrix 
Details
where RSS is the residual sum of squares.
The explained variance is usefull to compare the performance of different models and their ability to accurately reproduce the original target matrix. Note, however, that a possible caveat is that some models explicitly aim at minimizing the RSS (i.e. maximizing the explained variance), while others do not.
Value
a single numeric value
Methods
 evar
signature(object = "ANY")
: Default method forevar
.It requires a suitable
rss
method to be defined forobject
, as it internally callsrss(object, target, ...)
. rss
signature(object = "matrix")
: Computes the RSS between a target matrix and its estimateobject
, which must be a matrix of the same dimensions astarget
.The RSS between a target matrix V and its estimate v is computed as:
RSS = ∑_{i,j} (v_{ij}  V_{ij})^2
Internally, the computation is performed using an optimised C++ implementation, that is light in memory usage.
 rss
signature(object = "ANY")
: Residual sum of square between a given target matrix and a model that has a suitablefitted
method. It is equivalent torss(fitted(object), ...)
In the context of NMF, Hutchins et al. (2008) used the variation of the RSS in combination with the algorithm from Lee et al. (1999) to estimate the correct number of basis vectors. The optimal rank is chosen where the graph of the RSS first shows an inflexion point, i.e. using a screeplottype criterium. See section Rank estimation in
nmf
.Note that this way of estimation may not be suitable for all models. Indeed, if the NMF optimisation problem is not based on the Frobenius norm, the RSS is not directly linked to the quality of approximation of the NMF model. However, it is often the case that it still decreases with the rank.
References
Hutchins LN, Murphy SM, Singh P and Graber JH (2008). "Positiondependent motif characterization using nonnegative matrix factorization." _Bioinformatics (Oxford, England)_, *24*(23), pp. 268490. ISSN 13674811, <URL: http://dx.doi.org/10.1093/bioinformatics/btn526>, <URL: http://www.ncbi.nlm.nih.gov/pubmed/18852176>.
Lee DD and Seung HS (1999). "Learning the parts of objects by nonnegative matrix factorization." _Nature_, *401*(6755), pp. 78891. ISSN 00280836, <URL: http://dx.doi.org/10.1038/44565>, <URL: http://www.ncbi.nlm.nih.gov/pubmed/10548103>.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22  #
# rss,matrixmethod
#
# RSS bewteeen random matrices
x < rmatrix(20,10, max=50)
y < rmatrix(20,10, max=50)
rss(x, y)
rss(x, x + rmatrix(x, max=0.1))
#
# rss,ANYmethod
#
# RSS between an NMF model and a target matrix
x < rmatrix(20, 10)
y < rnmf(3, x) # random compatible model
rss(y, x)
# fit a model with nmf(): one should do better
y2 < nmf(x, 3) # default minimizes the KLdivergence
rss(y2, x)
y2 < nmf(x, 3, 'lee') # 'lee' minimizes the RSS
rss(y2, x)
