Description Usage Arguments Details Value Warning Note
This function returns the fitted data that the model was built on; that is, if we 'predict' this model using the same data we used to generate the model, we would get this as a result.
1 | getFitted(model)
|
model |
is the model for which to extract the fitted data |
There is a subtle but significant point about getFitted: it cannot be assumed that predict
called with the same data used to
generate the model will give the same results as getFitted. In general, when the original data is dropped down through a classifier, it
will return the original results. It makes sense: if this is a known data point, then why not return the known class. However, in order to
estimate accuracy, most packages will use some type of technique to give a dataset that shows what the model would have return if
the datapoint had not been included in the original data. In all cases, getFitted returns this dataset: the one that can be used
for error estimation. buildPredict
was designed for classifying new data; what it returns when original data is used is
dependent on both the type of classifier and the implementation of the package.
Random Forest: both packages will estimate the accuracy based on data that falls out-of-bag (OOB); that is, when a data point is not directly represented in a tree, that data point can be used to estimate the accuracy of that tree. The technique is akin to a leave-one-out validation method.
Nearest Neighbour: the issue here is identical. getFitted is estimated using a leave-out-out cross-validation approach for all packages. The predict function will return the original data. In the case of the FNN package, the nearest-neighbour index is included in the model results; the second nearest neighbour returned by predict matches the values returned by getFitted.
GBM: in this case, the values of getFitted match those returned by predict.
a factor variable containing the fitted data
Note there is a bug in FNN (and possibly the class package) that means that the class reported, and the associated probability is incorrect. As would be expected it's a bit subtle, but it goes as follows:
the data is correct when it is returned from the C function;
in cases where k > 1, i.e. we are looking for more than the first nearest neighbour, it uses table
to figure out if any
of the nearest neighbours are of the same class;
if so, it returns that class as the most likely class with the probability of it's occurrence;
if not, it returns the first class in the list.
However, table sorts the input data, so the first class in the list is not necessarily the closest neighbour. Hence, in cases
where there is no repetition of classes, that is, all the nearest k
neighbours are of unique classes, the function returns the
first alpha-numeric class, not the class of the nearest neighbour. I have written getFitted to return the class of the nearest
neighbour, but it does not check for repeat classes in the k nearest neighbours. I was not able to do the same for the class
package as the C function does not return the indices of nearest neighbours.
The nearest neighbour models in package:FNN and package:class do not enclose their results in a class; when NPEL.Classification builds objects of these types it wraps them in a class so they are recognizable by S3 methods, and attaches the formula and data. Hence, if a model was built directly using these packages the result will not run this function.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.