Linear predictive models estimation based on the LIBLINEAR C/C++ Library.
Description
LiblineaR
allows the estimation of predictive linear models for
classification and regression, such as L1 or L2regularized logistic
regression, L1 or L2regularized L2loss support vector classification,
L2regularized L1loss support vector classification and multiclass support
vector classification. It also supports L2regularized support vector regression
(with L1 or L2loss). The estimation of the models is particularly fast as
compared to other libraries. The implementation is based on the LIBLINEAR C/C++
library for machine learning.
Usage
1 2 3 
Arguments
data 
a nxp data matrix. Each row stands for an example (sample, point) and each column stands for a dimension (feature, variable). A sparse matrix (from SparseM package) will also work. 
target 
a response vector for prediction tasks with one value for
each of the n rows of 
type 

cost 
cost of constraints violation (default: 1). Rules the tradeoff
between regularization and correct classification on 
epsilon 
set tolerance of termination criterion for optimization.
If
The meaning of

svr_eps 
set tolerance margin (epsilon) in regression loss function of SVR. Not used for classification methods. 
bias 
if 
wi 
a named vector of weights for the different classes, used for asymmetric class sizes. Not all factor levels have to be supplied (default weight: 1). All components have to be named according to the corresponding class label. Not used in regression mode. 
cross 
if an integer value k>0 is specified, a kfold cross validation
on 
verbose 
if 
... 
for backwards compatibility, parameter 
Details
For details for the implementation of LIBLINEAR, see the README file of the original c/c++ LIBLINEAR library at http://www.csie.ntu.edu.tw/~cjlin/liblinear.
Value
If cross
>0, the average accuracy (classification) or mean square error (regression) computed over cross
runs of crossvalidation is returned.
Otherwise, an object of class "LiblineaR"
containing the fitted model is returned, including:
TypeDetail 
A string decsribing the type of model fitted, as determined by 
Type 
An integer corresponding to 
W 
A matrix with the model weights. If 
Bias 
TRUE or FALSE, according to the value of 
ClassNames 
A vector containing the class names. This entry is not returned in case of regression models. 
Note
Classification models usually perform better if each dimension of the data is first centered and scaled.
Author(s)
Thibault Helleputte thibault.helleputte@dnalytics.com and
Pierre Gramme pierre.gramme@dnalytics.com.
Based on C/C++code by ChihChung Chang and ChihJen Lin
References

For more information on LIBLINEAR itself, refer to:
R.E. Fan, K.W. Chang, C.J. Hsieh, X.R. Wang, and C.J. Lin.
LIBLINEAR: A Library for Large Linear Classification,
Journal of Machine Learning Research 9(2008), 18711874.
http://www.csie.ntu.edu.tw/~cjlin/liblinear
See Also
predict.LiblineaR
, heuristicC
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111  data(iris)
attach(iris)
x=iris[,1:4]
y=factor(iris[,5])
train=sample(1:dim(iris)[1],100)
xTrain=x[train,]
xTest=x[train,]
yTrain=y[train]
yTest=y[train]
# Center and scale data
s=scale(xTrain,center=TRUE,scale=TRUE)
# Find the best model with the best cost parameter via 10fold crossvalidations
tryTypes=c(0:7)
tryCosts=c(1000,1,0.001)
bestCost=NA
bestAcc=0
bestType=NA
for(ty in tryTypes){
for(co in tryCosts){
acc=LiblineaR(data=s,target=yTrain,type=ty,cost=co,bias=TRUE,cross=5,verbose=FALSE)
cat("Results for C=",co," : ",acc," accuracy.\n",sep="")
if(acc>bestAcc){
bestCost=co
bestAcc=acc
bestType=ty
}
}
}
cat("Best model type is:",bestType,"\n")
cat("Best cost is:",bestCost,"\n")
cat("Best accuracy is:",bestAcc,"\n")
# Retrain best model with best cost value.
m=LiblineaR(data=s,target=yTrain,type=bestType,cost=bestCost,bias=TRUE,verbose=FALSE)
# Scale the test data
s2=scale(xTest,attr(s,"scaled:center"),attr(s,"scaled:scale"))
# Make prediction
pr=FALSE
if(bestType==0  bestType==7) pr=TRUE
p=predict(m,s2,proba=pr,decisionValues=TRUE)
# Display confusion matrix
res=table(p$predictions,yTest)
print(res)
# Compute Balanced Classification Rate
BCR=mean(c(res[1,1]/sum(res[,1]),res[2,2]/sum(res[,2]),res[3,3]/sum(res[,3])))
print(BCR)
#' #############################################
# Example of the use of a sparse matrix:
if(require(SparseM)){
# Sparsifying the iris dataset:
iS=apply(iris[,1:4],2,function(a){a[a<quantile(a,probs=c(0.25))]=0;return(a)})
irisSparse<as.matrix.csr(iS)
# Applying a similar methodology as above:
xTrain=irisSparse[train,]
xTest=irisSparse[train,]
# Retrain best model with best cost value.
m=LiblineaR(data=xTrain,target=yTrain,type=bestType,cost=bestCost,bias=TRUE,verbose=FALSE)
# Make prediction
p=predict(m,xTest,proba=pr,decisionValues=TRUE)
# Display confusion matrix
res=table(p$predictions,yTest)
print(res)
}
#############################################
# Try regression instead, to predict sepal length on the basis of sepal width and petal width:
xTrain=iris[c(1:25,51:75,101:125),2:3]
yTrain=iris[c(1:25,51:75,101:125),1]
xTest=iris[c(26:50,76:100,126:150),2:3]
yTest=iris[c(26:50,76:100,126:150),1]
# Center and scale data
s=scale(xTrain,center=TRUE,scale=TRUE)
# Estimate MSE in crossvaidation on a train set
MSECross=LiblineaR(data = s, target = yTrain, type = 13, cross = 10, svr_eps=.01)
# Build the model
m=LiblineaR(data = s, target = yTrain, type = 13, cross=0, svr_eps=.01)
# Test it, after test data scaling:
s2=scale(xTest,attr(s,"scaled:center"),attr(s,"scaled:scale"))
pred=predict(m,s2)$predictions
MSETest=mean((yTestpred)^2)
# Was MSE well estimated?
print(MSETestMSECross)
# Distribution of errors
print(summary(yTestpred))
