Description Usage Arguments Details Author(s) Examples
Classification with selected features and cross-validation. It supports 10 classification algorithms, feature selection by Weka, cross-validation and leave-one-out test.
1 2 3 4 5 6 |
data |
a data frame including the feature matrix and class label. The last column is a vector of class label comprising of "-1" or "+1"; Other columns are features. |
classifyMethod |
a string for the classification method. This must be one of the strings "libsvm", "svmlight", "NaiveBayes", "randomForest", "knn", "tree", "nnet", "rpart", "ctree", "ctreelibsvm", "bagging". |
cv |
an integer for the time of cross validation, or a string "leave\_one\_out" for the jackknife test. |
features |
an integer vector for the index of interested columns in data, which will be used as features for build classification model. |
evaluator |
a string for the feature selection method used by WEKA. This must be one of the strings "CfsSubsetEval", "ChiSquaredAttributeEval", "InfoGainAttributeEval", or "SVMAttributeEval". |
search |
a string for the search method used by WEKA. This must be one of the strings "BestFirst" or "Ranker". |
n |
an integer for the number of selected features. |
svm.kernel |
a string for kernel function of SVM. |
svm.scale |
a logical vector indicating the variables to be scaled. |
svm.path |
a character for path to SVMlight binaries (required, if path is unknown by the OS). |
svm.options |
Optional parameters to SVMlight. For further details see: "How to use" on http://svmlight.joachims.org/. (e.g.: "-t 2 -g 0.1")) |
nnet.size |
number of units in the hidden layer. Can be zero if there are skip-layer units. |
nnet.rang |
Initial random weights on [-rang, rang]. Value about 0.5 unless the inputs are large, in which case it should be chosen so that rang * max(|x|) is about 1. |
nnet.decay |
parameter for weight decay. |
nnet.maxit |
maximum number of iterations. |
knn.k |
number of neighbours considered in function |
classify
employ feature selction method in Weka and
diverse classification model in other R packages to perfrom classification.
"Cross Validation" is controlled by parameter "cv";
"Feature Selection" is controlled by parameter "features", "evaluator", "search",
and "n";
"Classification Model Building" is controlled by parameter "classifyMethod".
Parameter "evaluator" supportes three feature selection methods provided by WEKA: "CfsSubsetEval": Evaluate the worth of a subset of attributes by considering the individual predictive ability of each feature along with the degree of redundancy between them. "ChiSquaredAttributeEval": Evaluate the worth of an attribute by computing the value of the chi-squared statistic with respect to the class. "InfoGainAttributeEval": Evaluate attributes individually by measuring information gain with respect to the class. "SVMAttributeEval": Evaluate the worth of an attribute by using an SVM classifier. Attributes are ranked by the square of the weight assigned by the SVM. Attribute selection for multiclass problems is handled by ranking attributes for each class seperately using a one-vs-all method and then "dealing" from the top of each pile to give a final ranking.
Parameter "search" supportes three feature subset search methods provided by WEKA: "BestFirst": Searches the space of attribute subsets by greedy hillclimbing augmented with a backtracking facility. Setting the number of consecutive non-improving nodes allowed controls the level of backtracking done. Best first may start with the empty set of attributes and search forward, or start with the full set of attributes and search backward, or start at any point and search in both directions (by considering all possible single attribute additions and deletions at a given point). "Ranker": Ranks attributes by their individual evaluations.
Parameter "classifyMethod" supports multiple classification model:
"libsvm": Employ classifyModelLIBSVM
to perform Support Vecotr
Machine by LibSVM. Package "e1071" is required.
"svmlight": Employ classifyModelSVMLIGHT
to Support Vecotr Machine
by SVMLight. Package "klaR" is required.
"NaiveBayes": Employ classifyModelNB
to perform Naive Bayes
classification. Package "klaR" is required.
"randomForest": Employ classifyModelRF
to perform random forest
classification. Package "randomForest" is required.
"knn": Employ classifyModelKNN
to perform k Nearest Neighbor
algorithm. Package "class" is required.
"tree": Employ classifyModelTree
to perform tree classification.
Package "tree" is required.
"nnet": Employ classifyModelNNET
to perform neural net algorithm.
Bundle "VR" is required.
"rpart": Employ classifyModelRPART
to perform Recursive
Partitioning and Regression Trees. Package "rpart" is required.
"ctree": Employ classifyModelCTREE
to perform Conditional
Inference Trees. Package "party" is required.
"ctreelibsvm": Employ classifyModelCTREELIBSVM
to combine Conditional
Inference Trees and Support Vecotr Machine for classification.
For each node in the tree, one SVM model will be constructed using
train data in this node. Test data will be firstly classified
to one node of the tree, and then use corresponding SVM to do
classification. Package "party" and "e1071" is required.
"bagging": Employ classifyModelBAG
to perform bagging for
classification trees. Package "ipred" is required.
Hong Li
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | ## read positive/negative sequence from files.
tmpfile1 = file.path(path.package("BioSeqClass"), "example", "acetylation_K.pos40.pep")
tmpfile2 = file.path(path.package("BioSeqClass"), "example", "acetylation_K.neg40.pep")
posSeq = as.matrix(read.csv(tmpfile1,header=FALSE,sep="\t",row.names=1))[,1]
negSeq = as.matrix(read.csv(tmpfile2,header=FALSE,sep="\t",row.names=1))[,1]
seq=c(posSeq,negSeq)
classLable=c(rep("+1",length(posSeq)),rep("-1",length(negSeq)) )
data = data.frame(featureBinary(seq),classLable)
## Use LibSVM and 5-cross-validation to classify.
LIBSVM_CV5 = classify(data,classifyMethod="libsvm",cv=5,
svm.kernel="linear",svm.scale=FALSE)
## Features selection is done by envoking "CfsSubsetEval" method in WEKA.
FS_LIBSVM_CV5 = classify(data,classifyMethod="libsvm",cv=5,evaluator="CfsSubsetEval",
search="BestFirst",svm.kernel="linear",svm.scale=FALSE)
if(interactive()){
KNN_CV5 = classify(data,classifyMethod="knn",cv=5,knn.k=1)
RF_CV5 = classify(data,classifyMethod="randomForest",cv=5)
TREE_CV5 = classify(data,classifyMethod="tree",cv=5)
NNET_CV5 = classify(data,classifyMethod="nnet",cv=5)
RPART_CV5 = classify(data,classifyMethod="rpart",cv=5,evaluator="")
CTREE_CV5 = classify(data,classifyMethod="ctree",cv=5,evaluator="")
BAG_CV5 = classify(data,classifyMethod="bagging",cv=5,evaluator="")
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.