buildTrainModel: Classification models.

Description Usage Arguments Value Examples

View source: R/AStrap.R

Description

This function builds various classification models, including support vector machine (SVM), random forests (RF), and adaptive boosting (AdaBoost).

Usage

1
2
buildTrainModel(ASdata, chooseNum = 1000, proTrain = 2/3, proTest = 1/3,
  ASlength = 0, classifier = "rf", use.all = FALSE)

Arguments

ASdata

A data frame including the coordinates of splice sites, class label and the sequence around splice sites. The "type" column is a vector of class label comprising of "AltA","AltD","ES" and "IR".

chooseNum

A interger for the number of AS events from each AS type for building classification model.

proTrain

The proportion of training dataset using random sampling.

proTest

The proportion of testing dataset using random sampling.

ASlength

AS data is trimmed if AS length below a given threshold.

classifier

A string for the classification method. This must be one of the string "svm", "rf", "adaboost", not case sensitive.

use.all

Whether to use all alternative splicing dataset for building classificaiton model (default: FALSE).

Value

This function returns a fitted model with eight elements, including trainset, testset, model, predict, accuracy, confusion, evaluate, ROC. trainset is the training data set; testset is the testing data set; model is the fitted model; predict is the predicted classification results; accuracy is the prediction accuracy; confusion is the confusion matrix of the prediction; evaluate is the evaluation matrix of the classification, including precition, sp, recall, f1; ROC: A ROC curve.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
##Loading example alternative splicing data
path <- system.file("extdata","sample_riceAS.txt",package = "AStrap")
rice_ASdata <-read.table(path,sep="\t",head = TRUE,stringsAsFactors = FALSE)
head(rice_ASdata)

##Loading geneome using the package of BSgenome
library("BSgenome.Osativa.MSU.MSU7")
rice_ASdata<- extract_IsoSeq_ge(rice_ASdata,Osativa)
names(rice_ASdata)

##Classification model building based on random forest method
library(randomForest)
library(ROCR)
library(ggplot2)
model <- buildTrainModel(rice_ASdata, chooseNum = 100,
                       proTrain = 2/3, proTest = 1/3,ASlength =0,
                       classifier = "rf", use.all = FALSE)
##Performance evaluation
names(model)
model$evaluate
model$confusion
model$accuracy

##Or classification model building based on  SVM method
library(e1071)
library(ROCR)
library(ggplot2)
model <- buildTrainModel(rice_ASdata, chooseNum = 100,
                       proTrain = 2/3, proTest = 1/3,ASlength =0,
                       classifier = "svm", use.all = FALSE)

##Or classification model building based on  AdaBoost method
library(adabag)
library(ROCR)
library(ggplot2)
model <- buildTrainModel(rice_ASdata, chooseNum = 100,
                       proTrain = 2/3, proTest = 1/3,ASlength =0,
                       classifier = "adaboost", use.all = FALSE)

BMILAB/AStrap documentation built on Nov. 20, 2020, 4:03 p.m.