View source: R/gkmsvm_classify.R View source: R/RcppExports.R
gkmsvm_classify | R Documentation |
Given support vectors SVs and corresponding coefficients alphas and a set of sequences, calculates the SVM scores for the sequences.
gkmsvm_classify(seqfile, svmfnprfx, outfile, L=10, K=6, maxnmm=3,
maxseqlen=10000, maxnumseq=1000000, useTgkm=1, alg=0, addRC=TRUE, usePseudocnt=FALSE,
batchSize=100000, wildcardLambda=1.0, wildcardMismatchM=2, alphabetFN="NULL",
svseqfile=NA, alphafile=NA)
seqfile |
input sequences file name (FASTA format) |
svmfnprfx |
SVM model file name prefix |
outfile |
output file name |
L |
word length, default=10 |
K |
number of informative columns, default=6 |
maxnmm |
maximum number of mismatches to consider, default=3 |
maxseqlen |
maximum sequence length in the sequence files, default=10000 |
maxnumseq |
maximum number of sequences in the sequence files, default=1000000 |
useTgkm |
filter type: 0(use full filter), 1(use truncated filter: this gaurantees non-negative counts for all L-mers), 2(use h[m], gkm count vector), 3(wildcard), 4(mismatch), default=1 |
alg |
algorithm type: 0(auto), 1(XOR Hashtable), 2(tree), default=0 |
addRC |
adds reverse complement sequences, default=TRUE |
usePseudocnt |
adds a constant to count estimates, default=FALSE |
batchSize |
number of sequences to compute scores for in batch, default=100000 |
wildcardLambda |
lambda for wildcard kernel, defaul=0.9 |
wildcardMismatchM |
max mismatch for Mismatch kernel or wildcard kernel, default=2 |
alphabetFN |
alphabets file name, if not specified, it is assumed the inputs are DNA sequences |
svseqfile |
SVM support vectors sequence file name (not needed if svmfnprfx is provided) |
alphafile |
SVM support vectors weights file name (not needed if svmfnprfx is provided) |
classification using SVM: gkmsvm_classify can be used to score any set of sequences. Note that the same set of parameters (L, K, maxnmm) used in the gkmsvm_kernel should be specified for optimal classification.
gkmsvm_classify(testfn, svmfnprfx, outfn); #scores test sequences
Mahmoud Ghandi
#Input file names:
posfn= 'test_positives.fa' #positive set (FASTA format)
negfn= 'test_negatives.fa' #negative set (FASTA format)
testfn= 'test_testset.fa' #test set (FASTA format)
#Output file names:
kernelfn= 'test_kernel.txt' #kernel matrix
svmfnprfx= 'test_svmtrain' #SVM files
outfn = 'output.txt' #output scores for sequences in the test set
# gkmsvm_kernel(posfn, negfn, kernelfn); #computes kernel
# gkmsvm_train(kernelfn,posfn, negfn, svmfnprfx); #trains SVM
# gkmsvm_classify(testfn, svmfnprfx, outfn); #scores test sequences
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.