Description Usage Arguments Value
test a PU fit on a test data set
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | v.pudms(
protein_dat,
py1 = NULL,
nhyperparam = 10,
nfolds = 5,
test_idx = 1:nfolds,
seed = round(runif(1, min = 1, max = 1000)),
order = 1,
refstate = NULL,
verbose = T,
nobs_thresh = 10,
lambda = 0,
pvalue = FALSE,
n_eff_prop = 1,
intercept = F,
maxit = 1000,
eps = 0.001,
inner_eps = 0.01,
initial_coef = NULL,
p.adjust.method = "BH",
tol = 1e-05,
nCores = 1,
full.fit = FALSE,
full.fit.pvalue = FALSE,
outfile = NULL
)
|
protein_dat |
input data. A data table containing (sequence, labeled, unlabeled, seqId) |
py1 |
a numeric value, a numeric vector or NULL; the prevalence of positives in the unlabeled data. If length(py1) >1, optimal py1 will be chosen based on auc values on a test data set. If NULL (default), a sequence of py1 values (of length nhyperparam)–ranging from 0.001 to 0.5 interpolated in a log scale–will be considered. |
nhyperparam |
an integer for the length of the py1 sequence if py1 == NULL |
nfolds |
the number of subsamples. (nfolds -1)/nfolds splits will be used for training, and the rest will be used for testing. |
test_idx |
a vector of indices of cross-validation models which will be fitted. Default is to fit the model for each of the cross-validation fold. |
seed |
a seed number for reproducibility |
order |
an integer; 1= main effects, 2= main effects + pairwise effects |
refstate |
a character which will be used for the common reference state; the default is to use the most frequent amino acid as the reference state for each of the position. |
verbose |
a logical value. The default is TRUE |
nobs_thresh |
the number of minimum required mutations per position |
lambda |
l1 penalty |
pvalue |
a logial value; if TRUE, p-values based on the asymptotic distribution are obtained |
n_eff_prop |
proportion of an effective sample size |
intercept |
a logical value; if TRUE, an estimated intercept is reported together with other coefficients |
maxit |
maximum number of iterations |
eps |
convergence threshold for the outer loop |
inner_eps |
convergence threshold for the inner loop |
initial_coef |
a vector representing an initial point where we start PUlasso algorithm from. |
p.adjust.method |
method for multiple comparison |
tol |
NULL or a numeric value; if the estimated roc curve <= y+tol, the estimated roc curve is determined to be contained by the maximal curve. The default is NULL, where we use tol = 1sd value of the length(test_idx) roc curves at each x value of the estimated roc curve. |
nCores |
the number of threads for computing. |
full.fit |
a logical value; if TRUE, the model will be fitted using a full data set and at a chosen py1. |
full.fit.pvalue |
a logical value; if TRUE, p-values for the full fit will be returned |
outfile |
NULL or a string; if a string is provided, an output with the name of the string will be exported in a working directory. |
a list containing v.dmsfit (all fits using training/test splits), roc_curves (average roc curve at each py1), dmsfit (pudms.fit using a full data set at the selected py1), folds (test/training split information), py1 (a sequence of py1 values used for searching), py1.opt (the selected py1 value based on the predictive performance of the models)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.