Description Arguments Fields Methods Examples
Quantitative Structure-Properties Relationship (QSPR) model construction. This class contains all the required functions to train linear and non-linear models, to produce bootstrap datasets for variance estimation, and to provide prediction capabilities over a matrix or vector of studied properties.
smis |
is a list of vectors of SMILES from which a regression model will be trained, or for which targeted properties will be predicted. |
prop |
is a list of vectors/matrices of available targeted physico-chemical properties for the training dataset. |
v_filterfunc |
defines the filtering function (NULL by default) to use in the computation of properties to filter. |
v_filtermin |
is a vector representing the expected minimal value for each filtered property. |
v_filtermax |
is a vector representing the expected maximal value for each filtered property. |
v_fnames |
is a vector, or a list of vectors, of fingerprints and/or physical descriptors types used as features for each regression model
(see |
v_scale |
sets (FALSE by default) the scaling of physical descriptors only (i.e. continuous features) - mean = 0, standard deviation = 1. |
v_func |
defines the analytic function (NULL by default), or a list of analytic functions, to use in the computation of a subsequent property, or properties respectively. A given function will return a new property computed analytically via a list of known properties in prop. This is particularly useful when data and regression models can be stated for some properties (e.g. A and B), but not for a targeted property of interest (e.g. A+B, A/B, etc.) for which constrains are defined via the set_target method. |
v_func_args |
is a vector, or a list of vectors, of integers that tags the used properties in prop for the computation of a subsequent property. For example, v_func=list(func1,func2), where func1 and func2 are a priori defined functions, and prop=list(V1,M23), where V1 is a numerical vector and M23 is a two columns matrix. In this case, v_func_args=list(c(1,3),c(2)), i.e. the function func1 uses the 1st and 3rd output properties located in prop, and func2 uses the 2nd only. Therefore, the defined empirical functions know where to find their inputs. |
kekulise |
enables (FALSE by default) electron checking and allows for parsing of incorrect SMILES (see |
model |
is the name of a regression model to be used (see |
params |
is a list of parameters to submit to a given regression model (see |
n_boot |
is the number of requested bootstrap datasets (1 by default) in the training process. This is used for an estimation of the means and standard deviations of subsequent non-Bayesian predictions. A higher number of bootstrap datasets will allow more accuracy in this estimation. However, it exists a trade-off between accuracy and computation time that the user has to figure out. Consequently, in order to ease the bootstrap analysis, a parallelization capability is implemented. |
s_boot |
is the proportion of input data (0.85 by default), defined in ]0,1], used to construct bootstrap datasets. |
r_boot |
allows (FALSE by default) the sampling in a bootstrap analysis to be performed with replacement. |
parallelize |
allows (FALSE by default) to use the full computational capability of a user's machine for a bootstrap analysis. Indeed, N-1 cores, with N the total number of cores available on the machine, will be used. |
v_propmin |
is a vector representing the expected minimal value for each targeted property. |
v_propmax |
is a vector representing the expected maximal value for each targeted property. |
temp |
is a vector/matrix of numerical values which sets the initial temperatures in the annealing process for the
sequential Monte-Carlo sampler (see |
propndim
is the number of properties received as input data.
propmin
is a vector representing the expected minimal value for each targeted property.
propmax
is a vector representing the expected maximal value for each targeted property.
filtermin
is a vector representing the expected minimal value for each filtered property.
filtermax
is a vector representing the expected maximal value for each filtered property.
filterfunc
is a function to compute the properties to filter.
X
is the nxd matrix, with d features for n input SMILES, returned by get_descriptor
.
Y
is a nxp matrix of p properties for n input SMILES.
fnames
is a list of vectors of fingerprints and/or physical descriptors types used as features in each regression model by
get_descriptor
.
mdesc
is a scalar or vector of means used for physical descriptors scaling, returned by get_descriptor
.
sddesc
is a scalar or vector of standard deviations used for physical descriptors scaling, returned by get_descriptor
.
scale
tags the scaling statement (TRUE or FALSE) of the physical descriptors only (i.e. continuous features) - mean = 0, standard deviation = 1.
func
defines the analytic function to use in the computation of a subsequent property.
func_args
is a vector of integers that tags the used columns in the property array prop for the computation of a subsequent property.
trmodel
is the name of the used regression model for training and predictions.
trnboot
is the number of bootstrap dataset used for the training.
trndf
is the number of input SMILES, i.e. the number of degrees of freedom, available in the training of the regression process.
get_features()
returns a list of nxd matrix X with d features for n input SMILES
get_props()
returns a list of nxp matrix Y of p properties for n input SMILES
init_env(smis = NULL, prop = matrix(0), v_filterfunc = NULL,
v_filtermin = NULL, v_filtermax = NULL, v_fnames = NULL,
v_scale = FALSE, v_func = NULL, v_func_args = NULL, kekulise = F)
initialize the QSPR predictor: implicitly called via the QSPRpred$new() method
iqspr_predict(smis = NULL, temp = c(1, 1))
predicts properties for input SMILES from a given regression model and evaluates the probability to reach a targeted properties space
model_training(model = "linear_Bayes", params = NA, n_boot = 10,
s_boot = 0.85, r_boot = F, parallelize = F)
allows to train regression models, define their parameters, request bootstrap approach and CPU parallelization
qspr_predict(smis = NULL)
predicts properties for input SMILES from a given regression model
set_target(v_propmin, v_propmax)
sets the targeted properties space in vectors propmin and propmax
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | ## Not run:
# Load pre-existing data
data(qspr.data)
# Define input SMILES
smis <- paste(qspr.data[,1])
# Define associated properties
prop <- qspr.data[,c(2,5)]
# Define training set
trainidx <- sample(1:nrow(qspr.data), 5000)
# Initialize the prediction environment
# and compute fingerprints/descriptors associated to input SMILES
qsprpred_env <- QSPRpred()
qsprpred_env$initenv(smis=smis[trainidx], prop=as.matrix(prop[trainidx,]), v_fnames="graph")
# Train a regression model with associated parameters,
# number of bootstrapped datasets without CPUs parallelization
qsprpred_env$model_training(model="elasticnet",params=list("alpha" = 0.5),n_boot=10,parallelize=F)
# Predict properties for a test set
predictions <- qsprpred_env$qspr_predict(smis[-trainidx])
# Plot the results
par(mfrow=c(1,2))
plot(predictions[[1]][1,], prop[-trainidx,1], xlab="prediction", ylab="true")
segments(-100,-100,1000,1000,col=2,lwd=2)
plot(predictions[[1]][2,], prop[-trainidx,2], xlab="prediction", ylab="true")
segments(-100,-100,1000,1000,col=2,lwd=2)
# Set a targeted properties space
qsprpred_env$set_target(c(8,100),c(9,200))
# Predict properties for any input SMILES
# and their probability to be close to the targeted properties space
inv_pred <- qsprpred_env$qspr_predict(smis = smis[-trainidx], temp=c(3,3))
See \code{vignette("tutorial", package = "iqspr")} for further options and details.
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.