Description Arguments Fields Methods Examples

Quantitative Structure-Properties Relationship (QSPR) model construction. This class contains all the required functions to train linear and non-linear models, to produce bootstrap datasets for variance estimation, and to provide prediction capabilities over a matrix or vector of studied properties.

`smis` |
is a list of vectors of SMILES from which a regression model will be trained, or for which targeted properties will be predicted. |

`prop` |
is a list of vectors/matrices of available targeted physico-chemical properties for the training dataset. |

`v_filterfunc` |
defines the filtering function (NULL by default) to use in the computation of properties to filter. |

`v_filtermin` |
is a vector representing the expected minimal value for each filtered property. |

`v_filtermax` |
is a vector representing the expected maximal value for each filtered property. |

`v_fnames` |
is a vector, or a list of vectors, of fingerprints and/or physical descriptors types used as features for each regression model
(see |

`v_scale` |
sets (FALSE by default) the scaling of physical descriptors only (i.e. continuous features) - mean = 0, standard deviation = 1. |

`v_func` |
defines the analytic function (NULL by default), or a list of analytic functions, to use in the computation of a subsequent property, or properties respectively.
A given function will return a new property computed analytically via a list of known properties in prop. This is particularly
useful when data and regression models can be stated for some properties (e.g. A and B), but not for a targeted property of
interest (e.g. A+B, A/B, etc.) for which constrains are defined via the |

`v_func_args` |
is a vector, or a list of vectors, of integers that tags the used properties in prop for the computation of a subsequent property.
For example, v_func=list(func1,func2), where func1 and func2 are |

`kekulise` |
enables (FALSE by default) electron checking and allows for parsing of incorrect SMILES (see |

`model` |
is the name of a regression model to be used (see |

`params` |
is a list of parameters to submit to a given regression model (see |

`n_boot` |
is the number of requested bootstrap datasets (1 by default) in the training process. This is used for an estimation of the means and standard deviations of subsequent non-Bayesian predictions. A higher number of bootstrap datasets will allow more accuracy in this estimation. However, it exists a trade-off between accuracy and computation time that the user has to figure out. Consequently, in order to ease the bootstrap analysis, a parallelization capability is implemented. |

`s_boot` |
is the proportion of input data (0.85 by default), defined in ]0,1], used to construct bootstrap datasets. |

`r_boot` |
allows (FALSE by default) the sampling in a bootstrap analysis to be performed with replacement. |

`parallelize` |
allows (FALSE by default) to use the full computational capability of a user's machine for a bootstrap analysis. Indeed, N-1 cores, with N the total number of cores available on the machine, will be used. |

`v_propmin` |
is a vector representing the expected minimal value for each targeted property. |

`v_propmax` |
is a vector representing the expected maximal value for each targeted property. |

`temp` |
is a vector/matrix of numerical values which sets the initial temperatures in the annealing process for the
sequential Monte-Carlo sampler (see |

`propndim`

is the number of properties received as input data.

`propmin`

is a vector representing the expected minimal value for each targeted property.

`propmax`

is a vector representing the expected maximal value for each targeted property.

`filtermin`

is a vector representing the expected minimal value for each filtered property.

`filtermax`

is a vector representing the expected maximal value for each filtered property.

`filterfunc`

is a function to compute the properties to filter.

`X`

is the nxd matrix, with d features for n input SMILES, returned by

`get_descriptor`

.`Y`

is a nxp matrix of p properties for n input SMILES.

`fnames`

is a list of vectors of fingerprints and/or physical descriptors types used as features in each regression model by

`get_descriptor`

.`mdesc`

is a scalar or vector of means used for physical descriptors scaling, returned by

`get_descriptor`

.`sddesc`

is a scalar or vector of standard deviations used for physical descriptors scaling, returned by

`get_descriptor`

.`scale`

tags the scaling statement (TRUE or FALSE) of the physical descriptors only (i.e. continuous features) - mean = 0, standard deviation = 1.

`func`

defines the analytic function to use in the computation of a subsequent property.

`func_args`

is a vector of integers that tags the used columns in the property array prop for the computation of a subsequent property.

`trmodel`

is the name of the used regression model for training and predictions.

`trnboot`

is the number of bootstrap dataset used for the training.

`trndf`

is the number of input SMILES, i.e. the number of degrees of freedom, available in the training of the regression process.

`get_features()`

returns a list of nxd matrix X with d features for n input SMILES

`get_props()`

returns a list of nxp matrix Y of p properties for n input SMILES

`init_env(smis = NULL, prop = matrix(0), v_filterfunc = NULL, v_filtermin = NULL, v_filtermax = NULL, v_fnames = NULL, v_scale = FALSE, v_func = NULL, v_func_args = NULL, kekulise = F)`

initialize the QSPR predictor: implicitly called via the QSPRpred$new() method

`iqspr_predict(smis = NULL, temp = c(1, 1))`

predicts properties for input SMILES from a given regression model and evaluates the probability to reach a targeted properties space

`model_training(model = "linear_Bayes", params = NA, n_boot = 10, s_boot = 0.85, r_boot = F, parallelize = F)`

allows to train regression models, define their parameters, request bootstrap approach and CPU parallelization

`qspr_predict(smis = NULL)`

predicts properties for input SMILES from a given regression model

`set_target(v_propmin, v_propmax)`

sets the targeted properties space in vectors propmin and propmax

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | ```
## Not run:
# Load pre-existing data
data(qspr.data)
# Define input SMILES
smis <- paste(qspr.data[,1])
# Define associated properties
prop <- qspr.data[,c(2,5)]
# Define training set
trainidx <- sample(1:nrow(qspr.data), 5000)
# Initialize the prediction environment
# and compute fingerprints/descriptors associated to input SMILES
qsprpred_env <- QSPRpred()
qsprpred_env$initenv(smis=smis[trainidx], prop=as.matrix(prop[trainidx,]), v_fnames="graph")
# Train a regression model with associated parameters,
# number of bootstrapped datasets without CPUs parallelization
qsprpred_env$model_training(model="elasticnet",params=list("alpha" = 0.5),n_boot=10,parallelize=F)
# Predict properties for a test set
predictions <- qsprpred_env$qspr_predict(smis[-trainidx])
# Plot the results
par(mfrow=c(1,2))
plot(predictions[[1]][1,], prop[-trainidx,1], xlab="prediction", ylab="true")
segments(-100,-100,1000,1000,col=2,lwd=2)
plot(predictions[[1]][2,], prop[-trainidx,2], xlab="prediction", ylab="true")
segments(-100,-100,1000,1000,col=2,lwd=2)
# Set a targeted properties space
qsprpred_env$set_target(c(8,100),c(9,200))
# Predict properties for any input SMILES
# and their probability to be close to the targeted properties space
inv_pred <- qsprpred_env$qspr_predict(smis = smis[-trainidx], temp=c(3,3))
See \code{vignette("tutorial", package = "iqspr")} for further options and details.
## End(Not run)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.