SmcChem-class: SMILES generator

Description Arguments Methods Examples

Description

SMILES generator thanks to a sequential Monte-Carlo sampler

Arguments

smis

is an initial vector of SMILES from which the generation of novel SMILES begins.

v_engram

is an Engram object, a priori created, encapsulating the SMILES grammar (see ENgram for more details).

v_m

is a positive scalar representing the order of a used ENgram model for generation.

v_qsprpred

is a QSPRpred object in which a regression model, initially trained, is accessible for properties predictions (i.e. physico-chemical properties here) of compounds from newly created SMILES.

v_temp

is a vector of numerical values, with a length equals to the number of properties, which represents the annealing parametrization in the sequential Monte-Carlo sampler.

v_decay

is a positive scalar corresponding to the decay rate of temp above (temp_{i+1}=temp_{i}^decay).

v_ESSth

is a positive scalar representing the threshold from which a re-sampling over the set of newly created SMILES is done (0.5 by default). This threshold limits the degeneracy in the set of newly created SMILES. A lower (higher) value allows more (less) degeneracy.

gentype

is the type of the procedure used by the SMILES strings generator. For a Back-off procedure, use "ML" (by default), and for a Neaser-Nay smoothing procedure, use "KN".

v_maxstock

is the maximum of newly created SMILES kept in stock (2000 by default).

keeptrack

is set to TRUE by default. It allows the tracking of the mean of predicted properties, and thus the plotting and/or listing of the latest newly created SMILES during the generation process. It is extremely useful in order to tune the annealing parameters, as to visualize the convergence speed to a targeted physico-chemical properties space.

smidatabase

is a vector of known SMILES to which the generated SMILES should not match. This is useful to avoid the creation of SMILES with great similarity with existing and/or un-wanted ones.

Methods

get_hiscores(nsmi = 100, exsim = 0.8)

get chemical structures with high QSPR score from SmcChem object (same as get_hiscores function)

get_smiles()

get SMILES strings from the SmcChem object (same as get_smiles function)

initialize(smis = NULL, v_engram = NULL, v_m = NULL, v_qsprpred = NULL, v_temp = c(1, 1), v_decay = 0.95, v_ESSth = 0.5, gentype = "ML", v_maxstock = 2000, keeptrack = TRUE, smidatabase = NULL)

Initialize the SMC chemical generator with initial SMILES strings smis, ENgram class object v_engram and QSPRpred class object v_qsprpred

smcexec(niter, nsteps = 5, preorder = 0, nview = 0)

modify chemical structures with niter SMC updates

viewstr(idx)

view 2D structures from SMILES string vector with index idx (same as viewstr function)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
## Not run: #sample data
data(qspr.data)
idx <- sample(nrow(qspr.data), 5000)
smis <- paste(qspr.data[idx,1])
y <- qspr.data[idx,c(2,5)]

#learning a pattern of chemical strings
data(trainedSMI)
data(engram_5k)  #same as run => engram <- ENgram$new(trainedSMI, order=10)

#learning QSPR model
data(qsprpred_EG_5k)
#same as run => qsprpred <- QSPRpred$new(smis=smis, y=as.matrix(y), v_fpnames="graph")

#set target range
targ.min <- c(200,1.5)
targ.max <- c(350,2.5)
qsprpred_EG_5k$set_target(targ.min,targ.max)

#getting chemical strings from the Inverse-QSPR model
smchem <- SmcChem$new(smis = rep("c1ccccc1O", 25), v_qsprpred=qsprpred_EG_5k,
                     v_engram=engram_5k,temp=3)

smchem$smcexec(niter=5, preorder=0, nview=4)
#if OpenBabel (>= 2.3.1) is installed, you can use reordering for better mixing as
#smchem$smcexec(niter=100, preorder=0.2, nview=4)
#see http://openbabel.org

#check
gensmis <- smchem$get_hiscores(nsmi=5, exsim=0.9)
pred <- qsprpred_EG_5k$qspr_predx(gensmis[,1])
## End(Not run)

iqspr documentation built on Aug. 1, 2017, 9:02 a.m.