Home

/

GitHub

/

girke-lab/eiR

/

eiInit: Initialize a compound database

eiInit: Initialize a compound database
In girke-lab/eiR: Accelerated similarity searching of small molecules

View source: R/core.R

eiInit

R Documentation

Initialize a compound database

Description

Takes the raw compound database in whatever format the given measure supports and creates a "data" directory.

Usage

	eiInit(inputs,dir=".",format="sdf",descriptorType="ap",append=FALSE,
	conn=defaultConn(dir,create=TRUE), updateByName = FALSE, cl = NULL, connSource = NULL,
	priorityFn = forestSizePriorities,skipPriorities=FALSE)

Arguments

`inputs`	Either a filename of a file in `format` format, or an SDFset. This can also be a vector of filenames and if `cl` is also specified and if you database supports it (SQLite does not), it will load these file in parallel on the cluster.
`dir`	The directory where the "data" directory lives. Defaults to the current directory.
`format`	The format of the data in `inputs`. Currently only "sdf" and "smiles" is supported.
`descriptorType`	The format of the descriptor. Currently supported values are "ap" for atom pair, and "fp" for fingerprint.
`append`	If true the given compounds will be added to an existing database and the <data-dir>/Main.iddb file will be updated with the new compound id numbers. This should not normally be used directly, use `eiAdd` instead to add new compounds to a database.
`conn`	Database connection to use. If a connection is given, you must ensure that it has been initialized using the `initDb` function from ChemmineR before calling `eiInit`.
`updateByName`	If true we make the assumption that all compounds, both in the existing database and the given dataset, have unique names. This function will then avoid re-adding existing, identical compounds, and will update existing compounds with a new definition if a new compound definition with an existing name is given. If false, we allow duplicate compound names to exist in the database, though not duplicate definitions. So identical compounds will not be re-added, but if a new version of an existing compound is added it will not update the existing one, it will add the modified one as a completely new compound with a new compound id.
`cl`	A SNOW cluster can be given here to run this function in parallel.
`connSource`	A function returning a new database connection. Note that it is not sufficient to return a reference to an existing connection, it must be a distinct, new connection. This is needed for cluster operations that make use of the database as they will each need to create a new connection. If not given, certain parts of this function will not be parallelized. This function can also be used to setup the environment on the cluster worker nodes. For example, you might need to re-load libraries like RSQLite and such.
`priorityFn`	This option takes a function that takes a list of compound ids and returns a data frame with the compound ids as the column 'compound_id', and their priority as the column 'priority'. There are two pre-defined functions in ChemmineR: 'randomPriorities', and 'forestSizePriorities' (default). When several compounds map to the same descriptor, then when some functions need to go from a descriptor to a compound, there is ambiguity about which compound to select. In that case, it will pick the compound with the highest priority.
`skipPriorities`	If this is true, then no priority values will be computed. See option `priorityFn` for an explanation of priorities.

Details

EiInit can take either an SDFset, or a filename. SDF and SMILES is supported by default. It might complain if your SDF file does not follow the SDF specification. If this happens, you can create an SDFset with the read.SDFset command and then use that instead of the filename.

EiInit will create a folder called 'data'. Commands should always be executed in the folder containing this directory (ie, the parent directory of "data"), or else specify the location of that directory with the dir option.

Value

A directory called "data" will have been created in the current working directory. The generated compound ids of the given compounds will be returned. These can be used to reference a compound or set of compounds in other functions, such as eiQuery.

Author(s)

Kevin Horan

Examples

   data(sdfsample)
   dir=file.path(tempdir(),"init")
   dir.create(dir)
   eiInit(sdfsample,dir=dir,priorityFn=randomPriorities)

girke-lab/eiR documentation built on April 19, 2023, 12:52 p.m.

girke-lab/eiR index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

girke-lab/eiR
Accelerated similarity searching of small molecules

eiInit: Initialize a compound database
In girke-lab/eiR: Accelerated similarity searching of small molecules

Initialize a compound database

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to eiInit in girke-lab/eiR...

R Package Documentation

Browse R Packages

We want your feedback!

girke-lab/eiR Accelerated similarity searching of small molecules

eiInit: Initialize a compound database In girke-lab/eiR: Accelerated similarity searching of small molecules

Initialize a compound database

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to eiInit in girke-lab/eiR...

R Package Documentation

Browse R Packages

We want your feedback!

girke-lab/eiR
Accelerated similarity searching of small molecules

eiInit: Initialize a compound database
In girke-lab/eiR: Accelerated similarity searching of small molecules