eiInit: Initialize a compound database

View source: R/core.R

eiInitR Documentation

Initialize a compound database

Description

Takes the raw compound database in whatever format the given measure supports and creates a "data" directory.

Usage

	eiInit(inputs,dir=".",format="sdf",descriptorType="ap",append=FALSE,
	conn=defaultConn(dir,create=TRUE), updateByName = FALSE, cl = NULL, connSource = NULL,
	priorityFn = forestSizePriorities,skipPriorities=FALSE)

Arguments

inputs

Either a filename of a file in format format, or an SDFset. This can also be a vector of filenames and if cl is also specified and if you database supports it (SQLite does not), it will load these file in parallel on the cluster.

dir

The directory where the "data" directory lives. Defaults to the current directory.

format

The format of the data in inputs. Currently only "sdf" and "smiles" is supported.

descriptorType

The format of the descriptor. Currently supported values are "ap" for atom pair, and "fp" for fingerprint.

append

If true the given compounds will be added to an existing database and the <data-dir>/Main.iddb file will be updated with the new compound id numbers. This should not normally be used directly, use eiAdd instead to add new compounds to a database.

conn

Database connection to use. If a connection is given, you must ensure that it has been initialized using the initDb function from ChemmineR before calling eiInit.

updateByName

If true we make the assumption that all compounds, both in the existing database and the given dataset, have unique names. This function will then avoid re-adding existing, identical compounds, and will update existing compounds with a new definition if a new compound definition with an existing name is given.

If false, we allow duplicate compound names to exist in the database, though not duplicate definitions. So identical compounds will not be re-added, but if a new version of an existing compound is added it will not update the existing one, it will add the modified one as a completely new compound with a new compound id.

cl

A SNOW cluster can be given here to run this function in parallel.

connSource

A function returning a new database connection. Note that it is not sufficient to return a reference to an existing connection, it must be a distinct, new connection. This is needed for cluster operations that make use of the database as they will each need to create a new connection. If not given, certain parts of this function will not be parallelized.

This function can also be used to setup the environment on the cluster worker nodes. For example, you might need to re-load libraries like RSQLite and such.

priorityFn

This option takes a function that takes a list of compound ids and returns a data frame with the compound ids as the column 'compound_id', and their priority as the column 'priority'. There are two pre-defined functions in ChemmineR: 'randomPriorities', and 'forestSizePriorities' (default).

When several compounds map to the same descriptor, then when some functions need to go from a descriptor to a compound, there is ambiguity about which compound to select. In that case, it will pick the compound with the highest priority.

skipPriorities

If this is true, then no priority values will be computed. See option priorityFn for an explanation of priorities.

Details

EiInit can take either an SDFset, or a filename. SDF and SMILES is supported by default. It might complain if your SDF file does not follow the SDF specification. If this happens, you can create an SDFset with the read.SDFset command and then use that instead of the filename.

EiInit will create a folder called 'data'. Commands should always be executed in the folder containing this directory (ie, the parent directory of "data"), or else specify the location of that directory with the dir option.

Value

A directory called "data" will have been created in the current working directory. The generated compound ids of the given compounds will be returned. These can be used to reference a compound or set of compounds in other functions, such as eiQuery.

Author(s)

Kevin Horan

See Also

eiMakeDb eiPerformanceTest eiQuery

Examples

   data(sdfsample)
   dir=file.path(tempdir(),"init")
   dir.create(dir)
   eiInit(sdfsample,dir=dir,priorityFn=randomPriorities)

girke-lab/eiR documentation built on April 19, 2023, 12:52 p.m.