TDCOR: The TDCOR main function
In TDCor: Gene Network Inference from Time-Series Transcriptomic Data

Description Usage Arguments Details Value Note Author(s) References See Also Examples

This is the main function to run the TDCOR algorithm and reconstruct the gene network topology.

1	TDCOR(dataset,l_genes, TPI, DPI, ...)

`dataset`	Numerical matrix storing the non-log2 transcriptomic data (average of replicates). The rows of this matrix must be named by gene codes (e.g. the AGI gene code for Arabidopsis datasets). The columns must be organized in chronological order from the left to the right.
`l_genes`	A character vector containing the (AGI) gene codes of the genes one wishes to build the network with (gene codes -e.g. "AT5G26930"- by opposed to gene names -e.g."GATA23"- which are provided by the optional `l_names` argument).
`TPI`	A TPI database generated by `CalculateTPI` which contains some necessary statistical information for triangle motifs pruning. In particular it must have an entry for all the regulators included in the network analysis. The TPI database may also contain data for genes that are not included in `l_genes`.
`DPI`	A DPI database generated by `CalculateDPI` which contains some necessary statistical information for diamond motifs pruning. In particular it must have an entry for all the regulators included in the network analysis. The DPI database may also contain data for genes that are not included in `l_genes`.
`...`	Additional arguments to be passed to the TDCOR function (Some are necessary if `dataset` is not the LR dataset): `l_names`: A character vector containing the 'everyday names' of the genes included in the analysis (e.g. "LBD16") . These are the names by which genes will be refered as to in the final network table. Gene names must be unique; repeats of the same name are not allowed. If no `l_names` parameter is given, it is by default equal to `l_genes`. The i-th element of the vector `l_names` contains the 'everyday name' of the i-th gene in `l_genes`. `l_prior`: A numerical vector containing the prior information on the genes included in the analysis. By defining the `l_prior`, the user defines which genes are positive regulators, which are negative regulators and which can only be targets. The prior code is defined as follow: -1 for negative regulator; 0 for non-regulator (target only); 1 for positive regulator; 2 for genes that can act as both positive and negative regulators. If no `l_prior` parameter is provided, it is by default equal to a vector of 2s, meaning that all genes are regarded as being both potential activators and repressors. The i-th element of the vector `l_prior` contains the prior to associate to the i-th gene in `l_genes`. `times`: A numerical vector containing the successive times (in hours) at which the samples were collected to generate the time-series transcriptomic dataset.If no `times` parameter is given, it is by default equal to the `times` parameter used for the lateral root transcriptomic dataset. `n0`: An integer corresponding to the number of iterations to be performed in the external bootstrap loop. At the beginning of every iteration of this external loop, new random parameter values are sampled in the user-defined bootstrapping interval. If no `n0` parameter is given, it is by default equal to 1000. `n1` : An integer corresponding to the number of iterations to be performed in the internal bootstrap loop. In this loop parameter values are kept the same but the order of node analysis is randomized at each iteration. If no `n1` parameter is provided, it is by default equal to 10. `time_step`: A positive number corresponding to the time step (in hours) i.e. the temporal resolution at which the gene profiles are analysed. If no `time_step` parameter is provided, it is by default equal to 1 hour. `delayspan`: A positive number. It is the maximum time shift (in hours) which is analysed. It should be high enough for the time shift estimation process to be successful but argueably relatively small in comparison to the overall duration of the time series. (e.g. for the LR dataset which has data spanning over 54 hours, `delayspan` was set to 12 hours). `tol`: A strictly positive number corresponding to the tolerance threshold on the final score of time shift estimates. The score is a positive number that measures the "level of disagreement" between the four time shift estimators. A time shift estimate is regarded as meaningful if it scores lower than the `tol` threshold. If all four estimators agree on a certain value of time shift, the estimate obtains the best possible score, which is 0. Increasing `tol` make the time shift estimation process LESS stringent. For more information see `estimate.delay`. `delaymin`: A numerical vector containing one or two positive elements corresponding to the boundaries of the boostrapping interval for the minimum time shift above which putative interactions are regarded as possible. If no `delaymin` parameter is provided, it is by default equal to 0 hour. Gene pairs with time shift lower than or equal to `delaymin` are regarded as co-regulation and are therefore not included in the network. `delaymax`: A numerical vector containing one or two positive elements corresponding to the boundaries of the boostrapping interval for the maximum time shift above which putative interactions are regarded as indirect. If no `delaymax` parameter is provided, it is by default equal to 3 hours. Putative indirect interactions are included in the network only when the putative target is not predicted any direct regulator. `thr_cor`: A numerical vector containing one or two positive elements between 0 and 1 corresponding to the boundaries of the boostrapping interval for the threshold of Pearson's correlation. A gene pair is included in the preliminary network only if the correlation between the profiles (with the time shift correction) is higher than or equal to the `thr_cor` threshold. If no `thr_cor` parameter is provided, it is by default equal to [0.7;0.9]. Note that increasing `thr_cor` makes the correlation filter MORE stringent. `delay`: A positive number corresponding to the most likely time shift (in hours) one could expect between the profile of a regulator and the profile of its direct targets. This parameter enables one to generate the reference profiles of the ideal regulator when calculating the index of directness (ID). If no `delay` parameter is provided, it is by default equal to 3 hours. Note that similar parameters serving the same purpose are also used to calculate the triangle pruning index and the diamond pruning index. But TDCOR reads the value to use for calculating those indices directly from the TPI and DPI databases (for consistency reasons). `thr_ind1`: A numerical vector containing one or two positive elements corresponding to the boundaries of the boostrapping interval for the index of directness (ID) lower threshold. Gene pairs showing an ID below this threshold will be regarded as co-regulation and therefore eliminated from the network. If no `thr_ind1` parameter is provided, it is by default equal to 0.5. Reminder: For direct interaction one expects ID values around 1. For indirect interactions one expect values greater than 1. For co-regulated genes, ID should be smaller than 1. Note that increasing `thr_ind1` makes the ID-based "anti-coregulation filter" MORE stringent. `thr_ind2`: A numerical vector containing one or two positive elements corresponding to the boundaries of the boostrapping interval for the index of directness (ID) upper threshold. Putative interactions showing an ID above this threshold are regarded as indirect. If no `thr_ind2` parameter is provided, it is by default equal to 4.5. Reminder: For direct interaction one expects ID values around 1. For indirect interactions one expect values greater than 1. For co-regulated genes, ID should be smaller than 1. Note that increasing `thr_ind2` makes the ID-based filter against indirect interactions LESS stringent. `thr_overlap` : A numerical vector containing one or two positive elements smaller than 1. These correspond to the boundaries of the boostrapping interval for the index of overlap. If no `thr_overlap` parameter is provided, it is by default equal to [0.5,0.6]. Note that increasing `thr_overlap` makes the overlap filter MORE stringent. This filter aims at removing unlikely negative interactions where the putative regulator switches on too late to downregulate the putative target. Keep in mind that the filter is sensitive to the noise level in the data. It should only be used if the data has a very low level of noise. To inactivate the filter set the `thr_overlap` parameter to 0. `thrpTPI`: A numerical vector containing one or two positive numbers smaller or equal to 1 in increasing order. These correspond to the boundaries of the boostrapping interval for the probability threshold used in the triangle filter. If no `thrpTPI` parameter is provided, it is by default equal to [0.5,0.75]. Note that increasing `thrpTPI` makes the triangle filter LESS stringent. `thrpDPI`: A numerical vector containing one or two positive numbers smaller or equal to 1 in increasing order. These correspond to the boundaries of the boostrapping interval for the probability threshold used in the diamond filter. If no `thrpTPI` parameter is provided, it is by default equal to [0.8,0.9]. Note that increasing `thrpDPI` makes the diamond filter LESS stringent. `thr_isr`: A numerical vector containing one or two positive elements corresponding to the boundaries of the boostrapping interval for the threshold of the index of directness above which the gene is predicted to negatively self-regulate. Genes will be predicted to positively self-regulate if the index of directness is smaller than 1/`thr_isr`. If no `thr_isr` parameter is provided, it is by default equal to [3,6]. Note that increasing `thr_isr` makes the search for self-regulating genes MORE stringent. `search.EP`: A boolean to control whether Master-Regulator-Signal-Transducer (MRST) or signal Entry Point (EP) should be looked for or not. (If yes, set on TRUE which is the default value) `thr_bool_EP`: A number between 0 and 1 used as threshold to convert normalized expression profiles (values between 0 and 1) into boolean expression profiles (values equal to 0 or 1). If no `thr_bool_EP` parameter is provided, it is by default equal to 0.8. The conversion of the continuous profiles into boolean profiles is part of the process of MRST analysis. `MinTarNumber`: An integer. Minimum number of targets a regulator should have in order to be regarded as a potential MRST. If no `MinTarNumber` parameter is provided, it is by default equal to 5. Note that increasing `MinTarNumber` makes the search for MRST genes MORE stringent. `MinProp`: A number between 0 and 1. Minimum proportion of targets which are not at steady state at t=0 that a regulator should have in order to be regarded as a potential MRST. If no `MinProp` parameter is provided, it is by default equal to 0.75. Note that increasing `MinProp` makes the search for MRST genes MORE stringent. `MaxEPNumber`: An integer. Maximum number of MRST that can be predicted at each iteration. If no `MaxEPNumber` parameter is provided, it is by default equal to 1. `regmax`: An integer. Maximum number of regulators that a target may have. If no `regmax` parameter is provided, it is by default equal to 6. `outfile_name`: A string. Name of the file to print the network table in. By default it is "TDCor_output.txt".

The default values are certainly not the best values to work with. The TDCOR parameters have to be optimized by the user based on its own knowledge of the network, the quality of the data etc... Because TDCOR works by pruning interactions, it is probably easier (as a first go) to optimize the parameter values following the order of the filters.

Before starting inactivate all the filters using the less stringent parameter values possible or for the MRST filter by setting search.EP to FALSE. You should as well set the bootstrap parameters to a relatively low value (e.g. n0=100 and n1=1). Hence the runs will be quick and you will be able to rapidly assess whether the changes you made in the parameter values were a good thing.

Start by optimizing the parameters involved in time shifts estimation. That is to say, essentially delayspan, time_step, tol and delaymax. The latter (together with delaymin) is a biological parameters and the range of possible values is argueably limited. Though they ought to be adapted to the organism (e.g. in prokaryotes, the delays are extremely short since polysomes couple transcription and translation). Note that the estimate.delay function can be very helpful to optimize these various parameters thanks to the visual output. Use it with pairs of genes that have been shown to interact directly or indirectly in your system and for which the relationship in the dataset in clearly linear. For network reconstruction with TDCor, good time shift estimation is absolutely crucial. Once this is done, proceed with optimizing the threshold for correlation thr_cor and the thresholds on the index of directness (thr_ind1, thr_ind2). Then optimize the parameters of the triangle and diamond pruning filters (thrpTPI and thrpDPI). You may have to try a couple of different TPI and DPI databases (i.e. databases built with different input parameters). In particular increasing the noise level when generating these database enables one to decrease the stringency of the triangle and diamond filters, when increasing the thrpTPI and thrpDPI value is not sufficient. Subsequently fine-tune the parameters of the MRST filter (thr_bool_EP, MinTarNumber, MinProp, MaxEPNumber) if you want it on. Remember to set search.EP back to TRUE first. Next optimize thr_isr (self-regulation). Finally, restrict the number of maximum regulators if necessary (regmax).

The TDCOR main function returns a list containing 7 elements

`input`	A list containing the input parameters (as a reminder).
`intermediate`	A list containing three intermediate matrices. `mat_cor` is the matrix that stores the correlations, `mat_isr` stores the indices of self-regulations and `mat_overlap` contains the indices of overlap.
`network`	A matrix containing the network. The element [i,j] of this matrix contains the bootstrap value for the edge "gene j to gene i". The sign indicates the sign of the predicted interaction.
`ID`	A matrix containing the computed indices of directness (ID). The element [i,j] contains the ID for the edge "gene j to gene i".
`delay`	A matrix containing the computed time shifts. The element [i,j] of this matrix contains the estimated time shift between the profile of gene j and the profile of gene i.
`EP`	A vector containing the bootstrap values for the MRST predictions.
`predictions`	The edge predictions in the form of a table. The columns are organized in following order: Regulator name, Type of interaction (+ or-), Target name, Bootstrap, Index of Directness, Estimated time shift between the target and regulator profiles.

The table of predictions (without header) and the input parameters are printed at the end of the run in two separate text files located in the current R working directory (If you are not sure which directory this is, use the command getwd()).

For a parameter to be involved in the bootstrapping process, one must feed the function a vector containing two values as input. These two values are respectively the lower and upper boundaries of the bootstrapping interval. If one chooses not to use a parameter for bootstrapping, one can either feed the function an input vector containing twice the same value, or only one value.

Julien Lavenus jl.tdcor@gmail.com

Lavenus et al., 2015, The Plant Cell

See also CalculateDPI, CalculateTPI, UpdateDPI, UpdateTPI, TDCor-package.

## Not run: 
# Load the lateral root transcriptomic dataset
data(LR_dataset)

# Load the vectors of gene codes, gene names and prior
data(l_genes)
data(l_names)
data(l_prior)

# Load the vector of time points for the LR_dataset
data(times)

# Generate the DPI databases

DPI15=CalculateDPI(dataset=LR_dataset,l_genes=l_genes,l_prior=l_prior,
times=times,time_step=1,N=10000,ks_int=c(0.5,3),kd_int=c(0.5,3),delta_int=c(0.5,3),
noise=0.15,delay=3)

# Generate the TPI databases

TPI10=CalculateTPI(dataset=LR_dataset,l_genes=l_genes,
l_prior=l_prior,times=times,time_step=1,N=10000,ks_int=c(0.5,3),
kd_int=c(0.5,3),delta_int=c(0.5,3),noise=0.1,delay=3)

# Check/update if necessary the databases (Not necessary here though.
# This is just to illustrate how it would work.)

TPI10=UpdateTPI(TPI10,LR_dataset,l_genes,l_prior)
DPI15=UpdateDPI(DPI15,LR_dataset,l_genes,l_prior)


### Choose your TDCOR parameters ###
 
# Parameters for time shift estimatation 
# and filter on time shift value
ptime_step=1			
ptol=0.13			 
pdelayspan=12
pdelaymax=c(2.5,3.5)
pdelaymin=0

# Parameter of the correlation filter
pthr_cor=c(0.65,0.8)

# Parameters of the ID filter
pdelay=3		
pthr_ind1=0.65			
pthr_ind2=3.5

# Parameter of the overlap filter
pthr_overlap=c(0.4,0.6)

# Parameters of the triangle and diamond filters
pthrpTPI=c(0.55,0.8)
pthrpDPI=c(0.65,0.8)	
pTPI=TPI10			
pDPI=DPI15

# Parameter for identification of self-regulations				
pthr_isr=c(4,6)

# Parameters for MRST identification			
pMinTarNumber=5			
pMinProp=0.6

# Max number of regulators		
pregmax=5

# Bootstrap parameters
pn0=1000			
pn1=10

# Name of the file to print network in
poutfile_name="TDCor_output.txt"	

### Reconstruct the network ###

tdcor_out= TDCOR(dataset=LR_dataset,l_genes=l_genes,l_names=l_names,n0=pn0,n1=pn1,
l_prior=l_prior,thr_ind1=pthr_ind1,thr_ind2=pthr_ind2,regmax=pregmax,thr_cor=pthr_cor,
delayspan=pdelayspan,delaymax=pdelaymax,delaymin=pdelaymin,delay=pdelay,thrpTPI=pthrpTPI,
thrpDPI=pthrpDPI,TPI=pTPI,DPI=pDPI,thr_isr=pthr_isr,time_step=ptime_step,
thr_overlap=pthr_overlap,tol=ptol,MinProp=pMinProp,MinTarNumber=pMinTarNumber,
outfile_name=poutfile_name)


## End(Not run)