View source: R/EvoWeaver-class.R
predict.EvoWeaver | R Documentation |
This S3 method predicts pairwise functional associations between gene groups encoded in a EvoWeaver
object.
This returns an object of type EvoWeb
, which is essentially an adjacency
matrix with some extra S3 methods to make printing cleaner.
## S3 method for class 'EvoWeaver'
predict(object, Method='Ensemble',
Subset=NULL, Processors=1L,
MySpeciesTree=SpeciesTree(object, Verbose=Verbose),
PretrainedModel="KEGG",
NoPrediction=FALSE,
ReturnDataFrame=TRUE,
Verbose=interactive(),
CombinePVal=TRUE, ...)
object |
A EvoWeaver object |
Method |
Method(s) to use for prediction. This can be a character vector with multiple entries for predicting using multiple methods. See 'Details' for more information. |
Subset |
Subset of data to predict on. This can either be a vector or a If a vector, prediction proceeds for all possible pairs of elements specified in the vector
(either by name, for character vector, or by index, for numeric vector). For example,
If a matrix, subset is interpreted as a matrix of pairs, where each row of the matrix specifies a pair to evaluate. These can also be specifed by name (character) or by index (numeric).
|
Processors |
Number of cores to use for methods that support multithreaded execution.
Setting to |
MySpeciesTree |
Phylogenetic tree of all genomes in the dataset. Required for |
PretrainedModel |
A pretrained model for use with ensemble predictions. The default
value is Has no effect if |
NoPrediction |
For If If |
ReturnDataFrame |
Logical indicating whether to return a |
Verbose |
Logical indicating whether to print progress bars and messages. Defaults to |
CombinePVal |
Logical indicating whether to combine scores and p-values or to return them as separate values. Defaults to |
... |
Additional parameters for other predictors and consistency with generic. |
predict.EvoWeaver
wraps several methods to create an easy interface for multiple prediction types. Method='Ensemble'
is the default value, but each of the component analyses can also be accessed. Common arguments to Method
include:
'Ensemble'
: Ensemble prediction combining individual coevolutionary predictors. See Note
below.
'PhylogeneticProfiling'
: All Phylogenetic Profiling Algorithms used in the EvoWeaver manuscript.
'PhylogeneticStructure'
: All EvoWeaver Phylogenetic Structure Methods
'GeneOrganization'
: All EvoWeaver Gene Organization Methods
'SequenceLevel'
: All EvoWeaver Sequence Level Methods used in the EvoWeaver manuscript.
Additional information and references for each prediction algorithm can be found at the following pages:
EvoWeaver Phylogenetic Profiling Methods
EvoWeaver Phylogenetic Structure Methods
EvoWeaver Gene Organization Methods
EvoWeaver Sequence-Level Methods
The standard return type is a data.frame
object with one column per predictor and an additional two columns specifying the genes in each pair. If ReturnDataFrame=FALSE
, this returns a EvoWeb
object. See EvoWeb
for more information. Use of this parameter is discouraged.
By default, EvoWeaver weights scores by their p-value to correct for spurious correlations. The returned scores are raw_score*(1-p_value)
. If CombinePVal=FALSE
, EvoWeaver will instead return the raw score and the p-value separately. The resulting data.frame will have one column for the raw score (denoted METHOD.score
) and one column for the p-value (denoted METHOD.pval
). **Note: p-values are recorded as (1-p)**. Not all methods support returning p-values separately from the score; in this case, only a METHOD.score
column will be returned.
Different methods require different types of input. The constructor
EvoWeaver
will notify the user which methods are
runnable with the given data. Method Ensemble
automatically selects the
methods that can be run with the given input data.
See EvoWeaver
for more information on input data types.
Complete listing of all supported methods (asterisk denotes a method used in Ensemble
, if possible):
'ExtantJaccard'
: Jaccard Index of Presence/Absence (P/A) profiles at extant leaves
'Hamming'
: Hamming similarity of P/A profiles
* 'GLMI'
: MI of G/L profiles
'PAPV'
: 1-p_value
of P/A profiles
'ProfDCA'
: Direct Coupling Analysis of P/A profiles
'Behdenna'
: Analysis of Gain/Loss events following Behdenna et al. (2016)
'CorrGL'
: Correlation of ancestral Gain/Loss events
* 'GLDistance'
: Score-based method based on distance between inferred ancestral Gain/Loss events
* 'PAJaccard'
: Centered Jaccard distance of P/A profiles with conserved clades collapsed
* 'PAOverlap'
: Conservation of ancestral states based on P/A profiles
* 'RPMirrorTree'
: MirrorTree using Random Projection for dimensionality reduction
* 'RPContextTree'
: MirrorTree with Random Projection correcting for species tree and P/A conservation
* 'GeneDistance'
: Co-localization analysis
* 'MoransI'
: Co-localization analysis using Moran's I for phylogenetic correction and significance
* 'OrientationMI'
: Mutual Information of Gene Relative Orientation
* 'GeneVector'
: Correlation of distribution of sequence level residues following Zhao et al. (2022)
* 'SequenceInfo'
: Mutual information of sites in multiple sequence alignment
Returns a data.frame
object where each row corresponds to a single prediction for a pair of gene groups. The first two columns contain the gene group identifiers for each pair, and the remaining columns contain each prediction.
If ReturnDataFrame=FALSE
, the return type is a list of EvoWeb
objects. See EvoWeb
for more info.
EvoWeaver's publication used a random forest model from the randomForest
package for prediction. The next release of EvoWeaver will include multiple new built-in ensemble methods, but in the interim users are recommended to rely on randomForest
or neuralnet
. Planned algorithms are random forests and feed-forward neural networks. Feel free to contact me regarding other models you would like to see added.
If NumCores
is set to NULL
, EvoWeaver will use one less core than is detected, or one core if detectCores()
cannot detect the number of available cores. This is because of a recurring issue
on my machine where the R session takes all available cores and is then locked
out of forking processes, with the only solution to restart the entire R session.
This may be an issue specific to ARM Macs, but out of an abundance of caution
I've made the default setting to be slightly slower but guarantee completion
rather than risk bricking a machine.
If ReturnDataFrame=FALSE
and CombinePVal=FALSE
, the resulting EvoWeb
objects will contain values of type 'complex'
. For each value, the real part denotes the raw score, and the imaginary part denotes 1-p
, with p
the p-value.
Aidan Lakshman ahl27@pitt.edu
EvoWeaver
EvoWeb
EvoWeaver Phylogenetic Profiling Predictors
EvoWeaver Phylogenetic Structure Predictors
EvoWeaver Gene Organization Predictors
EvoWeaver Sequence-Level Predictors
##############
## Prediction with built-in model and data
###############
set.seed(555L)
exData <- get(data("ExampleStreptomycesData"))
ew <- EvoWeaver(exData$Genes[1:50], MySpeciesTree=exData$Tree)
# Subset isn't necessary but is faster for a working example
evoweb1 <- predict(ew, Subset=1:2)
# print out results as an adjacency matrix
if(interactive()) print(evoweb1)
###############
## Training own ensemble model
###############
datavals <- evoweb1[,-c(1,2,10)]
actual_values <- sample(c(0,1), nrow(datavals), replace=TRUE)
# This example just picks random numbers
# ***Do not do this for your own models***
# Make sure the actual values correspond to the right pairs!
datavals[,'y'] <- actual_values
myModel <- glm(y~., datavals[,-c(1,2)], family='binomial')
testEvoWeaverObject <- EvoWeaver(exData$Genes[51:60], MySpeciesTree=exData$Tree)
evoweb2 <- predict(testEvoWeaverObject,
PretrainedModel=myModel)
# Print result as a data.frame of pairwise scores
if(interactive()) print(evoweb2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.