View source: R/EvoWeaver-class.R
predict.EvoWeaver | R Documentation |
This S3 method predicts pairwise functional associations between gene groups encoded in a EvoWeaver
object.
This returns an object of type EvoWeb
, which is essentially an adjacency
matrix with some extra S3 methods to make printing cleaner.
## S3 method for class 'EvoWeaver'
predict(object, Method='Ensemble',
Subset=NULL,
MySpeciesTree=SpeciesTree(object, Verbose=Verbose),
PretrainedModel="KEGG",
NoPrediction=FALSE,
ReturnDataFrame=TRUE,
Verbose=interactive(),
CombinePVal=TRUE,
useDNA=FALSE,...)
object |
A EvoWeaver object |
Method |
Character; Method(s) to use for prediction. This can be a character vector with multiple entries for predicting using multiple methods. See 'Details' for more information. |
Subset |
Either a vector or a If a vector, prediction proceeds for all possible pairs of elements specified in the vector
(either by name, for character vector, or by index, for numeric vector). For example,
If a matrix, subset is interpreted as a matrix of pairs, where each row of the matrix specifies a pair to evaluate. These can also be specifed by name (character) or by index (numeric).
|
MySpeciesTree |
Object of class |
PretrainedModel |
A pretrained model for use with ensemble predictions. The default
value is Has no effect if |
NoPrediction |
Logical; determines if data should be returned prior to making prediction for If If |
ReturnDataFrame |
Logical; Determines if the function should return a |
Verbose |
Logical; Determines if status messages and progress bars should be displayed while running. |
CombinePVal |
Logical; Determines if scores and p-values should be combined or returned as separate values. |
useDNA |
Logical; Determines whether to interpret sequences as DNA or AA (only used for Sequence Level methods, see Details). |
... |
Additional parameters for other predictors and consistency with generic. |
predict.EvoWeaver
wraps several methods to create an easy interface for multiple prediction types. Method='Ensemble'
is the default value, but each of the component analyses can also be accessed. Common arguments to Method
include:
'Ensemble'
: Ensemble prediction combining individual coevolutionary predictors. See Note
below.
'PhylogeneticProfiling'
: All Phylogenetic Profiling Algorithms used in the EvoWeaver manuscript.
'PhylogeneticStructure'
: All EvoWeaver Phylogenetic Structure Methods
'GeneOrganization'
: All EvoWeaver Gene Organization Methods
'SequenceLevel'
: All EvoWeaver Sequence Level Methods used in the EvoWeaver manuscript.
Additional information and references for each prediction algorithm can be found at the following pages:
EvoWeaver Phylogenetic Profiling Methods
EvoWeaver Phylogenetic Structure Methods
EvoWeaver Gene Organization Methods
EvoWeaver Sequence Level Methods
The standard return type is a data.frame
object with one column per predictor and an additional two columns specifying the genes in each pair. If ReturnDataFrame=FALSE
, this returns a EvoWeb
object. See EvoWeb
for more information. Use of this parameter is discouraged.
By default, EvoWeaver weights scores by their p-value to correct for spurious correlations. The returned scores are raw_score*(1-p_value)
. If CombinePVal=FALSE
, EvoWeaver will instead return the raw score and the p-value separately. The resulting data.frame will have one column for the raw score (denoted METHOD.score
) and one column for the p-value (denoted METHOD.pval
). Note: p-values are recorded as (1-p). Not all methods support returning p-values separately from the score; in this case, only a METHOD.score
column will be returned.
Different methods require different types of input. The constructor
EvoWeaver
will notify the user which methods are
runnable with the given data. Method Ensemble
automatically selects the
methods that can be run with the given input data.
See EvoWeaver
for more information on input data types.
Complete listing of all supported methods (asterisk denotes a method used in Ensemble
, if possible):
* 'GLMI'
: MI of G/L profiles
* 'GLDistance'
: Score-based method based on distance between inferred ancestral Gain/Loss events
* 'PAJaccard'
: Centered Jaccard distance of P/A profiles with conserved clades collapsed
* 'PAOverlap'
: Conservation of ancestral states based on P/A profiles
* 'RPMirrorTree'
: MirrorTree using Random Projection for dimensionality reduction
* 'RPContextTree'
: MirrorTree with Random Projection correcting for species tree and P/A conservation
* 'GeneDistance'
: Co-localization analysis
* 'MoransI'
: Co-localization analysis using Moran's I for phylogenetic correction and significance
* 'OrientationMI'
: Mutual Information of Gene Relative Orientation
* 'GeneVector'
: Correlation of distribution of sequence level residues following Zhao et al. (2022)
* 'SequenceInfo'
: Mutual information of sites in multiple sequence alignment
'ExtantJaccard'
: Jaccard Index of Presence/Absence (P/A) profiles at extant leaves
'Hamming'
: Hamming similarity of P/A profiles
'PAPV'
: 1-p_value
of P/A profiles
'ProfDCA'
: Direct Coupling Analysis of P/A profiles
'Behdenna'
: Analysis of Gain/Loss events following Behdenna et al. (2016)
'CorrGL'
: Correlation of ancestral Gain/Loss events
If ReturnDataFrame=TRUE
, returns a data.frame
object where each row corresponds to a single prediction for a pair of gene groups. The first two columns contain the gene group identifiers for each pair, and the remaining columns contain each prediction.
If ReturnDataFrame=FALSE
, the return type is a list of EvoWeb
objects. See EvoWeb
for more info.
If NumCores
is set to NULL
, EvoWeaver will use one less core than is detected, or one core if detectCores()
cannot detect the number of available cores. This is because of a potential
issue where the R session can consume all available cores and then lose the ability to fork processes, with the only solution to restart the entire R session.
If ReturnDataFrame=FALSE
and CombinePVal=FALSE
, the resulting EvoWeb
objects will contain values of type 'complex'
. For each value, the real part denotes the raw score, and the imaginary part denotes 1-p
, with p
the p-value.
Aidan Lakshman ahl27@pitt.edu
EvoWeaver
EvoWeb
EvoWeaver Phylogenetic Profiling Predictors
EvoWeaver Phylogenetic Structure Predictors
EvoWeaver Gene Organization Predictors
EvoWeaver Sequence Level Predictors
##############
## Prediction with built-in model and data
###############
set.seed(555L)
exData <- get(data("ExampleStreptomycesData"))
ew <- EvoWeaver(exData$Genes[1:50], MySpeciesTree=exData$Tree)
# Subset isn't necessary but is faster for a working example
evoweb1 <- predict(ew, Subset=1:2)
# print out results as an adjacency matrix
if(interactive()) print(evoweb1)
###############
## Training own ensemble model
###############
datavals <- evoweb1[,-c(1,2,10)]
actual_values <- sample(c(0,1), nrow(datavals), replace=TRUE)
# This example just picks random numbers
# ***Do not do this for your own models***
# Make sure the actual values correspond to the right pairs!
datavals[,'y'] <- actual_values
myModel <- glm(y~., datavals[,-c(1,2)], family='binomial')
testEvoWeaverObject <- EvoWeaver(exData$Genes[51:60], MySpeciesTree=exData$Tree)
evoweb2 <- predict(testEvoWeaverObject,
PretrainedModel=myModel)
# Print result as a data.frame of pairwise scores
if(interactive()) print(evoweb2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.