predict.EvoWeaver: Make predictions with EvoWeaver objects

View source: R/EvoWeaver-class.R

predict.EvoWeaverR Documentation

Make predictions with EvoWeaver objects

Description

This S3 method predicts pairwise functional associations between gene groups encoded in a EvoWeaver object. This returns an object of type EvoWeb, which is essentially an adjacency matrix with some extra S3 methods to make printing cleaner.

Usage

## S3 method for class 'EvoWeaver'
predict(object, Method='Ensemble',
         Subset=NULL, Processors=1L,
         MySpeciesTree=SpeciesTree(object), 
         PretrainedModel=NULL,
         NoPrediction=FALSE,
         ReturnRawData=FALSE, Verbose=TRUE, ...)

Arguments

object

A EvoWeaver object

Method

Method(s) to use for prediction. This can be a character vector with multiple entries for predicting using multiple methods. See 'Details' for more information.

Subset

Subset of data to predict on. This can either be a vector or a 2xN matrix.

If a vector, prediction proceeds for all possible pairs of elements specified in the vector (either by name, for character vector, or by index, for numeric vector). For example, subset=1:3 will predict for pairs (1,2), (1,3), (2,3).

If a matrix, subset is interpreted as a matrix of pairs, where each row of the matrix specifies a pair to evaluate. These can also be specifed by name (character) or by index (numeric).

subset=rbind(c(1,2),c(1,3),c(2,3)) produces equivalent functionality to subset=1:3.

Processors

Number of cores to use for methods that support multithreaded execution. Setting to NULL or a negative value will use the value of detectCores(), or one core if the number of available cores cannot be determined. See Note for more information.

MySpeciesTree

Phylogenetic tree of all genomes in the dataset. Required for Method=c('ContextTree', 'GainLoss', 'CorrGL', 'ColocMoran', 'Behdenna'). 'Behdenna' requires a rooted, bifurcating tree (other values of Method can handle arbitrary trees). Note that EvoWeaver can automatically infer a species tree if initialized with dendrogram objects.

PretrainedModel

A pretrained model for use with ensemble predictions. If unspecified when Method='Ensemble', the program will use built-in models (see BuiltInEnsembles). See the examples for how to train an ensemble method to pass to PretrainedModel.

Has no effect if Method != 'Ensemble'.

NoPrediction

For Method='Ensemble', should data be returned prior to making predictions?

If TRUE, this will instead return a data.frame object with predictions from each algorithm for each pair. This dataframe is typically used to train an ensemble model.

If FALSE, EvoWeaver will return predictions for each pair (using user model if provided or a built-in otherwise).

ReturnRawData

Internal parameter used for ensemble predictions. Should not be set by the user.

Verbose

Logical indicating whether to print progress bars and messages. Defaults to TRUE.

...

Additional parameters for other predictors and consistency with generic.

Details

predict.EvoWeaver wraps several methods to create an easy interface for multiple prediction types. Method='Ensemble' is the default value, but each of the component analyses can also be accessed. The following is a list of all algorithms implemented in EvoWeaver (* denotes algorithms used in the EvoWeaver publication):

  • 'Ensemble': Ensemble prediction combining individual coevolutionary predictors. See Note below.

  • * 'Jaccard': Jaccard distance of Presence/Absence (P/A) profiles

  • 'Hamming': Hamming distance of P/A profiles

  • * 'MutualInformation': MI of P/A profiles

  • * 'PAPV': 1-p_value of P/A profiles

  • 'ProfDCA': Direct Coupling Analysis of P/A profiles

  • 'Behdenna': Analysis of Gain/Loss events following Behdenna et al. (2016)

  • * 'CorrGL': Correlation of ancestral Gain/Loss events

  • * 'GainLoss': Score-based method based on distance between inferred ancestral Gain/Loss events

  • * 'MirrorTree': MirrorTree using Random Projection for dimensionality reduction

  • * 'ContextTree': MirrorTree with Random Projection correcting for species tree and P/A conservation

  • * 'Coloc': Co-localization analysis

  • * 'ColocMoran': Co-localization analysis using Moran's I for phylogenetic correction and significance

  • * 'TranscripMI': Mutual Information of Transcriptional Direction

  • * 'NVDT': Correlation of distribution of sequence level residues following Zhao et al. (2022)

  • * 'ResidueMI': Mutual information of sites in multiple sequence alignment

The best performing individual predictors are c('CorrGL', 'GainLoss', 'MirrorTree', 'Jaccard'). Users interesting in running quick analyses should use c('CorrGL', 'GainLoss', 'Jaccard').

Additional information and references for each prediction algorithm can be found at the following pages:

  • EvoWeaver Phylogenetic Profiling Methods

  • EvoWeaver Phylogenetic Structure Methods

  • EvoWeaver Gene Organization Methods

  • EvoWeaver Sequence-Level Methods

This returns a EvoWeb object, an S3 class that makes formatting and printing of results slightly nicer. See EvoWeb for more information.

Different methods require different types of input. The constructor EvoWeaver will notify the user which methods are runnable with the given data. Method Ensemble automatically selects the methods that can be run with the given input data.

See EvoWeaver for more information on input data types.

Value

Returns a EvoWeb object. See EvoWeb for more info.

Note

The current ensemble method included with EvoWeaver is out of date. EvoWeaver's publication used a random forest model from the randomForest package for prediction. The next release of EvoWeaver will include multiple new built-in ensemble methods, but in the interim users are recommended to rely on randomForest or neuralnet. Planned algorithms are random forests and feed-forward neural networks. Feel free to contact me regarding other models you would like to see added.

If NumCores is set to NULL, EvoWeaver will use one less core than is detected, or one core if detectCores() cannot detect the number of available cores. This is because of a recurring issue on my machine where the R session takes all available cores and is then locked out of forking processes, with the only solution to restart the entire R session. This may be an issue specific to ARM Macs, but out of an abundance of caution I've made the default setting to be slightly slower but guarantee completion rather than risk bricking a machine.

Author(s)

Aidan Lakshman ahl27@pitt.edu

See Also

EvoWeaver

EvoWeb

EvoWeaver Phylogenetic Profiling Predictors

EvoWeaver Phylogenetic Structure Predictors

EvoWeaver Gene Organization Predictors

EvoWeaver Sequence-Level Predictors

Examples

##############
## Prediction with built-in model and data 
###############

exData <- get(data("ExampleStreptomycesData"))
ew <- EvoWeaver(exData$Genes[1:50])

# Subset isn't necessary but is faster for a working example
evoweb1 <- predict(ew, Subset=1:10, MySpeciesTree=exData$Tree)

# print out results as an adjacency matrix
evoweb1

###############
## Training own ensemble model
###############

datavals <- predict(ew, NoPrediction=TRUE)
                  
actual_values <- sample(c(0,1), nrow(datavals), replace=TRUE)
# This example just picks random numbers
# ***Do not do this for your own models***

# Make sure the actual values correspond to the right pairs! 
datavals[,'y'] <- actual_values                  
myModel <- glm(y~., datavals[,-c(1,2)], family='binomial')

testEvoWeaverObject <- EvoWeaver(exData$Genes[51:60])
evoweb2 <- predict(testEvoWeaverObject, 
                     PretrainedModel=myModel)
                     
# Print result as a matrix of pairwise scores
evoweb2

npcooley/SynExtend documentation built on May 2, 2024, 7:28 p.m.