library(aphylo)
knitr::knit_hooks$set(smallsize = function(before, options, envir) {
    if (before) {
        "\\footnotesize\n\n"
    } else {
        "\n\\normalsize\n\n"
    }
})

options(digits = 4)
knitr::opts_chunk$set(
  echo = TRUE, warning = FALSE, message = FALSE, echo=FALSE,
  out.width = ".7\\linewidth", fig.align = "center", fig.width = 7, fig.height = 5)
library(aphylo)

# Setting the seed and simulating the data
set.seed(2)
x <- raphylo(40, P = 2)

# Droping some annotations
x <- rdrop_annotations(x, .5)

The problem

plot(x, main = "")

Last year's EAC

In brief

Notation {.t}

\begincols

\begincol{.25\linewidth} \includegraphics[height=.8\textheight]{fig/annotated-tree.pdf} \endcol

\begincol{.74\linewidth}

\begin{table}[tb] \centering \begin{tabular}{lm{.7\linewidth}} \toprule Symbol & Description \ \midrule $\phylo \equiv (\nodes, \edges)$ & Phylogenetic Tree.\ $\parent{n}$ & Parent of node $n$. \ $\offspring{n}$ & Offspring of node $n$.\ $\Ann \equiv {\ann{n}}{n\in\nodes}$ & True annotations.\ $\AnnObs \equiv {\annObs{n}}{n\in\nodes}$ & Experimental annotations.\ $\aphylo \equiv (\phylo, \Ann)$ & Annotated Phylogenetic Tree.\ $\aphyloObs \equiv (\phylo, \AnnObs)$ & Experimentally Annotated Phylogenetic Tree.\ $\aphyloObs_n$ & Induced Experimentally Annotated Sub-tree of node $n$. \ $\aphyloObs_n^c$ & Complement of $\aphyloObs_n$. \ \bottomrule \end{tabular} \caption{Mathematical Notation\label{tab:notation}} \end{table}

\endcol \endcols

Recap: Model {.t}

\begincols

\begincol{.475\linewidth}

\begin{enumerate} \item A probabilistic model of gene function evolution,

\item The probability that the root node has the function is $\pi$,

\item Conditional on its parent state, the probabilities that any given node has to either gain or lose a function are $(\gain,\loss)$,

\item \only<1>{Finally}\only<2->{\sout{Finally}}, at the leaf node, the probability that a node with no function is mislabeled as having the function is $\misszero$. Conversely, the probability that a node with a function is mislabeled as not having the function is $\missone$.

\only<2>{\item Finally, curators will report their discovery of function \emph{present}/\emph{absent} with probability $\reportzero/\reportone$.} \end{enumerate}

\endcol

\begincol{.475\linewidth}

\begin{table}[tb] \centering \begin{tabular}{lm{.7\linewidth}} \toprule Parameter & Probability \ \midrule $\pi$ & The root node has the function \ $\gain$ & Gaining a function \ $\loss$ & losing a function \ $\misszero$ & Mislabeling a 0 \ $\missone$ & Mislabeling a 1 \ \only<2>{$\reportzero$} & \only<2>{Propensity to report a 0} \ \only<2>{$\reportone$} & \only<2>{Propensity to report a 1} \ \bottomrule \end{tabular} \caption{Model parameters\label{tab:parameters}} \end{table}

\endcol

\endcols

Changes from last year

From the formal (statistical) stand point

By-products generated during the implementation

Recap: The aphylo R package

Features:

Some new features

Nice visualizations

plot_logLik(x ~ mu + psi + Pi)

ans <- aphylo_mcmc(x ~ mu + psi + Pi, priors = bprior())
plot(prediction_score(ans), main = "", which.fun=1L)

Flexible model specification

Automatic specification of the likelihood function, e.g.

\small

\normalsize

Flexible model specification

\footnotesize

ans

\normalsize

Simulation study

Using the entire Panther data set (~13,000 families), we applied our model's data generating process to annotate trees.\pause

Four different scenarios:\pause

  1. Gold standard: Estimation of the model on fully annotated trees\pause

  2. Missing data: Estimation of the model with missing annotations [from 10% to 90% missigness]\pause

  3. Propensity to report (a): Same data as scenario 2, but we drop more observations with probabilities $\reportzero, \reportone$. Estimation does not include $\eta$.\pause

  4. Propensity to report (b): Sames as scenario 3, but we include $\eta$.

Gold standard: Bias (small trees) {.t}

\begin{figure}\centering \includegraphics[height=.8\textheight]{fig/01-gold-standard-bias_plots_tree-size=small.pdf} \end{figure}

Gold standard: Bias (mid-small trees) {.t}

\begin{figure}\centering \includegraphics[height=.8\textheight]{fig/01-gold-standard-bias_plots_tree-size=mid-small.pdf} \end{figure}

Gold standard: Bias (mid-large trees) {.t}

\begin{figure}\centering \includegraphics[height=.8\textheight]{fig/01-gold-standard-bias_plots_tree-size=mid-large.pdf} \end{figure}

Gold standard: Bias (large trees) {.t}

\begin{figure}\centering \includegraphics[height=.8\textheight]{fig/01-gold-standard-bias_plots_tree-size=large.pdf} \end{figure}

Gold standard: Prediction {.t}

\begin{figure} \includegraphics[height=.8\textheight]{fig/01-gold-standard-auc.pdf} \end{figure}

Gold standard: Convergence {.t}

\begin{figure} \includegraphics[height=.8\textheight]{fig/01-gold-standard-gelman.pdf} \end{figure}

Missing data: Bias (small trees) {.t}

\begin{figure}\centering \includegraphics[height=.8\textheight]{fig/02-missing-bias_plots_tree-size=small.pdf} \end{figure}

Missing data: Bias (mid-small trees) {.t}

\begin{figure}\centering \includegraphics[height=.8\textheight]{fig/02-missing-bias_plots_tree-size=mid-small.pdf} \end{figure}

Missing data: Bias (mid-large trees) {.t}

\begin{figure}\centering \includegraphics[height=.8\textheight]{fig/02-missing-bias_plots_tree-size=mid-large.pdf} \end{figure}

Missing data: Bias (large trees) {.t}

\begin{figure}\centering \includegraphics[height=.8\textheight]{fig/02-missing-bias_plots_tree-size=large.pdf} \end{figure}

Missing data: Prediction {.t}

\begin{figure} \includegraphics[height=.8\textheight]{fig/02-missing-auc.pdf} \end{figure}

Missing data: Convergence {.t}

\begin{figure} \includegraphics[height=.8\textheight]{fig/02-missing-gelman.pdf} \end{figure}

Does $\eta$ improves the model? Prediction

\begincols

\begincol{.49\linewidth}

\begin{figure}\centering \includegraphics[width=.65\textheight]{fig/03-pub-bias-auc.pdf} \caption{Misspecified model (does not include $\eta$)} \end{figure}

\endcol

\begincol{.49\linewidth}

\begin{figure}\centering \includegraphics[width=.65\textheight]{fig/04-full-model-auc.pdf} \caption{Correct specification (includes $\eta$)} \end{figure}

\endcol

\endcols

Status of the paper {.t}

\centering \includegraphics[width=1\linewidth]{pages.pdf}

Concluding remarks

A parsimonious model of gene functions: easy to apply on a large scale (we already ran some simulations using all 13,000 trees from PantherDB... and it took us less than 1 ~~week~~ hour with ~~10~~ 240 processors ~~only~~).\pause


\begin{center} \huge \color{USCCardinal}{\textbf{Thank you!}} \end{center}

\maketitle



USCbiostats/aphylo documentation built on Oct. 28, 2023, 7:22 a.m.