Home

/

GitHub

/

BPerezLamarque/HOME

/

README.md

README.md
In BPerezLamarque/HOME: Phylogenetic Models of HOst-Microbiota Evolution

HOME Tutorial - HOst-Microbiota Evolution

This document indicates how to use our model of HOst-Microbiota Evolution.

The first part presents how to simulate mock microbiota.

The second part explains how to perform an empirical application; it is illustrated with the great apes microbiota data.

The last part shows how to interpret the HTML outputs of HOME.

Citation: Perez‐Lamarque, B., & Morlon, H. (2019). Characterizing symbiont inheritance during host‐microbiota evolution: application to the great apes gut microbiota. Molecular Ecology Resources 19:1659-1671.

Contact: Benoît Perez-Lamarque, benoit.perez.lamarque@gmail.com

Installation\ Performing Simulations\ Running Empirical application:\ * Creating microbial alignments for each OTU\ * Example of empirical applications - great apes microbiota\ * Running HOME on the empirical application\ Interpretation of the Results:\ * Example 1: Results from a simulation with horizontal transmission\ * Example 2: Results from a simulation with strict vertical transmission\ * Example 3: Results from a simulation with environmental acquisition

Installation:

Our model is part of the R-package HOME and can be installed from GitHub using devtools:

library(devtools)
install_github("BPerezLamarque/HOME", dependencies = TRUE)

Alternatively, our model is also part on the R package RPANDA (Morlon et al., 2016) available from GitHub.

library(devtools)
install_github("hmorlon/PANDA",ref="Benoit", dependencies = TRUE)

Performing Simulations:

You can provide a host tree (e.g. an empirical tree) and simulate the evolution of a mock microbiota on it. Your tree must be binary, rooted and ultrametric. You can either directly provide a host tree in the function sim_microbiota, or you can have the host tree saved in your working directory (thus, its filename must be well-formated host_tree_"name".tre in a Newick format).

If you don't provide a host tree, the function sim_microbiota will randomly simulate a host tree (with a pure-birth process) according to the number of tips (i.e. number of host species) you have provided.


setwd("/my_working_directory/") # working directory where all the files and folders will be created. 


# name of your simulation
name <- "simulation_tree_1" 

# name of the different microbial OTU of your simulations (hereafter you will model the evolution of 6 OTUs on your host tree)
name_index <-  c("OTU1","OTU2","OTU3","OTU4","OTU5","OTU6")  

# simulated scenarii for each OTU: 
simul <- c(0,1,3,5,"indep","indep")
# each simulated OTU has its own evolutionary history:
# i) "0" simulates strict vertical transmission (i.e. O host-switch)
# ii) any positive integer simulates horizontal transmissions (with this given number of host-switches)
# iii) "indep" simulates environmental acquisition (i.e. independent evolutions - the simulated OTU evolved on an independent simulated tree) 


# Load the host tree to simulate the evolution of the OTUs on it 
host_tree <- read.tree("my_tree.tre")

# Alternatively, if you don't provide a host tree, you must add the number of tips of the simulated host tree
# n <- 20  


# proportion of variable nucleotides in the alignment
proportion_variant <- 0.1 

# simulated substitution rate 
simulated_mu <- 1  # simulated_mu=1 corresponds to on average 1 mutation per variable nucleotide

# total number of nucleotides in the alignment 
N <- 300  


# number of cores to run the analyses
nb_cores <- 1 # if you don't run in a multi-cores machine, the default value is 1

# seed used for simulations 
seed <- 1


sim_microbiota(name=name, name_index=name_index, simul=simul, 
provided_tree=host_tree, mu=simulated_mu, N=N, proportion_variant=proportion_variant, 
seed=seed, nb_cores=nb_cores)

Then, you can proceed to the parameters estimation by running HOME on the simulated host-microbiota data.


#  possible numbers of host-switches to test during the inference
lambda <- c(1:25) 

# number of trees (for Monte Carlo estimation of the number of switches)
nb_tree <- 10000 

# number of randomizations in the model selection testing independent evolutions (R parameter)
nb_random <- 10

# rarefactions on the number of trees
raref <- FALSE # if TRUE rarefactions on the number of trees will be performed


HOME_model(name=name, name_index=name_index, provided_tree=host_tree, nb_tree=nb_tree, lambda=lambda, 
empirical=FALSE, raref=raref, nb_random=nb_random, seed=seed, nb_cores=nb_cores)

Running Empirical applications:

In order to run HOME, you need first to make the microbial alignments for each OTU of your empirical microbiota. The first step consists in the usual clustering of the raw reads into OTUs. Thus, the second step makes the OTU alignments for running HOME on them (using our own bash script: for each core OTU, it picks the most abundant sequence for every host and align them). Different pipelines can be used to obtain these microbial alignments for each OTU.

At the end of these two steps, all OTU alignments must be stored in a specific folder (i.e. the working directory where you will run HOME) with the filenames formatted as alignment_"name"_"OTUXXX".fas, where "name" will be the name our your HOME run ("name"), and "OTUXXX" is the name of the specific OTU ("name_index").

Examples of bash pipelines to prepare alignments before running HOME are available in the following.

This pipeline clusters OTU given a fixed similarity threshold (97%, or any other value) by combining QIIME1 and UPARSE thanks to the BMP pipelines. The bash pipelines used for studying the microbiota of great apes (Perez-Lamarque & Morlon, 2019) is available here. Although, this pipeline has been designed for 454 sequencing technology, it can also be used on Illumina datasets.

This pipeline clusters reads into SWARM OTUs (Mahé et al., 2014; Mahé et al., 2015): OTU are defined without chosing an global clustering threshold. The step 1 of this pipeline is a direct application of Frédéric Mahé's metabarcoding pipeline using VSEARCH and SWARM. This pipeline is specifically designed for Illumina datasets and is available here.

This pipeline clusters reads into OTUs given a fixed similarity threshold (97%, or any other value). The step 1 of this pipeline is derived from Frédéric Mahé's metabarcoding pipeline using VSEARCH. This pipeline is specifically designed for Illumina datasets and is available here.

A one-to-one correspondance between host tips in the tree and the fasta header of each OTU alignment is mandatory. If multiple sequences correspond to a unique host, there are several solutions : (1) you can only keep one sequence (the most abundant, and assumes that the other sequences are likely to come from PCR/sequencing errors); but if you really want to keep these sequences in the analyses, you have to add new tips in the host tree: (2) either by adding new tips with zero branch lengths to match the number of sequences (similar to what we did in Perez-Lamarque & Morlon (2019) ; see here for more details and a script to add new tips with branch lengths close to zero ), or (3) alternatively by adding a coalescent tree shape at the tip of each species (to model population differentiation for each species that gives the multiple sequences per host).

Let's run HOME for 3 OTUs from the great apes microbiota. First, your working directory must contain the host tree saved with a filename host_tree_"name".tre (Newick format), and all the OTU alignments with the filenames alignment_"name"_"OTUXXX".fas (FASTA format). You can also directly provide the host tree in the function HOME_model.

For instance, in this empirical application (name="great_apes"), you must provide:

/my_working_directory/host_tree_great_apes.tre
/my_working_directory/alignment_great_apes_OTU0001.fas
/my_working_directory/alignment_great_apes_OTU0002.fas
/my_working_directory/alignment_great_apes_OTU0003.fas

You can directly download this example from RPANDA:


setwd("/my_working_directory/") # working directory where all the files and folders will be created. 


example_great_apes_microbiota(name="great_apes")


# name of the empirical application 
name <- "great_apes" 

# name of the different OTUs
name_OTU <-  c("OTU0001","OTU0002","OTU0003")


# possible numbers of host-switches to test during the inference
lambda <- c(1:25) 

# number of trees (for Monte Carlo estimation of the number of switches)
nb_tree <- 10000 

# number of randomizations in the model selection testing independent evolutions 
nb_random <- 10

# rarefactions on the number of trees
raref <- FALSE # if TRUE rarefactions on the number of trees are performed

# number of cores to run the analyses
nb_cores <- 1 # if you don't run in a multi-cores machine, the default value is 1

# seed used for simulations 
seed <- 1


provided_tree <- read.tree(paste0("host_tree_",name,".tre"))


HOME_model(name=name, name_index=name_OTU, provided_tree=provided_tree, nb_tree=nb_tree, lambda=lambda, 
empirical=TRUE, raref=raref, nb_random=nb_random,  seed=seed, nb_cores=nb_cores)

Interpretation of the Results:

The results of each HOME run are available in the folder "/my_working_directory/figures/". For each OTU, a HTML file summarizes all the results with tables and figures. Here, we provide 3 examples of results interpretations.

Example 1: Results from a simulated OTU with horizontal transmission

Parameters | Values ------------- | ------------- Number of symbiont-host association:| 20 Simulated substitution rate:| 1 Number of simulated switches:| 3 Seed for simulations:| 30 Sequences length: |300 Probability segregated sites:| 0.1 Number of unvaried sites:| 281 Number of strictly variant sites:| 19

Parameters | Values ------------- | ------------- Inferred substitution model:| K80 Transition/Transversion ratio:| 0.6558 Estimated substitution rate:| 0.8519 Estimated number of switches:| 3 Associated likelihood:| 137.151 p-value: Strict vertical transmission:| 6e-04 p-value: Independent evolutions (nb switches):| 0 p-value: Independent evolutions (subs. rate):| 0

The most likely number of host-switches:

ksi | mu | -log(Likelihood) ------------- | ------------- | ------------- 3 | 0.8519 | 137.151

Minus log likelihood as a function of the number of host-switches (the minimum of this plot corresponds to the estimated number of switches):

Estimated substitution rate as a function of the number of switches:

Likelihood ratio test testing the model of strict vertical transmission (ksi=0). Strict vertical transmission is rejected if p-value < 0.05.

Results of the likelihood ratio test. The grey curve correponds to the Chi2 distribution with df=1. The dark blue line (resp. light) stands for the 0.05 (resp. 0.01) p-value threshold and the dashed orange line is the observed LRT ratio.

Model selection on independent evolutions.

Test | p-values ------------- | ------------- Empirical ranking (ksi distribution)| 0 Empirical ranking (mu distribution)| 0

Representation of the estimated numbers of switches and the estimated substitution rates for the randomized alignments (in blue) and the empirical alignment (in orange). Independent evolutions can be rejected if the orange dot stands alone in the bottom left corner (i.e. rejected if p-values < 0.05).

Estimated rate matrix:

Rates | A |C |G |T ------------- | -------------| -------------| -------------| ------------- A | -1 | 0.17 | 0.66 | 0.17 | C | 0.17 | -1 | 0.17 |0.66 | G | 0.66 | 0.17 | -1 | 0.17 | T | 0.17 | 0.66 | 0.17 | -1 |

Nucleotide frequencies:

A |C |G |T ------------- | -------------| -------------| ------------- 0.25 | 0.25 | 0.25 | 0.25

Simulated switch(es): | 1 |2 |3 ------------- | -------------| -------------| ------------- Branch origin | 1 |36| 35 Branch arrival |35| 2 |28 Absolute position| 0.022 |0.089 |0.13

Host tree:

Example 2: Results from a simulated OTU with strict vertical transmission

Parameters | Values ------------- | ------------- Number of symbiont-host association:| 20 Simulated substitution rate:| 1 Number of simulated switches:| 0 Seed for simulations:| 30 Sequences length: |300 Probability segregated sites:| 0.1 Number of unvaried sites:| 279 Number of strictly variant sites:| 21

Parameters | Values ------------- | ------------- Inferred substitution model:| K80 Transition/Transversion ratio:| 0.70 Estimated substitution rate:| 0.70 Estimated number of switches:| 0 Associated likelihood:| 149.8 p-value: Strict vertical transmission:| 1 p-value: Independent evolutions (nb switches):| 0 p-value: Independent evolutions (subs. rate):| 0

CONCLUSION: STRICT VERTICAL TRANSMISSION.

The most likely number of host-switches:

ksi | mu | -log(Likelihood) ------------- | ------------- | ------------- 3 | 0.70 | 149.8

Minus log likelihood as a function of the number of switches (the minimum of this plot corresponds to the estimated number of switches):

Estimated substitution rate as a function of the number of switches:

Likelihood ratio test testing the model of strict vertical transmission (ksi=0). Strict vertical transmission is rejected if p-value < 0.05.

Results of the likelihood ratio test. The grey curve correponds to the Chi2 distribution with df=1. The dark blue line (resp. light) stands for the 0.05 (resp. 0.01) p-value threshold and the dashed orange line is the observed LRT ratio.

Model selection on independent evolutions.

Test | p-values ------------- | ------------- Empirical ranking (ksi distribution)| 0 Empirical ranking (mu distribution)| 0

Representation of the estimated numbers of switches and the estimated substitution rates for the randomized alignments (in blue) and the empirical alignment (in orange). Independent evolutions can be rejected if the orange dot stands alone in the bottom left corner (i.e. rejected if p-values < 0.05).

Estimated rate matrix:

Rates | A |C |G |T ------------- | -------------| -------------| -------------| ------------- A | -1 | 0.15 | 0.7 | 0.15 | C | 0.15 | -1 | 0.15 |0.7 | G | 0.7 | 0.15 | -1 | 0.15 | T | 0.15 | 0.7 | 0.15 | -1 |

Nucleotide frequencies:

A |C |G |T ------------- | -------------| -------------| ------------- 0.25 | 0.25 | 0.25 | 0.25

Example 3: Results from a simulated OTU with environnemental acquisition (independent evolution)

Parameters | Values ------------- | ------------- Number of symbiont-host association:| 20 Simulated substitution rate:| 1 Number of simulated switches:| independent Seed for simulations:| 30 Sequences length: |300 Probability segregated sites:| 0.1 Number of unvaried sites:| 278 Number of strictly variant sites:| 22

Parameters | Values ------------- | ------------- Inferred substitution model:| K80 Transition/Transversion ratio:| 0.54 Estimated substitution rate:| 3.3793 Estimated number of switches:| 25 Associated likelihood:| 270.3 p-value: Strict vertical transmission:| 0 p-value: Independent evolutions (nb switches):| 0.56 p-value: Independent evolutions (subs. rate):| 0.89

CONCLUSION:INDEPENDENT EVOLUTIONS (BASED ON THE NUMBER OF SWITCHES AND THE SUBSTITUTION RATE).

The most likely number of host-switches:

ksi | mu | -log(Likelihood) ------------- | ------------- | ------------- 25 | 3.38 | 270.3

Minus log likelihood as a function of the number of switches (the minimum of this plot corresponds to the estimated number of switches):

Estimated substitution rate as a function of the number of switches:

Likelihood ratio test testing the model of strict vertical transmission (ksi=0). Strict vertical transmission is rejected if p-value < 0.05.

Results of the likelihood ratio test. The grey curve correponds to the Chi2 distribution with df=1. The dark blue line (resp. light) stands for the 0.05 (resp. 0.01) p-value threshold and the dashed orange line is the observed LRT ratio.

Model selection on independent evolutions.

Test | p-values ------------- | ------------- Empirical ranking (ksi distribution)| 0.56 Empirical ranking (mu distribution)| 0.88

Representation of the estimated numbers of switches and the estimated substitution rates for the randomized alignments (in blue) and the empirical alignment (in orange). Independent evolutions can be rejected if the orange dot stands alone in the bottom left corner (i.e. rejected if p-values < 0.05).

Estimated rate matrix:

Rates | A |C |G |T ------------- | -------------| -------------| -------------| ------------- A | -1 | 0.23 | 0.54 | 0.23 | C | 0.23 | -1 | 0.23 |0.54 | G | 0.54 | 0.23 | -1 | 0.23 | T | 0.23 | 0.54 | 0.23 | -1 |

Nucleotide frequencies:

A |C |G |T ------------- | -------------| -------------| ------------- 0.25 | 0.25 | 0.25 | 0.25

BPerezLamarque/HOME documentation built on May 17, 2023, 7:02 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

BPerezLamarque/HOME Phylogenetic Models of HOst-Microbiota Evolution

README.md In BPerezLamarque/HOME: Phylogenetic Models of HOst-Microbiota Evolution

HOME Tutorial - HOst-Microbiota Evolution

Contents:

Installation:

Performing Simulations:

Parameters for simulations

Simulation of a microbiota

Running Empirical applications:

Creating alignments for each OTU

Pipeline 1: OTU clustering using QIIME1 and UPARSE (old)

Pipeline 2: OTU clustering using VSEARCH (SWARM OTUs) (new)

Pipeline 3: OTU clustering using VSEARCH (OTUs at 97%) (new)

Note: one-to-one correspondance

Example of empirical applications - great apes microbiota:

Running HOME on the empirical application

Parameters of the empirical application

HOME inference

Interpretation of the Results:

Example 1: Results from a simulated OTU with horizontal transmission

Description of the simulated data

Summary of the inferred scenario:

Host-switches inference:

Strict vertical transmission model:

Host-symbiont independent evolutions:

Estimated substitution model

Simulated switches:

Example 2: Results from a simulated OTU with strict vertical transmission

Description of the simulated data

Summary of the inferred scenario:

Host-switches inference:

Strict vertical transmission model:

Host-symbiont independent evolutions:

Estimated substitution model

Example 3: Results from a simulated OTU with environnemental acquisition (independent evolution)

Description of the simulated data

Summary of the inferred scenario:

Host-switches inference:

Strict vertical transmission model:

Host-symbiont independent evolutions:

Estimated substitution model

R Package Documentation

Browse R Packages

We want your feedback!

BPerezLamarque/HOME
Phylogenetic Models of HOst-Microbiota Evolution

README.md
In BPerezLamarque/HOME: Phylogenetic Models of HOst-Microbiota Evolution