options(width = 750) knitr::opts_chunk$set( comment = "#>", error = FALSE, tidy = FALSE)
The dN/dS ratio quantifies the mode and strength of selection acting on a pair of orthologous genes. This selection pressure can be quantified by comparing synonymous substitution rates (dS) that are assumed to be neutral with nonsynonymous substitution rates (dN), which are exposed to selection as they change the amino acid composition of a protein (Mugal et al., 2013).
The orthologr
package provides a function named dNdS()
to perform dNdS estimation on pairs of orthologous genes.
The dNdS()
function takes the CDS files of two organisms of interest (query_file
and subject_file
)
and computes the dNdS estimation values for orthologous gene pairs between these organisms.
Note: the following dNdS estimation methods are based on KaKs_Calculator 2:
"NG": Nei, M. and Gojobori, T. (1986)
"LWL": Li, W.H., et al. (1985)
"LPB": Li, W.H. (1993) and Pamilo, P. and Bianchi, N.O. (1993)
"MLWL" (Modified LWL), MLPB (Modified LPB): Tzeng, Y.H., et al. (2004)
"YN": Yang, Z. and Nielsen, R. (2000)
"MYN" (Modified YN): Zhang, Z., et al. (2006)
"GMYN": Wang, D.P., et al. Biology Direct. (2009)
"GY": Goldman, N. and Yang, Z. (1994)
"MS": (Model Selection): based on a set of candidate models, Posada, D. (2003)
"MA" (Model Averaging): based on a set of candidate models, Posada, D. (2003)
"ALL": All models toghether
It is assumed that when you choose one of these dNdS estimation methods you have
KaKs_Calculator 2 installed on your machine and it can be executed from the
default execution PATH
.
The following pipeline resembles an example dNdS estimation procedure:
1) Orthology Inference: e.g. BLAST reciprocal best hit (RBH)
2) Pairwise sequence alignment: e.g. clustalw for pairwise amino acid sequence alignments
3) Codon Alignment: e.g. pal2nal program
4) dNdS estimation: e.g. Yang, Z. and Nielsen, R. (2000) (YN)
Note: it is assumed that when using dNdS()
all corresponding multiple sequence alignment programs you
want to use are already installed on your machine and are executable via either
the default execution PATH or you specifically define the location of the executable program
via the aa_aln_path
or blast_path
argument that can be passed to dNdS()
. See the Sequence Alignments vignette
for details.
The following example shall illustrate a dNdS estimation process.
library(orthologr) # get a dNdS table using: # 1) reciprocal best hit for orthology inference (RBH) # 2) Needleman-Wunsch for pairwise amino acid alignments # 3) pal2nal for codon alignments # 4) Comeron for dNdS estimation # 5) single core processing 'comp_cores = 1' dNdS(query_file = system.file('seqs/ortho_thal_cds.fasta', package = 'orthologr'), subject_file = system.file('seqs/ortho_lyra_cds.fasta', package = 'orthologr'), ortho_detection = "RBH", aa_aln_type = "pairwise", aa_aln_tool = "NW", codon_aln_tool = "pal2nal", dnds_est.method = "Comeron", comp_cores = 1)
A tibble: 20 x 15 query_id subject_id dN dS dNdS perc_identity alig_length <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> 1 AT1G010... 333554|PA... 0.106 0.254 0.420 74.0 469 2 AT1G010... 470181|PA... 0.0402 0.104 0.388 91.1 246 3 AT1G010... 470180|PA... 0.0150 0.126 0.118 95.5 359 4 AT1G010... 333551|PA... 0.0135 0.116 0.116 92.0 1970 5 AT1G010... 909874|PA... 0 0.175 0 100 213 6 AT1G010... 470177|PA... 0.0449 0.113 0.397 89.5 648 7 AT1G010... 918864|PA... 0.0183 0.106 0.173 95.1 366 8 AT1G010... 909871|PA... 0.0340 0.106 0.322 90.3 300 9 AT1G010... 470171|PA... 0.00910 0.218 0.0417 96.8 434 10 AT1G011... 333544|PA... 0.0325 0.122 0.266 93.6 528 11 AT1G011... 918858|PA... 0.00307 0.133 0.0232 99.2 529 12 AT1G011... 470161|PA... 0.00567 0.131 0.0432 98.5 453 13 AT1G011... 918855|PA... 0.13 0.203 0.641 72.6 285 14 AT1G011... 918854|PA... 0.105 0.280 0.373 84.9 179 15 AT1G011... 311317|PA... 0 0.306 0 85.6 97 16 AT1G011... 909860|PA... 0.0297 0.176 0.168 92.6 310 17 AT1G011... 311315|PA... 0.0287 0.162 0.177 94.2 533 18 AT1G012... 470156|PA... 0.0190 0.168 0.114 95.8 238 19 AT1G012... 311313|PA... 0.0207 0.154 0.134 95.3 107 20 AT1G012... 470155|PA... 0.0157 0.153 0.102 96.7 1056 ... with 8 more variables: mismatches <dbl>, gap_openings <dbl>, q_start <dbl>, q_end <dbl>, s_start <dbl>, s_end <dbl>, evalue <dbl>, bit_score <dbl>
Some outputs include NA
values. To filter for NA
values or a specific dnds.threshold
,
you can use the filter_dNdS()
function. The filter_dNdS()
function takes the output data.table
returned by dNdS()
and filters the output by the following criteria:
1) all dN values having an NA value are omitted
2) all dS values having an NA value are omitted
3) all dNdS values >= the specified dnds.threshold
are omitted
library(orthologr) # get dNdS estimated for orthologous genes between A. thaliana and A. lyrata Ath_Aly_dnds <- dNdS(query_file = system.file('seqs/ortho_thal_cds.fasta', package = 'orthologr'), subject_file = system.file('seqs/ortho_lyra_cds.fasta', package = 'orthologr'), ortho_detection = "RBH", aa_aln_type = "pairwise", aa_aln_tool = "NW", codon_aln_tool = "pal2nal", dnds_est.method = "Comeron", comp_cores = 1) # filter for: # 1) all dN values having an NA value are omitted # 2) all dS values having an NA value are omitted # 3) all dNdS values >= 2 are omitted filter_dNdS(Ath_Aly_dnds, dnds.threshold = 2)
Filtering out NA values in dN or dS and all values with dNdS > 2 ... Initial input contains 20 rows. Filtering done. New output table contains 20 rows. A tibble: 20 x 15 query_id subject_id dN dS dNdS perc_identity alig_length <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> 1 AT1G010... 333554|PA... 0.106 0.254 0.420 74.0 469 2 AT1G010... 470181|PA... 0.0402 0.104 0.388 91.1 246 3 AT1G010... 470180|PA... 0.0150 0.126 0.118 95.5 359 4 AT1G010... 333551|PA... 0.0135 0.116 0.116 92.0 1970 5 AT1G010... 909874|PA... 0 0.175 0 100 213 6 AT1G010... 470177|PA... 0.0449 0.113 0.397 89.5 648 7 AT1G010... 918864|PA... 0.0183 0.106 0.173 95.1 366 8 AT1G010... 909871|PA... 0.0340 0.106 0.322 90.3 300 9 AT1G010... 470171|PA... 0.00910 0.218 0.0417 96.8 434 10 AT1G011... 333544|PA... 0.0325 0.122 0.266 93.6 528 11 AT1G011... 918858|PA... 0.00307 0.133 0.0232 99.2 529 12 AT1G011... 470161|PA... 0.00567 0.131 0.0432 98.5 453 13 AT1G011... 918855|PA... 0.13 0.203 0.641 72.6 285 14 AT1G011... 918854|PA... 0.105 0.280 0.373 84.9 179 15 AT1G011... 311317|PA... 0 0.306 0 85.6 97 16 AT1G011... 909860|PA... 0.0297 0.176 0.168 92.6 310 17 AT1G011... 311315|PA... 0.0287 0.162 0.177 94.2 533 18 AT1G012... 470156|PA... 0.0190 0.168 0.114 95.8 238 19 AT1G012... 311313|PA... 0.0207 0.154 0.134 95.3 107 20 AT1G012... 470155|PA... 0.0157 0.153 0.102 96.7 1056 ... with 8 more variables: mismatches <dbl>, gap_openings <dbl>, q_start <dbl>, q_end <dbl>, s_start <dbl>, s_end <dbl>, evalue <dbl>, bit_score <dbl>
The dNdS()
function can be used choosing the following options:
ortho_detection
: RBH
(BLAST best reciprocal hit), BH
(BLAST best reciprocal hit), PO
(ProteinOrtho), and OrthoMCL
(OrthoMCL)aa_aln_type
: multiple
or pairwise
aa_aln_tool
: clustalw
, t_coffee
, muscle
, clustalo
, mafft
, and NW
(in case aa_aln_type = "pairwise"
)codon_aln_tool
: pal2nal
dnds_est.method
: Li
, Comeron
, NG
, LWL
, LPB
, MLWL
, YN
, and MYN
Please see ?dNdS
for details.
In case your BLAST program, or multiple alignment program can not be executed from the default execution PATH
you can specify the aa_aln_path
or blast_path
arguments.
library(orthologr) # using the `aa_aln_path` or `blast_path` arguments dNdS( query_file = system.file('seqs/ortho_thal_cds.fasta', package = 'orthologr'), subject_file = system.file('seqs/ortho_lyra_cds.fasta', package = 'orthologr'), ortho_detection = "RBH", blast_path = "here/path/to/blastp", aa_aln_type = "multiple", aa_aln_tool = "clustalw", aa_aln_path = "here/path/to/clustalw", codon_aln_tool = "pal2nal", dnds_est.method = "Comeron", comp_cores = 1, clean_folders = TRUE)
Additional arguments can be passed to dNdS()
. This allows you to use more advanced
options of several interface programs.
To pass additional parameters to the interface programs, you can use the
blast_params
and aa_aln_params
arguments. The aa_aln_params
argument assumes
that when you chose e.g. aa_aln_tool = "mafft"
you will pass the corresponding
additional parameters in MAFFT notation.
library(orthologr) # get dNdS estimated for orthologous genes between A. thaliana and A. lyrata # using additional parameters: # get a dNdS table using: # 1) reciprocal best hit for orthology inference (RBH) # 2) multiple amino acid alignments using MAFFT # 3) pal2nal for codon alignments # 4) Comeron (1995) for dNdS estimation # 5) single core processing 'comp_cores = 1' Ath_Aly_dnds <- dNdS( query_file = system.file('seqs/ortho_thal_cds.fasta', package = 'orthologr'), subject_file = system.file('seqs/ortho_lyra_cds.fasta', package = 'orthologr'), ortho_detection = "RBH", blast_params = "-matrix BLOSUM80", aa_aln_type = "multiple", aa_aln_tool = "mafft", aa_aln_params = "--maxiterate 1 --clustalout", dnds_est.method = "Comeron", comp_cores = 1, clean_folders = TRUE, quiet = TRUE ) # filter for: # 1) all dN values having an NA value are omitted # 2) all dS values having an NA value are omitted # 3) all dNdS values >= 0.1 are omitted filter_dNdS(Ath_Aly_dnds, dnds.threshold = 0.1)
query_id subject_id dN dS dNdS 1 AT1G01050.1 909874|PACid:16 0.000000 0.1750 0.00000 2 AT1G01090.1 470171|PACid:16 0.009843 0.2150 0.04579 3 AT1G01120.1 918858|PACid:16 0.003072 0.1326 0.02317 4 AT1G01140.3 470161|PACid:16 0.005672 0.1312 0.04324 5 AT1G01170.2 311317|PACid:16 0.008750 0.2827 0.03095 6 AT1G01220.1 470155|PACid:16 0.015210 0.1533 0.09919
Here blast_params
and aa_aln_params
take an character string specifying the parameters
that shall be passed to BLAST and MAFFT. The notation of these parameters must follow the
command line call of the stand alone versions of BLAST and MAFFT: e.g. blast_params = "blast_params = -matrix BLOSUM80"
and aa_aln_params = "--maxiterate 1 --clustalout"
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.