read.vhica: Reads divergence and codon usage data files for the VHICA...

read.vhicaR Documentation

Reads divergence and codon usage data files for the VHICA method.

Description

The VHICA method relies on two sources of information: (i) the divergence between sequences, and (ii) the codon usage bias. This function reads two data files and creates an object of class vhica that can be further explored by plot.vhica and image.vhica. Input can be either (1) two vectors of fasta file names (one for the genes, one for the putatively transfered genes), or (2) already processed files containing codon usage bias and divergence data (see Details).

Usage

read.vhica(gene.fasta=NULL, target.fasta=NULL, 
	cb.filename=NULL, div.filename=NULL, 
	reference = "Gene", divergence = "dS", 
	CUB.method="ENC", div.method="LWL85", div.pairwise=TRUE, 
	div.max.lim=3, species.sep="_", gene.sep=".", family.sep=".", ...)

Arguments

gene.fasta

Sequence files (FASTA format) containing the aligned sequences (respecting the translation phase) for all species of the reference genes.

target.fasta

Sequence files (FASTA format) containing the aligned sequence of the putatively transfered genes.

cb.filename

File name for the codon usage bias data. If FASTA files are provided, this file will be created.

div.filename

File name for the divergence data. If FASTA files are provided, this file will be created.

reference

Name of the reference type in the codon usage file. Default is "Gene".

divergence

Name of the divergence column in the divergence file. Default is "dS".

CUB.method

Method to be used for Codon Usage Bias calculation (see CUB).

div.method

Method to be used for divergence calculation (see div).

div.pairwise

Whether divergence should be calculated from the whole alignment of between pairs of sequences (see div).

div.max.lim

Maximum divergence score. Estimated divergence much larger than 100% are likely to be problematic and should not be considered.

species.sep

Separator for species (or equivalent) labels in sequence names. Any character string following this separator will be disregarded – be careful about potential duplicates.

gene.sep

Separator for gene names from gene sequence files.

family.sep

Separator for target sequence sub-families.

...

Further parameters for the internal function .reference.regression.

Details

Details about CUB and divergence calculations can be found in CUB and div. If CUB and/or divergence need to be calculated by an external program, it is possible to provide them in the following format:

  • Codon usage bias Example of data file:

            Type    sp1     sp2     sp3
    CG4231  Gene    42.3    51.1    47.2
    CG2214  Gene    47.2    44.9    53.2
    Pelem1  TE      36.2    47.0    44.4
    ...
    • Row names (or first column)sequence index

    • Type whether the sequence is a reference (default: Gene) or a focal sequence (transposable element, ...)

    • Following columns a measurement of codon bias (ENC, CBI...) for every species

  • Divergence Example of data file:

    seq     dS      sp1     sp2
    CG4231  0.84    Dmel    Dsim
    CG4231  0.46    Dmel    Dana
    CG4231  0.58    Dsim    Dana
    CG2214  0.10    Dmel    Dsim
    ...
    • First column (or row names): sequence index

    • Second column: divergence measurement

    • Columns 3 and 4: the pair of species on which the divergence is calculated

    • Row names and Col names are allowed but disregarded

Value

The function returns an object of class vhica, a list containing:

  • cbias: A codon bias array

  • div: The divergence matrix

  • reg: The result of all pairwise regressions

  • reference: The reference option

  • target: The sequence type that is not the reference

  • divergence: The divergence option

  • family.sep: The character used to indicate TE sub-families

Author(s)

Implementation: Arnaud Le Rouzic
Scientists who designed the method: Gabriel Wallau, Aurelie Hua-Van, Arnaud Le Rouzic.

References

Gabriel Luz Wallau, Arnaud Le Rouzic, Pierre Capy, Elgion Loreto, Aurelie Hua-Van. VHICA: A new method to discriminate between vertical and horizontal transposon transfer: application to the mariner family within Drosophila. Molecular biology and evolution 33 (4), 1094-1109.

See Also

plot.vhica, image.vhica, CUB, div

Examples

file.cb <- system.file("extdata", "mini-cbias.txt", package="vhica")
file.div <- system.file("extdata", "mini-div.txt", package="vhica")
file.tree <- if(require("ape")) system.file("extdata", "phylo.nwk", package="vhica") else NULL
vc <- read.vhica(cb.filename=file.cb, div.filename=file.div)
plot(vc, "dere", "dana")
image(vc, "mellifera:6", treefile=file.tree, skip.void=TRUE)

vhica documentation built on March 31, 2023, 10:09 p.m.