s_index: S index (Strength of Selected Codon Usage)

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/s_index.R

Description

The function sscu calculates the S index (strength of selected codon usage bias) for bacteria species based on Paul Sharp's method. The method take into account of background mutation rate, and focus only on codons with universal translational advantages in all bacterial species. Thus the sscu index can be used to quantify the strength of translational selection and is comparable among different species.

Usage

1
s_index(high_cds_file = NULL, genomic_cds_file = NULL, gc3 = NULL)

Arguments

high_cds_file

a character vector for the filepath of the highly expressed genes

genomic_cds_file

a character vector for the filepath of the whole genome cds file

gc3

a numeric vector with gc3 value, eg, 0.5

Details

The function calculates the S index (strength of selected codon usage bias) for bacteria species based on Paul Sharp's method.The method take into account of background mutation rate (in the program, two arguments genomic_cds_file and gc3, are input to calculate mutation), and focus only on codons with universal translational advantages in all bacterial species (in the program, one argument high_cds_file, is input to calculate these codons). Thus the s index can be used to quantify the strength of translational selection and is comparable among different species.

The argument high_cds_file much be specified with the input filepath for the highly expressed genes. The file should be a multifasta file contains 40 highly, including elongation factor Tu, Ts, G, 50S ribosomal protein L1 to L6, L9 to L20, 30S ribosomal protein S2 to S20. This file can be generated by either directly extract these DNA sequence from genbank file, or parse by blast program. For the four amino acids (Phy, Tyr, Ile and Asn), the C-ending codons are always preferred than the U-ending codons. Thus, only these four codons were taken into account in the analyses.

The two arguments, genomic_cds_file or gc3, is used to calculate the genomic mutation rate, and one of them must be specified. The genomic_cds_file should be a multifasta file contains all the coding sequences in the genome, and the function use it to calculate the genomic gc3 and mutation rate. If the gc3 value for the genome is known already, you can specify it in the argument gc3. If both the genomic_cds_file and gc3 arguments are specified, the function will use the genomic_cds_file to calculate mutation rate, and neglect the gc3 argument.

Value

a numeric vector s-index is returned

Author(s)

Yu Sun

References

Sharp PM, Bailes E, Grocock RJ, Peden JF, Sockett RE (2005). Variation in the strength of selected codon usage bias among bacteria. Nucleic Acids Research.

See Also

uco in seqinr library for rscu calculation

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# ----------------------------------------------- #
#     Lactobacillus kunkeei example               #
# ----------------------------------------------- #

  # Here is an example to load the data included in the sscu package
  # input the two multifasta files to calculate sscu 
  s_index(high_cds_file=system.file("sequences/L_kunkeei_highly.ffn",package="sscu"),genomic_cds_file=system.file("sequences/L_kunkeei_genome_cds.ffn",package="sscu"))

  # alternatively, input one multifasta file and gc3 content to calculate sscu 
  s_index(high_cds_file=system.file("sequences/L_kunkeei_highly.ffn",package="sscu"),gc3=0.76)

  # if you want to load your own data, you just specify the file path for your input as these examples
  # s_index(high_cds_file="/home/yu/Data/codon_usage/bee_endosymbionts/sharp_40_highly_dataset/Bin2.ffn",genomic_cds_file="/home/yu/Data/codon_usage/bee_endosymbionts/cds_filtered/Bin2.ffn")
  # s_index(high_cds_file="/home/yu/Data/codon_usage/bee_endosymbionts/sharp_40_highly_dataset/Bin2.ffn",gc3=0.76)

sscu documentation built on Nov. 8, 2020, 5:48 p.m.