mo_matching_score: Calculate the Matching Score for Microorganisms
In AMR: Antimicrobial Resistance Data Analysis

mo_matching_score

R Documentation

Calculate the Matching Score for Microorganisms

Description

This algorithm is used by as.mo() and all the mo_* functions to determine the most probable match of taxonomic records based on user input.

Usage

mo_matching_score(x, n)

Arguments

`x`	Any user input value(s).
`n`	A full taxonomic name, that exists in `microorganisms$fullname`.

Matching Score for Microorganisms

With ambiguous user input in as.mo() and all the mo_* functions, the returned results are chosen based on their matching score using mo_matching_score(). This matching score m, is calculated as:

m_{(x, n)} = \frac{l_{n} - 0.5 \cdot \min \begin{cases}l_{n} \\ \textrm{lev}(x, n)\end{cases}}{l_{n} \cdot p_{n} \cdot k_{n}}

where:

x is the user input;
n is a taxonomic name (genus, species, and subspecies);
l_n is the length of n;
lev is the Levenshtein distance function (counting any insertion as 1, and any deletion or substitution as 2) that is needed to change x into n;
p_n is the human pathogenic prevalence group of n, as described below;
k_n is the taxonomic kingdom of n, set as Bacteria = 1, Fungi = 1.25, Protozoa = 1.5, Chromista = 1.75, Archaea = 2, others = 3.

The grouping into human pathogenic prevalence p is based on recent work from Bartlett et al. (2022, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1099/mic.0.001269")}) who extensively studied medical-scientific literature to categorise all bacterial species into these groups:

Established, if a taxonomic species has infected at least three persons in three or more references. These records have prevalence = 1.15 in the microorganisms data set;
Putative, if a taxonomic species has fewer than three known cases. These records have prevalence = 1.25 in the microorganisms data set.

Furthermore,

Genera from the World Health Organization's (WHO) Priority Pathogen List have prevalence = 1.0 in the microorganisms data set;
Any genus present in the established list also has prevalence = 1.15 in the microorganisms data set;
Any other genus present in the putative list has prevalence = 1.25 in the microorganisms data set;
Any other species or subspecies of which the genus is present in the two aforementioned groups, has prevalence = 1.5 in the microorganisms data set;
Any non-bacterial genus, species or subspecies of which the genus is present in the following list, has prevalence = 1.25 in the microorganisms data set: Absidia, Acanthamoeba, Acremonium, Actinomucor, Aedes, Alternaria, Amoeba, Ancylostoma, Angiostrongylus, Anisakis, Anopheles, Apophysomyces, Arthroderma, Aspergillus, Aureobasidium, Basidiobolus, Beauveria, Bipolaris, Blastobotrys, Blastocystis, Blastomyces, Candida, Capillaria, Chaetomium, Chilomastix, Chrysonilia, Chrysosporium, Cladophialophora, Cladosporium, Clavispora, Coccidioides, Cokeromyces, Conidiobolus, Coniochaeta, Contracaecum, Cordylobia, Cryptococcus, Cryptosporidium, Cunninghamella, Curvularia, Cyberlindnera, Debaryozyma, Demodex, Dermatobia, Dientamoeba, Diphyllobothrium, Dirofilaria, Echinostoma, Entamoeba, Enterobius, Epidermophyton, Exidia, Exophiala, Exserohilum, Fasciola, Fonsecaea, Fusarium, Geotrichum, Giardia, Graphium, Haloarcula, Halobacterium, Halococcus, Hansenula, Hendersonula, Heterophyes, Histomonas, Histoplasma, Hortaea, Hymenolepis, Hypomyces, Hysterothylacium, Kloeckera, Kluyveromyces, Kodamaea, Lacazia, Leishmania, Lichtheimia, Lodderomyces, Lomentospora, Madurella, Malassezia, Malbranchea, Metagonimus, Meyerozyma, Microsporidium, Microsporum, Millerozyma, Mortierella, Mucor, Mycocentrospora, Nannizzia, Necator, Nectria, Ochroconis, Oesophagostomum, Oidiodendron, Opisthorchis, Paecilomyces, Paracoccidioides, Pediculus, Penicillium, Phaeoacremonium, Phaeomoniella, Phialophora, Phlebotomus, Phoma, Pichia, Piedraia, Pithomyces, Pityrosporum, Pneumocystis, Pseudallescheria, Pseudoscopulariopsis, Pseudoterranova, Pulex, Purpureocillium, Quambalaria, Rhinocladiella, Rhizomucor, Rhizopus, Rhodotorula, Saccharomyces, Saksenaea, Saprochaete, Sarcoptes, Scedosporium, Schistosoma, Schizosaccharomyces, Scolecobasidium, Scopulariopsis, Scytalidium, Spirometra, Sporobolomyces, Sporopachydermia, Sporothrix, Sporotrichum, Stachybotrys, Strongyloides, Syncephalastrum, Syngamus, Taenia, Talaromyces, Teleomorph, Toxocara, Trichinella, Trichobilharzia, Trichoderma, Trichomonas, Trichophyton, Trichosporon, Trichostrongylus, Trichuris, Tritirachium, Trombicula, Trypanosoma, Tunga, Ulocladium, Ustilago, Verticillium, Wallemia, Wangiella, Wickerhamomyces, Wuchereria, Yarrowia, or Zygosaccharomyces;
All other records have prevalence = 2.0 in the microorganisms data set.

When calculating the matching score, all characters in x and n are ignored that are other than A-Z, a-z, 0-9, spaces and parentheses.

All matches are sorted descending on their matching score and for all user input values, the top match will be returned. This will lead to the effect that e.g., "E. coli" will return the microbial ID of Escherichia coli (m = 0.688, a highly prevalent microorganism found in humans) and not Entamoeba coli (m = 0.381, a less prevalent microorganism in humans), although the latter would alphabetically come first.

Download Our Reference Data

All reference data sets in the AMR package - including information on microorganisms, antimicrobials, and clinical breakpoints - are freely available for download in multiple formats: R, MS Excel, Apache Feather, Apache Parquet, SPSS, and Stata.

For maximum compatibility, we also provide machine-readable, tab-separated plain text files suitable for use in any software, including laboratory information systems.

Visit our website for direct download links, or explore the actual files in our GitHub repository.

Note

This algorithm was originally developed in 2018 and subsequently described in: Berends MS et al. (2022). AMR: An R Package for Working with Antimicrobial Resistance Data. Journal of Statistical Software, 104(3), 1-31; \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v104.i03")}.

Later, the work of Bartlett A et al. about bacterial pathogens infecting humans (2022, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1099/mic.0.001269")}) was incorporated, and optimalisations to the algorithm were made.

Examples

mo_reset_session()

as.mo("E. coli")
mo_uncertainties()

mo_matching_score(
  x = "E. coli",
  n = c("Escherichia coli", "Entamoeba coli")
)

AMR documentation built on June 8, 2025, 10:17 a.m.

AMR index

README.md AMR for Python Welcome to the `AMR` package

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

AMR
Antimicrobial Resistance Data Analysis

mo_matching_score: Calculate the Matching Score for Microorganisms
In AMR: Antimicrobial Resistance Data Analysis

Calculate the Matching Score for Microorganisms

Description

Usage

Arguments

Matching Score for Microorganisms

Download Our Reference Data

Note

Examples

Related to mo_matching_score in AMR...

R Package Documentation

Browse R Packages

We want your feedback!

AMR Antimicrobial Resistance Data Analysis

mo_matching_score: Calculate the Matching Score for Microorganisms In AMR: Antimicrobial Resistance Data Analysis

Calculate the Matching Score for Microorganisms

Description

Usage

Arguments

Matching Score for Microorganisms

Download Our Reference Data

Note

Examples

Related to mo_matching_score in AMR...

R Package Documentation

Browse R Packages

We want your feedback!

AMR
Antimicrobial Resistance Data Analysis

mo_matching_score: Calculate the Matching Score for Microorganisms
In AMR: Antimicrobial Resistance Data Analysis