read_biotyper_report: Importing Bruker MALDI Biotyper CSV report
In maldipickr: Dereplicate and Cherry-Pick Mass Spectrometry Spectra

read_biotyper_report

R Documentation

Importing Bruker MALDI Biotyper CSV report

Description

The header-less table exported by the Compass software in the Bruker MALDI Biotyper device is separated by semi-colons and has empty columns which prevent an easy import in R. This function reads the report correctly as a tibble.

Usage

read_biotyper_report(path, best_hits = TRUE, long_format = TRUE)

Arguments

`path`	Path to the semi-colon separated table
`best_hits`	A logical indicating whether to return only the best hits for each target analyzed
`long_format`	A logical indicating whether the table is in the long format (many rows) or wide format (many columns) when showing all the hits. This option has no effect when `best_hits = TRUE`.

Details

The header-less table contains identification information for each target processed by the Biotyper device and once processed by the read_biotyper_report, the following seven columns are available in the tibble, when using the best_hits = TRUE option:

name: a character indicating the name of the spot of the MALDI target (i.e., plate)
sample_name: the character string provided during the preparation of the MALDI target (i.e., plate)
hit_rank: an integer indicating the rank of the hit for the corresponding target and identification
bruker_quality: a character encoding the quality of the identification with potentially multiple "+" symbol or only one "-"
bruker_species: the species name associated with the MALDI spectrum analyzed.
bruker_taxid: the NCBI Taxonomy Identifier of the species name in the column species
bruker_hash: a hash from an undocumented checksum function probably to encode the database entry.
bruker_log: the log-score of the identification.

When all hits are returned (with best_hits = FALSE), the default output format is the long format (long_format = TRUE), meaning that the previous columns remain unchanged, but all hits are now returned, thus increasing the number of rows.

When all hits are returned (with best_hits = FALSE) using the wide format (⁠long_format = FALSE), the two columns ⁠nameandsample_name⁠remains unchanged, but the five columns prefixed by⁠bruker_' contain the hit rank, creating a tibble of 52 columns:

bruker_01_quality
bruker_01_species
bruker_01_taxid
bruker_01_hash
bruker_01_log
bruker_02_quality
...
bruker_10_species
bruker_10_taxid
bruker_10_hash
bruker_10_log

Value

A tibble of 7 columns (best_hits = TRUE) or 52 columns (best_hits = FALSE). See Details for the description of the columns.

Note

A report that contains only spectra with no peaks found will return a tibble of 0 rows and a warning message.

Examples

# Get a example Bruker report
biotyper <- system.file("biotyper.csv", package = "maldipickr")
# Import the report as a tibble
report_tibble <- read_biotyper_report(biotyper)
# Display the tibble
report_tibble

maldipickr documentation built on Sept. 13, 2024, 1:12 a.m.