read_biotyper_report: Importing Bruker MALDI Biotyper CSV report

View source: R/read_biotyper_report.R

read_biotyper_reportR Documentation

Importing Bruker MALDI Biotyper CSV report

Description

The header-less table exported by the Compass software in the Bruker MALDI Biotyper device is separated by semi-colons and has empty columns which prevent an easy import in R. This function reads the report correctly as a tibble.

Usage

read_biotyper_report(path, best_hits = TRUE, long_format = TRUE)

Arguments

path

Path to the semi-colon separated table

best_hits

A logical indicating whether to return only the best hits for each target analyzed

long_format

A logical indicating whether the table is in the long format (many rows) or wide format (many columns) when showing all the hits. This option has no effect when best_hits = TRUE.

Details

The header-less table contains identification information for each target processed by the Biotyper device and once processed by the read_biotyper_report, the following seven columns are available in the tibble, when using the best_hits = TRUE option:

  • name: a character indicating the name of the spot of the MALDI target (i.e., plate)

  • sample_name: the character string provided during the preparation of the MALDI target (i.e., plate)

  • hit_rank: an integer indicating the rank of the hit for the corresponding target and identification

  • bruker_quality: a character encoding the quality of the identification with potentially multiple "+" symbol or only one "-"

  • bruker_species: the species name associated with the MALDI spectrum analyzed.

  • bruker_taxid: the NCBI Taxonomy Identifier of the species name in the column species

  • bruker_hash: a hash from an undocumented checksum function probably to encode the database entry.

  • bruker_log: the log-score of the identification.

When all hits are returned (with best_hits = FALSE), the default output format is the long format (long_format = TRUE), meaning that the previous columns remain unchanged, but all hits are now returned, thus increasing the number of rows.

When all hits are returned (with best_hits = FALSE) using the wide format (⁠long_format = FALSE), the two columns ⁠nameandsample_name⁠remains unchanged, but the five columns prefixed by⁠bruker_' contain the hit rank, creating a tibble of 52 columns:

  • bruker_01_quality

  • bruker_01_species

  • bruker_01_taxid

  • bruker_01_hash

  • bruker_01_log

  • bruker_02_quality

  • ...

  • bruker_10_species

  • bruker_10_taxid

  • bruker_10_hash

  • bruker_10_log

Value

A tibble of 7 columns (best_hits = TRUE) or 52 columns (best_hits = FALSE). See Details for the description of the columns.

Note

A report that contains only spectra with no peaks found will return a tibble of 0 rows and a warning message.

See Also

read_many_biotyper_reports

Examples

# Get a example Bruker report
biotyper <- system.file("biotyper.csv", package = "maldipickr")
# Import the report as a tibble
report_tibble <- read_biotyper_report(biotyper)
# Display the tibble
report_tibble

maldipickr documentation built on Sept. 13, 2024, 1:12 a.m.