In paleolimbot/nmea: Parse 'NMEA' Sentences

library(tidyverse)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

nmea

The goal of nmea is to read, check, and extract useful information from NMEA-formatted sentences. These frequently contain embedded null characters and garbage input as a result of medium-quality serial communication and can be difficult to extract useful information from.

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("paleolimbot/nmea")

Example

You probably want to start with read_nmea(), which extracts anything that looks like an NMEA sentence ($......\n with a maximum length of 82 characters) from a raw vector, file, or connection.

library(tidyverse)
library(nmea)

nmea_file <- system.file("extdata/depths.nmea", package = "nmea")
(sentences <- read_nmea(nmea_file))

You can inspect the nmea sentences using nmea_meta() and related functions:

nmea_meta(sentences$sentence)

If you'd like to know more about a particular sentence you can inspect nmea_fields, which was sourced from the fantastic pynmea2 Python package.

nmea_fields %>% 
  filter(message_type %in% nmea_message_type(sentences$sentence))

To extract field values as columns, use nmea_extract(). This uses the information in nmea_fields to make some reasonable conversions and assign column names to the output.

(parsed <- nmea_extract(sentences$sentence))

Some conversions require information from more than one field (notably, the very important datetime, longitude, and latitude). The nmea package provides utility functions for these common cases:

more_parsed <- parsed %>% 
  mutate(
    date_time = nmea_date_time(gprmc_datestamp, gprmc_timestamp),
    longitude = nmea_longitude(gprmc_lon, gprmc_lon_dir),
    latitude = nmea_latitude(gprmc_lat, gprmc_lat_dir),
  )

The order of sentences matters and there are many cases where garbage serial connections mangle the data such that it's difficult to partition the sentences into meaningful groups. I'd suggest liberal use of dplyr's grouped operations for this. As an example, here is how I would go about extracting meaningful output from the above data:

clean_ensemble <- more_parsed %>% 
  # each ensemble starts with a GPRMC sentence:
  # identify with cumsum()
  mutate(
    ensemble = cumsum(sentence_id == "GPRMC")
  ) %>% 
  # select the values we want: datetime, longitude, latitude,
  # and anything from the depth sounder
  select(
    sentence_id, checksum_valid, ensemble, 
    date_time, longitude, latitude,
    starts_with("sd")
  ) %>% 
  # fill all values down and select the last row of each ensemble
  group_by(ensemble) %>% 
  fill(everything(), .direction = "down") %>%
  # keep some summary values we might use to filter out garbage
  # ensembles
  mutate(
    n_sentence = n(),
    n_valid = sum(checksum_valid),
    sentence_ids = paste(sentence_id, collapse = ", ")
  ) %>% 
  slice(n()) %>% 
  ungroup() %>% 
  # these aren't meaningful anymore because all columns contain
  # summary values
  select(-sentence_id, -checksum_valid)

clean_ensemble