In MartinSchobben/point: Reading, Processing, and Analysing Raw Ion Count Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE
)

Reading Raw Ion Count Data

The read functions are currently only supported for data generated by a Cameca NanoSIMS 50L. Raw ion count data and accompanying metadata is extracted and collated into a single tibble from text files with the extensions .is_txt, .chk_is and .stat, respectively. These files can usually be found in a single directory, which often constitute the analysis on a series of spots.

Nomenclature

Sample: sample of the true population
Analytical substrate: physical sample measured during SIMS analysis
Event: single event of an ion hitting the detector
Measurement: single count cycle $N_i$
Analysis: $n$-series of measurements $N_{(i)} = M_j$
Study: $m$-series of analyses $M_{(j)}$, constituting the different spots on the analytical substrate

library(point) # load package

The following packages are used in the examples that follow.

library(dplyr) # manipulating data
library(ggplot2) # plot
library(polyaAeppli) # Polya-Aeppli distribution
library(purrr) # functional programming
library(tidyr) # tidyr data

Example dataset

One example dataset is bundled with this package: 2018-01-19-GLENDON.

The dataset is generated with the Cameca NanoSIMS 50L at the Department of Earth Sciences at Utrecht University. The suffix GLENDON stands for glendonite an authigenic calcium carbonate seafloor precipitate. The excerpt included here contains an in-house reference (a belemnite rostra) which was used to check repeatability/external reproducibility. Ion detection for 7 individual species was performed solely with electron multipliers (EM) with the main purpose of producing stable carbon isotope ratios (^13^C/^12^C).

The example directories can be accessed with the function point_example().

# Use point_example() to access the examples bundled with this package 
# If path is 'NULL', the example directories will be listed
point_example()
# Accessing the example directory 2018-01-19-GLENDON
point_example("2018-01-19-GLENDON")

Extracting raw ion count data and associated metadata

The function read_IC() takes a character string indicating the directory file name. It further enables selecting the extraction of associated metadata by setting the argument meta = TRUE (default), and this metadata can be include as an attribute with argument hide = TRUE (default) or as additional columns hide = FALSE. This has the added bonus that it provides consistency checks between metadata that generate easily interpretable warnings.

(tb_rw <- read_IC(point_example("2018-01-19-GLENDON"), meta = TRUE))

This generates a tibble which includes;

file.nm: file name
t.nm: time increments of the measurements $t_i$
N.rw: the individual measurement counts $N_i$
species.nm: chemical species name
sample.nm: physical sample name
n.rw: the total number of measurements $n$
bl.nm: count block identifier

Warnings signals are used to inform to inform that e.g. some metadata files have no associated data files with ion counts. These files are omitted with this argument combination call to read_IC().

The data is complemented with metadata of the associated analysis.

attr(tb_rw, "metadata")

num.mt: measurement order in case of multiple chemical species
mass.mt: mass measured
det.mt: number of the detector trolley
tc.mt: measurement time of a measurement blanked in seconds
rad.mt: radius of the mass spectrometer
sample.nm: the assigned sample name
data: date of the analysis
presput.mt: time allocated for presputtering of the analytical substrate in seconds
bl_num.mt: block number
meas_bl.mt: number of measurements per block
width_hor.mt: horizontal Secondary Ion Beam Centering in Volts
width_ver.mt: vertical Secondary Ion Beam Centering in Volts
prim_cur_start.mt: Primary Ion Beam current in pico Ampere at the beginning of the analysis
prim_cur_after.mt: Primary Ion Beam current in pico Ampere at the end of the analysis
rast_com.mt: raster dimensions in micrometer
blank_rast.mt: percentage of blanked raster
det_type.mt: the type of ion counting devise; Electron Multiplier (EM) or Faraday Cup (FC)

In the case of EM usage for ion counting the metadate is complemented with;

mean_PHD.mt: the mean pulse height amplitude in Volts to approximate the peak height distribution (PHD)
SD_PHD.mt: the standard deviation of pulse height amplitude in Volts to approximate the PHD
EMHV.mt: EM High Voltage

In the case of FC usage for ion counting the metadate is complemented with;

FC_start.mt: FC background count before data acquisition
FC_after.mt: FC background count after data acquisition

Extracting metadata for machine performance assessment

Alternatively, one can also only extract the metadata of an analysis to, e.g., assess machine performance over a sequence of analyses. For example, one can assess the Peak Height Distribution (PHD) over a series of analyses.

tb_mt <- read_meta(point_example("2018-01-19-GLENDON"))

# The polya density distribution model to approximate PHD distributions
# install.packages("polyaAeppli")
tb_mt <- drop_na(tb_mt[[2]], M_PHD.mt, SD_PHD.mt) %>% 
  distinct(file.nm, .keep_all = TRUE) %>% 
  filter(num.mt == 1) %>%  # most high intensity counts
  mutate(PHD = map(M_PHD.mt,  ~as.integer(seq(0, 700, length = 100)))) %>%
  unnest(cols = c(PHD)) %>% 
  mutate(
    lambda = (2 * M_PHD.mt^2) / (SD_PHD.mt^2 + M_PHD.mt),
    prob = (SD_PHD.mt^2 - M_PHD.mt) / (SD_PHD.mt^2 + M_PHD.mt),
    prob = if_else(prob < 0 | prob >= 1 , NA_real_, prob),
  ) %>% 
  drop_na(lambda, prob) %>% 
  mutate(
    density = dPolyaAeppli(PHD, lambda = lambda, prob = prob),
    Y = pPolyaAeppli(50, lambda = lambda, prob = prob, lower.tail = FALSE)
  ) 

# Plot of PHD over analysis sequence
ggplot(tb_mt, aes(x = PHD, y = density)) +
  geom_line() +
  geom_text(
    aes(
      x = 500, 
      y = max(density) * 0.8, 
      label = paste("Y = ", sprintf("%0.1f", Y))
      ),
    check_overlap = TRUE
    ) +
  facet_wrap(vars(file.nm), scales = "free") +
  theme_classic()

The compounded Polya-Aeppli density probability function can approximate the peak height distribution [@Dietz1970; @Dietz1978]. The package pPolyaAeppli [@Burden2014] together with the discriminator threshold value (usually 50 V) enables calculating the EM Yield ($Y$) (Fig. \@ref(fig:PHDexample)). More on this topic can be found in the vignette IC-process.