knitr::opts_chunk$set(echo = TRUE)

ProcessHplc Package

This is a package written to process the raw CSV output from ChemStation of HPLC pigment runs generated by the ChemStation macro Export3D.mac.

This macro places a DAD1.CSV file in the ChemStation directory of the respective sample that contains the absorbance values for all wavelengths examined, for all time points during the run.

This package will process DAD1.CSV files to i) integrate the desired peaks, ii) correctly identify the integrated peaks, iii) quantify the concentration of identified pigments [currently not initiated] and iv) produce a database containing all of this information for each sample along with all absorbance data from the hplc run.

This package has been mainly developed using ShinyApp to allow the user to interactively work with the functions and to process their data.

Golden rules of HPLC data processing

1. Peak Integration

The user must decide a minimum peak height to integrate (i.e. select all peaks that meet this minimum height). This is done here interactively when the integrate.peaks() function is called. The trick to this is to set the threshold low enough to select all peaks of interest, but not too low that every tiny bump on the chromatograph is integrated. There is usually a trade-off, whereby some non-true-peak features are integrated when we set the threshold low enough to make sure we capture all peaks of interest.

2. Peak Identification

A. Identification must take into consideration BOTH the peak absorbance spectra (and thus match to library) AND the peak retention time:

Many pigments have extremely similar absorbance spectra and so can only really be differentiated using both this and their retention time in the HPLC column, i.e. at what time during the run the pigment appears on the chromatograph. It is very important to rely on both of these attributes to determine the peak ID. In the function id.peaks() there are two sliding bars to set thresholds to help in this ID process, but these will not get the proper ID for all pigments (hence we need a user driven pigment ID step). After working with a few samples you will get an idea of which order you expect pigments to appear on the chromatograph. To aid in this, green points are shown along the x-axis (hover mouse over them to see which pigment they represent) to give a rough guide of which peak should be where. These points are derived from one mixed standard pigment sample and therefore retention times will differ to your sample peaks (they'll usually be slightly later than the peaks in your sample, particularly earlier on in the chromatograph) but they should work as a guide nonetheless. The important point here is DONT BELIEVE A PEAK ID IF THE RETENTION TIME IS WAY OFF.

B. There are numerous peaks we cant identify:

Linked with the above, there will be several peaks (typcailly small and bad absorbance spectra) in all chromatographs that cannot be identified. These 'peaks' may be artefacts due to bad separation on the column, may be isomers of known pigments, pigment break-down products, or actual unique pigments that i) are not in our library and so we cant identify them yet, ii) are unknown generally. In some cases these peaks will be almost perfect spectral matches to a known pigment in our library, but appear on the chromatograph at completely the wrong time (and out of order) and thus are not the pigment we have spectrally identified them as. In the case that there is a re-occurring peak across several samples that we cant identify, literature searches can often serve to provide an estimate of their ID.

C. Different pigments are assessed at different wavelengths:

The functions in this package focus on wavelenths 223nm, 431nm and 451nm given the pigments of interest. Vitamin E [internal extraction standard] is assessed here at wavelength 223nm. Chlorophylls are assessed at 431 nm. All other pigments are assessed at 451 nm. This allows to assess peak area for calculation of pigment concentations at lambda max of respective pigment groups. For this reason, there are 3 different wavelengths to identify peaks across, BUT, at 223 nm it is only important that the vitamin E peak is correctly identified, at 431 nm it is only important that the chlorophyll peaks are correctly identified, where as 451nm all other peaks must be identified correctly. Hence most of the work will be done at this wavelength.

Package Functions

Loading the package ProcessHplc should load (and install if not already present) all dependencies. This process is not always smooth and some packages (particularly 'alsace') may need manual installation prior to using the ProcessHplc package.

Required packages include: magrittr, dplyr, dbplyr, RSQLite, alsace, Peaks, pracma, ggplot2, gridExtra, shiny, plotly, knitr, rmarkdown

The ProcessHplc package has 4 functions:

  1. integrate.peaks()
  2. id.peaks()
  3. view.database()
  4. calculate.concs()

1. integrate.peaks()

The purpose of this function is to decide a threshold of minimum peak height to integrate. As noted previously, this is a trade-off between integrating peaks of interest and avoiding noise.

The function requires the name of the sample CSV file (a character string of the file name if in working directory, or path to file) and the name of a blank CSV file completed in the same run as your sample (a character strong of the file name if in working directory, or path to file). If the blank file is not provided an integrated blank file is used in the blanking process, but this does not produce good results.

The function will launch an interactive html window showing the main chromatograph at 451 nm with a slider input to change the minimum peak height threshold. As the slider is changed the number of peaks integrated (and thus labelled on the plot) will iteratively change. NB, this plot is interactive, i.e. can be zoomed etc if required.

When you have decided on the desired threshold, click the Finished button.
This function will produce a database with the same name as the sample name in your current working directory. This database will contain numerous datatables (see view.database() function for how to navigate these tables if desired).

2. id.peaks()

The purpose of this function is to decide on the identity of the peaks integrated based on the integrate.peaks() function.
This function requrires the name of the sample CSV file (a character string of the file name if in working directory, or path to file). NB. though we provide the name of the sample CSV file, really this function is drawing on the database produced by the integrate.peaks() function and so do not change the database location inbetween calling the two functions.

The function will launch an interactive html window. In the top left you have a drop down menu to choose between the 223nm, 431nm or 451nm chromatograph plots. These plots (top right panel) show i) the chromatograph at respective wavelengths, ii) orange points to show the identity of peaks for which the identity is 'known' (See below); hover over these points to show the pigment name, iii) green points showing the rough reference position of known pigments; hover over these points to show the pigment name.

In the second row, there are two sliders used to determine two different thresholds to aid in automatic identification of peaks (fit_threshold and fit_norm_threhold). The fit_threshold refers to the R^2^ value of a linear regression between your peak and a pigment library. It is bounded 0 - 1. By changing this slider, you are changing the R^2^ value at which you accept a match between your peak and a given entry in the library. The fit_norm_threshold incorporates both the spectral match and a weighed match based on comparisions of retention times between your peak and a pigment library. It is not bounded. As you reduce the thresholds of these two sliders, you should see more peaks gain orange points on your chromatograph, indicating these have been 'automatically identified'. Two important points i) if no peaks are identified based on these thresholds, an orange line will show in upper chromatograph, ii) the identification of these peaks MUST be checked as automatic identification is typically only ~ 70% correct, hence the user interface. The automation is provided here to effectively speed-up user peak identification but should not be used as a true peak identification tool.

The third row gives you the opportunity to check and amend your peak IDs. It shows the absorbance spectra for the peak selected (left panel drop down menu), including the absorbance spectra for the best match (223 nm), or top three best matches (431nm and 451nm) to your peak from the library. The Current Peak ID will be displayed in the left panel.
The user should examine the absorbance spectra of their peak (blue line), and compare it to the matches shown in the plot (NB the plot is interactive for i) zooming, ii) selecting/deselecting different lines by clicking on their legend entry, iii) hovering over best matches traces to view their pigment ID and fit thresholds).
To amend the ID of a peak, i) click the Amend Peak ID? button, ii) select from the top 3 best matches from the library, unknown, or other (which will show a list of all pigments in the library), iii) confirm ID amendment by clicking Confirm Peak Amendment box.
To show the absorbance spectra of another pigment on the right hand plot, click Add trace button and select the pigment from the dropdown menu that appears.

The user should use this function to i) check the vitamin E peak is identified correctly in 231 nm view [if internal extraction standard used], ii) check that chlorophyll peaks are correctly identified in 431 nm view [dont worry about the ID of any other peaks at this wavelength], iii) check that all other peaks are corectly identified in 451 nm view. The majority of the work will be in 451 nm view. If the identity of a peak is not known, it must be set to 'unknown' if not already so.

When you are happy with all peak identification, select the Finished button in the top left panel.

This function will update the database that was initially made by integrate.peaks(). In particular, this database will contain tables called integrated.223, integrated.431 and integrated.451 that will contain 'final_id' columns which reflect the identifications estalished during the interactive session.

3. view.database()

The purpose of this function is to quickly show the contents of databases produced by the integrate.peaks() and id.peaks() functions. It takes the database name as a character string and opens a simple interactive html window to navigate through the databases. Note, this can easily be done within R using functions of the dbplyr and RSQ_Lite packages.

4. calculate.concs() NOT INITIATED

This function will calculate the concentration of pigments integrated and identified using the functions integrate.peaks() and id.peaks() by taking i) a database with integration information, ii) standard curve coefficients for all pigments, iii) filtration and extraction volumes for each sample.

This functions is not yet initiated in this package, but will be once I have converted my original script into a shinyApp representation.



chrisjw18/ProcessHplc documentation built on May 28, 2019, 6:11 p.m.