getFullData: *** Get / Import Spectral Data ***

View source: R/prep_importData.r

getFullDataR Documentation

*** Get / Import Spectral Data ***

Description

If everyting is left at the defaults, the function first tries to load an R-object containing previously imported spectral data. If this was not found, it tries to import spectral data from a file in the rawdata-folder, fuses (if slType is not NULL) these data together with the class-header provided in the sampleLists/sl_in folder and saves the resulting dataset. It is also possible to use a user-defined custom function to import data from a file in any format, containing the NIR-spectra as well as all the class- and numerical variables. In the latter case it is still possible to fuse additional variables provided in a file in sampleLists/sl_in with the imported data.

Usage

getFullData(
  md = getmd(),
  filetype = "def",
  slType = "def",
  trhLog = "def",
  multiplyRows = "def",
  ttl = TRUE,
  stf = TRUE,
  naString = "NA",
  dol = "def",
  sh = NULL,
  remDC = getstn()$imp_remDoubleCols,
  rawOnlyNIR = FALSE
)

gfd(
  md = getmd(),
  filetype = "def",
  slType = "def",
  trhLog = "def",
  multiplyRows = "def",
  ttl = TRUE,
  stf = TRUE,
  naString = "NA",
  dol = "def",
  sh = NULL,
  remDC = getstn()$imp_remDoubleCols,
  rawOnlyNIR = FALSE
)

Arguments

md

List. The object with the metadat of the experiment. The default is to get the metadata file via getmd.

filetype

Character. The type of the spectral raw data file. If a value other than "def" is provided, this is overriding the value of "filetype" in the metadata file. Possible values are:

  • def: Gets the default value from the setings.r file. (variable 'imp_specFileType')

  • vision_NSAS.da: Import from the .da file generated by the Vision-software from a Foss-XDS spectroscope.

  • tabDelim.txt: Import any tab delimited text file that contains only the NIR spectra and *no* additional columns like e.g. time, temperature etc, and that has 1 character in front of the wavelengths in the column names of the NIR spectra.

  • Pirouette.pir: Import spectra *and* any class- or numerical variable directly from a .pir file. Those column-names in the .pir file that match the standard-column names (printStdColnames) as defined in the settings.r file will be assigned to those columns automatically.

  • custom@yourFile.R: It is possible to use a custom import function to import **any** type of spectra. Use custom@yourFile.R, with yourFile.R being an .R-file located in the path specified in the .Renviron file. Please refer to custom_import for further information.

  • xls: Import raw spectra from a xlsx file. Please see the section Import from xlsx below for further details.

  • YunosatoDatFile.dat: Import raw spectra from a .dat file as styled by the Yunosato Aquaphotomics Lab, Japan. Please see the section Import from Yunosato .dat below for further information.

  • MicroNIR.csv A comma separated value file (csv) as generated by the XXX software containing spectra acquired with a MicroNIR device.

slType

Character. The type of sample-list file in the sampleLists/ sl_in folder. Possible values are:

  • def: Gets the value from the metadata file (variable sampleListType.) (variable 'imp_sampleListType')

  • NULL: By providing 'NULL' to the argument slType you indicate hat no sample-list should be imported to create the header. This would be the case if you use a custom-function to import your spectral data and all the necessary class- and numerical variables are already defined in the same file that holds the spectral data. Please refer to custom_import for further information on the requirements for this custom import function. A custom function can be used to import spectral data and at the same time import additional variables from a sample-list file by providing one of the characters listed below.

  • xls: an Excel file ending in '.xlsx'

trhLog

If data from temperatur and rel.humidity logger should be imported and aligned to a timestamp in the dataset. Possible values are:

  • def: Gets the value from the variable tempHumLog from the metadata file.

  • FALSE: No data from a logger-file will be imported.

  • ESPEC: Import data from a tab. delim .txt file generated by an 'ESPEC' logger. (This is included for historical reasons.)

  • HOBO: Import data from a file as generated by HOBOware data loggers. When providing HOBO to the argument trhLog, both .xls and .csv files as created by the HOBOware export software can be read in. Having both .cls and .csv files present results in an error. Please see details for the structure of the HOBOware files below.

  • custom@yourFile.R You can provide your own import-function for importing data from any logger, with yourFile.R being a .R-file located in the settings-home folder as specified in the .Renviron file. Please refer to custom_TRH for further information.

multiplyRows

Character or Logical. If the rows in the sample list should be multiplied by the number of consecutive scans as specified in the variable nrConScans) in the metadata of the experiment.

  • def: If the argument multiplyRows in the function getFullData is left at def, the value (TRUE or FALSE or auto) from the variable multiplyRows from the **metadata file** is used.

  • auto: Checks if there is an error column in the sample list. If no error column and no column for consecutive scans or only the error column is present, the rows in the sample list will be multiplied by the numer of consecutive scans as given in the metadata. The values in the error column (if any) will be used to correct the number of consecutive scans for each respective sample. If no error column, but a column for consecutive scans is present in the sample list, it will **not** be multiplied.

  • FALSE: The sample list will be left as it is. In that case it is the users responsibility to provide a sample list with the rows correctly multiplied to match the number of consecutive scans in the dataset.

  • TRUE: For multiplying every row in the sample list by the number of consecutive scans as specified in nrConScans in the metadata of the experiment. If values are given in the error column in the sample list, the consecutive scans for each sample will be corrected by this number.

Please also refer to exportSampleList and the explanation to the argument multiplyRows therein.

ttl

Logical, 'try to load'. If a possibly existing r-data file should be loaded. From the provided metadata (argument 'md') the experiment name is extracted, and if a file having the same name as the experiment name is found in folder 'R-data' it is loaded. If there is no such file, the spectra and class variables are imported from raw-data, and the whole dataset is safed if argument 'stf' is TRUE. In other words, providing 'FALSE' to argument 'ttl' always imports the spectra from the raw-data.

stf

Logical, 'save to file'. If the final dataset should be saved to the 'R-data' folder after import from the raw-data file. Defaults to 'TRUE'.

naString

Character. What to use as 'NA'. Applies only when 'filetype' is tabDelim.txt.

dol

Detect outliers. If outliers should be detected using the flags provided by RSimca. If left at the default "def", the value from the settings.r file will be used (parameter imp_flagOutliers. If dol evaluates to TRUE, an additional column flagging the outliers as detected in the scope of the complete dataset will be added to the dataset.

sh

Character length one. Manual path to settings home. Can and should be left at the default NULL.

remDC

Logical. Takes its factory-fresh default value TRUE from the key imp_remDoubleCols in the settings.R file. remDC defines if columns with identical names should be removed automatically at the time of data import. Double columns can arise from the same column being present in the rawdata file (e.g. as possible in the case when importing from filetype YunosatoDatFile.dat) AND in the sample list file. If remDC is set to FALSE, importing double columns will throw an error and the import will be stopped.

rawOnlyNIR

Logical. If class- and numerical variables that got possibly imported from within a raw data file should be discarded. Defaults to FALSE. Set to TRUE to only import the NIR (and a possible timestamp) from the rawdata file.

Details

From the metadata, provided in the first argument, the experiment name is extracted, and (if 'ttl' is TRUE) first the dataset-file having this name is looked for in the 'R-data' folder and, if there, is being loaded. If the file could not be found (or if 'ttl' is FALSE) the spectral file having the same name as the experiment name (plus its specific ending) is imported from the rawdata-folder. The sample list (what is used to create the header) must be in the sampleLists/sl_in folder and must be named with the experiment name, followed by a "-in" and then the file extension. To be recognized as such, the standard columns have to be named with the standard column names as defined in the settings.r file. (see printStdColnames) If you use a custom function and provide all the class- and numerical variables together with the spectral data, set argument 'slType' to NULL. If you import from a .pir file and have all the class- and numerical variables inside the .pir file, set argument 'slType'to NULL. If the dataset is the result of the fusion of other datasets mergeDatasets, the slot 'mergeInfo' will contain further information.

Value

An object of class 'aquap_data' containing a data frame and six slots:

  • dataframe Consists of 'header', 'colRep' and 'NIR'.

  • metadata A list with the metadata of the experiment

  • anproc Possibly a list with an analysis procedure

  • mergeInfo Possibly an object of class 'aquap_mergeLabels'

  • calcVarInfo Possibly a list containing information on calculated variables. (generateMergeLabels), if the dataset is the result of merging other datsets.

  • ncpwl Numeric length one, the number of characters before the wavelength in the column names of the NIR spectra.

  • version A length one character noting the version of the dataset.

Note

The strict regime with the filenames (see Details) seems maybe at first at bit complicated, but it proved to be good practise to ensure a strict and conscious handling of the files.

Import from xlsx

For the raw spectra to be imported from a xlsx file, a few prerequisites have to be fulfilled. It is recommended to look at the file structure of xlsx files generated via export_ap2_ToXlsx and use that as a template.

  • At least two worksheets are required to be in the xlsx file: One contains the data, the other some metadata describing the data.

  • Data Worksheet: The worksheet containing the data can either contain only NIR spectra, or class and numerical variables (what is called the 'header') **and** NIR spectra. (Compare export_ap2_ToXlsx). The data worksheet´s name should either end in _data, or it has to be the first worksheet.

  • _meta Worksheet: The name of the worksheet containing the metadata must end in _meta. There can only be one worksheet ending in _meta in the file. In this worksheet, there has to be one row with three columns. The names of the columns have to be ncol_header, rownamesAsFirstColumn and ncpwl.

  • First column in _meta: Provide an integer denoting the number of columns in the header. Provide 0 (zero) if the data only contain NIR spectra.

  • Second column in _meta: Logical, denotes whether there are rownames in the data. Set to TRUE or FALSE.

  • Third column in _meta: Provide an integer denoting the number of characters in front of the wavelength-number. Set to 0 (zero) if there are no characters in front of the wavelengths.

  • Timestamps: Should there be timestamps in the xlsx file, their column name has to be Timestamp, and the format has to be POSIXct in order to be recognized correctly. If these requirements can not be met, it is advised to write a custom import function to import from xlsx files. Please see custom_import for further information.

If there are class and numerical variables present in the xlsx file **and** variables from a sample list are imported as well (so slType is **not** NULL), the sample list must contain a column denoting the sample number. In this case, the sample number and the number of consecutive scans get imported from the sample list file, and it will result in an error to have those variables in the xlsx file as well. Generally, it is not possible to have two variables with the same name. Please look at the files generated via export_ap2_ToXlsx as a reference.

Import from Yunosato.dat

It is possible to have all or some of the class- and numeric variables in the .dat file. Whatever is present will be read out, and if an additional sample list is demanded to be imported (parameter slType != NULL) it will be combined. The tab-separated .dat file by the Yunosato Aquaphotomics lab is styled as follows:

  • The first rows starts with #D and contains the dimension in columns x rows (e.g. 25x30)

  • The second row starts with #C and contains the column names, with a w preceding the wavelengths, a * preceding the class variables, and a $ preceding the numeric variables. Please consider the standard column names, see printStdColnames.

  • The following rows all start with #S and contain the data, and in the first columnn there is a string. This string is structured via _, and in its last element there is a timestamp in the format "YYYYMMDDHHMMSS", and in its second last element there are the consecutive scans. All previous elements stay as they are and are used as base for rownnames and provided as an extra class variable.

Import from MicroNIR.sv

Designed to read the .csv file as produced by the MicroNIR software from VIAVI. The number of consecutive scan is taken from the name assigned by the MicroNIR software (someSample-1.sam). Decide for one of the following options when providing user input at the sample-ID input in the MicroNIRs GUI:

  • Only Numbers: These numbers have be unique, and they will be used as as the sample number. No sampleID will be produced. In case of a misstake, i.e. a repeated number at a later measurement, just put in a character at some next measurement so that all sampleIDs are forced to be treated as character. Then the sample numbers will be auto-generated.

  • Character: Provide any character as sampleID. It should be unique for each sample. In case of a misstake, i.e. a second instance of a sampleID, all instances of this sampleID will be renamed by appending #n with n being the number of the instance, starting with 1 with the first. The consecutive scans will be renumbered to always range from 1 to n for each sample instance.

The device temperature will be imported, also the notes, and the time as given in the MicroNIR file. The DateTime format on your computer will decide about the format of the timestamp in the MicroNIR file. aquap2´s input format to read this time can be changed via the global settings file (parameter imp_timeFormat_microNir). The instruments serial number will be stored in the slot instrument in the resulting R-object.

Details on HOBOware file structure

The HOBOware logger file is structured as follows:

  • first row contains a title, second row the column names

  • first column contains rownumber

  • second column contains the timestamp in the format day/month/Year Hour:Minutes:Seconds (24h format, day and month in 2 digits, year in 4 digits). Please note that the time format for importing from HOBOware data loggers can be specified in the global settings file at the key imp_timeFormat_HOBOware.

  • third column contains temperature data

  • fourth column contains relative humidity data

By creating or formatting your own temperature and data export like this, it is possible to use the built in HOBO import function for importing temperature and rel. humidity data. Please also note the possibility to create a custom import function for the temp. and rel.hum. data, see the input option custom@yourFile.R at the parameter trhLog and custom_TRH.

See Also

readSpectra, readHeader, aquap_data-methods

Other Core functions: exportSampleList(), gdmm(), plot,aquap_cube,missing-method, plot,aquap_data,missing-method

Examples

## Not run: 
 md <- getmd()
 fd <- getFullData(md)
 fd <- getFullData() # the same as above
 fd <- gfd(getmd(expName="OtherName")) # to override the experiment name specified in 
 # the metadata.r file and load the dataset called 'Foo' instead. (see ?getmd)
 fd <- gfd(md=getmd("foo.r")) # loads metadata from file 'foo.r'
 fd <- getFullData(filetype="custom@myFunc.r", slType="xls")
 # This would use a custom function to read in the raw spectra, and read in 
 # the class- and numerical variables from an Excel file.
 ## 
 md <- getmd()
 md$meta$expName <- "bar"
 fd <- getFullData(md) # load a rawdata-file called "bar"

## End(Not run)

bpollner/aquap2 documentation built on March 29, 2024, 7:33 a.m.