aquap2: Multivariate Data Analysis Tools for R including Aquaphotomics Methods

getFullData

R Documentation

* Get / Import Spectral Data *

Description

If everyting is left at the defaults, the function first tries to load an R-object containing previously imported spectral data. If this was not found, it tries to import spectral data from a file in the rawdata-folder, fuses (if slType is not NULL) these data together with the class-header provided in the sampleLists/sl_in folder and saves the resulting dataset. It is also possible to use a user-defined custom function to import data from a file in any format, containing the NIR-spectra as well as all the class- and numerical variables. In the latter case it is still possible to fuse additional variables provided in a file in sampleLists/sl_in with the imported data.

Usage

getFullData(
  md = getmd(),
  filetype = "def",
  slType = "def",
  trhLog = "def",
  multiplyRows = "def",
  ttl = TRUE,
  stf = TRUE,
  naString = "NA",
  dol = "def",
  sh = NULL,
  remDC = getstn()$imp_remDoubleCols,
  rawOnlyNIR = FALSE
)

gfd(
  md = getmd(),
  filetype = "def",
  slType = "def",
  trhLog = "def",
  multiplyRows = "def",
  ttl = TRUE,
  stf = TRUE,
  naString = "NA",
  dol = "def",
  sh = NULL,
  remDC = getstn()$imp_remDoubleCols,
  rawOnlyNIR = FALSE
)

Arguments

`md`	List. The object with the metadat of the experiment. The default is to get the metadata file via `getmd`.
`filetype`	Character. The type of the spectral raw data file. If a value other than "def" is provided, this is overriding the value of "filetype" in the metadata file. Possible values are: `def`: Gets the default value from the setings.r file. (variable 'imp_specFileType') `vision_NSAS.da`: Import from the .da file generated by the Vision-software from a Foss-XDS spectroscope. `tabDelim.txt`: Import any tab delimited text file that contains only the NIR spectra and no additional columns like e.g. time, temperature etc, and that has 1 character in front of the wavelengths in the column names of the NIR spectra. `Pirouette.pir`: Import spectra and any class- or numerical variable directly from a .pir file. Those column-names in the .pir file that match the standard-column names (`printStdColnames`) as defined in the settings.r file will be assigned to those columns automatically. `custom@yourFile.R`: It is possible to use a custom import function to import any type of spectra. Use `custom@yourFile.R`, with `yourFile.R` being an .R-file located in the path specified in the .Renviron file. Please refer to `custom_import` for further information. `xls`: Import raw spectra from a xlsx file. Please see the section `Import from xlsx` below for further details. `YunosatoDatFile.dat`: Import raw spectra from a `.dat` file as styled by the Yunosato Aquaphotomics Lab, Japan. Please see the section `Import from Yunosato .dat` below for further information. `MicroNIR.csv` A comma separated value file (csv) as generated by the XXX software containing spectra acquired with a MicroNIR device.
`slType`	Character. The type of sample-list file in the sampleLists/ sl_in folder. Possible values are: `def`: Gets the value from the metadata file (variable `sampleListType`.) (variable 'imp_sampleListType') `NULL`: By providing 'NULL' to the argument `slType` you indicate hat no sample-list should be imported to create the header. This would be the case if you use a custom-function to import your spectral data and all the necessary class- and numerical variables are already defined in the same file that holds the spectral data. Please refer to `custom_import` for further information on the requirements for this custom import function. A custom function can be used to import spectral data and at the same time import additional variables from a sample-list file by providing one of the characters listed below. `xls`: an Excel file ending in '.xlsx'
`trhLog`	If data from temperatur and rel.humidity logger should be imported and aligned to a timestamp in the dataset. Possible values are: `def`: Gets the value from the variable `tempHumLog` from the metadata file. `FALSE`: No data from a logger-file will be imported. `ESPEC`: Import data from a tab. delim .txt file generated by an 'ESPEC' logger. (This is included for historical reasons.) `HOBO`: Import data from a file as generated by HOBOware data loggers. When providing `HOBO` to the argument `trhLog`, both .xls and .csv files as created by the HOBOware export software can be read in. Having both .cls and .csv files present results in an error. Please see details for the structure of the HOBOware files below. `custom@yourFile.R` You can provide your own import-function for importing data from any logger, with `yourFile.R` being a .R-file located in the settings-home folder as specified in the .Renviron file. Please refer to `custom_TRH` for further information.
`multiplyRows`	Character or Logical. If the rows in the sample list should be multiplied by the number of consecutive scans as specified in the variable `nrConScans`) in the metadata of the experiment. `def`: If the argument `multiplyRows` in the function `getFullData` is left at `def`, the value (`TRUE` or `FALSE` or `auto`) from the variable `multiplyRows` from the metadata file is used. `auto`: Checks if there is an error column in the sample list. If no error column and no column for consecutive scans or only the error column is present, the rows in the sample list will be multiplied by the numer of consecutive scans as given in the metadata. The values in the error column (if any) will be used to correct the number of consecutive scans for each respective sample. If no error column, but a column for consecutive scans is present in the sample list, it will not be multiplied. `FALSE`: The sample list will be left as it is. In that case it is the users responsibility to provide a sample list with the rows correctly multiplied to match the number of consecutive scans in the dataset. `TRUE`: For multiplying every row in the sample list by the number of consecutive scans as specified in `nrConScans` in the metadata of the experiment. If values are given in the error column in the sample list, the consecutive scans for each sample will be corrected by this number. Please also refer to `exportSampleList` and the explanation to the argument `multiplyRows` therein.
`ttl`	Logical, 'try to load'. If a possibly existing r-data file should be loaded. From the provided metadata (argument 'md') the experiment name is extracted, and if a file having the same name as the experiment name is found in folder 'R-data' it is loaded. If there is no such file, the spectra and class variables are imported from raw-data, and the whole dataset is safed if argument 'stf' is TRUE. In other words, providing 'FALSE' to argument 'ttl' always imports the spectra from the raw-data.
`stf`	Logical, 'save to file'. If the final dataset should be saved to the 'R-data' folder after import from the raw-data file. Defaults to 'TRUE'.
`naString`	Character. What to use as 'NA'. Applies only when 'filetype' is `tabDelim.txt`.
`dol`	Detect outliers. If outliers should be detected using the flags provided by `RSimca`. If left at the default "def", the value from the settings.r file will be used (parameter `imp_flagOutliers`. If `dol` evaluates to TRUE, an additional column flagging the outliers as detected in the scope of the complete dataset will be added to the dataset.
`sh`	Character length one. Manual path to settings home. Can and should be left at the default `NULL`.
`remDC`	Logical. Takes its factory-fresh default value `TRUE` from the key `imp_remDoubleCols` in the settings.R file. `remDC` defines if columns with identical names should be removed automatically at the time of data import. Double columns can arise from the same column being present in the rawdata file (e.g. as possible in the case when importing from filetype `YunosatoDatFile.dat`) AND in the sample list file. If `remDC` is set to `FALSE`, importing double columns will throw an error and the import will be stopped.
`rawOnlyNIR`	Logical. If class- and numerical variables that got possibly imported from within a raw data file should be discarded. Defaults to `FALSE`. Set to `TRUE` to only import the NIR (and a possible timestamp) from the rawdata file.

Details

From the metadata, provided in the first argument, the experiment name is extracted, and (if 'ttl' is TRUE) first the dataset-file having this name is looked for in the 'R-data' folder and, if there, is being loaded. If the file could not be found (or if 'ttl' is FALSE) the spectral file having the same name as the experiment name (plus its specific ending) is imported from the rawdata-folder. The sample list (what is used to create the header) must be in the sampleLists/sl_in folder and must be named with the experiment name, followed by a "-in" and then the file extension. To be recognized as such, the standard columns have to be named with the standard column names as defined in the settings.r file. (see printStdColnames) If you use a custom function and provide all the class- and numerical variables together with the spectral data, set argument 'slType' to NULL. If you import from a .pir file and have all the class- and numerical variables inside the .pir file, set argument 'slType'to NULL. If the dataset is the result of the fusion of other datasets mergeDatasets, the slot 'mergeInfo' will contain further information.

Value

An object of class 'aquap_data' containing a data frame and six slots:

dataframe Consists of 'header', 'colRep' and 'NIR'.
metadata A list with the metadata of the experiment
anproc Possibly a list with an analysis procedure
mergeInfo Possibly an object of class 'aquap_mergeLabels'
calcVarInfo Possibly a list containing information on calculated variables. (generateMergeLabels), if the dataset is the result of merging other datsets.
ncpwl Numeric length one, the number of characters before the wavelength in the column names of the NIR spectra.
version A length one character noting the version of the dataset.

Note

The strict regime with the filenames (see Details) seems maybe at first at bit complicated, but it proved to be good practise to ensure a strict and conscious handling of the files.

Import from xlsx

For the raw spectra to be imported from a xlsx file, a few prerequisites have to be fulfilled. It is recommended to look at the file structure of xlsx files generated via export_ap2_ToXlsx and use that as a template.

At least two worksheets are required to be in the xlsx file: One contains the data, the other some metadata describing the data.
Data Worksheet: The worksheet containing the data can either contain only NIR spectra, or class and numerical variables (what is called the 'header') **and** NIR spectra. (Compare export_ap2_ToXlsx). The data worksheet´s name should either end in _data, or it has to be the first worksheet.
_meta Worksheet: The name of the worksheet containing the metadata must end in _meta. There can only be one worksheet ending in _meta in the file. In this worksheet, there has to be one row with three columns. The names of the columns have to be ncol_header, rownamesAsFirstColumn and ncpwl.
First column in _meta: Provide an integer denoting the number of columns in the header. Provide 0 (zero) if the data only contain NIR spectra.
Second column in _meta: Logical, denotes whether there are rownames in the data. Set to TRUE or FALSE.
Third column in _meta: Provide an integer denoting the number of characters in front of the wavelength-number. Set to 0 (zero) if there are no characters in front of the wavelengths.
Timestamps: Should there be timestamps in the xlsx file, their column name has to be Timestamp, and the format has to be POSIXct in order to be recognized correctly. If these requirements can not be met, it is advised to write a custom import function to import from xlsx files. Please see custom_import for further information.

If there are class and numerical variables present in the xlsx file **and** variables from a sample list are imported as well (so slType is **not** NULL), the sample list must contain a column denoting the sample number. In this case, the sample number and the number of consecutive scans get imported from the sample list file, and it will result in an error to have those variables in the xlsx file as well. Generally, it is not possible to have two variables with the same name. Please look at the files generated via export_ap2_ToXlsx as a reference.

Import from Yunosato.dat

It is possible to have all or some of the class- and numeric variables in the .dat file. Whatever is present will be read out, and if an additional sample list is demanded to be imported (parameter slType != NULL) it will be combined. The tab-separated .dat file by the Yunosato Aquaphotomics lab is styled as follows:

The first rows starts with #D and contains the dimension in columns x rows (e.g. 25x30)
The second row starts with #C and contains the column names, with a w preceding the wavelengths, a * preceding the class variables, and a $ preceding the numeric variables. Please consider the standard column names, see printStdColnames.
The following rows all start with #S and contain the data, and in the first columnn there is a string. This string is structured via _, and in its last element there is a timestamp in the format "YYYYMMDDHHMMSS", and in its second last element there are the consecutive scans. All previous elements stay as they are and are used as base for rownnames and provided as an extra class variable.

Import from MicroNIR.sv

Designed to read the .csv file as produced by the MicroNIR software from VIAVI. The number of consecutive scan is taken from the name assigned by the MicroNIR software (someSample-1.sam). Decide for one of the following options when providing user input at the sample-ID input in the MicroNIRs GUI:

Only Numbers: These numbers have be unique, and they will be used as as the sample number. No sampleID will be produced. In case of a misstake, i.e. a repeated number at a later measurement, just put in a character at some next measurement so that all sampleIDs are forced to be treated as character. Then the sample numbers will be auto-generated.
Character: Provide any character as sampleID. It should be unique for each sample. In case of a misstake, i.e. a second instance of a sampleID, all instances of this sampleID will be renamed by appending #n with n being the number of the instance, starting with 1 with the first. The consecutive scans will be renumbered to always range from 1 to n for each sample instance.

The device temperature will be imported, also the notes, and the time as given in the MicroNIR file. The DateTime format on your computer will decide about the format of the timestamp in the MicroNIR file. aquap2´s input format to read this time can be changed via the global settings file (parameter imp_timeFormat_microNir). The instruments serial number will be stored in the slot instrument in the resulting R-object.

Details on HOBOware file structure

The HOBOware logger file is structured as follows:

first row contains a title, second row the column names
first column contains rownumber
second column contains the timestamp in the format day/month/Year Hour:Minutes:Seconds (24h format, day and month in 2 digits, year in 4 digits). Please note that the time format for importing from HOBOware data loggers can be specified in the global settings file at the key imp_timeFormat_HOBOware.
third column contains temperature data
fourth column contains relative humidity data

By creating or formatting your own temperature and data export like this, it is possible to use the built in HOBO import function for importing temperature and rel. humidity data. Please also note the possibility to create a custom import function for the temp. and rel.hum. data, see the input option custom@yourFile.R at the parameter trhLog and custom_TRH.

Examples

## Not run: 
 md <- getmd()
 fd <- getFullData(md)
 fd <- getFullData() # the same as above
 fd <- gfd(getmd(expName="OtherName")) # to override the experiment name specified in 
 # the metadata.r file and load the dataset called 'Foo' instead. (see ?getmd)
 fd <- gfd(md=getmd("foo.r")) # loads metadata from file 'foo.r'
 fd <- getFullData(filetype="custom@myFunc.r", slType="xls")
 # This would use a custom function to read in the raw spectra, and read in 
 # the class- and numerical variables from an Excel file.
 ## 
 md <- getmd()
 md$meta$expName <- "bar"
 fd <- getFullData(md) # load a rawdata-file called "bar"

## End(Not run)

bpollner/aquap2 documentation built on June 29, 2024, 5:21 p.m.

bpollner/aquap2 index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bpollner/aquap2
Multivariate Data Analysis Tools for R including Aquaphotomics Methods

getFullData: * Get / Import Spectral Data *
In bpollner/aquap2: Multivariate Data Analysis Tools for R including Aquaphotomics Methods

* Get / Import Spectral Data *

Description

Usage

Arguments

Details

Value

Note

Import from xlsx

Import from Yunosato.dat

Import from MicroNIR.sv

Details on HOBOware file structure

See Also

Examples

Related to getFullData in bpollner/aquap2...

R Package Documentation

Browse R Packages

We want your feedback!

bpollner/aquap2 Multivariate Data Analysis Tools for R including Aquaphotomics Methods

getFullData: *** Get / Import Spectral Data *** In bpollner/aquap2: Multivariate Data Analysis Tools for R including Aquaphotomics Methods

*** Get / Import Spectral Data ***

Description

Usage

Arguments

Details

Value

Note

Import from xlsx

Import from Yunosato.dat

Import from MicroNIR.sv

Details on HOBOware file structure

See Also

Examples

Related to getFullData in bpollner/aquap2...

R Package Documentation

Browse R Packages

We want your feedback!

bpollner/aquap2
Multivariate Data Analysis Tools for R including Aquaphotomics Methods

getFullData: * Get / Import Spectral Data *
In bpollner/aquap2: Multivariate Data Analysis Tools for R including Aquaphotomics Methods

* Get / Import Spectral Data *