file: Convert RI files from text to binary format and vice versa

text2binR Documentation

Convert RI files from text to binary format and vice versa

Description

This function converts a list of RI files (also known as peak list files) in text or binary format to binary or text format.

Usage

    text2bin(in.files, out.files=NULL, columns=NULL)

    bin2text(in.files, out.files=NULL)

Arguments

in.files

A character vector of file paths to the input RI files.

out.files

A character vector of file paths. If NULL, the input file extensions will be changed accordingly ("txt" to "dat" and vice versa).

columns

Either a numeric vector with the positions of the columns for SPECTRUM, RETENTION_TIME_INDEX, and RETENTION_TIME; or a character vector with the respective names of those columns. If NULL, then default (hard-coded) values will be used. If an integer vector, the column position must start at zero. In addition, the column names can be set by a global option (see details below).

Details

These functions transform a list of RI files from and to binary to text representation. The format of the input files is detected dynamically and an error will be issued on invalid files.

Transforming a binary file to text might be useful if you need to inspect what a RI file looks in the inside (for example, you need to check that the peak detection was correct). On the other hand, a text file to binary is highly recommended as it is faster to parse than a text file.

For text files, the order of the columns is important (see option columns above). The first entry is the spectrum list, followed by the retention time index and the retention time. If the column names are other than SPECTRUM, RETENTION_TIME_INDEX, and RETENTION_TIME, use the respective column names or the column names positions starting at zero (first column is zero, second is one, and so on).

Many functions relay on those column names and having to pass them as arguments on each function is tedious, so the global option TS_RI_columns can be set at the beginning, for example:

        # using column names
        options(TS_RI_columns=c('spec_column', 'RI_column', 'RT_column'))

        # using column indices (zero-based!)
        options(TS_RI_columns=c(1, 2, 0))
    

where "spec_column", "RI_column", and "RT_columns" are the names of the spectrum, retention index and retention time columns.

This command is useful if your RI files were generated by another software. However, it is highly recommended to simple convert those custom RI files into TargetSearch's binary format and do not worry about column names.

Value

A character vector of the created files paths or invisible.

File Format

The so-called RI files contain lists of m/z peaks detected for every ion trace measured in the samples. Historically, the file format was a simple tab-delimited text file in the format described below. Note that the column order could differ and additional columns could be present, but they are ignored.

RETENTION_TIME SPECTRUM RETENTION_TIME_INDEX
212.46 250:26 256:26 316:27 221029.7
212.51 114:46 162:30 251:27 221081.3
212.56 319:25 221132.9
212.61 95:38 108:30 262:32 266:27 292:25 221184.5

The retention time is usually represented in seconds, while the retention time in arbitrary units, which depends on the retention time correction standard method (in the table above it is in milliseconds, but other units can be used).

The spectrum column is represented by pairs of m/z and raw intensity (peak height), similarly as the representation of a metabolite library (see ImportLibrary). Thus each pair correspond to a peak of the respective ion trace.

The disadvantage of using text files is they are slow to parse, so a binary format was created which represents the peak data as binary vectors so they are fast to parse. These files contain the extension dat.

Note

Beware that the respective tsSample object may need to be updated by using the method fileFormat.

Author(s)

Alvaro Cuadros-Inostroza

See Also

ImportSamples, tsSample, RIcorrect

Examples

    require(TargetSearchData)
    # take three example files from package TargetSearchData
    in.files <- tsd_rifiles()[1:3]
    # out files to current directory
    out.files <- sub(".txt", ".dat", basename(in.files))
    # convert to binary format
    res <- text2bin(in.files, out.files)
    stopifnot(res == out.files)

    # convert back to text
    res <- bin2text(out.files)
    stopifnot(res == basename(in.files))

    # Demonstrate how to use the `columns` option
    # make dummy RI file with arbitrary column names and save it
    tmp <- data.frame(RT=c(101.5,102.5), SPEC=c('12:100 23:100', '114:46 162:30'), RI=c(300, 400) + .75)

    # file must be tab-delimited, unquoted strings and no row names
    RI_test <- tempfile(fileext=".txt")
    write.table(tmp, file=RI_test, sep="\t", quote=FALSE, row.names=FALSE)

    # convert this text file to binary format
    ## wrong! It fails because of invalid columns
    # text2bin(RI_test)

    # correct! The columns are correct
    text2bin(RI_test, columns=c('SPEC', 'RI', 'RT'))

    # same example but using integers (not recommended)
    text2bin(RI_test, columns=c(1, 2, 0)) # note they start from zero.

    # Alternative, set a global option (so it can be used in a session)
    opt <- options(TS_RI_columns=c('SPEC', 'RI', 'RT'))
    text2bin(RI_test)

    # or using integers (again, not recommended)
    options(TS_RI_columns=c(1, 2, 0))
    text2bin(RI_test)

    # unset options
    options(opt)

acinostroza/TargetSearch documentation built on April 3, 2024, 8:09 p.m.