rankEN: Rank compounds via the Elastic Net path

Description Usage Arguments Details Value Examples

View source: R/RankEN.R

Description

Returns identifying information for the compounds in the order in which the corresponding regression coefficient for a given compound first becomes nonzero as part of the Elastic Net path

Usage

1
2
rankEN(msObj, bioact, region_ms = NULL, region_bio = NULL, lambda,
  pos_only = TRUE, ncomp = NULL)

Arguments

msObj

An object of class msDat containing mass spectrometry abundances data and identifying information. Note that this includes objects created by the functions binMS, filterMS, and msDat.

bioact

Either a numeric vector or matrix, or a data frame providing bioactivity data. If a numeric vector, then it is assumed that each entry corresponds to a particular fraction. If the data is 2-dimensional, then it is assumed that each column corresponds to a particular fraction, and that each row corresponds to a particular bioactivity replicate.

region_ms

Either NULL, or a vector either of mode character or mode numeric providing information specifying which fractions from the mass spectrometry abundances data are to be included in the data analysis. If NULL, then it is assumed that the entirety of the mass spectrometry abundances data encapsulated in the argument to msObj is to be included in the analysis. If numeric then the entries should provide the indices for the region of interest in the mass spectrometry data (i.e. the indices of the columns corresponding to the appropriate fractions in the data). If character then the entries should uniquely specify the region of interest through partial string matching (i.e. the names of the columns corresponding to the appropriate fractions in the data). The methods dim, dimnames, and colnamesMS can be used as interfaces to the mass spectrometry data encapsulated in msObj.

region_bio

Either NULL, or a vector either of mode character or mode numeric providing information specifying which fractions from the bioactivity data are to be included in the data analysis. If NULL, then it is assumed that the entirety of bioactivity data provided as the argument to bioact is to be included in the analysis. If numeric then the entries should provide the indices for the region of interest in the bioactivity data (i.e. the indices of the columns corresponding to the appropriate fractions in the data). If character then the entries should uniquely specify the region of interest through partial string matching (i.e. the names of the columns corresponding to the appropriate fractions in the data).

lambda

A single nonnegative numeric value providing the quadratic penalty mixture parameter argument for the elastic net model. The elastic net fits the least squares model with penalty function

γ|β|_1 + λ|β|^2

where β is the vector of regression coefficients and γ, λ ≥ 0. rankEN constructs a list of candidate compounds by tracking the entrance of compounds into the elastic net model as γ is decreased from to 0.

pos_only

Either TRUE or FALSE; specifies whether the list of candidate compounds that the algorithm produces should include only those compounds that are positively correlated with bioactivity levels, or conversely should include all compounds. The correlation is calculated using only observations from the region of interest, and when bioactivity replicates are present, the within-fraction replicates are averaged prior to calculation.

ncomp

Either NULL, or a numeric value no less than 1 specifying the maximum number of candidate compounds that the function should report. When NULL, this is taken to mean that all compounds that enter the model should be reported, possibly after removing compounds nonpositively correlated with bioactivity levels, as specified by pos_only.

Details

rankEN prepares the data by extracting the region of interest from the mass spectrometry abundance data and from the bioactivity data. If bioactivity replicates are present, then the within-fraction replicates are averaged. Once the data has been converted into the appropriate form, then an elastic net model is fitted by invoking the enet function from the elasticnet package, and an ordered list of candidate compounds is constructed such that compounds are ranked by the order in which they first enter the model. The list may be filtered and / or pruned before being returned to the user, as determined by the arguments to pos_only and ncomp.

Value

Returns an object of class rankEN. This object is a list with elements described below. The class is equipped with a print, summary, and extract_ranked function.

mtoz

A vector providing the mass-to-charge values of the candidate compounds, such that the k-th element of the vector provides the mass-to-charge value of the k-th compound to enter the elastic net model, possibly after removing compounds nonpositively correlated with bioactivity levels.

charge

A vector providing the charge state of the candidate compounds, such that the k-th element of the vector provides the charge state of the k-th compound to enter the elastic net model, possibly after removing compounds nonpositively correlated with bioactivity levels.

comp_cor

A vector providing the correlation between each of the candidate compounds and the bioactivity levels, such that the k-th element of the vector provides the correlation between the k-th compound to enter the elastic net model and the bioactivity levels, possibly after removing compounds nonpositively correlated with bioactivity levels.

enet_fit

The fitted model object produced by rankEN's internal invokation of the enet function from the elasticnet package.

summ_info

A list containing information related to the data used to fit the elastic net model; used by the summary function.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Load mass spectrometry data
data(mass_spec)

# Convert mass_spec from a data.frame to an msDat object
ms <- msDat(mass_spec = mass_spec,
            mtoz = "m/z",
            charge = "Charge",
            ms_inten = c(paste0("_", 11:43), "_47"))

# Load growth inhibition bioactivity data.  Each element in bioact is a
# stand-alone dataset for a species of virus or bacteria.
data(bioact)

# Perform the candidate ranking procedure with fractions 21-24 as the region
# of interest.  Note that it is not advisable to calculate the elastic net
# estimates with 30,799 candidate compounds on 4 data points!

## Not run: 

    rank_out <- rankEN(msObj = ms,
                       bioact = bioact$ec,
                       region_ms = paste0("_", 21:24),
                       region_bio = paste0("_", 21:24),
                       lambda = 0.001,
                       pos_only = TRUE,
                       ncomp = NULL)

    # print, summary function
    rank_out
    summary(rank_out)

    # Extract ranked compounds as a data.frame
    ranked_candidates <- extract_ranked(rank_out)


## End(Not run)

dpritchLibre/PepSAVIms documentation built on Oct. 1, 2017, 4:14 a.m.