Sel.Features: Gene (Feature) Selection.

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/Sel.Features.R

Description

Sel.Feature selects the most discriminative genes (features) among the given ones.

Usage

1
Sel.Features(ES, Y, K = "Min", Verbose = FALSE)

Arguments

ES

gene (feature) matrix: P, number of genes, by N, number of samples (observations).

Y

a vector of length N for samples' class label.

K

the number of genes to be selected. The default is to give the minimum subset of genes that correctly classify the maximum number of the given tissue samples (observations). Alternatively, K should be a positive integer.

Verbose

logical. If TRUE, more information about the selected genes are returned.

Details

Sel.Feature selects the most relevant genes (features) in the high-dimensional binary classification problems. The discriminative genes are identified using analyzing the overlap between the expression values across both classes. The “POS” technique has been applied to produce the selected set of genes. A proportional overlapping score measures the overlapping degree avoiding the outliers effect for each gene. Each gene is described by a robust mask that represents its discriminative power. The constructed masks along with the gene scores are exploited to produce the selected subset of genes.

Value

If K is specified as ‘Min’ (the default), a list containing the following components is returned:

Features

A matrix of the indices of selected genes with their POS measures. See POS.

Covered.Obs

A vector showing the indices of the observations that have been covered by the returned minimum subset of genes.

If K is specified as a positive integer, a list containing the following components is returned:

features

A vector of the indices of the selected genes.

nMin.Features

The number of genes included in the minimum subset.

Measures

Available only when Verbose is TRUE. It is an object with class “data.frame” which contains 3 columns: the indices of the selected genes; the POS measures of the selected genes (see POS); the status that reports on which basis a gene is selected (“Min.Set”: the gene is a member of the selected minimum subset, 1: the gene has a low POS score and its relative dominant class is the first class or 2: the gene has a low POS score and its relative dominant class is the second class), see RDC.

Note

Verbose is only needed when K is specified. If K is set to “Min” (default), all information are automatically returned.

Author(s)

Osama Mahmoud ofamah@essex.ac.uk

References

Mahmoud O., Harrison A., Perperoglou A., Gul A., Khan Z., Metodiev M. and Lausen B. (2014) A feature selection method for classification within functional genomics experiments based on the proportional overlapping score. BMC Bioinformatics, 2014, 15:274.

See Also

POS for calculating the proportional overlapping scores and RDC for assigning the relative dominant class.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
data(leukaemia)
GenesExpression <- leukaemia[1:7129,] #define the features matrix
Class           <- leukaemia[7130,]   #define the observations' class labels
## select the minimum subset of features
Selection       <- Sel.Features(GenesExpression, Class)
attributes(Selection)
(Candidates      <- Selection$Features)   #return the selected features
(Covered.observations <- Selection$Covered.Obs) #return the covered observations by the selection
## select a specific number of features
Selection.k      <- Sel.Features(GenesExpression, Class, K=10, Verbose=TRUE)
Selection.k$Features
Selection.k$nMin.Features   #return the size of the minimum subset of genes
Selection.k$Measures        #return the selected features' information

Example output

Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

$names
[1] "Features"    "Covered.Obs"

          Feature Pos
gene 4847    4847   0
gene 15        15   0
 sample 1  sample 2  sample 3  sample 4  sample 5  sample 6  sample 7  sample 8 
        1         2         3         4         5         6         7         8 
 sample 9 sample 10 sample 11 sample 12 sample 13 sample 14 sample 15 sample 16 
        9        10        11        12        13        14        15        16 
sample 17 sample 18 sample 19 sample 20 sample 21 sample 22 sample 23 sample 24 
       17        18        19        20        21        22        23        24 
sample 25 sample 26 sample 27 sample 28 sample 29 sample 30 sample 31 sample 32 
       25        26        27        28        29        30        31        32 
sample 33 sample 34 sample 35 sample 36 sample 37 sample 38 sample 39 sample 40 
       33        34        35        36        37        38        39        40 
sample 41 sample 42 sample 43 sample 44 sample 45 sample 46 sample 47 sample 48 
       41        42        43        44        45        46        47        48 
sample 49 sample 50 sample 51 sample 52 sample 53 sample 54 sample 55 sample 56 
       49        50        51        52        53        54        55        56 
sample 57 sample 58 sample 59 sample 60 sample 61 sample 62 sample 63 sample 64 
       57        58        59        60        61        62        63        64 
sample 65 sample 66 sample 67 sample 68 sample 69 sample 70 sample 71 sample 72 
       65        66        67        68        69        70        71        72 
 [1] 4847   15  760   38 1092   48 1798   92 1882  100
[1] 2
          Features Pos  Status
gene 4847     4847   0 Min.Set
gene 15         15   0 Min.Set
gene 760       760   0       1
gene 38         38   0       2
gene 1092     1092   0       1
gene 48         48   0       2
gene 1798     1798   0       1
gene 92         92   0       2
gene 1882     1882   0       1
gene 100       100   0       2

propOverlap documentation built on May 1, 2019, 10:55 p.m.