PSOL_InitialNegativeSelection: Initial negative set selection for building machine...

Description Usage Arguments Value Author(s) References Examples

Description

This function selects an initial negative set with the machine learning(ML)-based positive-only sample learning (PSOL) algorithm. The PSOL algorithm has been previously applied to predict genomic loci encoding functional non-coding RNAs (Wang, et al. 2006). We have employed this algorithm to identify stress-related candidate genes in Arabidopsis based on the stress microarray datasets (Ma and Wang, 2013).

Usage

1
2
PSOL_InitialNegativeSelection(featureMatrix, positives, unlabels, 
                              negNum = length(positives), cpus = 1, PSOLResDic )

Arguments

featureMatrix

a numeric matrix recording the features for all sample.

positives

a character vector recording positive samples

unlabels

a character vector recording unlabeled samples.

negNum

an integer number specifying the size of negative samples will be selected.

cpus

an integer number specifying the number of cpus will be used for parallel computing.

PSOLResDic

a character string specifying the file directionry storing PSOL results.

Value

A list containing three components:

positives

a character vector including the input positive samples.

negatives

a character vector recording the selected negative samples.

unlabels

a character vector recording the unlabeled samples.

Author(s)

Chuang Ma and Xiangfeng Wang.

References

[1] Chunlin Wang, Chris Ding, Richard F. Meraz and Stephen R. Holbrook. PSoL: a positive sample only learning algorithm for finding non-coding RNA genes. Bioinformatics, 2006, 22(21): 2590-2596.

[2] Chuang Ma, Xiangfeng Wang. Machine learning-based differential network analysis: a case study of stress-responsive transcriptomes in Arabidopsis thaliana. 2013(Submitted).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
## Not run: 

   ##generate expression feature matrix
   sampleVec1 <- c(1, 2, 3, 4, 5, 6)
   sampleVec2 <- c(1, 2, 3, 4, 5, 6)
   featureMat <- expFeatureMatrix( 
                   expMat1 = ControlExpMat, sampleVec1 = sampleVec1, 
                   expMat2 = SaltExpMat, sampleVec2 = sampleVec2, 
                   logTransformed = TRUE, base = 2,
                   features = c("zscore", "foldchange", 
                                 "cv","expression"))

   ##positive samples
   positiveSamples <- as.character(sampleData$KnownSaltGenes)
   ##unlabeled samples
   unlabelSamples <- setdiff( rownames(featureMat), positiveSamples )
  
   ##selecting an intial set of negative samples 
   ##for building ML-based classification model
   ##suppose the PSOL results will be stored in:
   PSOLResDic <- "/home/wanglab/mlDNA/PSOL/"
   res <- PSOL_InitialNegativeSelection(featureMatrix = featureMat, 
                                        positives = positiveSamples, 
                                        unlabels = unlabelSamples, 
                                        negNum = length(positiveSamples), 
                                        cpus = 6, PSOLResDic = PSOLResDic )

   ##initial negative samples extracted from unlabelled samples with PSOL algorithm
   negatives <- res$negatives


## End(Not run)

mlDNA documentation built on May 2, 2019, 2:15 p.m.