CreatePartition: Creates a partition (groups) of variables

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/CreatePartition.R

Description

Creates a partition (groups) of variables from nominal (factor) or numeric input

Usage

1
2
CreatePartition(vec,varnamesdata=NULL,subset=NULL,grsize=NULL,
                decreasing=TRUE,uniform=FALSE,ngroup=10,mingr=25)

Arguments

vec

Factor, numeric vector or character vector.

subset

Character vector. Names of variables (features) that correspond to the values in vec. Allows to make a partition on a subset of all variables. Requires varnamesdata.

varnamesdata

Character vector. Names of the variables (features). Only relevant when vec is a character vector OR when subset is specified.

grsize

Numeric. Size of the groups. Only relevant when vec is a numeric vector and uniform=TRUE.

decreasing

Boolean. If TRUE then vec is sorted in decreasing order.

uniform

Boolean. If TRUE the group sizes are as equal as possible.

ngroup

Numeric. Number of the groups to create. Only relevant when vec is a numeric vector and grsize is NOT specified.

mingr

Numeric. Minimum group size. Only relevant when vec is a numeric vector and uniform=FALSE.

Details

A convenience function to create partitions of variables from external information that is stored in vec. If vec is a factor then the levels of the factor define the groups. If vec is a character vector, then varnamesdata need to be specified (vec is supposed to be a subset of varnamesdata, e.g. a published gene list). In this case a partition of two groups is created: one with those variables of varnamesdata that also appear in vec and one which do not appear in vec. If vec is a numeric vector, then groups contain the variables corresponding to grsize consecutive values of the values in vec. Alternatively, the group size is determined automatically from ngroup. If uniform=FALSE, a group with rank $r$ is of approximate size mingr*(r^f), where f>1 is determined such that the total number of groups equals ngroup. Such unequal group sizes enable the use of fewer groups (and hence faster computations) while still maintaining a good ‘resolution’ for the extreme values in vec. About decreasing: if smaller values of components of vec mean ‘less relevant’ (e.g. test statistics, absolute regression coefficients) use decreasing=TRUE, else use decreasing=FALSE, e.g. for p-values. If subset is defined, then varnamesdata should be specified as well. The parition will then only be applied to variables in subset and in varnamesdata.

Value

A list the components of which contain the indices of the variables belonging to each of the groups.

Author(s)

Mark A. van de Wiel

See Also

For gene sets (overlapping groups): matchGeneSets. Further example in real life dataset: grridge.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#SOME EXAMPLES ON SMALL NR OF VARIABLES

#EXAMPLE 1: partition based on known gene signature
genset <- sapply(1:100,function(x) paste("Gene",x))
signature <- sapply(seq(1,100,by=2),function(x) paste("Gene",x))
SignaturePartition <- CreatePartition(signature,varnamesdata=genset)

#EXAMPLE 2: partition based on factor variable
Genetype <- factor(sapply(rep(1:4,25),function(x) paste("Type",x)))
TypePartition <- CreatePartition(Genetype)

#EXAMPLE 3: partition based on continuous variable, e.g. p-value
pvals <- rbeta(100,1,4)

#Creating a partition of 10 equally-sized groups, corresponding to increasing p-values.
PvPartition <- CreatePartition(pvals, decreasing=FALSE,uniform=TRUE,ngroup=10)

#Alternatively, create a partition of 5 unequally-sized groups,
#with minimal size at least 10. Group size
#increases with less relevant p-values.
# Recommended when nr of variables is large.
PvPartition2 <- CreatePartition(pvals, decreasing=FALSE,uniform=FALSE,ngroup=5,mingr=10)

#EXAMPLE 4: partition based on subset of variables,
#e.g. p-values only available for 50 genes. 
genset <- sapply(1:100,function(x) paste("Gene",x))

subsetgenes <- sort(sapply(sample(1:100,50),function(x) paste("Gene",x)))

pvals50 <- rbeta(50,1,6)

#Returns the partition for the subset based on the indices of 
#the variables in entire genset. Variables not
#present in subset will receive group-penalty = 1 for this partition. 

PvPartitionSubset <- CreatePartition(pvals50, varnamesdata = genset,subset = subsetgenes,
                                     decreasing=FALSE,uniform=TRUE, ngroup=5)

#EXAMPLE 5: COMBINING PARTITIONS

#Combines partitions into one list with named components. 
#This can be use as input for the grridge() #function.
#NOTE: if one aims to use one partition only, then this can be directly used in grridge(). 

MyPart <- list(signature=SignaturePartition, type = TypePartition,
               pval = PvPartition, pvalsubset=PvPartitionSubset)

markvdwiel/GRridge documentation built on May 21, 2019, 12:25 p.m.