entities_attribute_stats: presents basic statistics on the number of entities per...

Description Usage Arguments Details Value Author(s) References Examples

View source: R/entities_attribute_stats.R

Description

Plots the denisty distribution of the number of entities per attribute and shows what is the number of attributes proposed to be igonored (and the number of attributes that will be kept)

Usage

1
2
3
4
5
entities_attribute_stats(entity.attribute
        , min.entities.per.attr = NULL
        , entity.space.name = "Yeast genes"
        , attribute.space.name = "Gene Ontology"
        , plot.saveRDS.file=NULL)

Arguments

entity.attribute

data frame or matrix with 2 columns The assumption is that first column represent some 'entities' like gene names or gene ids. And the second column represents ‘attributes' of entities (for example Gene Ontology ID ’GO:0007260' which is 'tyrosine phosphorylation of STAT protein')

min.entities.per.attr

a number : the minimum number of entities per attribute accepted

entity.space.name

a string that will be presented on the plot representing the meaning of the entities

attribute.space.name

a string that will be presented on the plot representing the meaning of the attributes

plot.saveRDS.file

if not NULL must be a string represented a file location where the plot will be saved as an RDS object. The plot can be then retrieved at any time using readRDS function.

Details

The attributes that appear only on once or just a in very few entities do not bring additional information. In general there are many such 'non-informative' attributes. Thus it's good to know the proportion of attributes that will be still kept if we impose a minimum number of entities per attribute.

Value

a number: wich is either the input value of the min.entities.per.attr or, in case min.entities.per.attr is null, a proposed min.entities.per.attr threshold. The assumption is that attributes characterizing juts one entity are the most frequent. The proposed threshold is the minimum number of entities per attribute whose frequency matches 1/3 of the above maximum frequency.

Author(s)

Adrian Pasculescu

References

Gibbons, F.D. and Roth F.P., (2002) Judging the Quality of Gene Expression-Based Clustering Methods Using Gene Annotation. Genome Research, vol. 12, pp1574-1581.

Examples

1
2
3
4
5
data(Yeast.GO.assocs)
min.entities.per.attr  <- entities_attribute_stats(entity.attribute= Yeast.GO.assocs
                                                  , min.entities.per.attr=NULL
                                                  , entity.space.name='Yeast genes'
                                                  , attribute.space.name='Gene Ontology') 

ClusterJudge documentation built on March 11, 2021, 2 a.m.