nfca: Numerical Formal Concept Analysis for Systematic Clustering

Description Usage Arguments Details Value Author(s) References Examples

Description

The R function nfca() is an implementation of the numerical Formal Concept Analysis (nFCA), a modern unsupervised learning tool for analyzing general numerical data developed in [3]. nfca() provides two nFCA graphs: a H-graph and an I-graph that reveal systematic, hierarchical clustering and inherent structure of the data.

Usage

1
nfca(data, type = 0, method = 'hist', choice = 1, n = 30, alpha = 0.05)    

Arguments

data

The input numerical data. It can be a correlation matrix, a distance matrix, or a general data matrix of n x p dimensions, where n is the number of subjects and p is the dimension of variables. If the input is a general data matrix, nfca is performed based on the correlation matrix computed from the data.

type

The type of input data. The default type=0 represents a correlation matrix, while type=1 represents a distance matrix and type=2 represents a general data matrix.

method

The method of choosing a sequence of thresholds to scale FCA in building nFCA. The default method=‘hist’ implements the ‘histogram’ method, applicable to all data types, type=0, 1, or 2. The method=‘CI’ uses the ‘confidence interval’ method, currently applicable when data type is a correlation matrix, i.e. type=0.

choice

The choice of how to choose the thresholds for the ‘histogram’ method. The default choice=1 chooses the thresholds automatically while choice=0 allows the user to choose thresholds based on histograms shown on the screen manually.

n

The sample size used to compute the correlation matrix if the input data is 0, ie. correlation matrix. Required only if the threshold selection method is ‘confidence interval’. The default value is 30.

alpha

The significance level used for ‘confidence interval’ method. The default value is 0.05.

Details

Numerical Formal Concept Analysis (nFCA) combines the merit of statistics, formal concept analysis (FCA), and a graphical visualization tool (Graphviz) to analyze the clustering and inherent structure of data. Its output is a pair of nFCA graphs, H- and I-graphs. H-graph maps systematic relations of hierarchical clusters. I-graph is a directed acyclic graph (DAG) that complements the H-graph by revealing inherent structures and connections from one member to the relevant member of another cluster.

The nFCA package includes our main R code and a supporting program in Ruby that implements the faster concept analysis algorithm developed by Dr. Zhang's team (Troy et al. 2007). If needed, Ruby compiler can be downloaded from https://www.ruby-lang.org.

The two nFCA outcome files, Hgraph.dot and Igraph.dot, can be visualized using Graphviz. Graphviz is a standard, powerful graphic visualization software, available at http://www.graphviz.org/. We have tested selected versions of Graphviz. Versions 2.26, 2.30, 2.38 for Mac OS Lion and Window work with this package. Do not use version 2.28, which has a known bug. For further instructions on how to use Graphviz for nFCA, see the ‘Value’ below or: http://sr2c.case.edu/nfca, for detailed installation instructions and examples of figures from Graphviz.

Value

Hgraph.dot

a dot file containing systematic clustering result.

Igraph.dot

a dot file containing inherited clustering information.

To visualize H-graph.dot and I-graph.dot in Graphviz, choose ‘fdp’ as the LAYOUT engine for the H-graph, and choose ‘neato’ for the I-graph. These selections can be done in GUI versions of Graphviz in Window or Mac. In a Mac running macport or Linux machine on which Graphviz is installed, use the following commands to generate graphics outside R:

fdp -Tpng Hgraph.dot -o Hgraph.png
neato -Tpng Igraph.dot -o Igraph.png

Author(s)

Junheng Ma, Jiayang Sun and Guo-Qiang Zhang

References

Troy, A. D., Zhang, G.-Q. and Tian, Y. (2007) Faster Concept Analysis. Proceedings of the 15th International Conference on Conceptual Structures (ICCS 2007), 4604, 206–219.

Ma, J. (2010) Contributions to Numerical Formal Concept Analysis, Bayesian Predictive Inference, and Sample Size Determination. PhD thesis, Case Western Reserve University.
http://rave.ohiolink.edu/etdc/view?acc_num=case1285341426

Ma, J., Sun, J. and Zhang, G.-Q. (2014) Numerical Formal Concept Analysis (nFCA): a New Systematic Clustering Technique. Under review.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# View a build-in correlation matrix: nfca_example
data("nfca_example", package = "nFCA")
nfca_example
     
# 1. using the default 'histogram' method and choosing threshold
# automatically   
nfca(data = nfca_example)

# 2. using 'confidence interval' method with sample size 30 and 
# choosing threshold automatically 
nfca(data = nfca_example, method = "CI")

# The output files Hgraph.dot and Igraph.dot from #1 and #2 can
# be visualized as H- and I-graphs in Graphviz. In this example,
# the I-graphs from both 'histogram' and 'confidence interval' 
# methods are identical, while two H-graphs are consistent to
# each other.

Example output

      H    W    S    D   HF   HM   WF   WM   H1   W1    O
1  1.00 0.95 0.90 0.90 0.85 0.85 0.80 0.80 0.40 0.30 0.10
2  0.95 1.00 0.88 0.88 0.70 0.70 0.87 0.87 0.20 0.45 0.15
3  0.90 0.88 1.00 0.82 0.80 0.80 0.76 0.76 0.10 0.10 0.02
4  0.90 0.88 0.82 1.00 0.82 0.82 0.81 0.81 0.10 0.10 0.05
5  0.85 0.70 0.80 0.82 1.00 0.95 0.65 0.65 0.08 0.06 0.01
6  0.85 0.70 0.80 0.82 0.95 1.00 0.60 0.60 0.06 0.07 0.01
7  0.80 0.87 0.76 0.81 0.65 0.60 1.00 0.95 0.03 0.05 0.01
8  0.80 0.87 0.76 0.81 0.65 0.60 0.95 1.00 0.01 0.02 0.01
9  0.40 0.20 0.10 0.10 0.08 0.06 0.03 0.01 1.00 0.02 0.01
10 0.30 0.45 0.10 0.10 0.06 0.07 0.05 0.02 0.02 1.00 0.02
11 0.10 0.15 0.02 0.05 0.01 0.01 0.01 0.01 0.01 0.02 1.00

Hgraph.dot and Igraph.dot are generated in the current R working directory.
You can go outside of R and use Graphviz to visualize high quality H- and I-graphs.

If you can not visualize the high quality H and I graphs using Graphviz,
here is the digitalized presentation of these graphs.

Explanation of the digitized H-graph:

Each '---' is one depth further into the hierarchical center from the
outer most boundary, e.g.
--- 1-depth into the center from the outside,
--- --- 2-depth into the center,
if not starting with '---', it is in the most outside layer.

Style at the same depth:

--- Cluster#
    A member, its relationship to other members (relation strength value)
    A member of the same cluster

Actual Presentation:

Digital presentation of clustering results from H-graph:

O
---cluster5
   H1
   W1
--- ---cluster4
--- --- ---cluster1
           H-->W(0.95)
           H-->S(0.9)
           H-->D(0.9)
--- --- ---cluster2
           HF<->HM(0.95)
--- --- ---cluster3
           WF<->WM(0.95)

Digital inherent structure results from I-graph:

H ----> W (0.95)
HF <----> HM (0.95)
WF <----> WM (0.95)
H ----> S (0.9)
H ----> D (0.9)
W ----> WF (0.87)
W ----> WM (0.87)
H ----> HF (0.85)
H ----> HM (0.85)
W ----> W1 (0.45)
H ----> H1 (0.4)
W ----> O (0.15)


Hgraph.dot and Igraph.dot are generated in the current R working directory.
You can go outside of R and use Graphviz to visualize high quality H- and I-graphs.

If you can not visualize the high quality H and I graphs using Graphviz,
here is the digitalized presentation of these graphs.

Explanation of the digitized H-graph:

Each '---' is one depth further into the hierarchical center from the
outer most boundary, e.g.
--- 1-depth into the center from the outside,
--- --- 2-depth into the center,
if not starting with '---', it is in the most outside layer.

Style at the same depth:

--- Cluster#
    A member, its relationship to other members (relation strength value)
    A member of the same cluster

Actual Presentation:

Digital presentation of clustering results from H-graph:

---cluster5
   H1
   W1
   O
--- ---cluster4
--- --- ---cluster1
           H-->W(0.95)
           H-->S(0.9)
           H-->D(0.9)
--- --- ---cluster2
           HF<->HM(0.95)
--- --- ---cluster3
           WF<->WM(0.95)

Digital inherent structure results from I-graph:

H ----> W (0.95)
HF <----> HM (0.95)
WF <----> WM (0.95)
H ----> S (0.9)
H ----> D (0.9)
W ----> WF (0.87)
W ----> WM (0.87)
H ----> HF (0.85)
H ----> HM (0.85)
W ----> W1 (0.45)
H ----> H1 (0.4)
W ----> O (0.15)

nFCA documentation built on May 2, 2019, 9:42 a.m.