inst/shiny/doc/README_HC.md

User guide

Description

Exploratory Analysis of Cybergeo Keywords

Methodology

Vertices and nodes attributes

The vertices are described by two variables: frequency and degree. The frequency is the number of articles citing the keyword. The degree is the total degree of the nodes in the network, that is the number of edges linking thiw keyword to the others (there is no distinction between in- and out- degree as the network is undirected). Both variables are distinct but correlated.

The edges are described by two variables: observed weight and relative residual. For two given keywords the observed weight is the number of articles citing both keywords. The relative residual is the ratio between the observed weight and the expected weigth of the edge. For a given edge the expected weight is the probability that this edge exists considering the degree of the nodes. It is computed as the union of two dependant probabilities.

The probability of drawing a vertex i equals w i w où w i is the degree of vertex i (weighted degree) and w the half sum of weights.

Then the probability of drawing a vertex j distinct from i equals w j w − w i .

The probability of existence of an edge between i and j is:

P i − > j = w i w × w j w − w i

The probability of existence of an edge between j and i:

P j − > i = w j w × w i w − w j

The probability of existence of an undirected edge is the union of both probabilities:

P i < − > j = P i − > j + P j − > i

Eventually the expected weight is:

w e = w ( P i < − > j 2 )

Community detection algorithm

The community detection is computed with the Louvain algorithm which finds an optimum of modularity. See Blondel et al. 2008.



Geographie-cites/corpusminer-package documentation built on Dec. 3, 2020, 5:33 a.m.