This package includes the RGraph libraries, the C libraries for complex network analysis developed by Roger Guimera. Some executables, built from the libraries, are also included.
For the libraries to compile, you will NEED TO INSTALL, first:
1) The GNU Scientific Libraries (GSL)
2) The libtool package is also needed.
In a Unix-like system, you can install the RGraph libraries and the
executables by uncompressing the tarball (tar -xzvf
rgraph-version.tar.gz
) and running the usual stuff from the
rgraph-version directory:
cd rgraph-version
./autogen.sh # Only needed if you are building from the github source code
./configure
For MAC versions, if an error appears saying that it couldn't find the
GSL libraries, execute the ./configure
command like that:
LDFLAGS="-L/usr/local/lib" CPPFLAGS="-I/usr/local/include" ./configure
make
[make install]
(In a Windows system, you will first need to install some sort of "Unix emulation." I have successfully compiled the libraries using either Cygwin or MinGW. See below for Windows installation steps).
This will install the libraries in your_default_lib_directory/rgraph and the executables in your_default_bin_directory. To install in a different directory run
./configure --prefix=path_to_install_directory
instead of just ./configure
. For other configure options run:
./configure -h
You can uninstall the whole thing by running make uninstall
from the installation directory.
You can also test that everything is working by running make check
from the installation directory.
1 First of all, you have to download and install MinGW
During the installation, when it prompts you the packages to install, select gcc, msys and mingw base. The other default options are OK.
2 Download GNU Scientific Libraries GSL. In my installation, I've used version 1.15.
3 Launch MinGW console (Programs -> MinGW -> MinGW Shell or C:\MinGW\msys\1.0\msys).
4 Unzip the contents of the GSL downloaded file under your msys home which is at C:\MinGW\msys\1.0\home\user\
(it's important to perform step 3 or you won't have the home directory).
5 In your msys console, cd into the gsl-15 folder and type the following:
./configure --prefix=/MinGW #path of MinGW installation
make
make install
All this steps may take a while.
6) Untar the contents of rgraph under your msys home and type the following:
./autogen.sh
./configure
make
[make install]
7) To check it's working, use make check
command or try to execute any of the executables generated by the make command (for example ./netcarto/netcarto
).
librgraph is the library itself. You can use it to build your own network analysis programs. Sorry, as of now no documentation is available, but you may want to take a look at the header files and try to figure things out.
Given a network, the program netcarto identifies modules ---i.e. densely connected groups of nodes in the network--- and classifies nodes according to their roles, as defined in Guimera (2005).
In case you use the results of the program in a publication, please cite the following papers:
Guimera, R. & Amaral, L.A.N., Functional cartography of complex metabolic networks, Nature 433, 895-900 (2005).
Guimera, R. & Amaral, L.A.N., Cartography of complex networks: modules and universal roles, J. Stat. Mech.-Theory Exp., art. no. P02001 (2005).
Important note about the new implenentation
In fall 2015 we added a new, equivalent implementation of the
simulated annealing algorithm based on adjacency arrays. This new
implementation is faster and can treat weighted and unweighted graphs
seamlessly. However it has been less tested yet. If correctness is
crucial, we encourage you to verify your results with the previous
implementation accessible with the netcarto-legacy
command. Please
report us all bugs or unexpected behavior, it will be greatly
appreciated.
Input parameters
The synopsis of the command is:
Usage:
netcarto [-f FILE] [-o FILE] [-s SEED] [-i ITER] [-c COOL] [-wmr]
netcarto [-f FILE] [-o FILE] [-s SEED] [-i ITER] [-c COOL] [-wmr] -b [-t]
netcarto [-f FILE] [-o FILE] [-p FILE] [-w]
netcarto [-f FILE] [-o FILE] [-p FILE] [-w] -b [-t]
netcarto -h
Arguments:
-f FILE: Input network file name (default: '-', standard input),
-o FILE: Output file name (default: '-', standard output),
-s SEED: Random number generator seed (positive integer, default 1111),
-i ITER: Iteration factor (recommended 1.0, default 1.0),
-c COOL: Cooling factor (recommended 0.950-0.995, default 0.97),
-p FILE: Partition file name to load and compute modularity and roles onto,
-w : Read edge weights from the input's third column and uses the weighted modularity,
-b : Use bipartite modularity,
-r : Compute modularity roles,
-t : [with -b only] Find modules for the second column (default: first),
-h : Display this synopsis.
Seed for the random number generator (-s
): Must be a positive
integer. Since the module identification algorithm is stochastic,
different runs will yield, in general, slightly different different
modules. Two runs with the same seed, though, should give the exact
same results.
Name of the network file (-f
): Name of the file that contains the
network. The file must be a list of links with the format:
n1 n2
n3 n4
. .
. .
. .
```
This represents a network with a link between nodes n1 and n2,
another between nodes n3 and n4, and so on. Nodes must be separated
by spaces.
If you use the weighted definition of modularity (with the -w flag),
the file must contain an additional third column giving the weight of
each link:
n1 m1 w1
n2 m2 w2
. . .
. . .
. . .
- Iteration factor (`-i`): At each temperature of the simulated annealing
(SA), the program performs fN^2 individual-node updates (involving
the movement of a single node from one module to another) and fN
collective updates (involving the merging of two modules and the
split of a module). The number "f" is the iteration factor. Large
values of f (1 or larger) will result, in general, in better results
(higher modularities) and longer execution times. The recommended
range for f is [0.1, 1], although smaller values may be needed for
large and/or dense networks. Note, also, that a minimum number of
iterations is imposed at each temperature, so that when f is very
small, the minimum number will be used instead of fN^2 or fN.
- Cooling factor (`-c`): After the desired number of updates is done at a
certain temperature T, the system is cooled down to a new
temperature T'=cT, where c is the cooling factor. the cooling factor
must be strictly larger than 0 and strictly smaller than 1. In
general, values close to one will result in better results and
longer execution times. Recommended values of the cooling factor f
are [0.990, 0.999], although smaller values (0.95 or even 0.9) may
be needed for large and/or dense networks.
- Compute modularity roles (`-r`): If this flag is specified, the
program will compute for each node the *connectivity* (within-module
z-score of edge weights) and *participation coefficient* (evenness
of linked modules). Those two values are used to give the modularity
role of the nodes. Nodes with a low connectivity (<2.5) are
classified between ultra peripherals (R1), peripheral (R2),
connectors (R3) or kinless (R4) according to their increasing
participation coefficient. Nodes with high connectivity are
classified as peripheral (R5), connectors (R6) or kinless (R7)
hubs. Note that with the `-b` flag (denoting bipartite networks),
those roles are computed on the projected graph.
Netcarto **can** treat bipartite graphs in a different way if you use
the `-b` flag. It will produce a partition of one of the side
according to their shared neighbors. Please refer to (and cite) those article for
more information (unweighted and weighted formula respectively):
> Guimera, R., Sales-Pardo, M. & Amaral, L.A.N., Module
> identification in bipartite and directed networks, Phys. Rev. E 76,
> 036102 (2007)
> Stouffer, D.B., Sales-Pardo, M., Sirer, M.I. & Bascompte J.,
> Evolutionary conservation of species' roles in food webs, Science
> 335, 1489-1492 (2012).
- Bipartite `-b`: This flag sepcifies that the input graph is
bipartite. The two component of the bipartite network must be on
different columns. If the same name is used in both columns, it will
spawn two nodes (one in each component).
- Invert `-t`: If this flag is specified the program will identify
modules in the first second column of the input file.
**Program output**
After entering these parameters, the algorithm will start to identify
the modules in the network. As the SA proceeds, the program displays
three columns (in the standard error stream), which indicate the the
temperature, the modularity at that temperature, and the stopping
criterion (current streak of steps without significant increase in
modularity), respectively. This provides you with a fast way to check
if the process is too slow or, conversely, if it is fast and the
accuracy can be increased. If you want to hide those information you
can redirect the error stream:
bipartmod_cl -f network.dat 2> /dev/null
Then come the main program output (in the standard output or in a file
if you used the `-o` option). Two versions are possible depending on
the options you used.
By default, the program output the modularity value (with and without
the diagonal term) and then the modules in a *compact format*. Each
module is outputed as a single line, and node label are separated by
tabulations. This format is the one used in input by the `-p` option.
# Modularity: 0.469592 # Modularity (with diagonal): 0.419790 Actor_11 Actor_5 Actor_17 Actor_6 Actor_7 Actor_22 Actor_12 Actor_14 Actor_13 Actor_20 Mr_Hi Actor_2 Actor_8 Actor_3 Actor_18 Actor_4 Actor_28 Actor_24 Actor_26 Actor_25 Actor_29 Actor_32 Actor_9 Actor_31 John_A Actor_10 Actor_30 Actor_27 Actor_16 Actor_19 Actor_23 Actor_21 Actor_15 Actor_33
If modularity-roles were computed (`-r` flag), the program displays a
*tabular output*. Each line correspond to a node, with values
separated by tabulations. The fields are: label, module id, role,
participation coefficient (P) and within-module degree (z). Note that
for bipartite networks `-b` flag, those last three values are computed
on the projected network.
Mynode 1 R3 0.6500 -1.440 Another_node 1 R2 0.277778 -2.445675 ```
The original implementation of netcarto is still accessible trhough
the netcarto-legacy
executable. The command line options are almost
the same than the current netcarto
program use -h
for
precisions), you can also get an interactive version if you start it
without arguments.
This implementation offers the additional feature to compute
modularity of randomizations of the original network (option
-r
). This test is necessary to establish whether the modular
structure of the original network is significant or not. Calculation
of the modularity for each random network will take approximately the
same time as for the original network. Please refer to (and cite) this
article about this feature:
Guimera, R., Sales-Pardo, M. & Amaral, L.A.N., Modularity from fluctuations in random graphs and complex networks, Phys. Rev. E 70, art. no. 025101 (2004).
The program output the following files:
- network.net
: a Pajek file containing the giant component of the
network (for information on Pajek, visit
http://vlado.fmf.uni-lj.si/pub/networks/pajek/).
modules.clu
: a Pajek partition containing the modules as identified
by the algorithm.
roles.clu
: a Pajek partition containing the roles as identified
by the algorithm.
modules.dat
: A text file containing some basic information about the
modules (can be edited with any text editor such as NotePad, or
imported in Excel as a csv file). The format of the file is as
follows. Each line corresponds to a different module. The first
number is an ID number for the module, mostly irrelevant. The second
is the number of nodes in the module. The third is the total number
of links in the module, the fourth the number of within-module
links, and the fifth the number of links from this module to other
modules (of course, the third column must be equal to the sum of the
fourth and fifth columns). Then there is a "---" and the next
columns correspond to the list of nodes in the module. The last line
of the file contains the value of the modularity for this partition.
roles.dat
: A text file containing some basic information about the
roles (can be edited with any text editor such as NodePad, or
imported in Excel as a csv file). The format of the file is as
follows. Each line corresponds to a different role. The first number
is the role number, as defined in [1, 2]. The second is the number
of nodes with that role. The third is the total number of links of
nodes with that role, the fourth the number of within-role links,
and the fifth the number of links from this role to other roles (of
course, the third column must be equal to the sum of the fourth and
fifth columns). Then there is a "---" and the next columns
correspond to the list of nodes with that role.
node_prop.dat
: A text file with four columns. The first one is the
number of the node. The second is the degree (number of links) of
the node. The third is the participation coefficient as defined in
[1, 2]. The fourth one is the within-module relative degree, as
defined in [1, 2].
randomized_mod.dat
; the average modularity of the randomized
networks, and the standard deviation of the modularity of the
randomized network.
Given a network observation, the programs in reliability:
1) reliability_links: evaluate the reliability of links
2) reliability_reconstruct: reconstruct the network
In case you use the results of the program in a publication, please cite the following papers:
Input parameters
The programs take two arguments:
n1 n2 n3 n4 . . . . . .
This represents a network with a link between nodes n1 and n2, another between nodes n3 and n4, and so on. Nodes must be separated by spaces.
Program output
The "links" program generates two files: missing.dat and spurious.dat. Each of these files has the format:
score12 n1 n2 score13 n1 n3 ...
missing.dat contains all scores for links that are not observed in the network. High scores in missing.dat correspond to links that are likely to be missing.
spurious.dat contains all scores for links that are observed in the network. Low scores in spurious.dat correspond to links that are likely to be spurious.
The "reconstruct" program returns a file net_reconstructed.dat with the reconstructed network.
Additionally, a few utility programs are also compiled and installed.
countlinks netA: count the number of links in a network.
netcompare netA netA: compares two networks.
netprop netA: print a number of properties of a network.
netrandomize netA: randomize an undirected unweighted network and print result to standard output.
roger.guimera@urv.cat
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.