Initialisation and organisation code to use snowfall.
sfInit( parallel=NULL, cpus=NULL, type=NULL, socketHosts=NULL, restore=NULL, slaveOutfile=NULL, nostart=FALSE, useRscript=FALSE ) sfStop( nostop=FALSE ) sfParallel() sfIsRunning() sfCpus() sfNodes() sfGetCluster() sfType() sfSession() sfSocketHosts() sfSetMaxCPUs( number=32 )
Logical determinating parallel or sequential execution. If not set values from commandline are taken.
Numerical amount of CPUs requested for the cluster. If not set, values from the commandline are taken.
Logical determinating if the basic cluster setup should be skipped. Needed for nested use of snowfall and usage in packages.
Type of cluster. Can be 'SOCK', 'MPI', 'PVM' or 'NWS'. Default is 'SOCK'.
Host list for socket clusters. Only needed for socketmode (SOCK) and if using more than one machines (if using only your local machine (localhost) no list is needed).
Globally set the restore behavior in the call
Write R slave output to this file. Default: no
Change startup behavior (snow>0.3 needed): use shell scripts or R-script for startup (R-scripts beeing the new variant, but not working with sfCluster.
Same as noStart for ending.
Amount of maximum CPUs useable.
sfInit initialisise the usage of the snowfall functions
and - if running in parallel mode - setup the cluster and
snow. If using
sfCluster management tool, call this without arguments. If
sfInit is called with arguments, these overwrite
sfCluster settings. If running parallel,
set up the
cluster by calling
makeCluster from snow. If using with
sfCluster, the initialisation also contains management of
lockfiles. If this function is called more than once and current
cluster is yet running,
sfStop is called automatically.
Note that you should call
sfInit before using any other function
from snowfall, with the only exception
If you do not call
sfInit first, on calling any snowfall
sfInit is called without any parameters, which is
equal to sequential mode in snowfall only mode or the settings from
sfCluster if used with sfCluster.
This also means, you cannot check if
sfInit was called from
within your own program, as any call to a function will initialize
again. Therefore the function
sfIsRunning gives you a logical
if a cluster is running. Please note: this will not call
and it also returns true if a previous running cluster was stopped via
sfStop in the meantime.
If you use snowfall in a package argument
nostart is very
handy if mainprogram uses snowfall as well. If set, cluster
setup will be skipped and both parts (package and main program) use
the same cluster.
If you call
sfInit more than one time in a program without
sfStop, stopping of the cluster will be
executed automatically. If your R-environment does not cover required
sfInit automatically switches to sequential mode
(with a warning). Required libraries for parallel usage are snow
and depending on argument
type the libraries for the
cluster mode (none for
socket clusters, Rmpi for MPI clusters, rpvm for
PVM clusters and nws for NetWorkSpaces).
If using Socket or NetWorkSpaces,
socketHosts can be used to
specify the hosts you want to have your workers running.
Basically this is a list, where any entry can be a plain character
string with IP or hostname (depending on your DNS settings). Also
for real heterogenous clusters for any host pathes are setable. Please
look to the acccording snow documentation for details.
If you are not giving an socketlist, a list with the required amount
of CPUs on your local machine (localhost) is used. This would be the
easiest way to use parallel computing on a single machine, like a
Note there is limit on CPUs used in one program (which can be
configured on package installation). The current limit are 32 CPUs. If
you need a higher amount of CPUs, call
before the first call to
sfInit. The limit is set to
prevent inadvertently request by single users affecting the cluster as
slaveOutfile to define a file where to write the log
files. The file location must be available on all nodes. Beware of
taking a location on a shared network drive! Under *nix systems, most
likely the directories
/var/tmp are not shared
between the different machines. The default is no output file.
If you are using
argument have no meaning as the slave logs are always created in a
sfClusters choice (depending on it's configuration).
sfStop stop cluster. If running in parallel mode, the LAM/MPI
cluster is shut down.
sfSession grant access to
the internal state of the currently used cluster.
All three can be configured via commandline and especially with
sfCluster as well, but given
sfInit always overwrite values on commandline.
The commandline options are --parallel (empty option. If missing,
sequential mode is forced), --cpus=X (for nodes, where X is a
numerical value) and --session=X (with X a string).
sfParallel returns a
logical if program is running in parallel/cluster-mode or sequential
on a single processor.
sfCpus returns the size of the cluster in CPUs
(equals the CPUs which are useable). In sequential mode
sfNodes is a deprecated similar to
sfSession returns a string with the
session-identification. It is mainly important if used with the
sfGetCluster gets the snow-cluster handler. Use for
direct calling of snow functions.
sfType returns the type of the current cluster backend (if
used any). The value can be SOCK, MPI, PVM or NWS for parallel
modes or "- sequential -" for sequential execution.
sfSocketHosts gives the list with currently used hosts for
socket clusters. Returns empty list if not used in socket mode (means:
sfType() != 'SOCK').
sfSetMaxCPUs enables to set a higher maximum CPU-count for this
program. If you need higher limits, call
sfInit with the new maximum amount.
See snow documentation for details on commands:
## Not run: # Run program in plain sequential mode. sfInit( parallel=FALSE ) stopifnot( sfParallel() == FALSE ) sfStop() # Run in parallel mode overwriting probably given values on # commandline. # Executes via Socket-cluster with 4 worker processes on # localhost. # This is probably the best way to use parallel computing # on a single machine, like a notebook, if you are not # using sfCluster. # Uses Socketcluster (Default) - which can also be stated # using type="SOCK". sfInit( parallel=TRUE, cpus=4 ) stopifnot( sfCpus() == 4 ) stopifnot( sfParallel() == TRUE ) sfStop() # Run parallel mode (socket) with 4 workers on 3 specific machines. sfInit( parallel=TRUE, cpus=4, type="SOCK", socketHosts=c( "biom7", "biom7", "biom11", "biom12" ) ) stopifnot( sfCpus() == 4 ) stopifnot( sfParallel() == TRUE ) sfStop() # Hook into MPI cluster. # Note: you can use any kind MPI cluster Rmpi supports. sfInit( parallel=TRUE, cpus=4, type="MPI" ) sfStop() # Hook into PVM cluster. sfInit( parallel=TRUE, cpus=4, type="PVM" ) sfStop() # Run in sfCluster-mode: settings are taken from commandline: # Runmode (sequential or parallel), amount of nodes and hosts which # are used. sfInit() # Session-ID from sfCluster (or XXXXXXXX as default) session <- sfSession() # Calling a snow function: cluster handler needed. parLapply( sfGetCluster(), 1:10, exp ) # Same using snowfall wrapper, no handler needed. sfLapply( 1:10, exp ) sfStop() ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.