monitor_cluster_resources | R Documentation |
To-do: add better docs
The data comes from the linux command ps
(specifically, "ps -u <username> -o pcpu,rss,size,state,time,cmd" ). If you want to know EXACTLY what each column means RTFM and type in man ps
in a UNIX terminal. See Details
for more information about the data frame being saved.
monitor_cluster_resources(username_or_command, login_node, node_list,
save_path, sleeping_time, total_checks, ..., stop_file = NULL)
username_or_command |
The username you're using to log in to the remote server or, if you supply |
login_node |
the name of the gateway node (e.g. 'zach@remote_back_up_server.server.com'). Should NOT be the same as the node you're using to run the other tasks. |
node_list |
a list of the nodes you want to monitor |
save_path |
the filename you want to save all this information to (on the remote server). If NULL, it returns the future of the data frame it would normally save. Choosing this option will overwrite the current |
sleeping_time |
time between checks in seconds |
total_checks |
total number of checks |
... |
additional arguments supplied to |
stop_file |
The path of a file where, if present on the node, will cause it to end and return prematurely. A totally hacky way of communicating with the monitoring functions. Wholesome people should not bother with this parameter. |
Each row is a process at a given time on a given node.
Columns:
%CPU
is the percent CPU being used by the process (can go > 100
RSS
is the memory usage (google it), probably in kb
SIZE
is somehow also related to memory usage (ugh computer stuff, amirite guys)
S
is the state of the process. Basically "S" means sleeping and "R" means running
TIME
is the CPU time of the process. Basically how long it's been "active." (Processes, unlike grad students, sleep a lot)
PID
is the ID of the process.
CMD
is an extended form of the command/name of the process. All R processes have been renamed "R"
SampleTime
is when the process was pinged
Nodename
is the name of the node
PIDofMonitor
is the process ID of the monitoring process itself. You can use this to filter out the resources being used by this process.
## Not run:
monitor_cluster_resources("zach",
"zach@remote_backup_server.com",
nodes_to_monitor,
save_path="/u/zach/bb_maker_resources.RDS",
sleeping_time = 10,
total_checks = 6)
# Wait for it to complete before using another connection to 'remote_backup_server.com'
plan(remote, workers = "zach@remote_backup_server.com")
df %<-% readRDS("/u/zach/bb_maker_resources.RDS")
resolved(futureOf(df))
df %>%
group_by(Nodename, SampleTime) %>%
filter(CMD =="R") %>%
summarise(RSS = sum(as.numeric(RSS)),
CPU = sum(as.numeric(`%CPU`))) %>%
filter(RSS > 2e+06) %>%
ggplot(aes(x=SampleTime, y=RSS, color=Nodename)) +
geom_line()
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.