Copy files during particular times of day and with metadata.
Provides the function wait_copy
that:
Imagine someone you work with has a hard drive that you need data from, but that hard drive is only accessible via the network, mounted via SAMBA, and there are potentially duplicate files with the file or base name, files with the same file name that are different, and the file path provides some meta-information about the sample. In addition, the file names have odd characters in them (spaces, colons, etc) that make them a pain to work with from the command line on Linux, so you'd prefer if they weren't there.
The files are small, so even copying over the network is fast, but if you copy too many too quickly during the day, you'll get complaints about hitting this shared resource too often by the people who are local to it.
So ideally, you want to copy the files only during certain hours, wait a little bit between each copy operation, check for duplicates (via names and md5 hashing), strip the file name of special characters, and note where the file originated.
waitcopy
provides these capabilities.
Given a file to copy, and a location to copy it to, does a few things:
# install.packages("devtools") devtools::install_github("MoseleyBioinformaticsLab/waitcopy")
# install.packages("devtools") devtools::install_github("rmflight/waitcopy")
Lets imagine that we want to copy a set of files during a set time, and one of the files is duplicated (but we don't know that before we start).
library(waitcopy) library(lubridate) # files are in the extdata directory of waitcopy testloc <- system.file("extdata", "set1", package = "waitcopy") file_list <- dir(testloc, pattern = "raw", full.names = TRUE) file_list
We will setup a temp directory to copy them to:
temp_dir <- tempfile(pattern = "copyfiles-test-1") dir.create(temp_dir) dir(temp_dir)
And then lets set up to copy 20s from now.
curr_time <- waitcopy:::get_now_in_local() curr_today <- waitcopy:::get_today_in_local() now_minus_today <- difftime(curr_time, curr_today, units = "s") beg_time <- seconds(now_minus_today + 20) end_time <- seconds(now_minus_today + 3600) beg_time end_time
And now let's copy! This is in the near future, so we will set the wait_check
parameter to a low value of only 10 seconds, normally this is set to 30 minutes (1800 seconds),
assuming that it is in the far future when you want to copy the files.
wait_copy(file_list, temp_dir, json_meta = file.path(temp_dir, "all_meta.json"), start_time = beg_time, stop_time = end_time, wait_check = 10, pause_file = 0)
Lets look at how many files were copied and the contents of the JSON metadata.
copied_files <- dir(temp_dir) copied_files meta_json <- jsonlite::fromJSON(file.path(temp_dir, "all_meta.json"), simplifyVector = FALSE) jsonlite::toJSON(meta_json, auto_unbox = TRUE, pretty = TRUE)
If you just want to copy between 8pm and 6am everyday for however long it will take:
wait_copy(file_list, temp_dir, json_meta = file.path(temp_dir, "all_meta.json"))
What if the best time to copy files was from 10am until 1pm (13:00)??
wait_copy(file_list, temp_dir, json_meta = file.path(temp_dir, "all_meta.json"), start_time = hours(10), stop_time = hours(13))
wait_copy(file_list, temp_dir, json_meta = file.path(temp_dir, "all_meta.json"), time_limit = FALSE)
If you want the function to give up after trying to check the time, then change
the n_check
variable. If you wanted to stop checking after 3 tries:
wait_copy(file_list, temp_dir, json_meta = file.path(temp_dir, "all_meta.json"), n_check = 3)
An alternative way to handle nasty file names would be to use the make.names
function:
wait_copy(file_list, temp_dir, json_meta = file.path(temp_dir, "all_meta.json"), clean_file_fun = make.names)
Note that this is only applied to the basename
of the file path, i.e. the actual
file-name after removing the path in front of the file-name.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.