Home

/

GitHub

/

rmflight/waitcopy

/

In rmflight/waitcopy: Copying files within certain hours and waiting between each one

waitcopy

Copy files during particular times of day and with metadata.

Useful Definitions

filepath: the full path to a given file, eg /home/user/file1.png
basename: the actual filename of a file path, eg file1.png in /home/user/file1.png
dirname: the directory portion of a filepath, eg /home/user in /home/user/file1.png

Description

Provides the function wait_copy that:

will only copy files during a set time interval
will wait a specific amount of time between file copies
creates a json file with some limited meta-data about the file
removes special characters from the file name

Why??

Imagine someone you work with has a hard drive that you need data from, but that hard drive is only accessible via the network, mounted via SAMBA, and there are potentially duplicate files with the file or base name, files with the same file name that are different, and the file path provides some meta-information about the sample. In addition, the file names have odd characters in them (spaces, colons, etc) that make them a pain to work with from the command line on Linux, so you'd prefer if they weren't there.

The files are small, so even copying over the network is fast, but if you copy too many too quickly during the day, you'll get complaints about hitting this shared resource too often by the people who are local to it.

The Solution

So ideally, you want to copy the files only during certain hours, wait a little bit between each copy operation, check for duplicates (via names and md5 hashing), strip the file name of special characters, and note where the file originated.

waitcopy provides these capabilities.

How it Works

Given a file to copy, and a location to copy it to, does a few things:

strip special characters and spaces from the file name (if asked)
copy the file to a temp location
save the original path to the file it was being copied from
calculate the SHA-1 hash of the file
check master json data of SHA-1 hashes and file names
if SHA-1 is new, add the new file name, original file path, and SHA-1 to the master json file
if SHA-1 is not new, add the original file path to the matching file entry in the master json file.
if a matching file name is found but with a different SHA-1 hash, append the first 8 digits of the SHA-1 hash to the file name, and add it to the json data

Installation

Stable Version

# install.packages("devtools")
devtools::install_github("MoseleyBioinformaticsLab/waitcopy")

Development Version

# install.packages("devtools")
devtools::install_github("rmflight/waitcopy")

Example Usage

Worked Example

Lets imagine that we want to copy a set of files during a set time, and one of the files is duplicated (but we don't know that before we start).

library(waitcopy)
library(lubridate)
# files are in the extdata directory of waitcopy
testloc <- system.file("extdata", "set1", package = "waitcopy")

file_list <- dir(testloc, pattern = "raw", full.names = TRUE)
file_list

We will setup a temp directory to copy them to:

temp_dir <- tempfile(pattern = "copyfiles-test-1")
dir.create(temp_dir)
dir(temp_dir)

And then lets set up to copy 20s from now.

curr_time <- waitcopy:::get_now_in_local()
curr_today <- waitcopy:::get_today_in_local()

now_minus_today <- difftime(curr_time, curr_today, units = "s")
beg_time <- seconds(now_minus_today + 20)
end_time <- seconds(now_minus_today + 3600)

beg_time
end_time

And now let's copy! This is in the near future, so we will set the wait_check parameter to a low value of only 10 seconds, normally this is set to 30 minutes (1800 seconds), assuming that it is in the far future when you want to copy the files.

wait_copy(file_list, temp_dir, json_meta = file.path(temp_dir, "all_meta.json"),
          start_time = beg_time, stop_time = end_time, wait_check = 10, pause_file = 0)

Lets look at how many files were copied and the contents of the JSON metadata.

copied_files <- dir(temp_dir)
copied_files

meta_json <- jsonlite::fromJSON(file.path(temp_dir, "all_meta.json"), simplifyVector = FALSE)
jsonlite::toJSON(meta_json, auto_unbox = TRUE, pretty = TRUE)

Alternatives (not run)

Default

If you just want to copy between 8pm and 6am everyday for however long it will take:

wait_copy(file_list, temp_dir, json_meta = file.path(temp_dir, "all_meta.json"))

Change Start or End Time in Hours

What if the best time to copy files was from 10am until 1pm (13:00)??

wait_copy(file_list, temp_dir, json_meta = file.path(temp_dir, "all_meta.json"),
start_time = hours(10), stop_time = hours(13))

Don't Set a Time Limit

wait_copy(file_list, temp_dir, json_meta = file.path(temp_dir, "all_meta.json"),
time_limit = FALSE)

Stop Checking The Time

If you want the function to give up after trying to check the time, then change the n_check variable. If you wanted to stop checking after 3 tries:

wait_copy(file_list, temp_dir, json_meta = file.path(temp_dir, "all_meta.json"),
n_check = 3)

Use Different Renaming Function

An alternative way to handle nasty file names would be to use the make.names function:

wait_copy(file_list, temp_dir, json_meta = file.path(temp_dir, "all_meta.json"),
clean_file_fun = make.names)

Note that this is only applied to the basename of the file path, i.e. the actual file-name after removing the path in front of the file-name.

rmflight/waitcopy documentation built on May 24, 2019, 6:18 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

rmflight/waitcopy
Copying files within certain hours and waiting between each one

In rmflight/waitcopy: Copying files within certain hours and waiting between each one

waitcopy

Useful Definitions

Description

Why??

The Solution

How it Works

Installation

Stable Version

Development Version

Example Usage

Worked Example

Alternatives (not run)

Default

Change Start or End Time in Hours

Don't Set a Time Limit

Stop Checking The Time

Use Different Renaming Function

R Package Documentation

Browse R Packages

We want your feedback!

rmflight/waitcopy Copying files within certain hours and waiting between each one

In rmflight/waitcopy: Copying files within certain hours and waiting between each one

waitcopy

Useful Definitions

Description

Why??

The Solution

How it Works

Installation

Stable Version

Development Version

Example Usage

Worked Example

Alternatives (not run)

Default

Change Start or End Time in Hours

Don't Set a Time Limit

Stop Checking The Time

Use Different Renaming Function

R Package Documentation

Browse R Packages

We want your feedback!

rmflight/waitcopy
Copying files within certain hours and waiting between each one