Potential future features of the track package

Share:

Description

Potential future features of the track package, in some vague order of feasibility and priority ('easy', 'medium' and 'hard' are an estimate of design and coding difficulty):

better handling of writing files on changes to objects:

(medium) with cachePolicy="tltPurge", changed objects are only written to file at the end of a top level task. However, with cachePolicy="none", objects are written to file on each change – is better control over this needed?

untracked variables in the summary:

(easy) would this be useful? wouldn't need to cache these, mark with an asterisk in a special column? Compute these each time track.summary is called.

option writeToDisk:

is this redundant option with cache and cachePolicy?

other default tracked environment:

(easy) would it be useful to allow an environment other than the global environment to be the default tracking environment? This could be implemented by using options("tracked.environment") as the default environment for all the tracking functions (rather than the currently hardcoded pos=1)

DONE better cleanup:

(easy) provide an integrated quiting function that saves all tracked vars and history before quitting (and maybe also saves untracked vars in an RData file)

caching rules:

(hard) allow rule-based decisions for caching, e.g., only cache objects under a certain size, or only cache objects of certain classes, or enforce a limit on memory for caching tracked variables, and flush out least-recently used variables

record file read/writes:

(easy) record each time a file is read or written in the summary. Could be useful for smarter caching.

auto-trust in rebuild:

(easy) when rebuilding an active tracking environment, base decision whether to use summary row from file or environment on which has more recent dates in it. (whole dataframe, or row by row?)

smarter reading of filemap.txt:

(medium) check the mod time on filemap.txt when getting the filemap obj, and if the file on disk appears to have changed, reread it instead of just getting it from memory. This would allow working together better with other sessions that are simultaneously using this tracking dir. Don't know how much it would slow things down – do some timings. Note that to make this work in a fool-proof manner would require locks.

investigate double-get:

doing subset-replacement (e.g., X[2] <- ...) retrieves X twice (see example below)

DONE readonly mode:

(hard) to allow linking tracking dirs that might be in use by other R processes – would require not recording gets – this would require adding a new env on the search path and tracking it

DONE autoflush:

(hard) automatic flushing of variables that haven't been used frequently (triggered automaticall when memory runs low?) – this is why the summary records fetches as well as writes

safer restart:

(medium) check that we will be able to restart before doing the stop (check for masked variables or other potential clobber problems)

safe saves:

(hard) write files in a safe way so that the original file is not removed until the new file is written – not sure if this is necessary, because objects are in memory, and can be rewritten if there is a failure

DONE autotrack:

(hard) automatically track new variables? (would require hooks in base-R that get called when a new var is created)

Example of the "double-get" when assigning a subset (using the example from the help page for makeActiveBinding). Note that it works correctly, but retrieving the object twice seems unneccessary and could be slow with very large objects.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
> f <- local( {
+     x <- 1
+     function(v) {
+        if (missing(v))
+            cat("get\n")
+        else {
+            cat("set\n")
+            x <<- v
+        }
+        x
+     }
+ })
> makeActiveBinding("X", f, .GlobalEnv)
NULL
> bindingIsActive("X", .GlobalEnv)
[1] TRUE
> X
get
[1] 1
> X <- 2
set
> X
get
[1] 2
>
> X[1]
get
[1] 2
> X[2] <- 1 # 'X' is fetched twice
get
get
set
> X
get
[1] 2 1
>

See Also

Overview and design of the track package.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Example (transcript shown above) of how subset-assignment
# results in two retrievals when the object is an active binding.
f <- local( {
    x <- 1
    function(v) {
       if (missing(v)) {
           cat("get\n")
       } else {
           cat("set\n")
           x <<- v
       }
       x
    }
})
makeActiveBinding("X", f, .GlobalEnv)
bindingIsActive("X", .GlobalEnv)
X
X <- 2
X
X[1]
X[2] <- 1 # 'X' is fetched twice
X