See git repo issues.
This section contains developer notes for things that are unresolved, need fixing, or hare-brained ideas for features. Read at your own risk.
Need to figure out if we can if we can disable capture without then messing up the display of the test. Might not be possible? This is indeed a problem as we loose the message data stream which we do display once we get to the failed test. The conditions should still be there.
Need to try a test that does this.
Also need to figure out how to get rid of "<unitizer sink test>"
.
We need some option to set the progress mode:
The last two could default to TRUE for interactive. Should probably be S4.
unitizer_progress <- function(level=TRUE, transcript=!interactive()) {
structure(c(level=level, transcript=transcript), class='utz_transcript')
}
One weirdness is that "transcript" and "progress" feel somewhat separate, but the trick is the overwrite vs newline. Maybe there is a transcript mode modifier. This makes most sense. Maybe in the future we provide more control at the transcript level.
Challenge is that we eval all the unitizers, and then browse all of them, but if we just want to re-eval one we don't have a good way to interrupt?
So what behavior do we want? Ideally when we exit with R, we want to return to that exact unitizer and continue on from there in the original order.
Why do we allow only one bookmark? The RR option effectively requires multiple ones. Actually, maybe it's okay if we only care about returning to the specific point of a single re-eval, which is likely what we want. Does RR even make sense outside of the top level "select-a-unitizer" level?
Currently the bookmarked status would allow us to pick which unitizers to review
after reloading (browse.which). The problem is that a bookmark is complicated
(see unitizerBookmarkOrNULL
), and it's presence allow is used to determine
whether a unitizer is bookmarked, but we would need to add a semantic to the
bookmark that's just "start from the beginning" if we add bookmarks to all
subsequent that are not explicitly bookmarked. I guess we could do this,
although if feels hacky.
Astonishing we got this far without running into it more frequently, but the matching of calls on deparse suffers from several issues, particularly instability with different locales, etc, but could also be affected by numerical precision.
Storing the parsed data is out of the question due to file size (unitizer.R is a ~750 line file):
> x <- parse('../unitizer/R/unitizer.R')
> f.lang <- tempfile()
> f.char <- tempfile()
> saveRDS(x, f.lang)
> saveRDS(deparse(x), f.char)
> file.size(f.lang)
[1] 76612
> file.size(f.char)
[1] 5431
Parse/deparse cycle on load is technically feasible from a computing cost perspective, but it is not guaranteed to work as some locales will accept random bytes as valid (because they are) so they are parsed directly into those bytes, but then attempting to parse in another locale fails.
We could try to record the locale and iconv, or even try to use RDS3 which appears to translate, but really we probably will just document for now. We could add a warning too?
Works on winbuilder. Most likely explanation is that the CRAN machines when
given a random file name in normalizePath(blah, mustWork=FALSE)
produce a full
normalized path for blah
irrespective of whether blah
exists or not.
So fix here is to change normalize_path to test for file, and if not, return the unchanged file (and hope for the best?). Indeed we confirm that non-existent files on windows are still normalized to the working directory.
Additional complexity that on windows (sometimes?) we can expect drive letters, so it's not always clear what is an absolute path.
We got it working on 3.5, but there are still failures on 3.3. Should go back and see if we can figure out what's going on.
Also, do we want to backport the eq-assign fix (probably not)?
Right now a total mess; need a unified way of addressing color usage in:
diffobj
calls from within unitizercomment
coloringbrowse
menu coloring (failed/removed/etc)Use pager? Seems like almost definitely want to do this for elements like [B]rowse, etc. Have to think about how it impacts flow if the pager is external like in RStudio.
All meta-text should be grey... But what about large portions of text such as those we get in [B]rowse? Maybe the headers and all that in gray, but the contents in white? What about the prompts? Will have to play around.
Color in code comments? Could do so pretty easily
One of the key objectives is differentiating unitizer
meta output from normal
code output. Will need to provide a solution for terminals without color
support, perhaps something like prepending >>
to every meta line.
Need some way of mitigating the amount of output put out by unitizer
. Right
now with the addition of the new diff it's pretty jarring to see so much red and
yellow and blue. Some options:
value
error, squash all.equal
and rely instead on diffobj
output only; one issue here is that we might conceal some errors that
all.equal
would otherwise show; partly relies on making sure diffobj
shows
the all.equal
output when there are no visible differences.value
error is that we need to handle
those cases where the user provides their own comparison function with its own
output preferences. The auto-output of diffObj
might conflict with this.diff_state
business; what's the best way to handle it?Still somewhat conflicted about what the return values should be for a unitizer.
Need to be able to address:
At a high level we would want, for each unitizerObjectList, how many tests
passed / failed / added / removed etc, for each unitizer. This is comparable
to what we show during browsing. This info is generated by
summmary.unitizerObjectList in unitizer_browse
, with the post eval but
pre browse unitizers.
Additionally, we need the user selections on what happens to each test. We
need summaries of that, which we can get from the @change
slot of the post-
browse unitizer, but we also want the full listing as it is shown when pressing
B
at the unitizer
prompt. This information is fully contained in the
untizerBrowse
object and displayed by show.unitizerBrowse
.
So we should be able to modify unitizer_browse
to return the additional
info (show(y)
from outside review_prompt
works fine). The main issue seems
to be making sure there is a reasonable interface to access all this data.
So, to summarize, we should return a unitizerObjectList
that will contain the
post eval but pre-browse unitizers
, combined in some way with the post-browse
unitizerBrowse
objects, with mechanisms for extracting the data in easy to
manipulate structures (e.g. matrices).
Right now we're just returning the data.frame.
run_ls
on new tests is super inefficient and is likely the next
obvious point to optimize, but would take a little work in an area of the code
that is a bit finickyall.equal
adds a little overhead, but not a huge ammountappend
also adds some overheadTurns out stdout and stderr capture is a very small portion of the storage space taken up in a unitizer.
Looks like the majority of storage is actually taken by values, calls, and environments. We partially addressed values by no longer storing ignored tests. We probably can't do anything about the environments.
For calls, we're planning on storing the deparsed calls rather than the calls themselves. This should save a lot of room, at the cost of some computation time since the deparse/parse cycle is about 150us per call, which is less than ideal.
Q
? Right now causes an unexpected exit
with loss of all work.browse
source code, and then Q from browser() env
causes an exit with loss of history; this is likely purely internal and not
something a user would run into, but might be worth addressing to simplify
debugging. This seems to happen if we hit more than one browser() while
evaluating.parent.env<-
, what do we do about the note in R docs
stating this might be removed in the future? EDIT: most likely we should not be
affected as devs seem mostly concerned about changing the search path at
run time, which we do not do.library
be ignored? How do we reconcile that with the addition of the
new clean path functions? In theory library
should be returning the same
thing when using the clean paths.options(warn=1)
due to weird stuff that happens when trying
to run in mode 0 while capturing output.warning
or stop
?as.list
of
expression, or some such? Or just compare calls with identical
? alternate
solutions seems potentially expensive)There is fairly extensive use of S4 internally. In some cases it definitely make sense, in other potentially less so. In particular, we've leaned towards making functionst that use S4 objects methods, even if there is only one possible dispatch to contend with. We also haven't been super consistent, as in some cases we do not use S4 methods. Need to rationalize all of this at some point.
Rationalizing side effects.
Major types:
Types that are treated differently
Simplified implementation:
Different meaning for each of the types of things.
Unfortunately reproducible state can cause issues with packages that cannot be unloaded (e.g. data.table) and also likely to be a little less robust since we are doing a lot of borderline stuff to make it happen. Questions then:
Additionally, would be nice to be able to change just one value of reproducible state instead of having to do the entire settings string. Perhaps the best interface is a combination of either a single string value for a preset, and the full named integer vector. For example:
An alternative is to create a new S4 class that represents all the state values,
and control the population through the constructor. We can also extend that
class to provide other pre-set options, so we might have stateSettings
and
stateSettingsSafe
or some such. The main problem with this implementation
is that it would require an S4 object being stored as an option, which is
undesirable. Perhaps can accept either character value or S4 object?
Then we would have five state values, the traditional four plus par.env?
Is it worth the extra complexity for the argument in order to eliminate one single argument? Maybe, the best reason to do this is that then all state related aspects of evaluation are controlled through the one argument, which makes more sense. In particular, this allows us to have the "safe" mode also reasonably control par.env.
Shouldn't worry too much about 3. since translation is a secondary consideration. The reasonable thing would be to run in some lower level of reproducibility, though annoyingly any time we re-run the tests we would have to use those exact settings. This is somewhat mitigated if we
Offer two options for each setting:
There is some interplay between search path and options, particularly because many options are often set up when a path is first loaded. Perhaps this means that you can only have zero-set options if you also have zero set search path.
Additionally this means that the search path trimming should happen with unloading namespaces as well so that when they are re-attached onload scripts get run again (and define options as they often do)?
Shimming library/attach/detach likely to be a major obstacle to CRAN approval,
and also introduces complexity and instability as new R versions are released.
Simplest solution is to just add library/attach/detach functions in a "shims"
environment, but not as robust as tracing the actual functions. In particular,
things can fail if there is a compound expression that uses base::library
,
either via package namespace or directly. We can detect the direct calls via
parsing and issue warnings, but this is limited since people can get creative.
Note that even shimming base::library
, etc, not fullproof since some packages
may be calling internals directly, or some base R functions may do that too?
(seems unlikely though).
The Mersenne Seed is way too large, need to use Marsaglia Multi Carry or some such.
It is viable to store all the non-function options in each unitizer (2-3KB expected per unitizer). But there are a few issues:
Details
editor
)options(error=NULL)
is technically not setPossibilities
Should we make a distinction between options being changed by user vs. by tests? The easiest thing to do would be that.
And how do options like working directory fit in all this? Do we want to change it to a directory that may not exist? Should it be test-writer responsibility to make sure they don't have any relative path references?
Some of these global settings are almost certainly going to be different even when comparing the creation run to the R CMD check run. I guess we could force them to be the same, but what does that even mean for stuff like working directory?
So again, two broad categories:
Go the whole hog, track all options and store them for reference tests, need a lighter weight mechanism for storing the options. Store only deltas and recompute them as needed? Would need getOldState functions that would just pull the requested entry for most states, but for options would recompute them from early data. The main issue remains that there options are that need to be different when run in different environments, such as:
etc. Can we realistically have a list of options not to change? If we get it wrong bad things would happen. What if packages introduce options that are system specific? Then they should get loaded on package attach, and will be different
There is an additional complication with options which is how to handle the
default options that get set by the .onLoad
hooks. In order for everything
to work well, we need to fully unload a namespace so that next time
library
is called with the same package, the .onLoad
hooks are kicked off.
Unfortunately, some packages (cough, data.table) cannot be unloaded (and more
generally R docs warn full unloads are not really well supported).
We can't just run .onLoad
as in a normal .onLoad
the package environment
is unlocked and .onLoad
hook scripts expect to be able to modify bindings in
the namespace.
So, what is the most reasonable compromise for this? If some namespaces are not unloaded, then we have to make sure the options set by those namespaces are not undone at any point. So suppose we start with a blank slate, then:
library(data.table)
We can prevent the unloading of the data.table
namespace easily enough, but
we cannot undo any options without risking undoing data.table
options. In
fact given that the data.table
namespace may have been loaded even before
we kick off unitizer
, we can't even see what options were added by
data.table
.
It seems that if one of the no-unload namespaces is loaded at any time then we must set the options tracking to 0 if. Actually, there are three types of situations:
For 1. and 2., we can run in modes 0 and 1. For 3. we can run in 0:2, but since we don't unload the namespace that only works the first time since the second time we run the namespace will already be loaded... Not super useful. For 4. we can only run in mode 0.
Hmm, even for 2. running in mode 1 we might have a problem since we would still be trying to undo the pre-loads, which would undo the options, but not the no-unload namespaces and dump us back to the normal R prompt in that state.
Let's say we run into one of these namespaces in an incompatible mode; what do we do?
Another possible work-around is to force a namespace load even before the initial state recording and then run in mode 1 so that we don't undo that namespace load.
Gah... and WTF do we do about all the potentially unloadable namespaces that we can no longer unload because an unloadable namespace imports them? This greatly complicates the problem.
More and more it seems like we need to narrowly define the acceptable modes, which seem to be 0, or 1 iff the namespace was already loaded by the time we hit pre-loads.
So do we have an additional "unitizer.load.namespace.if.not.loaded" list? And does that make "unitizer.keep.namespace" redundant or do we still need both? Probably still need both. So need to check that namespace was already pre-loaded against keep.namespace list.
unitizer
and make sure the results make sense (can we
do this without crapping al)description
for unitizer_sectionAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.