Automatic Documentation for (and confection of) R packages

Share:

Description

The development of this package was led by the idea that it is a bad practice to separate code and documentation. Code and documentation should be at the same place so that both can easily be modified simultaneously. We proceed according to this principle since the middle of the 90ties using a Perl script (a very convenient language for this kind of tasks). Now, we share it with the community by the more convenient way of an R package. The recent development of R make things quite easy.
To use documair, each code defining an object (either function or variable) must be encapsulated within a series of tagged comments; we propose tags but their values can be adapted as you wish (by modifying the documair0$tag$v list). From these tagged comments, documair automatically writes Rd files and gathers them with a few more files to produce the complete package until the tar.gz. For some specific objects, the user can write manually the Rd file. All the necessary files must be gathered in a unique directory. In it, the user must place the following set of mandatory files (in the following pkg will designate the name of the package):

  1. A pkg.DESCRIPTION file: the standard text DESCRIPTION file to be associated to the package. The NAMESPACE file is automatically created from the exported objects and the presence of C and Fortran files.

  2. A pkg.package.Rd file: a text file describing in Rd syntax the general description of the package to appear in the documentation. This file can be slightly supplemented by documair to add some additional information.

  3. As many as wanted foo.code.r files where are placed the documented code of each object. Each files can include more than one object. The extension code.r can be modified within the pkg.which.txt file.

Additional optional files can also be included:

  • foo.test.r files including some scripts to test the functions. The extension test.r can be modified within the pkg.which.txt file.

  • pkg.foo.rda files where are placed possible data sets. These binary files must be loadable with the function load. The associated pkg.foo.Rd documentation files must be provided.

  • object(s).Rd files the user wants to produce by hand. They will be used by documair instead of the one based on the tagged code for (and only for) exported objects. By default, all the objects are exported, but some can be declared hidden with the pkg.which.txt file (see below).

  • C functions must be stored into individual files with the same name and extension .c; the same for Fortran functions with extension .f.

  • An optional pkg.which.txt file allows the user to override the standard use of files and functions to prepare different versions of the package with the same set of code files. It gives the possibility to hide or not functions in the package as well as to get sets of aliased objects. For details see the example below.

  • Every additional file the author wants to share with the other users of the package.

The complete content of this directory will be copied in the free inst directory of the package arborescence. Once this directory is prepared, the user successively calls the documair functions prepare8pkg and compile8pkg.

The denomination of documair stands for documentation for R, 'air' having the same pronunciation as 'R' in French.

Other R packages for generating Rd documentations files and building packages from comments inserted in R code are available. Two outstanding examples are roxygen and inlinedocs. The former is based on header comments, uses powerful parsers and propose interesting analyses like the call tree of the set of functions. The latter is very light, using simple tagging in logical places of the code. documair is in between these two cases, tags are also within the code but are many and varied giving rise to more possibilities than inlinedocs.

Documenting an object

Here is an example of a simple masked function of documair.

#<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
r.bd <- function(n1,n2)
#TITLE sequence of increasing numbers
#DESCRIPTION
# This function returns {n1:n2} when {n1<=n2} and
# {numeric(0)} otherwise.
# Quite useful when some insertion must be done within
# a sequence
#DETAILS
#KEYWORDS iteration
#INPUTS
#{n1} <<first integer>>
#{n2} <<second integer>>
#[INPUTS]
#VALUE
# {n1:n2} if {n1<n2}
# else {numeric(0)}.
#EXAMPLE
# xx <- 1:5;
# for (ii in 1:6) { print(c(xx[bd(1,ii-1)],10,xx[bd(ii,5)]));}
#REFERENCE
#SEE ALSO bf
#CALLING
#COMMENT
#FUTURE
#AUTHOR J.-B. Denis
#CREATED 11_01_12
#REVISED 11_05_21
#--------------------------------------------
{
if (n1 <= n2) {return(n1:n2);}
numeric(0);
}
#>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
One can notice the different tags used to structure the information provided in the comments lines:

  • #<<<<<<<...<<<<<<<< to open the object,

  • #TITLE to specify the title of the object,

  • #DESCRIPTION to describe the object,

  • ...,

  • #-------...-------- to close the documentation part

  • ...,

  • #>>>>>>>...>>>>>>>> to close the object.

Those five tags are the only compulsory ones (Note: the #DESCRIPTION tag can be missing for the children objects of a set of aliased objects).

Monitoring documair

Introduction

documair can build different packages from the same set of files by simply selecting which files/objects have to be compiled and which ones must be proposed or hidden to the end user. This is performed through the pkg.which.txt text file. An example of such a file (with comments) is documair.which.txt of the documair package. If you are generating different packages with the same functions, it is recommended to change the name of the package, this is the reason why you can indicate in the pkg.which.txt file other DESCRIPTION and PRESENTATION files than the standard ones deduced from the package name.

syntax of pkg.which.txt

The documentation of documair was performed by itself. Below is provided, as an example, a possible documair.which.txt. The comments introduced in the file are self-sufficient to understand what are the different possibilities:

#
# created on 14_01_28
# last modified on 14_05_27
#
# + the order of the items is irrelevant
# but they are exploited in that order
# + according to an option, the existence
# of the specified file, objects is
# checked or not.
# + '_ALL_' means all occurences
#
# specifying the description file
<<DESCRIPTION>> documair.DESCRIPTION
#
# the Rd file to describe the package
<<PRESENTATION>> documair.package.Rd
#
# specifying the extension for the code files
<<C.EXTE>> code.r
#
# specifying the extension for the test files
<<T.EXTE>> test.r
#
# specifying hidden code files
# (in the example, all objects are hidden)
<<HIDDEN.F>> _ALL_
#
# specifying exported code files and
# modifying previous prescriptions
<<EXPORTED.F>> user.code.r
#
# specifying exported code files containing alias sets
<<ALIASED.F>> exterieur.code.r
#
# specifying hidden objects
# (in the example no specific object should be hidden
# so the item here is suppressed with an '#'.)
#<<HIDDEN.O>>
#
# specifying exported objects
# (in the example an object belonging to a hidden file
# (documair0) is exported)
<<EXPORTED.O>> documair0
#
# specifying objects for which
# the content will be displayed on the
# screen during the process.
# (the same can be done at the level of the
# files with '<<DISPLAY.F>>'.)
<<DISPLAY.O>> documair0 display8tags
#
# specifying keywords from the components
# (here 'compile' must be interpreted as
# 'compilation', 'pkg' as 'package',...
# be aware that the two words must be stuck
# with '=' and that a word must not comprise
# any blank. Also that the special word '_NO_'
# means that this component must not appear
# as a keyword.)
<<KEYWORDS>>/=/U
compile=compilation
pkg=package
prepare=preparation
documair=_NO_
#
# end of the which.documair.txt file

Aliasing

documair accepts the aliasing of set of objects but some rules must be followed for that.

  1. Objects sharing the same alias must be proposed into a single file, and only those objects should be present in this file.

  2. The parent object must be in the first position into this file.

  3. The first alias name of the parent object must be the common alias

  4. The name of the file must be declared as containing aliased objects in the (in that case mandatory) pkg.which.txt file after the tag <<ALIASED.F>>.

The alias Rd file can be either provided by the user or composed by documair from the documenting tags. When the Rd file is hand-written, it must be named under the first object (the parent alias). When the documentation is built by documair, the descriptions of identical arguments are taken in the first object using them according to the ordering within the file.

Examples

To get more insights about the flexibility of documair, the reader can have a glance to documair itself since all necessary files are gathered in the inst directory. In the same directory is prepared within the script make.r four examples of building package variations based on documair objects; the first one being documair itself. The second one (named documair1 proposes the building of the package without pkg.which.txt file meaning that all objects are exported. The fourth example (documair3 gives an example of using C and Fortran functions which can be not effective for some configurations. Due to minor inconsistencies, some examples generate warnings.

Errors with documair

Currently, documair is quite sensitive to errors in the input files! Some are detected but indications are not always very clear, others are not detected. For instance the double # in
##{argument} << explanation...>> when describing the arguments of a function causes a non explicit error. Also, it can be easily affected by a mismatched parenthesis... To help the user in seeing where the mistake is located, it is suggested to put the check argument of prepare8pkg to TRUE and introduce the line '<<DISPLAY.O>> _ALL_' in the used pkg.which.txt file. This way the interpreted content of each object of the package will be displayed on the screen during the process with a pause to give an opportunity to see if everything seems consistent. It is strongly advised:

  • to introduce only standard ascii characters, even in the comments.

  • to test regularly the preparation of a tar.gz with documair during the development of a package, rather than once at the end, in order detect more easily the origins of potantial issues.

  • to avoid functions with name having more than one dot (.). It is considered as a method for an S3 object by documair but only the last dot is taken into account.

  • to be aware that the tagging of documair0$tags$v$deb$v which is
    "#<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<".
    must be exactly respected as well as those of documair0$tags$v$fin$v and
    "#>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>"
    Of course, you can modify them as you wish.

Naming conventions

Remembering object names in a given package is not always an easy task, the first reason being their number. To relieve it, we propose to follow some conventions, if so documair will propose deduced keywords.

  • Names will be composed of name components separated with digits. For instance print8object has print and object as components. The number of components can be one, two,... or more.

  • When nouns, the components can be singular or plural, making differences.

  • The separating digits have a meaning:

    • [0] pkg0 (object proposed by the package pkg) /0 ~ Object/

    • [1] res4objA1objB (and; here 'res' obtained from 'objA' and 'objB') /1 = one = an ~ and/

    • [2] a2b (conversion from object a to object b, even if the conversion is not one-to-one, a reverse function b2a is supposed to exist) / 2 ~ to/

    • [3] series3fun (function fun belongs to the family series) /3 ~ \in/

    • [4] trait4object (extract some characteristic from an object) /4 = Four ~ From/

    • [5] <free for the moment>

    • [6] split8text6tag (split a text object [with | by means of] tags) /6 = sIx ~~ wIth/

    • [7] image7path (image path) /upper bar of 7 is similar to an hyphen used to join two nouns or an adjective and a noun./

    • [8] action8object (make an action on an (or several) object(s)) /8 ~ a/

    • [9] empty7object9 (is it an empty object?: question mark) /9 ~~ ?/

Projected evolution of documair

  • Make possible to have operator functions like %T%.

  • Allow the user not to have too many *.code.r files. For that, the code files could be first splitted into elementary sub-files with a splitting tag like #<<<--->>>. This would be convenient when there are numerous aliased object sets. Indeed, currently, each identically aliased set of objects must be in a distinct file.

  • Get the main steps of the algorithm used in the function by collecting some tag contents within the code (introduced as special titles) and added either as a special section or as a new paragraph in the details section.

  • Allows the possibility of including files for repeated pieces of code, at least for one level.

  • Allows the introduction of enumerations in the comments.

  • As an option, check and impose typographical rules like upper cases at the beginning and final dot to the argument description... with some indication in a separate file of the changes to give the user the opportunity to check them.

  • Improve the way aliased objects are documented, allowing collective fields like titles...

Acknowledgment

The authors want to thank Annie Bouvier and Caroline Bidot for their useful supports in solving some technical difficulties we had when elaborating the package.

Additional Information

  • This package was built with /documair/ package (version 0.6-0) on 14_09_15

  • There are 65 object(s) in total. 11 are exported and there exist 54 masked object(s): analyse8description, code7objects4text6tags, components9, extract8object, make8rd, parse8code, rrrbc, rrrbd, rrrbelong9, rrrbf, rrrdipa, rrrdisplay8k, rrrerreur, rrrexplore8list, rrrfidi9, rrrfile2list, rrrfile2text, rrrfilter8text, rrrform3crop, rrrform3display, rrrform3justify, rrrform3parag, rrrform3repeat, rrrform3title, rrrform3titre, rrrget8comp7list, rrrinterv7belonging, rrrlist2text, rrrlist4text, rrrnow, rrrobject9, rrrparse8text, rrrpause, rrrplaces4text6tags, rrrrbsa0, rrrrbsa7list9, rrrtext2file, rrrtext2list, rrrtext2vma, rrrtext3acceptance, rrrtext3brackets, rrrtext3ij2n, rrrtext3interval, rrrtext3n2ij, rrrtext3places8brackets, rrrtext3places8word, rrrtext3preparation, rrrtext3replace, rrrtext3stext, rrrtext3translate, rrrtexts4text, rrrvma2text, rrrvoid9, write8lyse.

  • There were 1 C file(s).

  • There were 1 Fortran file(s).

  • They were provided through 8 code file(s).

Author(s)

Jean-Baptiste Denis (MIAj - Inra - Jouy-en-Josas),
R\'egis Pouillot and
Ki\^en Ki\^eu (MIAj - Inra - Jouy-en-Josas)