[GitHub issues]: https://github.com/biocro/biocro/issues "Doxygen documentation" {target="_blank"}
"Code dependencies are the devil" {target="_blank"} "Our Software Dependency Problem" {target="_blank"} "Standard Library Compatibility" {target="_blank"}
[Boost FAQ]: https://www.boost.org/users/faq.html "The Boost FAQ" {target="_blank"} [CRAN policies]: https://cran.r-project.org/web/packages/policies.html "CRAN Repository Policy" {target="_blank"} [renv]: https://rstudio.github.io/renv/ "The renv package" {target="_blank"} "Docker and Reproducibility" {target="_blank"} "Reproducible analysis and Research Transparency" {target="_blank"} "Advanced R (1e) style guide" {target="_blank"}
[Solar Position module]: r params$biocro_root
/src/module_library/solar_position_michalsky.h "Source code for the Solar Position module" {target="_blank"}
[Solar Position module rendering]: https://biocro.github.io/BioCro-documentation/latest/doxygen/doxygen_docs_modules_public_members_only/classstandard_b_m_l_1_1solarpositionmichalsky.html#details "Documentation of the Solar Position module" {target="_blank"}
[dimensions]: https://en.wikipedia.org/wiki/International_System_of_Quantities#Dimensional_expression_of_derived_quantities "Dimensional expression of derived quantities (Wikipedia)" {target="blank"}
[doxygen manual]: https://www.doxygen.nl/manual/docblocks.html "Doxygen manual" {target="_blank"}
[physical quantity]: https://en.wikipedia.org/wiki/Physical_quantity "Wikipedia's entry on Physical Quantities" {target="_blank"}
[coherent units]: https://en.wikipedia.org/wiki/Coherence%28units_of_measurement%29 "Wikipedia's entry on Coherent Units" {target="_blank"}
[src/parameters.h]: r params$biocro_root
/src/parameters.h "The BioCro parameters.h file" {target="_blank"}
[unit tests for modules]: https://biocro.github.io/BioCro-documentation/latest/pkgdown/articles/an_introduction_to_biocro.pdf "Intro to BioCro vignette" {target="_blank"}
[C++ Core guidelines]: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines "C++ Core Guidelines" {target="_blank"}
[aspects of coding and design]: https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#S-naming "C++ Core Guidelines' naming and layout suggestions" {target="_blank"}
[Google C++ style guide]: https://google.github.io/styleguide/cppguide.html#Formatting "Google's guidelines for formatting C++ code" {target="_blank"}
[Strategies to Speedup R Code]: https://datascienceplus.com/strategies-to-speedup-r-code/ "datascience+ article on speeding up R code" {target="_blank"}
[Why loops are slow in R]: https://privefl.github.io/blog/why-loops-are-slow-in-r/ "Florian Privé's article on the pitfalls of R loops" {target="_blank"}
[Lochocki et al. (2022)]: https://doi.org/10.1093/insilicoplants/diac003 "Lochocki et al. (2022)" {target="_blank"}
[git-flow]: https://nvie.com/posts/a-successful-git-branching-model/ "git-flow" {target="_blank"}
[^public-API]: As of this writing, no C++ API for BioCro has been defined: there is no document that makes clear what publicly accessible portions of the framework and standard library are guaranteed to remain stable and available to be programmed against and which portions are subject to change.
[^reproducibility]: Irrespective of what dependencies BioCro now has or are added to it, a researcher who is concerned about reproducability should consider making a containerized version of BioCro. See, for example, the chapter Docker and Reproducibility in the document for the workshop Reproducible analysis and Research Transparency. The BioCro maintainers don't provide containerized versions of BioCro as we think this is a task better left to the individual researcher.
[^biocro_dependencies]: BioCro's strong dependencies are the R framework, the C++ compiler used in installing the BioCro package, the C++ Standard Library, the and the Boost C++ Library.
As for R package dependencies, the BioCro R package depends only
upon packages in the R standard library (stats and utils) for its basic installation and functioning. BioCro does use other, non-standard R packages for building the documentation and for testing, but these are not essential to a fully-functioning BioCro installation.
For further reading on the benefits and pitfalls of using
dependencies, see, for example, Russ Cox's article Our Software Dependency Problem and Bill Sourour's article Code dependencies are the devil.
[^compressing_vignettes]: If a package contains large vignette files,
R CMD check
may produce an error about overly large documentation. In this
case, it can be helpful to specify --compact-vignettes=both
when calling
R CMD build
. Previously this was an important issue for BioCro, but it has
been mitigated by designating most vignettes as "web only."
By making changes without discussing it with the group, you risk spending time working on a solution that others may not accept. The members of the group also have diverse backgrounds and likely can give valuable design insights.
In general, BioCro development follows the [git-flow][] branching model, where
there are two permanent branches (main
and develop
) and three types of
temporary branches (hotfixes, features, and releases).
For most contributors, it is only necessary to know that most changes should
be accomplished through feature branches, which are branched off develop
and
merged back into develop
via a pull request that must be approved by one or
more other developers. The remainder of this section discusses our system in
more detail.
Beyond the basic description of git-flow, we have a few additional rules and clarifications specific to BioCro development:
Any merge into main
or develop
must be done via a pull request. This
requirement is enforced with GitHub branch protection rules and cannot be
bypassed.
All pull requests into main
and develop
require approval before merging.
This requirement is not enforced using GitHub branch protection rules, but
please do not unilaterally merge a pull request without any form of approval
from another developer. (There are two exceptions to this rule related to
hotfix and release branches, described below.)
The BioCro R package and repository follow semantic versioning of the form
major.minor.patch
. Hotfix branches should typically only increment the
third (patch
) number. Most release branches will increment the second
(minor
) number, and very rarely the first (major
) number. Feature
branches do not directly change the version number.
Hotfix and release branch names should be formatted like type-version
,
where type
is either hotfix
or release
, and version is the new version
number. For example, hotfix-v3.0.3
would be a hotfix branch that
changes the version number to 3.0.3
. Feature branch names should reflect
their purpose, and can be anything other than master
, develop
,
hotfix-*
, or release-*
.
Whenever the package version changes, a description of the related changes
should be added to the changelog in NEWS.md
as a new section titled with
the new version number. Such version changes will happen via release and
hotfix branches. Release branches will generally include changes made during
the course of several feature branches, so they may require a significant
amount of documentation in NEWS.md
. To make it easier to prepare a list of
these changes, each feature branch should include a description of its own
changes in an UNRELEASED
section of NEWS.md
. For more information about
updating the changelog, please see the comment at the top of NEWS.md
.
The following is a short description of BioCro's implementation of the git-flow branching model:
The main
branch always contains the latest stable release of the BioCro
GitHub repository. In fact, a new tag and release is made any time changes
are merged into main
, and such changes should always be accompanied by an
increment to the BioCro R package version number.
The develop
branch contains bug fixes and new features that have been
completed but may not have been released yet.
Hotfix branches are for urgent changes that warrant immediate incorporation
into main
; typically these are bug fixes. A hotfix branch should be
branched off main
. When ready, it should first merged into main
via an
approved pull request. Then, a second pull request should be made to merge
into develop
. If there are no merge conflicts or test failures, this
second request can be merged without any additional approval. When both
merges are complete, the hotfix branch should be deleted.
Feature branches are for less urgent changes that do not require immediate
release; for example, the addition of a new R function to the package
namespace. A feature branch should be branched off develop
and merged
back into develop
via an approved pull request. Before merging, NEWS.md
should be updated with a description of the changes under the # UNRELEASED
section (see NEWS.md
for more details). When the merge into develop
is
complete, the feature branch should be deleted.
When sufficiently many changes have accumulated in develop
to justify a
new version of the package, a release branch is used to move the changes
from develop
to main
. Release branches should not include substantive
changes; rather, a release branch is primarily used to increment the package
version and to ensure an up-to-date changelog in NEWS.md
; see the
"New Release" chapter for more details. A release branch
should be branched off develop
. When it's ready, it should first merged
into develop
via an approved pull request. Then, a second pull request
should be made to merge into main
. (If the first PR target is main
, it
will be difficult to discern the minor changes in the release branch from
the other changes in develop
.) If there are no merge conflicts or test
failures, this second request can be merged without any additional approval.
When both merges are complete, the release branch should be deleted.
We have several conventions when making and reviewing pull requests that are designed to make the process as efficient as possible, and to minimize any issues introduced by new code. These have been used in many PRs and have been found to be helpful.
Single ownership of each PR: If multiple people are making changes to the same code, there is a potential for conflicting changes to occur. While git provides a way to resolve such conflicts, it is better to avoid causing them in the first place. A simple way to prevent most of these conflicts is to treat each branch and PR as having a single "owner" who is in charge of making all commits. There are a few consequences of this approach:
Pull request reviewers should suggest changes, not commit changes directly to the PR branch. If a reviewers wishes to contribute specific code changes, it is better to make them in a new branch and create a secondary PR to merge the changes into the original PR.
Pull request owners should either implement the suggested changes or give a short explanation for why they have chosen not to.
Once one or more reviewers have indicated their approval, it is the responsibility of the pull request owner to finalize the PR by merging it and deleting the branch.
Submitted code should work: It takes time and effort to produce a
thoughtful review of a pull request, so requests to review a PR should
generally only be made for working code. In a BioCro context, "working" code
means that R CMD check
does not produce any errors and that all the
vignettes can be built. See the "Running R CMD check" chapter
for more information. While this does not guarantee that the new code achieves
its goals, is fully documented, or follows the coding style guidelines, it is
nevertheless a minimum requirement for any code to be incorporated into
BioCro. The following is a thorough workflow that should ensure submitted code
is working:
Run the regression tests locally. The tests don't take much time to run, so this is a good way to get fast feedback.
Instructions for running the regression tests locally can be found in the "Running the testthat Tests" chapter.
When all the regression tests are passing, run R CMD check
locally. This
takes longer than just running the regression tests, but it will also make
sure that all code in the examples (and some of the vignettes) is able to
run, which sometimes turns up more issues.
Instructions for running R CMD check
locally can be found in
"Running R CMD check locally" section.
Note that R CMD check
will stop checking the examples once it encounters
an error, so its list of problematic examples is not exhaustive. If there
is an issue with an example, fix it and rerun R CMD check
again because
there may be additional errors in the examples.
When R CMD check
is passing locally, build all the vignettes. Because
of space limits imposed by CRAN, some of our vignettes are designated as
"web only." Such vignettes are stored in the vignettes/web_only
directoty
and are not checked by R CMD check
. They can be built manually on a
case-by-case basis from within R by using tools::build_vignette
, or all
at once by running pkgdown::build_site()
in an R session whose working
directory is set to the root directory of the BioCro repository. See the
"Building package vignettes" section for more
information.
When R CMD check
is passing locally and the vignettes can all be built
locally, make a PR, but don't assign any reviewers. Making the PR will
cause the workflow tests to run, which includes running R CMD check
on
several operating systems and versions of R, and building all the vignettes
as part of the package documentation. This step takes the most time, but is
important because sometimes an issue will appear that didn't occur locally
on your own operating system or version of R.
When the online checks have passed, then assign reviewers to begin the review process.
Doing this will ensure that there are no basic issues when reviewers begin to look at the code. It may seem like a hassle, but it saves time in the long run. Even when the exact workflow above is not followed, the online tests can still be extremely helpful to a PR owner or reviewer:
It is possible to see whether the checks have passed by looking at the list of current PRs, where a red X next to a PR name indicates failure and a green check mark indicates success.
When the tests have failed, more details can be found in the "Checks" tab of the PR web site. This will include the exact error messages, which are the starting point for troubleshooting.
Of course, the conventions above are not hard rules, and there may be situations where it is acceptable, or even necessary, to deviate from them. The following is a (non-exhaustive) list of situations where these conventions could be modified:
A PR owner may give clear permission to another developer to make direct code changes. This may be in response to a reviewer asking to make changes.
Sometimes the online tests will fail due to an issue with the testing setup
itself. In that case it might not be necessary for them to pass. However, in
this scenario, it is important for at least one developer to ensure that
R CMD check
passes locally.
Although it is ideal to only submit working code, sometimes this is not
possible for a variety of reasons. If you are working on new code and wish to
discuss it but there are R CMD check
errors, this can be accomplished via a
PR, a draft PR, a GitHub issue, or a GitHub discussion. The best approach may
vary with the particular situation. Regardless of which approach is taken, it
is essential to acknowledge the R CMD check
issue and be clear about the
type of feedback or discussion you are requesting.
Sometimes a working PR may begin failing R CMD check
after the author has
made code changes in response to reviewer feedback. This is normal; the new
problems and their solutions will simply become part of the PR discussion.
From time to time, someone will propose making a large change to the organizational structure of BioCro or to one of its central components. Here we also consider any modification that influences the way users or developers interact with BioCro to be "large." Large changes must be carefully considered and discussed before they are implemented. When considering such proposals, a number of key points should be kept in mind:
A friendly user experience makes BioCro accessible to a wide range of users who each have the potential to contribute to broader scientific understanding through modeling. Thus, it is important to minimize any barriers that may prevent some scientists from using it.
BioCro is developed and maintained by a small team, most of whom have numerous other responsibilities. Thus, it is important to carefully consider the implementation and maintenance costs that may be associated with any proposed change, carefully weighing these against the perceived benefits.
Changes to BioCro R packages should be consistent with [CRAN policies].
Distributing BioCro via CRAN is a key part of keeping it accessible to all users with maximum ease.
This has two implications:
The core C++ library should maintain a consistent interface. Any changes to the public API should be carefully considered and carefully delineated.[^public-API]
It should be and should remain relatively painless to obtain and use this library without having to have a full R installation.
With BioCro, a premium is put on keeping it relatively self-contained; needless dependencies should be avoided.
There are a number of reasons for this:
Limiting dependencies will make it less likely BioCro will break for reasons beyond our control.
Limiting dependencies reduces security risks. (On this point see, for example, Russ Cox's discussion of the 2017 Equifax fiasco in his article Our Software Dependency Problem.)
Limiting dependencies makes reproducibilty easier.
If one wishes to replicate results of a BioCro simulation, the task is more difficult if one has to worry not only about what platform, what version of R, and what version of BioCro were used to obtain the original results, but if, on top of that, one has to worry about the versions of other R packages depended upon.[^reproducibility]
Limiting dependencies means that fewer steps are required to install BioCro.
BioCro has few dependencies, and all things being equal, we would like to keep it that way.[^biocro_dependencies]
If you propose a large modification to BioCro, please be prepared to discuss the following questions:
How will time costs change for maintainers, developers, and users?
Will there be more or fewer opportunities for BioCro to break due to changes in its dependencies?
Which BioCro features will be added or lost?
When releasing a new version of BioCro, it is essential to make sure that
NEWS.md
is up to date and that the new version is acceptable to CRAN. This can
be achieved through the following steps:
Choose the next version number; our conventions for semantic versioning are
described in NEWS.md
.
Make a new release branch whose name is formatted as release-vX.Y.X
, where
X.Y.Z
is the new version number, as described in the
"Branching Structure" chapter.
Update the DESCRIPTION
file with the new release date (set to today's
date), the new version number, and any necessary changes to the author list.
Change the UNRELEASED
header in NEWS.md
to the new version number, and
check the contents of this section to ensure it is clear and complete. It may
be helpful to look through the list of completed pull requests on GitHub to
check for any important changes that may have been missed.
Run R CMD check
to see any NOTEs that are reported; see the
"Running R CMD check chapter" for instructions. Check the
contents of cran-comments.md
to make sure it accurately reflects the
R CMD check
notes.
Make a pull request, requesting to merge the new branch into develop
. In
general, follow the guidelines in the
"Pull Request Etiquette" chapter. However, there is one
extra requirement for a release branch: the branch should not be merged until
the new version has been accepted by CRAN. Sometimes CRAN may have issues
with a new release, and it is better to address them before finalizing the
release. (Otherwise, there may be several releases in quick succession with
only minor or trivial changes between them; for example, BioCro versions
3.1.1 and 3.1.2.) See the "Submitting to CRAN" chapter
for instructions.
When the new version is on CRAN, merge the release branch into develop
and
then main
.
The package maintainer is responsible for submitting to CRAN, and the process consists of the following steps:
Build the package with R CMD build
[^compressing_vignettes] using the
current release version.
Test the resulting .tar.gz
file using R CMD check --as-cran
using the
current release version and the current development version on Linux and
Windows. GitHub actions run this for R current Windows and Linux, and R
devel Linux, so only R devel Windows needs to be checked locally.
Attach the resulting .tar.gz
file to the form here:
https://cran.r-project.org/submit.html.
Paste the contents of cran-comments.md
in the form's comments box.
Submit, cross fingers, and wait. If any issues are found by CRAN, address
them and try again. If the checks fail only from permissible NOTES, such as
using C++11, reply to the email indicating the justification, for example,
we use a library that uses C++11. You can restate what is in
cran-comments.md
.
BioCro's online testing system should catch most issues before reaching this
point, but sometimes CRAN starts enforcing rules that are not clearly explained
anywhere or not checked by R CMD check
. The most up-to-date description of
CRAN's requirements can be obtained from the following official and
semi-official sources, which each offer a different perspective on CRAN
submission:
https://cran.r-project.org/web/packages/policies.html
https://r-pkgs.org/release.html#release
https://github.com/DavisVaughan/extrachecks
(Most of what is discussed here pertains specifically to code for the BioCro C++ library.)
Include citations to sources for equations and parameters used in the code. The citation should be sufficient to locate the article and relevant information within it. Include a table or figure reference if appropriate.
Use [Doxygen][doxygen manual]-style comment syntax for high-level
documentation of functions and classes. We use the Javadoc style of
comment block in our code. (See the documentation of the Solar
Position module [solar_position_michalsky
][Solar Position module]
for an example of a Doxygen-style comment, and then look at [the way
this is rendered in the generated documentation][Solar Position
module rendering].)
Include reasoning and justification for the model, including assumptions that determine when use of the model is appropriate. These descriptions should be succinct.
double yield = 10 // Mg /
ha
should be read as meaning "the yield was 10 Mg / ha". Using
[dimensions][] instead of units is acceptable if the code is written
with the expectation that [coherent units][] are used.The following example shows how to indicate units in a number of
different contexts. Note that, as in LaTeX, ^
is used to indicate
a superscript, so that m^2
indicates square meters.
```c++ // In function signatures double ball_berry(double assimilation, // mol / m^2 / s double atmospheric_co2_concentration, // mol / mol double atmospheric_relative_humidity, // Pa / Pa double beta0, // mol / m^2 / s double beta1) // dimensionless from [mol / m^2 / s] / [mol / m^2 / s] // In assignments double leaf_temperature = air_temperature - delta_t; // K. // In return statements return assimilation_rate; // micromoles / m^2 / s. // In tables const std::map<SoilType, soilText_str> soil_parameters = { // d = dimensionless // d d d J / kg d J s / m^3 d d d Mg / m^3 // silt clay sand air_entry b Ks satur fieldc wiltp bulk_density { SoilType::sand, { 0.05, 0.03, 0.92, -0.7, 1.7, 5.8e-3, 0.87, 0.09, 0.03, 1.60 } }, { SoilType::loamy_sand, { 0.12, 0.07, 0.81, -0.9, 2.1, 1.7e-3, 0.72, 0.13, 0.06, 1.55 } }, }; ```
If you would like to include other details, include the units in the same way, and include details following the units so that the variables are still read like regular text. For example, write
c++
✓ return gswmol * 1000; // mmol / m^2 / s. Convert from mol to mmol.
not
c++
✗ return gswmol * 1000; // Converting from mol / m^2 / s to mmol / m^2 / s.
Note that in a case such as this, the units apply to the entire
value (gswmol * 1000
) and not merely to the variable (gswmol
).
Use SI conventions for units and dimensions, including
capitalization. Specifically, use "degrees C
", not "C
", to
indicate °C.
Use full names when symbols are not available:
micromoles / m^2
, not umol / m^2
degrees C
, not *C
.
Use dimensionless
for dimensionless quantities, and include how
the dimensions have canceled if that is informative.
Use ^
to indicate exponentiation: m^2
, not m2
.
Prefer an asterisk to indicate multiplication; but indicating
multiplication by juxtaposing units with exactly one space between
them is acceptable. Prefer exactly one space on each side of the
asterisk: kg * m / s
or kg m / s
.
Either a solidus ("/") or negative exponent is acceptable to indicate division, but ensure that the solidus is used correctly if used multiple times. Prefer exactly one space on each side of the solidus.
When adding models that require new parameters, document the parameters in the parameter table in [src/parameters.h][]. Please keep the table well formatted.
If you are working on a model with undocumented parameters, it would be nice if you added them to the table as you work through the issue.
Do not use C-style arrays. Use an appropriate data type from the standard library instead.
Use cmath, not math.h, for common mathematical functions.
Be careful with using-directives (e.g. using namespace std
) in
a global scope; do not use them in global scope in a header file.
Try to make using-declarations (e.g. using std::string
) as
local as possible. Type aliases (e.g. using string_vector =
std::vector<std::string>
) are perfectly acceptable in the global
scope of a header file.
Strongly prefer the [coherent][coherent units] set of SI units. Doing so reduces code complexity remarkably as no conversions are necessary. Yes, no one publishes values with these units, but do the conversion in one place, the manuscript, instead of dozens of times in the code, constantly having to look up units for variables, and then spending hours debugging silly, difficult-to-find errors. The coherent set of SI units consists of all the units without prefixes, except that kg is the coherent unit of mass, not g.
Do not copy and paste code, changing only small parts. Choose a design that eliminates the duplication.
Make an effort to write unit tests. (See the vignette [An Introduction to BioCro for Those Who Want to Add Models][unit tests for modules] for information about writing unit tests—specifically, writing unit tests for new modules.)
Do not mix sweeping formatting changes with behavior changes. Large formatting changes should be a separate commit, containing only formatting changes, and the commit comment should indicate that only formatting was changed. This way, code that changes program behavior won't be obscured by a mass of changes to the formatting of the code.
The Standard C++ Foundation's [C++ Core Guidelines][] have useful advice about [aspects of coding and design][].
(Again, except in a few instances, this pertains specifically to C++ code.)
The most important aspect of formatting is that the code is easy to understand. Below are unenforced preferences.
Prefer underscores_in_identifiers
not CamelCaseInIdentifiers
and, in R, not dots.in.identifiers
. Prefer lowercase-only
identifiers. An exception may be made for commonly-recognized names
used in a small scope, for example,
c++
I = V / R;
F = m * a;
E = m * (c * c);
Avoid unnecessary parentheses. For example, use "a * b / c
"
instead of (a * b) / c
"a * (b /
c)
"(a / b) * c
instead of (the equivalent) a / b * c
is acceptable.
Parentheses may also be used to group portions of a formula that are
commonly considered as a sub-unit, where they provide some semantic
value (see the previous bullet point). Consider naming parts of a
complicated expression in order to break it down into simpler ones.
For example,
c++
x = (-b + sqrt(b * b - 4 * a * c)) / (2 * a);
may be rewritten in three lines as
c++
num = -b + sqrt(b * b - 4 * a * c);
denom = 2 * a;
x = num / denom;
Note that in C++, unlike in R, return statements do not require parentheses around the returned expression.
Restrict the line length of paragraph-like comments to 80 characters, excepting a compelling reason to do otherwise. Lines in sections that are not paragraph-like could be somewhat longer if it facilitates presenting material in a more readable format. In the following snippet from the module library documentation, for example, we have allowed slighly-longer lines in order to be able to maintain one line per interval:
c++
/*
* However, this definition is flexible. For example, for our soybean model
* (soybean_development_rate_calculator.h) we define the intervals as follows:
* -1 <= DVI < 0 : Sowing to Emergence
* 0 <= DVI < 1 : Emergence to R1 (Flowering) is broken into three stages.
* 0 <= DVI < 0.333 : Emergence to V0 (Cotyledon stage)
* 0.333 <= DVI < 0.667 : V0 (Cotyledon stage) to R0 (End of Floral Induction)
* 0.667 <= DVI < 1 : R0 (End of Floral Induction) to R1 (Flowering)
* 1 <= DVI < 2 : R1 (Flowering) to R7 (Maturity)
*/
As for the code lines themselves, we point to the following advice from the Linux kernel project:1
The preferred limit on the length of a single line is 80 columns.
Statements longer than 80 columns should be broken into sensible chunks, unless exceeding 80 columns significantly increases readability and does not hide information.
Do not include trailing whitespace, i.e., whitespace characters preceding newline characters.
Each file should end with a newline character (i.e. a terminal endline).
Use spaces rather than tab characters.
In general, formatting preferences should follow something similar to the [Google C++ style guide][], except in cases where the code has been formatted in a more readable way, such as when aligning parts in a table.
The Standard C++ Foundation guidelines offer some advice about [formatting conventions][aspects of coding and design] that are informative, particularly regarding the use of code comments.
For tools to help with formatting code, see the "Formatting Tools" section.
Prefer to use the double-bracket operator (list[['element']]
) rather than
the dollar-sign operator (list$element
) when accessing the elements of a
list. The $
operator uses partial matching, whereas [[
, by default, does
not. (However, it can be specified: list[['element', exact = FALSE]]
.)
Avoiding partial matching by using [[
gives us more confidence that errors
won't occur.
While there is no inherent performance difference between a for
loop and an apply-type function such as apply
or lapply
(the
apply functions actually use for
loops in their source code), it
is nevertheless possible to write a "bad" for
loop that runs
slowly. Common culprits include a failure to pre-allocate memory or
a poor choice in assignment method. If a for
loop seems to run
slowly, consider replacing it with an apply-type function or
tweaking the assignment method (e.g. replacing append
with
[
). Many guides for optimizing loop performance are available
online, such as [Strategies to Speedup R Code][] and [Why loops are
slow in R][].
Generally our R code follows the Advanced R (1e) Style Guide with the exception of indentation, where we use 4 spaces rather than 2.
Put a space after control statements (such as if
or for
), but do not
put a space after other function names (such as function
, return
, or
sqrt
). For example:
```R myfun <- function(x) { y <- if (x == 1) { 2 } else { sqrt(x) }
return(y)
} ```
(Yes, in R, function
is a function.)
1 The Linux kernel project recently changed the default length for code lines from 80 to 100 characters with the following commit comment:
Yes, staying withing 80 columns is certainly still preferred. But it's not the hard limit that the checkpatch warnings imply, and other concerns can most certainly dominate.
Increase the default limit to 100 characters. Not because 100 characters is some hard limit either, but that's certainly a "what are you doing" kind of value and less likely to be about the occasional slightly longer lines. ↩
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.