README.md
In anjanadevanand/toyWGN: A Toy Stochastic Weather Generator

Naming: Can contain letters, numbers and periods. It must start with a letter and cannot end with a period. Hadley recommends against using periods to avoid confusion with the names of S3 methods. He also recommends against using a mix of uppercase and lowercase letters - since it makes the package name hard to remember and type.

Directory Structure inside the top-level directory package_name/

R/ - contains code in *.R files

tests/

data/ - contains .rda files that are to be distributed with the package. May contain files in .R or *.csv formats as well.

man/ - contains *.Rd files for documentation of package, functions and data. These files may be written manually or created using roxygen2.

inst/ - can contain any files. Typically contains copyrights and citation files. The contents of this directory will be recursively copied to the top-level directory after installation. So the filenames in this directory should be distinct from the original contents of the top-level directory.

src/ - contains *.cpp files

Other directories: demo/, exec/, po/, tools/, vignettes/

Note: README.md appears to be a valid top-level directory component in a bundled source package

Write .Rd files in the man/ directory of the package roxygen2: can be used to create *.Rd files in the man/ directory for documentation. These files are used for documentation of the package, functions, and data. roxygen2 turns specifically formatted comments into .Rd files. It can also manage NAMESPACE and the Collate field in DESCRIPTION.

Note1: The subscripts and superscripts in equations are not rendered correctly using roxygen2, need to look into this. There does not appear to be a fix for this. Subscripts and superscripts will be rendered correctly in the pdf version of the document, but not the html R helpfiles.

Note2: Add NULL after @import or @importFrom roxygen comments, if these comments are used at the beginning of the file, separate from the comment block for a function.

*yet to look into vignettes

Data may be stored in four directories in a package:

1. data/ Directory to store data that is to be released as part of the package. Data files should not typically exceed 1 MB in size for CRAN release; must be optimally compressed. Wickham recommends that the data be stored in RData files containing a single object with name same as the filename.

Can use the command usethis::use_data(data1, data2) to create data1.rda and data2.rda files in the data directory.

Other types of data files this directory may contain as per the CRAN manual: "plain R code (.R or .r), tables (.tab, .txt, or .csv, see ?data for the file formats). Note that R code should be if possible “self-sufficient” and not make use of extra functionality provided by the package, so that the data file can also be used without having to load the package or its namespace: it should run as silently as possible and not change the search() path by attaching packages or other environments"

2. R/sysdata.rda To store data that is not available to the users of the package. This is the place to keep data that the functions in the package need. The function objects are created at installation, and these files are not required thereafter.

3. inst/extdata Directory to store raw data. Raw data may be included as part of a package for examples or vignettes. Are the raw data files typically in JSON/YAML formats?

The contents of inst/ are recursively copied to the top-level directory after installation. To find a file in inst/ from the code use system.file() function.

Example: system.file( “extdata”, “mydata.csv”, package = “mypackage”). Note: the inst/ directory is omitted from the path specified (arg1) in the system.file function call.

4. tests/ It is okay to keep small data files for tests in this directory.

Notes on where to keep data typically hardcoded in a package:

If the default data should not be available as data files to the users - they may have to be stored as sysdata.rda in the R/ directory.

If the data may be available to the users: The data can be created using a .R file in data/ folder. Potential advanatge of this method over keeping the data inline with the code - ease of editing default parameters used by the functions in one single file in a central location. The .R file would also be more readable than *.RData files in the data/ directory or sysdata.rda files in the R/ directory. The disadvantage could be a dissociation of data from the code that uses it [Reference link 3]

Reference links: 1. https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Data-in-packages 2. https://stackoverflow.com/questions/47499671/how-to-store-frequently-used-data-or-parameters-within-an-r-package 3. https://community.rstudio.com/t/creating-global-object-vs-storing-it-as-sysdata-rda/3022 4. https://owi.usgs.gov/R/training-curriculum/r-package-dev/mechanics/

Examples from other packages

Separate package for a huge dataset that can be used by multiple other packages: https://github.com/hadley/nycflights13 Package that contains R code to generate data in the data/ directory: https://github.com/cran/stacomiR/blob/master/inst/config/generate_data.R. Note: This is different from keeping an *.R code directly in the data/ directory.

Package wide global variables stored in an separate environment?

https://stackoverflow.com/questions/12598242/global-variables-in-packages-in-r

https://www.r-bloggers.com/global-variables-in-r-packages/

https://www.r-bloggers.com/package-wide-variablescache-in-r-packages/

Parameters of the WGN are stored in a separate environment in this package.

Custom conditions are used to output metadata of errors and provide more useful error messages for function arguments. I have used rlang::abort() to signal errors of the same style and store additional metadata about the error (Advanced R, Ch 8.5). The advantage of rlang::abort() over base::stop() is the ability to contain metadata.

Note: More complicated wrappers for conditionhandlers are mentioned in Chapter 8 of the Advanced R book. These applications are:

To set the default failure value in case of an error

To set both success and failure values while evaluating code

To convert warning messages to errors and stop execution

To record conditions for later investigation

To implement a logging system based on conditions

I can revisit these if they are useful for implementation in foreSIGHT.

Effects of condition system on coding syle (custom condition system vs. default stop messages):

Output error messages of same style in the package (error_bad_argument, error_infeasible_argument, error_argument_dimension)

When custom conditions are in place, the code is more uniform to read.

The condition system leads you to classify potential errors into groups (eg: type-mismatch, dimension mismatch, infeasible range), so that each group may be handled with a common type of custom error message. This also serves as a reminder to include all the potential error checks while writing new functions.

Additional Discussion Points

Where should you signal errors that occur due to user input? - At the time of input. If the parameter input is separate from the function call that uses the parameters, a checker function may be used to signal errors and warnings immediately at the time of input. - At the top level function call. - At the level of internal function call. Here the user may need to go through multiple iterations to fix the same error in multiple instances.

Condition system for internal functions that do not interact with user inputs?

Formatting of error messages? Tidyverse style guide recommends bulleting error messages, and providing information about multiple issues in a single pass.

Notes on Conditions and Condition Handling

Conditions

Base R functions: stop("Error"), warning("warning message"), message("informative message") rlang equivalent to stop("Error") : rlang::abort("Error"), which can store additional metadata about the error. Tidyverse style guide includes recommendations for writing informative error and warning messages. See Point 6 below for a Summary.

Ignore Conditions

Base R functions: try(), suppressWarnings(), suppressMessages()

Handle Conditions

Conditionhandlers are used to supplement or temporarily override the default behaviour in case of a condition (error, warning, message) Condition object: List with 2 elements, - message : a character vector containing text to display. Can be accessed using conditionMessage(cnd) - call : call that triggered the condition. Can be accessed using conditionCall(cnd)

Base R functions to handle a condition object: tryCatch(), withCallingHandlers() Custom conditions can be created using rlang::abort() to store additional information about the error to use for debugging.

Helpful GitHub training presentation

https://github.com/IMMM-SFA/git-training/blob/master/materials/Presentation.pptx

Version Numbering

major.minor.patch https://semver.org/#faq https://blog.codeship.com/best-practices-when-versioning-a-release/

Function names: - Tidyverse syle guide recommends using snake_case for all objects - using verbs for function names and nouns for other objects. - Google style guide recommends using BigCamelCase for function names to distinguish them from other objects. - Since foreSIGHT currently uses smallCamelCase for function names, it may be better to stick with it.

Internal functions: - Google style guide recommends starting internal functions with a period (.). - Tidyverse style guide does not mention a different naming convention for internal functions. They recommend identifying internal functions using specifc roxygen tags in the documentation @keywords internal and @noRd.

Error messages: - If cause of the problem is clear - use "must" in the error statement. (eg: Error: 'n' must be a numeric vector, not a character vector.) - If you cannot state what was expected - use "can't". (eg: Error: Can't coerce 'x' to a vector.) - Do your best to reveal lcoation, name and/or content of the troublesome component. eg: Error: Each result must be a single integer: * Result 1 is a character vector. - If the source is unclear, avoid pointing the user in the wrong direction. - If there are multiple issues, prefer to use a bulleted list. If the list is too long, truncate to show only a first few. - Provide hints only for common errors. End the hint with a question mark.

Discussion Points

Strategy to implement expect_equal() type of unit tests for functions with stochastic return values.

Notes on Unit Testing

Formal automated testing using the testthat package. Test files are to be placed in the test/testthat directory. testhat package has to be added to the Suggests field in description. Use devtools::test() to run the tests. - A file groups together multiple related tests - A test groups together multiple expectations. A test is created using test_that(). - An expectation is the simplest form of testing. It describes the expected result of a computation. An example test:

test_that("functionality description", {
  expect_equal(fun(x1), value1)
  expect_equal(fun(x2), value2)
  expect_equal(fun(x3), value3)
  })

Commonly used expectations: expect_equal(), expect_identical(), expect_match(), expect_output(), expect_warning(), expect_error(), expect_is(), expect_true(), expect_false(), expect_equal_to_reference()

To have functions that behave differently for different objects, with the same external interface. eq: print(as.Date("2020/01/01")) behaves differently from print("2020/01/01")

Base Objects: These objects come from S, and were developed before object oriented systems were thought of in R.
"Object Oriented" (OO) Objects
- OO objects typically have a class attribute that specify their object type.

Note: Use sloop::otype() or sloop::s3_class() to identify the types of objects in R.

All objects in R (Base and OO) have a base type. Base types are not part of the OO system, because functions that behave differently for different base objects are written primarily in C code that uses switch statements. Examples of base types: NULL, logical, integer, double, character, list etc.

R's first and simplest OO system
Very flexible system, so the key to successful use is to apply the constraints yourself
Note: Look at the vctrs package page if using S3.

S3 Object

Base object with atleast a class attribute

R is not strict about classes. The class attribute of any object can be set using class(x) <- "class_name" . The S3 object behaves differently from its underlying base type when it's passed to a generic function .
The class name can be any string. When creating a new class, Hadley strongly reccommends adding the package name in the class name, so that it does not clash with classes in other packages.
Since S3 doesn't provide a format definition of a class, there is no built-in check to ensure that all objects of the same class have the same structure (i.e., the same base type and attributes). So this uniformity has be enforced by the creator of the class. Note: Hadley recommends providing three functions when creating new classes to ensure that objects of the same class are uniform.
1. constructor (new_myclass()): Function that creates new objects of the correct structure.
2. validator (validate_myclass()): Function that performs checks to ensure that the object has correct values.
3. helper(myclass()): Function to provide convenient way for users to create objects of the new class.

Functions that Act on S3 Objects

Generic --- Finds the right implementation depending on the type of object that is passed into it. - The implementation is called a method. An S3 method should not be called directly. You should rely on the generic to find the method using the object passed into it. - The process of finding the right method is called method dispatch. Note: Use sloop::s3_dispatch() to see the process of method dispatch. Eg: ```

sloop::s3_dispatch(print(as.Date("2020/01/01"))) => print.Date * print.default **S3 Method** --- The implementation for a specific class of object. - Functions with a special naming scheme, **generic.class()** - S3 generics and methods can be identified using `sloop::ftype()`. Eg: sloop::ftype(print) [1] "S3" "generic" sloop::ftype(print.Date) [1] "S3" "method"

**How do Generics and Methods Work?**

S3 generic performs method dispatch (i.e., finds the method for a specific class) based on the first argument passed to the generic (typically). The generic calls `UseMethod()` to find the method required. 
> * `UseMethod()` uses `paste0("generic", ".", c(class(x), "default")` to search for potential methods. (This is why S3 methods have to be named `generic.class()`)

A generic function looks like this:

generic <- function(x, ...) { UseMethod("generic") }

_Note1:_ Advanced R warns against doing any computation inside a generic. The generic should only call `UseMethod()`.  
_Note2:_ A Method must have the same arguments as it's generic, unless the generic has `...` argument. In this case, the method can have a superset of arguments.  

Each generic has a default method `generic.default()`. This gets called if the specific class in question does not have a method defined for it.  

**More Compilcated Aspects of Method Dispatch**  

**1. Inheritance**  

S3 classes can share behaviour by having a character **_vector_** as the class. The first element of the class vector is a **_subclass_** of the second vector

class(as.POSIXlt(as.Date("2020/01/01"))) [1] "POSIXlt" "POSIXt"

To create a subclass
> 1. the parent constructor needs to have `...` and `class` arguments    
Recommendations while creating a sub-class
> 1. base type of subclass and superclass should be the same  
> 2. attributes of the subclass should be a superset of the attributes of the superclass

`NextMethod()` : Function to call the method that would have been called if the method for a class did not exist. The function is useful to define a new method on internal generics that preserves the class and subclass of the object it returns. (For example, `` `[` `` is an internal generic, and `NextMethod()` may be used to define subsetting behaviour that preserves the class of the subset returned).  

**2. S3 Dispath on Base Objects**  

Base objects have **implicit class** asccociated with them. These can be seen using `sloop::s3_class()`.

sloop::s3_class(1:10) [1] "integer" "numeric"

S3 generics dispath on the implicit class of base objects when these functions are called on base objects.

**3. Internal Generics**  

Functions like `` `[` ``, `sum()`, `length()` . These generic do not dispatch to methods unless a class attribute has been set (i.e., they perform method dispatch only for S3 objects)

**4. Group Generics**  

The term referes to generic functions that are grouped together. There are four group generics:  
**`Math`:** `abs()`, `log()`, `sin()`, `cos()`, etc.  
**`Ops`:** `+`, `-`, `==`, `<`, `>`, `&`, `|`, etc.  
**`Summary`:** `sum()`, `min()`, `max()`, `all()`, `any()` etc.  
**`Complex`:** `Im()`, `Re()` etc.  

If a S3 method is defined on the group generic for a class (eg: `Math.class()`), this method gets called when any of the functions within the group are called on an S3 object of that class (eg: `abs(S3 obj of class)` or `log(S3 obj of class)`)

**5. Double Dispatch**  

For generics in the `Ops` group (`+`, `-`), S3 dispatch finds a method by dispatching on both the arguments to the generic. To preserve commutative property of the operators.  

### R6

- Used encapsulated OOP. So, **methods belong to objects** not generics. The function call would be `object$method()`.  
- R6 objects can be modified in place. There is an R6 library in R to create and use R6 objects. `R6Class()` is the function used to create both the class and it's methods.  
> ```
> # What does modify in place do?
> # Example from Python
> x = [1, 2]
> y = x
> y[1] = 10     # would change x also
> print(y)
> print(x)
>  
> Output:
> [1, 10]
> [1, 10]
> ```

#### R6 Class Definition and Use

**Definition of a an R6 Class Object using `R6Class()` Function**

MyClass <- R6Class(class = "MyClass", inherit = ParentClass, # If Applicable public = list(), # Methods are functions, fields are everything else (typically variables) private = list(), # Methods and fields accessible only from inside the class active = list()) # active bindings to calculate **Creating R6 Class Objects** x <- MyClass$new()

# Accessing fields and methods within the R6 Class object x$ x$

# Method Chaining x$$ ``` The methods of the object would operate on some fields of the object modifying it in-place

#### Important Methods in the R6Class() function call

$initialize to define the intialization of instance of the class. Overrides the default $new()
$print
$clone to create a copy of the R6 object instead of modifying it in place

#### Effects of Reference Sematics

A function call may modify it's R6 inputs in addition to modifying the return value. Note: Advanced R reccommends that a function may either return a value or modify it's R6 inputs, but not both
Its possible to define a $finalize method that deletes the private fields of an R6 object when it is deleted
If an R6 class is the default value of a field, it may be shared across all instances of an object if not defined properly.

## S4

Similar to S3, but stricter
There are specialized functions for creating classes (setClass()), generics (setGeneric()), and methods (setMethod())
S4 supportes multiple inheritance (multiple parents), and multiple dispatch (dispatch on the class of multiple arguments)

Contain named components called slots. These are accessed using @, similar to $ for lists.

S4 class definition

setClass("myclass", 
   contains = "parentclass"
   slots = c(
     name1 = "character",
     name2 = "numeric"
   )
   prototype = list(
     name1 = NA_character_
     name2 = NA_real_
   )
)

An object of the class can be constructed using new S4_obj <- new("myclass", name1 = "a", name2 = 1) The setValidity() method may be used to impose additional constraints and assess the validity of new objects created. new() automatically validates the object using the specifications set in setValidity when creating new S4 objects.

Functions for the S4 class may be defined using setGeneric() and setMethod()

setGeneric("myGeneric", function(x, ....) standardGeneric("myGeneric"), signature = "x")
setMethod("myGeneric", "myclass", function(x, ...) {
   # method implementation
})

The second argument to setMethod() is the signature. The signature can include multiple arguments to perform double dispatch. So, double dispatch need not be included as a special case as in S3.

S4 methods defined for a class typically include accessor functions to modify the values of its slots, and show function to define how the object prints.

S4 Method Dispatch

Uses multiple dispatch and multiple inheritance - making it much more complicated that S3 method dispatch.

Multiple Inheritance - Define fewer methods to work on related classes. Have to ensure that methods exists for the terminal nodes (the end parent classes), and that there is no ambiguity in choosing the method for classes inheriting from multiple parent classes. - S4 uses pseudo-class ANY (similar to S3 default), that is applied when the method for a class is not defined.

Multiple Dispatch - Dispatch on two or more arguments.

It is possible to use S3 or base objects in the slots of S4 objects. It is also possible to convert existing S3 methods to S4 methods.

Hadley recommends using the S3 object system if there isn't a strong case to use S4 or R6. Advantage is that it is simpler, and there are known approaches to overcome shortcomings.

S4 vs. S3

Main difference is that S4 is stricter.
S4 requires more upfront design, it may be suitable for larger projects and teams.
Good fit for a complex system of interelated objects - possible to reduce code duplication through careful implementation of methods (eg: Matrix Package - 102 classes, 21 generic functions, 1993 methods).
Main challenge with S4 - too complex, lacks documentation.

R6 vs. S3

Very different from S3 and S4 because it is built on encapsulated objects. R6 objects have reference sematics (modify in place), and this has consequences.
- R6 method lives in the local namespace of an object. So it does not interact with other functions of the same name in other objects or packages. Need not be too careful with function names and the function can use different arguments suited for each object (since the arguments do not have to pass through a common generic).
- Reference sematics allow functions to simultaneously return a value and modify an object. Modifiy and return can be useful for specific cases.
- Easier to implement method chaining

anjanadevanand/toyWGN documentation built on March 9, 2020, 6:57 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

anjanadevanand/toyWGN
A Toy Stochastic Weather Generator

README.md
In anjanadevanand/toyWGN: A Toy Stochastic Weather Generator

Toy package to implement R functionality

Considerations for creating an R-Package

1. Basic elements of setting up a package, including naming, directory structure and so forth.

2. Include detailed documentation. What forms of documentation should accompany the code (e.g. help files, readme files, vignettes, comments)? Can you implement roxygen and rmarkdown?

Notes on where to keep data typically hardcoded in a package:

Examples from other packages

Package wide global variables stored in an separate environment?

4. Develop a condition system that traps errors and provides useful warnings. What does this do not only to code robustness, but also to your coding style?

Effects of condition system on coding syle (custom condition system vs. default stop messages):

Additional Discussion Points

Notes on Conditions and Condition Handling

Conditions

Ignore Conditions

Handle Conditions

5. Use GitHub to help with version control (Sam can get you set up with a private account). Also develop an opinion on version numbering for your own package (e.g. when do you change version numbers, etc)

Helpful GitHub training presentation

Version Numbering

7. Implement unit testing in the package throughout, using the Test That package. What does unit testing do to your coding style? Does it change how you implement functions in R?

Discussion Points

Notes on Unit Testing

8. Implement the three different object systems (S3, S4, R6). What is the difference between them? When should you use one over the other?

Notes on the Object Systems in R (from Advanced R)

Why an object system?

Two Types of Objects in R

S3

S3 Object

Functions that Act on S3 Objects

S4 Objects

S4 Method Dispatch

Tradeoffs

S4 vs. S3

R6 vs. S3

9. Read The Pragmatic Programmer. What does implementation of DRY (pg 30) mean for your particular code? What is decoupling? How to you create orthogonality in code? What are some of the key messages in the ‘test to code’ chapter? Why ‘refactor’?

11. Implement plotting using the default R plots, as well as ggplot. What is the difference in approach? Do you have a preference? Read the ggplot book and form a view in terms of why this is becoming so popular but also why some people do not like it.

12. Implement some of the core algorithms in C++ using the RCpp package. There is a chapter on doing this and some basic C++ commands in the Advanced R book.

R Package Documentation

Browse R Packages

We want your feedback!

anjanadevanand/toyWGN A Toy Stochastic Weather Generator

README.md In anjanadevanand/toyWGN: A Toy Stochastic Weather Generator

Toy package to implement R functionality

Considerations for creating an R-Package

1. Basic elements of setting up a package, including naming, directory structure and so forth.

2. Include detailed documentation. What forms of documentation should accompany the code (e.g. help files, readme files, vignettes, comments)? Can you implement roxygen and rmarkdown?

Notes on where to keep data typically hardcoded in a package:

Examples from other packages

Package wide global variables stored in an separate environment?

4. Develop a condition system that traps errors and provides useful warnings. What does this do not only to code robustness, but also to your coding style?

Effects of condition system on coding syle (custom condition system vs. default stop messages):

Additional Discussion Points

Notes on Conditions and Condition Handling

Conditions

Ignore Conditions

Handle Conditions

5. Use GitHub to help with version control (Sam can get you set up with a private account). Also develop an opinion on version numbering for your own package (e.g. when do you change version numbers, etc)

Helpful GitHub training presentation

Version Numbering

7. Implement unit testing in the package throughout, using the Test That package. What does unit testing do to your coding style? Does it change how you implement functions in R?

Discussion Points

Notes on Unit Testing

8. Implement the three different object systems (S3, S4, R6). What is the difference between them? When should you use one over the other?

Notes on the Object Systems in R (from Advanced R)

Why an object system?

Two Types of Objects in R

S3

S3 Object

Functions that Act on S3 Objects

S4 Objects

S4 Method Dispatch

Tradeoffs

S4 vs. S3

R6 vs. S3

9. Read The Pragmatic Programmer. What does implementation of DRY (pg 30) mean for your particular code? What is decoupling? How to you create orthogonality in code? What are some of the key messages in the ‘test to code’ chapter? Why ‘refactor’?

11. Implement plotting using the default R plots, as well as ggplot. What is the difference in approach? Do you have a preference? Read the ggplot book and form a view in terms of why this is becoming so popular but also why some people do not like it.

12. Implement some of the core algorithms in C++ using the RCpp package. There is a chapter on doing this and some basic C++ commands in the Advanced R book.

R Package Documentation

Browse R Packages

We want your feedback!

anjanadevanand/toyWGN
A Toy Stochastic Weather Generator

README.md
In anjanadevanand/toyWGN: A Toy Stochastic Weather Generator