dev-notes.md

Developer's Note

Also Check CRAN Comments

Dependability

If a user builds a continuous workflow of data wrangling starting with some functions from tidycells package or includes tidycells in their package, then proper precautions should be taken as tidycells functions are heuristic-based. One can face problem like column collated_2 has been renamed to collated_3 etc.

The package has two main functions which may raise some dependability issues in future. These functions are tidycells::analyze_cells and tidycells::collate_columns (which are based on heuristics and internal statistical logic). The main cause for potential output variation across different releases (future CRAN releases) of this package might be due to changes in tidycells::analyze_cells. Since tidycells::read_cells is dependent on these functions, it will also be affected equally.

The package has been developed observing certain types of oddly structured data available to the developer (me). However, if any user has any issues in automatic understanding of the underlying structure, it is expected that the same will be attempted to address in a future release (provided the user inform me the issue). This may be referred to as the "Heuristic Maturation" process for these two functions. As and when the tests written for these functions requires a modification (which means the output column name and other major details has been changed) I'll bump the (version convention ..) version in the release. If only is changed since a user last used this package, then possibly you can depend on this package without any worry.

After first CRAN release when next version will be released in CRAN hopefully I'll be able to provide you with a compatibility checker function to work out and find potential tendency issues.

Don't worry I'll try my best not to break your code intentionally. This is just a message to you so that you can build careful dependency on this package.

Contributing

You are most welcome to contribute to this project in any means.

Apart from opening an issue in Github (preferably with reprex), you can also contribute mainly to the "Heuristic Maturation" process for tidycells::analyze_cells and tidycells::collate_columns. If any issue is specific to a data which you would like to share with me, there is a friendly function to mask your data using tidycells:::mask_data (this function is not exported to avoid possible name conflicts).

FAQs

1) Why there are two code coverages?

The package contains optional functionality which are written as shiny widgets. These are given to the user as visual_* series of functions. Limited tests for these are developed and tested in a few testing environments. These tests are based on shinytest package. The covr package is not yet (at least the CRAN version) capable to track code coverages in shinytest [Ref: r-lib/covr #277]. Also note that shinytest is not yet taking widget-based functions [Ref: rstudio/shinytest #157] (at least the CRAN version). That is why a set of functions is introduced to run tests for shiny.

On these grounds, codecov Codecov test coverage is used to give full coverage (without any restrictions) (ideally this should increase provided the support for covr is introduced in shinytest (and covr). Also "brush input fractional mismatch" (mentioned below) issue gets some resolution) . While the coveralls Coveralls test coverage shows the coverage excluding shiny_*.R and visual_*.R files (showing the coverage for only main functionality)

2) Where the shiny modules are tested?

The shiny tests are only carried out in selected testing environments because of several difficulties. The difficulties are listed below.

Difficulties during shiny test:

Given these difficulties, the shiny tests are tested in the following environments (and in all Windows environment listed below).

| Test Environment | OS | R Version | Screenshot Tested | |------------------|-----------------------------------------|---------------------------------------------|-------------------| | Local | Windows 10 x86 Build 9200 | R version 3.6.1 (2019-07-05) | yes | | Local | Windows 10 x64 Build 17134 | R version 3.6.0 (2019-04-26) | yes | | AppVeyor | Windows Server 2012 R2 x64 (build 9600) | R version 3.6.1 Patched (2019-07-24 r76894) | no | | AppVeyor | Windows Server 2012 R2 x64 (build 9600) | R version 3.6.1 (2019-07-05) | no | | AppVeyor | Windows Server 2012 R2 x64 (build 9600) | R version 3.5.3 (2019-03-11) | no |

Note: the screen-shots are also tested (apart from JSON test).

Current State of Development and Way Forward

Check trackable version here.

R-hub other builds

See other successful builds in CRAN Comments

See whole build matrix below

Note : Neither of these errors (or notes) are attributable to the package as they failed because of induced system dependency or optional package dependency.

All build summary

| Package | Version | Submit Date | Where | OS Type | OS Description | R Version | R Version Tag | Platform | State | |-----------|---------|-------------|------------|---------|-----------------------------------------|------------------------------------------------------------------------|-------------------------------------------------------------------|------------------------------------|-----------| | tidycells | 0.2.2 | 2020-01-06 | RHub | Windows | Windows Server 2008 R2 SP1 | R version 3.6.2 (2019-12-12) | R-release 32/64 bit | x86_64-w64-mingw32 (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | RHub | Windows | Windows Server 2008 R2 SP1 | R version 3.6.2 Patched (2019-12-12 r77564) | R-patched 32/64 bit | x86_64-w64-mingw32 (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | RHub | Windows | Windows Server 2008 R2 SP1 | R version 3.5.3 (2019-03-11) | R-oldrel 32/64 bit | x86_64-w64-mingw32 (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | RHub | Windows | Windows Server 2012 | R version 4.0.0 Under development (Testing Rtools) (2019-09-30 r77236) | R-devel Rtools4.0 32/64 bit | x86_64-w64-mingw32 (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | RHub | Windows | Windows Server 2008 R2 SP1 | R Under development (unstable) (2019-11-08 r77393) | R-devel 32/64 bit | x86_64-w64-mingw32 (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | RHub | Linux | Ubuntu Linux 16.04 LTS | R Under development (unstable) (2020-01-03 r77629) | R-devel with rchk | | PREPERROR | | tidycells | 0.2.2 | 2020-01-06 | RHub | Linux | Ubuntu Linux 16.04 LTS | | R-release GCC | | PREPERROR | | tidycells | 0.2.2 | 2020-01-06 | RHub | Linux | Ubuntu Linux 16.04 LTS | R Under development (unstable) (2020-01-03 r77629) | R-devel GCC | | PREPERROR | | tidycells | 0.2.2 | 2020-01-06 | RHub | Solaris | Oracle Solaris 10 x86 32 bit | R version 3.6.0 (2019-04-26) | R-patched | i386-pc-solaris2.10 (32-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | RHub | macOS | macOS 10.11 El Capitan | R version 3.6.2 (2019-12-12) | R-release | x86_64-apple-darwin15.6.0 (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | RHub | Linux | Debian Linux | R Under development (unstable) (2018-06-20 r74924) | R-devel GCC ASAN/UBSAN | | PREPERROR | | tidycells | 0.2.2 | 2020-01-06 | RHub | Linux | CentOS 6 with Redhat Developer Toolset | R version 3.5.2 (2018-12-20) | R from EPEL | x86_64-redhat-linux-gnu (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | RHub | Linux | CentOS 6 | | stock R from EPEL | | PREPERROR | | tidycells | 0.2.2 | 2020-01-06 | RHub | Linux | Fedora Linux | R Under development (unstable) (2020-01-03 r77629) | R-devel GCC | x86_64-pc-linux-gnu (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | RHub | Linux | Fedora Linux | R Under development (unstable) (2020-01-03 r77629) | R-devel clang gfortran | x86_64-pc-linux-gnu (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | RHub | Linux | Debian Linux | | R-release GCC | | PREPERROR | | tidycells | 0.2.2 | 2020-01-06 | RHub | Linux | Debian Linux | | R-patched GCC | | PREPERROR | | tidycells | 0.2.2 | 2020-01-06 | RHub | Linux | Debian Linux | R Under development (unstable) (2020-01-03 r77629) | R-devel GCC no long double | | PREPERROR | | tidycells | 0.2.2 | 2020-01-06 | RHub | Linux | Debian Linux | R Under development (unstable) (2020-01-03 r77629) | R-devel GCC | | PREPERROR | | tidycells | 0.2.2 | 2020-01-06 | RHub | Linux | Debian Linux | R Under development (unstable) (2019-08-18 r77026) | R-devel clang ISO-8859-15 locale | | PREPERROR | | tidycells | 0.2.2 | 2020-01-06 | AppVeyor | Windows | Windows Server 2012 R2 x64 (build 9600) | R version 3.6.2 (2019-12-12) | R_VERSION=release, R_ARCH=x64 | x86_64-w64-mingw32 (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | AppVeyor | Windows | Windows Server 2012 R2 x64 (build 9600) | R Under development (unstable) (2020-01-03 r77629) | R_VERSION=devel | x86_64-w64-mingw32 (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | AppVeyor | Windows | Windows Server 2012 R2 x64 (build 9600) | R version 3.6.2 Patched (2020-01-03 r77629) | R_VERSION=patched | x86_64-w64-mingw32 (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | AppVeyor | Windows | Windows Server 2012 R2 x64 (build 9600) | R version 3.5.3 (2019-03-11) | R_VERSION=oldrel, RTOOLS_VERSION=33, CRAN=http://cran.rstudio.com | x86_64-w64-mingw32 (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | Travis | linux | Ubuntu 16.04.6 LTS | R version 3.5.3 (2017-01-27) | R: oldrel | x86_64-pc-linux-gnu (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | Travis | linux | Ubuntu 16.04.6 LTS | R version 3.6.1 (2017-01-27) | R: release | x86_64-pc-linux-gnu (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | Travis | osx | macOS High Sierra 10.13.6 | R version 3.6.2 (2019-12-12) | R: release | x86_64-apple-darwin15.6.0 (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | Travis | linux | Ubuntu 16.04.6 LTS | R Under development (unstable) (2020-01-03 r77628) | R: devel | x86_64-pc-linux-gnu (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | WinBuilder | Windows | | R version 3.5.3 (2019-03-11) | | x86_64-w64-mingw32 (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | WinBuilder | Windows | | R version 3.6.2 (2019-12-12) | | x86_64-w64-mingw32 (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | WinBuilder | Windows | | R Under development (unstable) (2020-01-03 r77629) | | x86_64-w64-mingw32 (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | Local | Windows | Windows 10 x64 (build 17134) | R version 3.6.1 (2019-07-05) | | x86_64-w64-mingw32/x64 (64-bit) | OK | | tidycells | 0.2.2 | 2020-01-06 | Local | Windows | Windows 10 x64 (build 17134) | R version 3.6.2 (2019-12-12) | | x86_64-w64-mingw32/x64 (64-bit) | OK |

Dependency Explained

To install tidycells (with bare minimum functionality) you need do following two things

Rest packages are optional and can be installed as per your requirements.

Note the package has "Induced System Dependency" which is causing to break the code sometime in R-hub. Below table describes the same (note that below table is an indicative list and may not be complete.).

| Package | Type | Reason | Implied Critical Dependency | Induced System Dependency | |--------------|----------|-------------------------------------------------------------------------|-------------------------------|--------------------------------------| | Imports | | | | | | dplyr | Imports | core | | | | ggplot2 | Imports | plots | Rcpp | | | graphics | Imports | base | | | | magrittr | Imports | core | | | | methods | Imports | base | | | | purrr | Imports | core | | | | rlang | Imports | core | | | | stats | Imports | tidycells::collate_columns --> tidycells:::similarity_score | | | | stringr | Imports | core | | | | tibble | Imports | core | | | | tidyr | Imports | core | | | | unpivotr | Imports | core | xml2 | libxml2 | | utils | Imports | base | | | | Suggests | | | | | | cli | Suggests | nice prints | | | | covr | Suggests | code coverage | | | | docxtractr | Suggests | read doc and docx | | LibreOffice (Suggested Dependency) | | DT | Suggests | for visual_traceback plots | | | | knitr | Suggests | vignettes | | | | miniUI | Suggests | for visual_ functions | | | | plotly | Suggests | optional interactive ggplot2 in visual_ functions | httr, openssl | openssl / libssl | | readr | Suggests | read csv | | | | readxl | Suggests | read xls | | | | rmarkdown | Suggests | vignettes | | | | rstudioapi | Suggests | object selector in Rstudio | | | | shiny | Suggests | for visual_* functions | httr, openssl | openssl / libssl | | shinytest | Suggests | shiny module tests | | | | stringdist | Suggests | tidycells::collate_columns --> tidycells:::similarity_score (Enhance) | | | | tabulizer | Suggests | read pdf | rJava | Java | | testthat | Suggests | tests | | | | tidyxl | Suggests | read xlsx | | | | xlsx | Suggests | read xls (prefered option) | rJava | Java | | XML | Suggests | read html like files | | |



r-rudra/tidycells documentation built on July 19, 2022, 5:10 a.m.