README.md

dfcheck

dfcheck is a R package and an acronym for Data Frame Check. The aim of this package is to check data.frames to find possible mistakes from users, like misspelling, extra spaces or cloned columns.

Installation

Stable version

Build Status

dfcheck is under active development then there is no real stable version for the moment.

The master branch have a version that always build, then you can safely use is. Nevertheless, this branch don't include lasted features.

To install it from GitHub, install the devtools and then type this line in R:

devtools::install_github(repo = "jomuller/dfcheck", ref = "master")

Developpement version

Build Status

To install the development version from GitHub, install the devtools and then type this line in R:

devtools::install_github(repo = "jomuller/dfcheck", ref = "dev")

Participate

BugTracking

You are welcome to open issues on the issue page in GitHub.

Code, documentation

You are welcome to fork to improve the code and the documentation. We try to use test driven development with testhat and well documented code with roxygen then functionnalities are added relatively slowly.

Release plan

Follow the milestones of this GitHub project to show the release plan.

Motivation

The dfcheck package was created to speed-up the boring and important step of checking the databases that user send us during the methodology consultations. Most of the time, we receive Excel files on the XLSX format. We open them using the openxlsx package, and we hope we could do some analysis on this data. But during the analysis, we detect always the same errors :

The problem is we need a minimum of structuration in the table to be able to give them to our statistic software. We previously tried to give to our users some guidelines to give us perfect data that should be directly processed using, for example, vartors. This improves the quality of the table, but errors are more insidious and cost us a lot of time to check and correct.

The main aim of the dfcheck package is to detect and repport the maximum of possible errors, before performing the statistical analysis.



jomuller/dfcheck documentation built on May 19, 2019, 7:26 p.m.