checkDataIntegrity: Conveniently checking data integrity

Description Usage Arguments Value Author(s) Examples

View source: R/checkDataIntegrity.R

Description

This function is designed to make it easy to perform some data integrity checks, specifically checking for values that are impossible or unrealistic. These values can then be replaced by another value, or the offending cases can be deleted from the dataframe.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
checkDataIntegrity(x,
                   dat,
                   newValue = NA,
                   removeCases = FALSE,
                   validValueSuffix = "_validValue",
                   newValueSuffix = "_newValue",
                   totalVarName = "numberOfInvalidValues",
                   append = TRUE,
                   replace = TRUE,
                   silent = FALSE,
                   rmarkdownOutput=FALSE,
                   callingSelf = FALSE)

Arguments

x

This can be either a vector or a list. If it is a vector, it should have two elements, the first one being a regular expression matching one or more variables in the dataframe specified in dat, and second one being the condition the matching variables have to satisfy. If it is a list, it should be a list of such vectors. The conditions should start with a Comparison operator followed by a value (e.g. "<30" or ">=0).

dat

The dataframe containing the variables of which we should check the integrity.

newValue

The new value to be assigned to cases not satisfying the specified conditions.

removeCases

Whether to delete cases that do not satisfy the criterion from the dataframe (if FALSE, they're not deleted, but the offending value is replaced by newValue).

validValueSuffix

Suffix to append to variable names when creating variable names for new variables that contain TRUE and FALSE to specify for each original variable whether its value satisfied the specified criterion.

newValueSuffix

If replace is FALSE, original values are not replaced, but instead new variables are created where the offending values have been replaced. This suffix is appended to each original variable name to create the new variable name.

totalVarName

This is the name of a variable that contains, for each case, the total number of invalid values among all variables checked.

append

Whether to append the columns to the dataframe, or only return the new columns.

replace

Whether to replace the offending values with the value specified in newValue or whether to create new columns (see newValueSuffix).

silent

Whether to display the log, or only set it as attribute of the returned dataframe.

rmarkdownOutput

Whether to format the log so that it's ready to be included in RMarkdown reports.

callingSelf

For internal use; whether the function calls itself.

Value

The dataframe with the corrections, and the log stored in attribute checkDataIntegrity_log.

Author(s)

Gjalt-Jorn Peters

Maintainer: Gjalt-Jorn Peters <gjalt-jorn@userfriendlyscience.com>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
### Default behavior: return dataframe with
### offending values replaced by NA

checkDataIntegrity(c('mpg', '<30'),
                   mtcars);

### Check two conditions, and instead of returning the
### dataframe with the results appended, only return the
### columns indicating which cases 'pass', what the new
### values would be, and how many invalid values were
### found for each case (to easily remove cases that
### provided many invalid values)

checkDataIntegrity(list(c('mpg', '<30'),
                        c('gear', '<5')),
                   mtcars,
                   append=FALSE);

Example output

Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
Registered S3 methods overwritten by 'lme4':
  method                          from
  cooks.distance.influence.merMod car 
  influence.merMod                car 
  dfbeta.influence.merMod         car 
  dfbetas.influence.merMod        car 
Matching cases to criterion '<30' for all variables matching regular expression 'mpg'.
In 4 rows, for variable 'mpg', replacing values that do not satisfy criterion 'mpg<30' with 'NA'.
Registered S3 methods overwritten by 'ufs':
  method                     from               
  grid.draw.ggProportionPlot userfriendlyscience
  pander.associationMatrix   userfriendlyscience
  pander.dataShape           userfriendlyscience
  pander.descr               userfriendlyscience
  pander.normalityAssessment userfriendlyscience
  print.CramersV             userfriendlyscience
  print.associationMatrix    userfriendlyscience
  print.confIntOmegaSq       userfriendlyscience
  print.confIntV             userfriendlyscience
  print.dataShape            userfriendlyscience
  print.descr                userfriendlyscience
  print.ggProportionPlot     userfriendlyscience
  print.meanConfInt          userfriendlyscience
  print.multiVarFreq         userfriendlyscience
  print.normalityAssessment  userfriendlyscience
  print.regrInfluential      userfriendlyscience
  print.scaleDiagnosis       userfriendlyscience
  print.scaleStructure       userfriendlyscience
  print.scatterMatrix        userfriendlyscience
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128              NA   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic           NA   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla        NA   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa          NA   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
                    mpg_validValue numberOfInvalidValues
Mazda RX4                     TRUE                     0
Mazda RX4 Wag                 TRUE                     0
Datsun 710                    TRUE                     0
Hornet 4 Drive                TRUE                     0
Hornet Sportabout             TRUE                     0
Valiant                       TRUE                     0
Duster 360                    TRUE                     0
Merc 240D                     TRUE                     0
Merc 230                      TRUE                     0
Merc 280                      TRUE                     0
Merc 280C                     TRUE                     0
Merc 450SE                    TRUE                     0
Merc 450SL                    TRUE                     0
Merc 450SLC                   TRUE                     0
Cadillac Fleetwood            TRUE                     0
Lincoln Continental           TRUE                     0
Chrysler Imperial             TRUE                     0
Fiat 128                     FALSE                     1
Honda Civic                  FALSE                     1
Toyota Corolla               FALSE                     1
Toyota Corona                 TRUE                     0
Dodge Challenger              TRUE                     0
AMC Javelin                   TRUE                     0
Camaro Z28                    TRUE                     0
Pontiac Firebird              TRUE                     0
Fiat X1-9                     TRUE                     0
Porsche 914-2                 TRUE                     0
Lotus Europa                 FALSE                     1
Ford Pantera L                TRUE                     0
Ferrari Dino                  TRUE                     0
Maserati Bora                 TRUE                     0
Volvo 142E                    TRUE                     0
Matching cases to criterion '<30' for all variables matching regular expression 'mpg'.
In 4 rows, for variable 'mpg', replacing values that do not satisfy criterion 'mpg<30' with 'NA'.
Matching cases to criterion '<5' for all variables matching regular expression 'gear'.
In 5 rows, for variable 'gear', replacing values that do not satisfy criterion 'gear<5' with 'NA'.
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128              NA   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic           NA   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla        NA   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1   NA    2
Lotus Europa          NA   4  95.1 113 3.77 1.513 16.90  1  1   NA    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1   NA    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1   NA    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1   NA    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
                    mpg_validValue numberOfInvalidValues gear_validValue
Mazda RX4                     TRUE                     0            TRUE
Mazda RX4 Wag                 TRUE                     0            TRUE
Datsun 710                    TRUE                     0            TRUE
Hornet 4 Drive                TRUE                     0            TRUE
Hornet Sportabout             TRUE                     0            TRUE
Valiant                       TRUE                     0            TRUE
Duster 360                    TRUE                     0            TRUE
Merc 240D                     TRUE                     0            TRUE
Merc 230                      TRUE                     0            TRUE
Merc 280                      TRUE                     0            TRUE
Merc 280C                     TRUE                     0            TRUE
Merc 450SE                    TRUE                     0            TRUE
Merc 450SL                    TRUE                     0            TRUE
Merc 450SLC                   TRUE                     0            TRUE
Cadillac Fleetwood            TRUE                     0            TRUE
Lincoln Continental           TRUE                     0            TRUE
Chrysler Imperial             TRUE                     0            TRUE
Fiat 128                     FALSE                     1            TRUE
Honda Civic                  FALSE                     1            TRUE
Toyota Corolla               FALSE                     1            TRUE
Toyota Corona                 TRUE                     0            TRUE
Dodge Challenger              TRUE                     0            TRUE
AMC Javelin                   TRUE                     0            TRUE
Camaro Z28                    TRUE                     0            TRUE
Pontiac Firebird              TRUE                     0            TRUE
Fiat X1-9                     TRUE                     0            TRUE
Porsche 914-2                 TRUE                     1           FALSE
Lotus Europa                 FALSE                     2           FALSE
Ford Pantera L                TRUE                     1           FALSE
Ferrari Dino                  TRUE                     1           FALSE
Maserati Bora                 TRUE                     1           FALSE
Volvo 142E                    TRUE                     0            TRUE

userfriendlyscience documentation built on May 2, 2019, 1:09 p.m.