Description Usage Arguments Value Author(s) Examples
View source: R/checkDataIntegrity.R
This function is designed to make it easy to perform some data integrity checks, specifically checking for values that are impossible or unrealistic. These values can then be replaced by another value, or the offending cases can be deleted from the dataframe.
1 2 3 4 5 6 7 8 9 10 11 12 |
x |
This can be either a vector or a list. If it is a vector, it should have two elements, the first one being a regular expression matching one or more variables in the dataframe specified in |
dat |
The dataframe containing the variables of which we should check the integrity. |
newValue |
The new value to be assigned to cases not satisfying the specified conditions. |
removeCases |
Whether to delete cases that do not satisfy the criterion from the dataframe (if |
validValueSuffix |
Suffix to append to variable names when creating variable names for new variables that contain TRUE and FALSE to specify for each original variable whether its value satisfied the specified criterion. |
newValueSuffix |
If |
totalVarName |
This is the name of a variable that contains, for each case, the total number of invalid values among all variables checked. |
append |
Whether to append the columns to the dataframe, or only return the new columns. |
replace |
Whether to replace the offending values with the value specified in |
silent |
Whether to display the log, or only set it as attribute of the returned dataframe. |
rmarkdownOutput |
Whether to format the log so that it's ready to be included in RMarkdown reports. |
callingSelf |
For internal use; whether the function calls itself. |
The dataframe with the corrections, and the log stored in attribute checkDataIntegrity_log
.
Gjalt-Jorn Peters
Maintainer: Gjalt-Jorn Peters <gjalt-jorn@userfriendlyscience.com>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ### Default behavior: return dataframe with
### offending values replaced by NA
checkDataIntegrity(c('mpg', '<30'),
mtcars);
### Check two conditions, and instead of returning the
### dataframe with the results appended, only return the
### columns indicating which cases 'pass', what the new
### values would be, and how many invalid values were
### found for each case (to easily remove cases that
### provided many invalid values)
checkDataIntegrity(list(c('mpg', '<30'),
c('gear', '<5')),
mtcars,
append=FALSE);
|
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
Registered S3 methods overwritten by 'lme4':
method from
cooks.distance.influence.merMod car
influence.merMod car
dfbeta.influence.merMod car
dfbetas.influence.merMod car
Matching cases to criterion '<30' for all variables matching regular expression 'mpg'.
In 4 rows, for variable 'mpg', replacing values that do not satisfy criterion 'mpg<30' with 'NA'.
Registered S3 methods overwritten by 'ufs':
method from
grid.draw.ggProportionPlot userfriendlyscience
pander.associationMatrix userfriendlyscience
pander.dataShape userfriendlyscience
pander.descr userfriendlyscience
pander.normalityAssessment userfriendlyscience
print.CramersV userfriendlyscience
print.associationMatrix userfriendlyscience
print.confIntOmegaSq userfriendlyscience
print.confIntV userfriendlyscience
print.dataShape userfriendlyscience
print.descr userfriendlyscience
print.ggProportionPlot userfriendlyscience
print.meanConfInt userfriendlyscience
print.multiVarFreq userfriendlyscience
print.normalityAssessment userfriendlyscience
print.regrInfluential userfriendlyscience
print.scaleDiagnosis userfriendlyscience
print.scaleStructure userfriendlyscience
print.scatterMatrix userfriendlyscience
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 NA 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic NA 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla NA 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa NA 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
mpg_validValue numberOfInvalidValues
Mazda RX4 TRUE 0
Mazda RX4 Wag TRUE 0
Datsun 710 TRUE 0
Hornet 4 Drive TRUE 0
Hornet Sportabout TRUE 0
Valiant TRUE 0
Duster 360 TRUE 0
Merc 240D TRUE 0
Merc 230 TRUE 0
Merc 280 TRUE 0
Merc 280C TRUE 0
Merc 450SE TRUE 0
Merc 450SL TRUE 0
Merc 450SLC TRUE 0
Cadillac Fleetwood TRUE 0
Lincoln Continental TRUE 0
Chrysler Imperial TRUE 0
Fiat 128 FALSE 1
Honda Civic FALSE 1
Toyota Corolla FALSE 1
Toyota Corona TRUE 0
Dodge Challenger TRUE 0
AMC Javelin TRUE 0
Camaro Z28 TRUE 0
Pontiac Firebird TRUE 0
Fiat X1-9 TRUE 0
Porsche 914-2 TRUE 0
Lotus Europa FALSE 1
Ford Pantera L TRUE 0
Ferrari Dino TRUE 0
Maserati Bora TRUE 0
Volvo 142E TRUE 0
Matching cases to criterion '<30' for all variables matching regular expression 'mpg'.
In 4 rows, for variable 'mpg', replacing values that do not satisfy criterion 'mpg<30' with 'NA'.
Matching cases to criterion '<5' for all variables matching regular expression 'gear'.
In 5 rows, for variable 'gear', replacing values that do not satisfy criterion 'gear<5' with 'NA'.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 NA 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic NA 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla NA 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 NA 2
Lotus Europa NA 4 95.1 113 3.77 1.513 16.90 1 1 NA 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 NA 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 NA 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 NA 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
mpg_validValue numberOfInvalidValues gear_validValue
Mazda RX4 TRUE 0 TRUE
Mazda RX4 Wag TRUE 0 TRUE
Datsun 710 TRUE 0 TRUE
Hornet 4 Drive TRUE 0 TRUE
Hornet Sportabout TRUE 0 TRUE
Valiant TRUE 0 TRUE
Duster 360 TRUE 0 TRUE
Merc 240D TRUE 0 TRUE
Merc 230 TRUE 0 TRUE
Merc 280 TRUE 0 TRUE
Merc 280C TRUE 0 TRUE
Merc 450SE TRUE 0 TRUE
Merc 450SL TRUE 0 TRUE
Merc 450SLC TRUE 0 TRUE
Cadillac Fleetwood TRUE 0 TRUE
Lincoln Continental TRUE 0 TRUE
Chrysler Imperial TRUE 0 TRUE
Fiat 128 FALSE 1 TRUE
Honda Civic FALSE 1 TRUE
Toyota Corolla FALSE 1 TRUE
Toyota Corona TRUE 0 TRUE
Dodge Challenger TRUE 0 TRUE
AMC Javelin TRUE 0 TRUE
Camaro Z28 TRUE 0 TRUE
Pontiac Firebird TRUE 0 TRUE
Fiat X1-9 TRUE 0 TRUE
Porsche 914-2 TRUE 1 FALSE
Lotus Europa FALSE 2 FALSE
Ford Pantera L TRUE 1 FALSE
Ferrari Dino TRUE 1 FALSE
Maserati Bora TRUE 1 FALSE
Volvo 142E TRUE 0 TRUE
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.