checkAllUnique: Function to check all values of a vector or data frame/tibble...

Description Usage Arguments Value Background History/development log Author(s) See Also Examples

View source: R/checkAllUnique.R

Description

Function to check all values of a vector or data frame/tibble column are unique

Usage

1
2
3
4
5
6
checkAllUnique(
  dat,
  errIfNA = TRUE,
  allowJustOneNA = TRUE,
  allowMultipleColumns = FALSE
)

Arguments

dat

vector, single dimensional list or columns of data frame or tibble to test to see if values/rows are all unique

errIfNA

logical: defaults to TRUE to disallow any NA values

allowJustOneNA

logical: defaults to TRUE so if errIfNA is FALSE but you have more than one NA function returns FALSE

allowMultipleColumns

logical: defaults to FALSE but if you really do want to allow all of multi-column dat, set to TRUE

Value

logical: TRUE if all unique (see examples)

Background

This is a utility function to check that all the values of a vector, or a single column of a data frame or tibble, or a single dimensional list are unique. It can also check that all the rows of multiple columns of a data frame or tibble, are all unique.

This is important where you expect a single row of data for each participant say, and so you expect the values of participantID in a data set to be unique. It will cripple things later if you don't identify duplicated ID values. Sometimes it's not just a single variable that should be unique, a typical example of this is when you have data from multiple services and ID values for participants from each service. Then there may be multiple values for ID but they should each come from a different service. (Mind you, once I have established that rows are unique across serviceID and participantID I usally paste the two variables together with something like

1
data$superID = paste0(data$serviceID, ":", data$participantID)

or a tidyverse way of doing it:

1
2
data %>%
         mutate(superID = str_c(serviceID, ":", participantID))

Making sure things are unique also prevents all sorts of sometimes very confusing error messages if you want to pivot data depending on an ID variable whose values ought to be unique but aren't. Checking for this is in my Wisdom! Rblog page .

History/development log

Started before 5.iv.21 Tweaked 11.iv.21 to fix error in examples.

Author(s)

Chris Evans

See Also

Other data checking functions: getNNA(), getNOK(), isOneToOne()

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
## Not run: 
### letters gets us the 26 lower case letters of the English alphabet so should test OK
checkAllUnique(letters)
# [1] TRUE
### good!

### the checking is case sensitive:
checkAllUnique(c("A", letters), errIfNA = FALSE, allowJustOneNA = FALSE)
# [1] TRUE
### but ...
checkAllUnique(c("a", letters), errIfNA = FALSE, allowJustOneNA = FALSE)
# [1] FALSE
### both good!

### by default checkAllUnique doesn't allow any NA values,
### generally sensible for my data: ID codes or table indices
checkAllUnique(c(letters, NA))
# [1] FALSE
### good!

### but we can override that:
checkAllUnique(c(letters, NA), errIfNA = FALSE)
# [1] TRUE
### good!

### but generally I wouldn't want multiple NA values
### in the typical situations I'd be using
### checkAllUnique() so I have forced you to allow
### that explicitly if you really want it ...
checkAllUnique(c(NA, letters, NA), errIfNA = FALSE)
# [1] FALSE
### good!

### but you _can_ override that if you need to ...
checkAllUnique(c(NA, letters, NA), errIfNA = FALSE, allowJustOneNA = FALSE)
# [1] TRUE
### good!
### but you _can_ override that if you need to ...
checkAllUnique(c(NA, letters, NA), errIfNA = FALSE, allowJustOneNA = FALSE)
# [1] TRUE
### good!

### by default checkAllUnique expects vector input but it can handle data
### in multiple columns in a data frame or tibble
tmpDat <- as.data.frame(matrix(1:10, ncol = 2))
tmpDat
# V1 V2
# 1  1  6
# 2  2  7
# 3  3  8
# 4  4  9
# 5  5 10
checkAllUnique(tmpDat[, 1])
# [1] TRUE
checkAllUnique(tmpDat[, 2])
# [1] TRUE
### but remember column indexing a tibble returns a tibble not a vector
tmpTib <- as_tibble(tmpDat)
tmpTib
# # A tibble: 5 x 2
# V1    V2
# <int> <int>
#   1     1     6
# 2     2     7
# 3     3     8
# 4     4     9
# 5     5    10
checkAllUnique(tmpTib[, 1])
# Error in checkAllUnique(tmpTib[, 1]) :
#   You have input an object of length one to checkAllUnique(): makes no sense!
### whoops, I remember, must pull() to extract as vector
checkAllUnique(tmpTib %>% pull(1))
# [1] TRUE
checkAllUnique(tmpTib %>% pull(2))
# [1] TRUE

### you _can_ allow multiple columns
checkAllUnique(tmpDat, allowMultipleColumns = TRUE)
# [1] TRUE
#   In checkAllUnique(tmpDat, allowMultipleColumns = TRUE) :
#   Input of dat to checkAllUnique was not a vector: be careful please!
checkAllUnique(tmpTib, allowMultipleColumns = TRUE)
# [1] TRUE
# Warning message:
#   In checkAllUnique(tmpTib, allowMultipleColumns = TRUE) :
#   Input of dat to checkAllUnique was not a vector: be careful please!

### but it is checking all content in all columns so if two rows are the
### same it will return FALSE:
tmpDat2 <- tmpDat
tmpDat2[2, ] <- tmpDat[1, ] # make row 2 same as row 1
tmpDat2
# V1 V2
# 1  1  6
# 2  1  6
# 3  3  8
# 4  4  9
# 5  5 10
checkAllUnique(tmpDat2, allowMultipleColumns = TRUE)
# [1] FALSE
# Warning message:
#   In checkAllUnique(tmpDat2, allowMultipleColumns = TRUE) :
#   Input of dat to checkAllUnique was not a vector: be careful please!


### what about columns of different classes, e.g. numeric and character
cbind(tmpDat2, letters[1:5]) -> tmpDatMixed
tmpDatMixed
# V1 V2 letters[1:5]
# 1  1  6            a
# 2  1  6            b
# 3  3  8            c
# 4  4  9            d
# 5  5 10            e
checkAllUnique(tmpDatMixed, allowMultipleColumns = TRUE)
# [1] TRUE
# Warning message:
#   In checkAllUnique(tmpDatMixed, allowMultipleColumns = TRUE) :
#   Input of dat to checkAllUnique was not a vector: be careful please!
### that came out as TRUE because R "promoted" the numerics to character
### and because the two different letters in rows 1 and 2 of column 3
### broke the non-unique tie of rows 1 and 2 in columns 1 and 2

### what about having NA in a multicolumn input?
tmpDatNA <- tmpDat
tmpDatNA[2, 1] <- NA
tmpDatNA
# 1  1  6
# 2 NA  7
# 3  3  8
# 4  4  9
# 5  5 10
checkAllUnique(tmpDatNA, allowMultipleColumns = TRUE)
# [1] FALSE
# Warning message:
#   In checkAllUnique(tmpDatNA, allowMultipleColumns = TRUE) :
#   Input of dat to checkAllUnique was not a vector: be careful please!
### but
checkAllUnique(tmpDatNA, allowMultipleColumns = TRUE, errIfNA = FALSE)

## End(Not run)

cpsyctc/CECPfuns documentation built on Dec. 26, 2021, 1:19 p.m.