View source: R/messy_linelist.R
messy_linelist | R Documentation |
Take line list output from sim_linelist()
and replace elements of
the <data.frame>
with missing values (e.g. NA
), introduce spelling
mistakes and inconsistencies, as well as coerce date types.
messy_linelist(linelist, ...)
linelist |
Line list |
... |
< Accepted arguments and their defaults are:
|
By default messy_linelist()
:
Introduces 10% of values missing, i.e. converts to NA
.
Introduces spelling mistakes in 10% of character
columns.
Introduce inconsistency in the reporting of $sex
.
Converts numeric
columns (double
& integer
) to character
.
Converts Date
columns to character
.
Converts 50% of integer
s to (English) words.
Duplicates 1% of rows.
Setting missing_value
to something other than NA
will likely cause
type coercion in the line list <data.frame>
columns, most likely to
character
.
When setting sex_as_numeric
to TRUE
, male is set to 0
and female
to 1
. Only one of inconsistent_sex
or sex_as_numeric
can be TRUE
,
otherwise the function will error.
If numeric_as_char = TRUE
and sex_as_numeric = TRUE
then the sex encoded
as 0 or 1 is converted to character
. If prop_spelling_mistake
> 0 and
numeric_as_char = TRUE
the columns that are converted from numeric
to
character
do not have spelling mistakes introduced, because they are
numeric characters stored as character strings. If
prop_spelling_mistake
> 0 and date_as_char = TRUE
spelling mistakes are
not introduced into dates.
The Date
columns can be converted into an inconsistent format by
setting inconsistent_dates = TRUE
and it requires date_as_char = TRUE
,
if the latter is FALSE
the function will error.
If numeric_as_char = FALSE
and prop_int_as_word
> 0 then the integer
columns are converted to character
string (either character
numbers or
words) but the other numeric
columns are not coerced. Spelling mistakes
are not introduced into integers converted to words when
prop_spelling_mistakes
> 0 and prop_int_as_word
> 0.
Rows are duplicated after other messy modifications so the duplicated row contains identical messy elements.
A messy line list <data.frame>
.
linelist <- sim_linelist()
messy_linelist <- messy_linelist(linelist)
# increasing proportion of missingness to 30% with a missing value of -99
messy_linelist <- messy_linelist(
linelist,
prop_missing = 0.3,
missing_value = -99
)
# increasing proportion of spelling mistakes to 50%
messy_linelist <- messy_linelist(linelist, prop_spelling_mistakes = 0.5)
# encode `$sex` as `numeric`
messy_linelist <- messy_linelist(
linelist,
sex_as_numeric = TRUE,
inconsistent_sex = FALSE
)
# inconsistently formatted dates
messy_linelist <- messy_linelist(linelist, inconsistent_dates = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.