Description Usage Arguments Value Examples
View source: R/gen_gold_standard.R
add_variable
adds a column of new variable to a dataset. This new
variable generated by some realistic rules. Several type of variables
are included:
nhsid: each row is assigned with an identifical 10-digit id that is randomly generated following the Modulus 11 Algorithm;
dob: if the age_dependency
is TRUE and there is a variable called 'age'
in the dataset
, the dob is generated based on the value of age and
end_date
. If age_dependency
is FALSE, the dob is randomly
generated between start_date
and end_date
;
address: a random UK address sampled from 30,000 UK addresses, see gen_address
;
firstname: randomly sample a firstname from the selected database:
country
If is 'uk' and gender_dependency
and age_dependency
are both TRUE, the generated firstnames will automatically sample a firstname that based
on the gender and age of the indviduals within the dataset
. The uk
firstname database was extracted from ONS containing firstnames and their frequencies
in England and Wales from 1996 to 2018.
If country
is 'us' and gender_dependency
and race_dependency
are both TRUE, the generated firstnames will automatically sample a firstname that based
on the gender and ethnicity of the indviduals within the dataset
. The us
firstname database was extracted from randomNamesData
.
Current ethnicity codes are: 1 American Indian or Native Alaskan, 2 Asian or Pacific Islander,
3 Black (not Hispanic), 4 Hispanic, 5 White (not Hispanic) and 6 Middle-Eastern, Arabic.
lastname: randomly sample a lastname from the selected database:
If country
is 'uk', the generated lastnames will automatically sample
a lastname from a extracted lastname database. The lastname database was extracted
from ONS.
If country
is 'us' and race_dependency
is TRUE, the generated
lastnames will automatically sample a lastname that based on the indvidual's ethnicity.
The us lastname database was extracted from randomNamesData
.
1 2 3 4 5 6 7 8 9 10 | add_variable(
dataset,
type,
country = "uk",
start_date = "1900-01-01",
end_date = "2020-01-01",
age_dependency = FALSE,
gender_dependency = FALSE,
race_dependency = FALSE
)
|
dataset |
A data frame of the dataset. |
type |
A string of the type of variable we want to add: 'nhsid', 'dob', 'address', 'firstname' or 'lastname'. |
country |
A string variable with a default of 'uk'. It can be either 'uk' or 'us'. |
start_date |
A Date variable with a default of '1900-01-01'. |
end_date |
A Date variable with a default of '2020-01-01'. |
age_dependency |
A logical variable with a default of FALSE |
gender_dependency |
A logical variable with a default of FALSE |
race_dependency |
A logical variable with a default of FALSE. |
A data frame of the dataset
with a new generated variable.
1 2 3 4 5 6 7 8 9 | tmp1 <- add_variable(adult[1:100,], "nhsid")
tmp2 <- add_variable(adult[1:100,], "dob", end_date = "2015-03-02", age_dependency = TRUE)
tmp3 <- add_variable(adult[1:100,], "address")
tmp4 <- add_variable(adult[1:100,], "firstname", country = "uk", age_dependency = TRUE,
gender_dependency = TRUE)
tmp5 <- add_variable(adult[1:100,], "lastname", country = "uk")
tmp6 <- add_variable(adult[1:100,], 'firstname', country = 'us', gender_dependency=TRUE,
race_dependency=TRUE)
tmp7 <- add_variable(adult[1:100,], 'lastname', country='us', race_dependency = TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.