Description Usage Arguments Value Examples
View source: R/gen_gold_standard.R
add_variable adds a column of new variable to a dataset. This new
variable generated by some realistic rules. Several type of variables
are included:
nhsid: each row is assigned with an identifical 10-digit id that is randomly generated following the Modulus 11 Algorithm;
dob: if the age_dependency is TRUE and there is a variable called 'age'
in the dataset, the dob is generated based on the value of age and
end_date. If age_dependency is FALSE, the dob is randomly
generated between start_date and end_date;
address: a random UK address sampled from 30,000 UK addresses, see gen_address;
firstname: randomly sample a firstname from the selected database:
country If is 'uk' and gender_dependency and age_dependency
are both TRUE, the generated firstnames will automatically sample a firstname that based
on the gender and age of the indviduals within the dataset. The uk
firstname database was extracted from ONS containing firstnames and their frequencies
in England and Wales from 1996 to 2018.
If country is 'us' and gender_dependency and race_dependency
are both TRUE, the generated firstnames will automatically sample a firstname that based
on the gender and ethnicity of the indviduals within the dataset. The us
firstname database was extracted from randomNamesData.
Current ethnicity codes are: 1 American Indian or Native Alaskan, 2 Asian or Pacific Islander,
3 Black (not Hispanic), 4 Hispanic, 5 White (not Hispanic) and 6 Middle-Eastern, Arabic.
lastname: randomly sample a lastname from the selected database:
If country is 'uk', the generated lastnames will automatically sample
a lastname from a extracted lastname database. The lastname database was extracted
from ONS.
If country is 'us' and race_dependency is TRUE, the generated
lastnames will automatically sample a lastname that based on the indvidual's ethnicity.
The us lastname database was extracted from randomNamesData.
1 2 3 4 5 6 7 8 9 10 | add_variable(
dataset,
type,
country = "uk",
start_date = "1900-01-01",
end_date = "2020-01-01",
age_dependency = FALSE,
gender_dependency = FALSE,
race_dependency = FALSE
)
|
dataset |
A data frame of the dataset. |
type |
A string of the type of variable we want to add: 'nhsid', 'dob', 'address', 'firstname' or 'lastname'. |
country |
A string variable with a default of 'uk'. It can be either 'uk' or 'us'. |
start_date |
A Date variable with a default of '1900-01-01'. |
end_date |
A Date variable with a default of '2020-01-01'. |
age_dependency |
A logical variable with a default of FALSE |
gender_dependency |
A logical variable with a default of FALSE |
race_dependency |
A logical variable with a default of FALSE. |
A data frame of the dataset with a new generated variable.
1 2 3 4 5 6 7 8 9 | tmp1 <- add_variable(adult[1:100,], "nhsid")
tmp2 <- add_variable(adult[1:100,], "dob", end_date = "2015-03-02", age_dependency = TRUE)
tmp3 <- add_variable(adult[1:100,], "address")
tmp4 <- add_variable(adult[1:100,], "firstname", country = "uk", age_dependency = TRUE,
gender_dependency = TRUE)
tmp5 <- add_variable(adult[1:100,], "lastname", country = "uk")
tmp6 <- add_variable(adult[1:100,], 'firstname', country = 'us', gender_dependency=TRUE,
race_dependency=TRUE)
tmp7 <- add_variable(adult[1:100,], 'lastname', country='us', race_dependency = TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.