generate_demo_data | R Documentation |
This function generates a demo dataset with a specified number of subjects, features,
and desired number of clusters, ensuring that the generated clusters are not too far apart
and have some degree of overlap to simulate real-world data.
The generated dataset includes demographic information (outcome
, age
, and gender
),
as well as numeric features with a specified probability of missing values.
generate_demo_data(
n_subjects = 1000,
n_features = 200,
missing_prob = 0.1,
desired_number_clusters = 3,
cluster_overlap_sd = 15
)
n_subjects |
Integer. The number of subjects (rows) to generate. Defaults to 1000. |
n_features |
Integer. The number of features (columns) to generate. Defaults to 200. |
missing_prob |
Numeric. The probability of introducing missing values (NA) in the feature columns. Defaults to 0.1. |
desired_number_clusters |
Integer. The approximate number of clusters to generate in the feature space. Defaults to 3. |
cluster_overlap_sd |
Numeric. The standard deviation to control cluster overlap. Defaults to 15 for more overlap. |
The function generates n_features
numeric columns based on Gaussian clusters
with some overlap between clusters to simulate more realistic data. Missing values are
introduced in each feature column based on the missing_prob
.
A data frame containing the generated demo dataset, with columns:
outcome
: A categorical variable with values "low" or "high".
age
: A numeric variable representing the age of the subject (range 18-90).
gender
: A categorical variable with values "male" or "female".
Feature X
: Numeric feature columns with random values and some missing data.
# Generate a demo dataset with 1000 subjects, 200 features, and 3 clusters
demo_data <- generate_demo_data(n_subjects = 1000, n_features = 200,
desired_number_clusters = 3,
cluster_overlap_sd = 15, missing_prob = 0.1)
# View the first few rows of the dataset
head(demo_data)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.