YAML schema structure

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = TRUE,        
  echo = TRUE,         # echo code?
  message = TRUE,     # Show messages
  warning = TRUE,     # Show warnings
  fig.width = 8,       # Default plot width
  fig.height = 6,      # .... height
  dpi = 200,           # Plot resolution
  fig.align = "center"
)
knitr::opts_chunk$set()  # Figure alignment   
library(DataFakeR)
set.seed(123)
options(tibble.width = Inf)

In order to generate fake data you need to provide schema description in yaml format. The structure of the configuration file relects the structure of relational database. Such structure allows the package to detect inner and inter-table dependencies, more to that, makes the simulated data preserving original database assumptions.

The schema can be automatically sourced from a database (see: Structure from DB) or you may configure such on your own.

Such configuration file should have the below structure:

public                                                       - schema name
└── tables:                                                  - tables list
    ├── table_a:                                             - name of the table
    │   ├── ...                                              - additional table-wise parameters
    │   ├── check_constraints:                               - list of table check constraints
    │   │   └── constraint_name                              - unique name of the constraint 
    │   │       ├── column: column_a2                        - column attached to the constraint (can be empty)
    │   │       └── expression: !expr column_a2 == column_a1 - R expression describing constraint
    │   ├── columns:                                         - list of table columns
    │   │   ├── column_a1:                                   - name of the column
    │   │   │   ├── type: char(8)                            - column type (obligatory, valid R class or sql type)
    │   │   │   ├── not_null: true                           |
    │   │   │   ├── unique: true                             | standard column parameters (optional)
    │   │   │   └── ...                                      - extra column parameters
    │   │   └── column_a2:
    │   │       └── type: numeric(4, 2)
    │   └── primary_key:                                     - list of primary keys
    │       └── pk_name:                                     - primary key unique name
    │           └── columns:                                 - array of primary key columns
    │               └── - column_a1                          - column name treated as primary key
    └── table_b:
        ├── columns:
        │   ├── column_b1:
        │   │   └── type: char(8)
        │   └── column_b2:
        │       └── type: boolean
        └── foreign_keys:                                    - list of foreign keys
            └── fk_name:                                     - unique foreign key name
                ├── columns:                                 - array of foreign key columns
                │   └── - column_b1                          - name of the table column beeing foreign key
                └── references:                              - definition of foregin key reference
                    ├── columns:                             - array of foreign key dependent columns
                    │   └── - column_a1                      - name of dependent column
                    └── table: table_a                       - name of dependent foreign key table

Schema assumptions and limitations (possibly reduced in the future releases):



Try the DataFakeR package in your browser

Any scripts or data that you put into this service are public.

DataFakeR documentation built on Feb. 16, 2023, 7:38 p.m.