This document describes the structural contract of the tidyped class in
visPedigree 1.8.0. It is intended for maintenance and extension work.
tidyped is an S3 class layered on top of data.table.
Expected class vector:
c("tidyped", "data.table", "data.frame")
The class is created through new_tidyped() (internal constructor) and checked
with is_tidyped().
tidyped is designed to be:
IndNum, SireNum, DamNum)
are always aligned with row order, so C++ routines can index directly
without translation;tidyped;data.table: in-place modification via := and set()
preserves class and metadata without copying;data.table with a warning.The single most important structural rule in visPedigree:
IndNum[i]must equalifor every row.
This means SireNum and DamNum are direct row pointers: the sire of
individual i lives at row SireNum[i], and 0L encodes a missing parent.
Every C++ function in visPedigree — inbreeding coefficients, relationship matrices, BFS tracing, topological sorting — relies on this invariant. If it breaks, C++ will read wrong parents.
This invariant is enforced at three levels:
tidyped(): builds indices from scratch during construction.[.tidyped: rebuilds indices in-place after valid row subsets.ensure_tidyped() / ensure_complete_tidyped(): detect and repair
stale indices when class was accidentally dropped.These four columns define a valid tidyped:
| Column | Type | Description |
|--------|-----------|--------------------------------------|
| Ind | character | Unique individual ID |
| Sire | character | Sire ID, NA for unknown |
| Dam | character | Dam ID, NA for unknown |
| Sex | character | "male", "female", or "unknown" |
Checked by validate_tidyped().
| Column | Type | Description |
|-----------|---------|-------------------------------------|
| IndNum | integer | Row index (== row number, see §3) |
| SireNum | integer | Row index of sire, 0L for missing |
| DamNum | integer | Row index of dam, 0L for missing |
These exist whenever tidyped() is called with addnum = TRUE (default).
They are the interface between R and C++.
| Column | Description |
|--------------|----------------------------------------------|
| Gen | Generation number |
| Family | Family group code |
| FamilySize | Number of offspring in the family |
| Cand | TRUE for candidate individuals |
| f | Inbreeding coefficient (added by inbreed()) |
All data columns use PascalCase (Ind, SireNum, MeanF, ECG),
matching the core column style.
Pedigree-level metadata is stored in a single attribute:
attr(x, "ped_meta")
Built by build_ped_meta(), accessed by pedmeta().
| Field | Type | Description |
|--------------------|-----------|-----------------------------------------|
| selfing | logical | Whether self-fertilization mode was used |
| bisexual_parents | character | IDs appearing as both sire and dam |
| genmethod | character | "top" or "bottom" generation numbering |
No other pedigree-level attributes should be added outside ped_meta.
The following invariants must hold for a valid tidyped:
NA Sire and Dam appears in Ind.0L for missing parents, valid row
indices otherwise.Invariants 1–5 are established by tidyped() and guarded by [.tidyped.
Invariant 6 is a development convention.
tidyped() currently has two distinct tracing paths:
data.frame / data.table) — uses igraph for loop
detection, candidate tracing, and topological sorting before integer indices
are finalized.tidyped + cand) — skips graph rebuilding and uses C++ for
candidate tracing and topological sorting on existing integer pedigree
indices.tidyped(raw_input)When the input is a raw data.frame or data.table:
validate_and_prepare_ped() — normalize IDs, detect duplicates and
bisexual parents, inject missing founders.is_dag();
which_loop() and shortest_paths() are used only on the error path to
report informative loop diagnostics.cand is supplied, igraph neighborhood search is
used on the raw-input path.topo_sort() on the raw-input path.cpp_assign_generations_top /
cpp_assign_generations_bottom) using the pedigree implied by the sorted
rows.IndNum, SireNum, DamNum.new_tidyped() + attach ped_meta.tidyped(tp, cand = ids)When the input is already a tidyped and cand is supplied:
new_tidyped() + ped_meta.The fast path is the preferred workflow for repeated local tracing from a previously validated master pedigree:
tp_master <- tidyped(raw_ped) tp_local <- tidyped(tp_master, cand = ids, trace = "up", tracegen = 3)
new_tidyped() — internal constructornew_tidyped() attaches the "tidyped" class via setattr() (no copy) and
clears data.table's invisible flag via x[]. It does not attach
ped_meta — that is the caller's responsibility. It should only be called when
the caller has already ensured structural validity.
Analysis functions must guard their inputs. visPedigree provides three guard levels, chosen based on what each function needs.
validate_tidyped() — visualization guardensure_tidyped().Ind, Sire, Dam, Sex exist.visped(), plot.tidyped(), summary.tidyped().ensure_tidyped() — structure-light guardtidyped: returns as-is.Ind, Sire, Dam, Sex, Gen,
IndNum, SireNum, DamNum) are present: rebuilds IndNum if stale,
restores class, emits a message.pedsubpop(), splitped(), pedne(method = "demographic"),
pedstats(ecg = FALSE, genint = FALSE), pedfclass() (when f column
already exists).ensure_complete_tidyped() — complete-pedigree guardensure_tidyped() does, plus:require_complete_pedigree() — verifies that every non-NA Sire/Dam
is present in Ind. Stops with an error if not.inbreed(), pedecg(), pedgenint(), pedrel(),
pedne(method = "inbreeding" | "coancestry"), pedcontrib(),
pedancestry(), pedfclass() (when f must be computed), pedpartial(),
pediv(), pedmat(), pedhalflife().| Guard | Recovers class? | Requires completeness? | When to use |
|-----------------------------|:---------------:|:----------------------:|-------------------------------|
| validate_tidyped() | yes | no | Visualization |
| ensure_tidyped() | yes | no | Summaries on existing columns |
| ensure_complete_tidyped() | yes | yes | Pedigree recursion in C++ |
Some functions are conditionally guarded: they use ensure_tidyped() by
default but escalate to ensure_complete_tidyped() when a parameter triggers
pedigree recursion (for example pedstats(ecg = TRUE),
pedne(method = "coancestry")).
[.tidyped is the key protection layer.
:= operationsModify-by-reference is passed through safely. Class and metadata are preserved
via setattr(). No copy occurs.
If the selection removes core pedigree columns, the result is returned as a
plain data.table without warning.
After row subsetting, [.tidyped checks pedigree completeness:
IndNum,
SireNum, DamNum are rebuilt in-place, class and ped_meta are preserved.data.table with a warning guiding the user to
tidyped(tp, cand = ids, trace = "up").This downgrade is deliberate. It prevents stale integer indices from reaching C++ routines.
visPedigree delegates heavy pedigree recursion to C++ and uses igraph where a graph object is still the simplest representation.
| Task | C++ function |
|-------------------------------|------------------------------------------------------|
| Ancestry / descendant tracing | cpp_trace_ancestors, cpp_trace_descendants |
| Topological sorting | cpp_topo_order |
| Generation assignment | cpp_assign_generations_top, cpp_assign_generations_bottom |
| Inbreeding coefficients | cpp_calculate_inbreeding (Meuwissen & Luo) |
| Relationship matrices | cpp_addmat, cpp_dommat, cpp_aamat, cpp_ainv |
All C++ functions consume SireNum / DamNum integer vectors and assume the
head invariant (§3).
| Task | Where | igraph functions |
|------------------------|-------------------------------|------------------------------------------------------|
| Pedigree visualization | visped() pipeline | graph_from_data_frame, layout_with_sugiyama, plot.igraph |
| Connected components | splitped() | graph_from_edgelist, components |
| Loop detection | tidyped() raw-input path | graph_from_edgelist, is_dag |
| Loop diagnosis | tidyped() error path | which_loop, shortest_paths, neighbors, components |
| Candidate tracing | tidyped() raw-input path | neighborhood |
| Topological sorting | tidyped() raw-input path | topo_sort |
igraph is not used in the core numerical pedigree analysis routines such as
inbreed(), pedmat(), pedecg(), or pedrel(), but it is still part of
the preprocessing and visualization stack.
When extending the class, follow these rules.
Prefer adding fields to ped_meta instead of scattering new standalone
attributes.
If a column can be rebuilt from pedigree structure, prefer derivation over storing opaque cached state.
data.table semanticsUse :=, set(), and setattr() carefully. Avoid patterns that trigger full
copies unless unavoidable.
Any future method that subsets rows must preserve the current rule:
valid complete subset -> tidyped; incomplete subset -> plain data.table.
Any feature using IndNum, SireNum, or DamNum should document whether it
requires:
0L encoding for missing parents.| Function | Returns |
|---------------------------|-----------------------------------|
| is_tidyped(x) | TRUE if class is present |
| is_complete_pedigree(x) | TRUE if all Sire/Dam are in Ind |
| pedmeta(x) | The ped_meta named list |
| has_inbreeding(x) | TRUE if f column exists |
| has_candidates(x) | TRUE if Cand column exists |
Future extensions should prefer helper functions over direct attribute access.
Before merging a structural change to tidyped, check:
c("tidyped", "data.table", "data.frame")?IndNum == row index preserved after every code path?ped_meta fields preserved correctly?[.tidyped still handle := without copy issues?tidyped(tp_master, cand = ...) match the full path result?setorder() or merge(), are indices rebuilt before reaching C++?For large pedigrees, the intended usage pattern is:
# build one validated master pedigree tp_master <- tidyped(raw_ped) # reuse it for repeated local tracing (fast path) tp_local <- tidyped(tp_master, cand = ids, trace = "up", tracegen = 3) # modify analysis columns in place tp_master[, phenotype := pheno] # split only when disconnected components matter parts <- splitped(tp_master)
This keeps workflows explicit, fast, and safe.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.