6. tidyped Class Structure and Extension Notes

This document describes the structural contract of the tidyped class in visPedigree 1.8.0. It is intended for maintenance and extension work.

1. Class identity

tidyped is an S3 class layered on top of data.table.

Expected class vector:

c("tidyped", "data.table", "data.frame")

The class is created through new_tidyped() (internal constructor) and checked with is_tidyped().

2. Core design goals

tidyped is designed to be:

  1. safe for C++: integer pedigree indices (IndNum, SireNum, DamNum) are always aligned with row order, so C++ routines can index directly without translation;
  2. fast for large pedigrees: the fast path skips redundant validation when the input is already a tidyped;
  3. compatible with data.table: in-place modification via := and set() preserves class and metadata without copying;
  4. explicit about structural degradation: row subsets that break pedigree completeness are downgraded to plain data.table with a warning.

3. The head invariant: IndNum == row index

The single most important structural rule in visPedigree:

IndNum[i] must equal i for every row.

This means SireNum and DamNum are direct row pointers: the sire of individual i lives at row SireNum[i], and 0L encodes a missing parent.

Every C++ function in visPedigree — inbreeding coefficients, relationship matrices, BFS tracing, topological sorting — relies on this invariant. If it breaks, C++ will read wrong parents.

This invariant is enforced at three levels:

4. Column contract

4.1 Minimal structural columns

These four columns define a valid tidyped:

| Column | Type | Description | |--------|-----------|--------------------------------------| | Ind | character | Unique individual ID | | Sire | character | Sire ID, NA for unknown | | Dam | character | Dam ID, NA for unknown | | Sex | character | "male", "female", or "unknown" |

Checked by validate_tidyped().

4.2 Integer pedigree columns

| Column | Type | Description | |-----------|---------|-------------------------------------| | IndNum | integer | Row index (== row number, see §3) | | SireNum | integer | Row index of sire, 0L for missing | | DamNum | integer | Row index of dam, 0L for missing |

These exist whenever tidyped() is called with addnum = TRUE (default). They are the interface between R and C++.

4.3 Other common columns

| Column | Description | |--------------|----------------------------------------------| | Gen | Generation number | | Family | Family group code | | FamilySize | Number of offspring in the family | | Cand | TRUE for candidate individuals | | f | Inbreeding coefficient (added by inbreed()) |

4.4 Column naming convention

All data columns use PascalCase (Ind, SireNum, MeanF, ECG), matching the core column style.

5. Metadata layer

Pedigree-level metadata is stored in a single attribute:

attr(x, "ped_meta")

Built by build_ped_meta(), accessed by pedmeta().

| Field | Type | Description | |--------------------|-----------|-----------------------------------------| | selfing | logical | Whether self-fertilization mode was used | | bisexual_parents | character | IDs appearing as both sire and dam | | genmethod | character | "top" or "bottom" generation numbering |

No other pedigree-level attributes should be added outside ped_meta.

6. Structural invariants

The following invariants must hold for a valid tidyped:

  1. IndNum == row index (see §3).
  2. Ind is unique — no duplicate individual IDs.
  3. Completeness — every non-NA Sire and Dam appears in Ind.
  4. Acyclicity — no individual is its own ancestor.
  5. SireNum / DamNum consistency0L for missing parents, valid row indices otherwise.
  6. ped_meta is the sole metadata container — no scattered attributes.

Invariants 1–5 are established by tidyped() and guarded by [.tidyped. Invariant 6 is a development convention.

7. Constructor pipeline

tidyped() currently has two distinct tracing paths:

7.1 Full path: tidyped(raw_input)

When the input is a raw data.frame or data.table:

  1. validate_and_prepare_ped() — normalize IDs, detect duplicates and bisexual parents, inject missing founders.
  2. Loop detection — igraph builds a directed graph and checks is_dag(); which_loop() and shortest_paths() are used only on the error path to report informative loop diagnostics.
  3. Candidate tracing — if cand is supplied, igraph neighborhood search is used on the raw-input path.
  4. Topological sort — igraph topo_sort() on the raw-input path.
  5. Generation assignment — C++ (cpp_assign_generations_top / cpp_assign_generations_bottom) using the pedigree implied by the sorted rows.
  6. Sex inference — resolve unknowns from parental roles.
  7. Build integer indices — IndNum, SireNum, DamNum.
  8. new_tidyped() + attach ped_meta.

7.2 Fast path: tidyped(tp, cand = ids)

When the input is already a tidyped and cand is supplied:

The fast path is the preferred workflow for repeated local tracing from a previously validated master pedigree:

tp_master <- tidyped(raw_ped)
tp_local  <- tidyped(tp_master, cand = ids, trace = "up", tracegen = 3)

7.3 new_tidyped() — internal constructor

new_tidyped() attaches the "tidyped" class via setattr() (no copy) and clears data.table's invisible flag via x[]. It does not attach ped_meta — that is the caller's responsibility. It should only be called when the caller has already ensured structural validity.

8. Three-tier guard system

Analysis functions must guard their inputs. visPedigree provides three guard levels, chosen based on what each function needs.

8.1 validate_tidyped() — visualization guard

8.2 ensure_tidyped() — structure-light guard

8.3 ensure_complete_tidyped() — complete-pedigree guard

8.4 Choosing the right guard

| Guard | Recovers class? | Requires completeness? | When to use | |-----------------------------|:---------------:|:----------------------:|-------------------------------| | validate_tidyped() | yes | no | Visualization | | ensure_tidyped() | yes | no | Summaries on existing columns | | ensure_complete_tidyped() | yes | yes | Pedigree recursion in C++ |

Some functions are conditionally guarded: they use ensure_tidyped() by default but escalate to ensure_complete_tidyped() when a parameter triggers pedigree recursion (for example pedstats(ecg = TRUE), pedne(method = "coancestry")).

9. Safe subsetting contract

[.tidyped is the key protection layer.

9.1 := operations

Modify-by-reference is passed through safely. Class and metadata are preserved via setattr(). No copy occurs.

9.2 Column-only selections

If the selection removes core pedigree columns, the result is returned as a plain data.table without warning.

9.3 Row subsets

After row subsetting, [.tidyped checks pedigree completeness:

This downgrade is deliberate. It prevents stale integer indices from reaching C++ routines.

10. Computational boundaries: C++ vs igraph

visPedigree delegates heavy pedigree recursion to C++ and uses igraph where a graph object is still the simplest representation.

10.1 C++ — core computation path

| Task | C++ function | |-------------------------------|------------------------------------------------------| | Ancestry / descendant tracing | cpp_trace_ancestors, cpp_trace_descendants | | Topological sorting | cpp_topo_order | | Generation assignment | cpp_assign_generations_top, cpp_assign_generations_bottom | | Inbreeding coefficients | cpp_calculate_inbreeding (Meuwissen & Luo) | | Relationship matrices | cpp_addmat, cpp_dommat, cpp_aamat, cpp_ainv |

All C++ functions consume SireNum / DamNum integer vectors and assume the head invariant (§3).

10.2 igraph — graph-specific tasks

| Task | Where | igraph functions | |------------------------|-------------------------------|------------------------------------------------------| | Pedigree visualization | visped() pipeline | graph_from_data_frame, layout_with_sugiyama, plot.igraph | | Connected components | splitped() | graph_from_edgelist, components | | Loop detection | tidyped() raw-input path | graph_from_edgelist, is_dag | | Loop diagnosis | tidyped() error path | which_loop, shortest_paths, neighbors, components | | Candidate tracing | tidyped() raw-input path | neighborhood | | Topological sorting | tidyped() raw-input path | topo_sort |

igraph is not used in the core numerical pedigree analysis routines such as inbreed(), pedmat(), pedecg(), or pedrel(), but it is still part of the preprocessing and visualization stack.

11. Extension rules

When extending the class, follow these rules.

11.1 Do not add new pedigree-level attributes

Prefer adding fields to ped_meta instead of scattering new standalone attributes.

11.2 Keep computed state derivable

If a column can be rebuilt from pedigree structure, prefer derivation over storing opaque cached state.

11.3 Preserve data.table semantics

Use :=, set(), and setattr() carefully. Avoid patterns that trigger full copies unless unavoidable.

11.4 Respect downgrade semantics

Any future method that subsets rows must preserve the current rule:

valid complete subset -> tidyped; incomplete subset -> plain data.table.

11.5 Document C++ assumptions

Any feature using IndNum, SireNum, or DamNum should document whether it requires:

12. User-facing inspection helpers

| Function | Returns | |---------------------------|-----------------------------------| | is_tidyped(x) | TRUE if class is present | | is_complete_pedigree(x) | TRUE if all Sire/Dam are in Ind | | pedmeta(x) | The ped_meta named list | | has_inbreeding(x) | TRUE if f column exists | | has_candidates(x) | TRUE if Cand column exists |

Future extensions should prefer helper functions over direct attribute access.

13. Maintenance checklist

Before merging a structural change to tidyped, check:

  1. Does class identity remain c("tidyped", "data.table", "data.frame")?
  2. Is the head invariant IndNum == row index preserved after every code path?
  3. Are ped_meta fields preserved correctly?
  4. Does [.tidyped still handle := without copy issues?
  5. Do incomplete row subsets still downgrade with warning?
  6. Are integer pedigree columns rebuilt whenever a subset remains valid?
  7. Does tidyped(tp_master, cand = ...) match the full path result?
  8. After setorder() or merge(), are indices rebuilt before reaching C++?
  9. Do package tests and vignettes build cleanly?

14. Recommended workflow

For large pedigrees, the intended usage pattern is:

# build one validated master pedigree
tp_master <- tidyped(raw_ped)

# reuse it for repeated local tracing (fast path)
tp_local <- tidyped(tp_master, cand = ids, trace = "up", tracegen = 3)

# modify analysis columns in place
tp_master[, phenotype := pheno]

# split only when disconnected components matter
parts <- splitped(tp_master)

This keeps workflows explicit, fast, and safe.



Try the visPedigree package in your browser

Any scripts or data that you put into this service are public.

visPedigree documentation built on March 30, 2026, 9:07 a.m.