The pedigree routines came out of a simple need -- to quickly draw a pedigree structure on the screen, within R, that was ``good enough'' to help with debugging the actual routines of interest, which were those for fitting mixed effecs Cox models to large family data. As such the routine had compactness and automation as primary goals; complete annotation (monozygous twins, multiple types of affected status) and most certainly elegance were not on the list. Other software could do that much better.
It therefore came as a major surprise when these routines proved useful to others. Through their constant feedback, application to more complex pedigrees, and ongoing requests for one more feature, the routine has become what it is today. This routine is still not suitable for really large pedigrees, nor for heavily inbred ones such as in animal studies, and will likely not evolve in that way. The authors fondest hope is that others will pick up the project.
The pedigree function is the first step, creating an object of class pedigree.
It accepts the following input
\begin{description}
\item[id] A numeric or character vector of subject identifiers.
\item[dadid] The identifier of the father.
\item[momid] The identifier of the mother.
\item[sex] The gender of the individual. This can be a numeric variable
with codes of 1=male, 2=female, 3=unknown, 4=terminated, or NA=unknown.
A character or factor variable can also be supplied containing
the above; the string may be truncated and of arbitrary case. A sex
value of 0=male 1=female is also accepted.
\item[status] Optional, a numeric variable with 0 = censored and 1 = dead.
\item[relationship] Optional, a matrix or data frame with three columns.
The first two contain the identifier values of the subject pairs, and
the third the code for their relationship: 1 = Monozygotic twin, 2=Dizygotic twin,
3= Twin of unknown zygosity, 4 = Spouse.
\item[famid] Optional, a numeric or character vector of family identifiers.
\end{description}
The [[famid]] variable is placed last as it was a later addition to the
code; thus prior invocations of the function that use positional
arguments will not be affected.
If present, this allows a set of pedigrees to be generated, one per
family. The resultant structure will be an object of class
[[pedigreeList]].
Note that a factor variable is not listed as one of the choices for the subject identifier. This is on purpose. Factors were designed to accomodate character strings whose values came from a limited class -- things like race or gender, and are not appropriate for a subject identifier. All of their special properties as compared to a character variable turn out to be backwards for this case, in particular a memory of the original level set when subscripting is done.
However, due to the awful decision early on in S to automatically turn every
character into a factor --- unless you stood at the door with a club to
head the package off --- most users have become ingrained to the idea of
using them for every character variable.
(I encourage you to set the global option stringsAsFactors=FALSE to turn
off autoconversion -- it will measurably improve your R experience).
Therefore, to avoid unnecessary hassle for our users
the code will accept a factor as input for the id variables, but
the final structure does not retain it.
Gender and relation do become factors. Status follows the pattern of the
survival routines and remains an integer.
We will describe the code in a set of blocks.
The code starts out with some checks on the input data.
Is it all the same length, are the codes legal, etc. Checks for ids being non-missing,
and for sex to be as expected of the codes 1-4 for female/male/unknown/terminated.
Create the variables descibing a missing father and/or mother, which is what we expect both for people at the top of the pedigree and for marry-ins, \emph{before} adding in the family id information. It is easier to do it first. If there are multiple families in the pedigree, make a working set of identifiers that are of the form `family/subject'. Family identifiers can be factor, character, or numeric.
Next check that any mother or father identifiers are found in the identifier list, and are of the right sex. Subjects who don't have a mother or father are founders. For those people both of the parents should be missing.
Now, paste the parts together into a basic pedigree. The fields for father and mother are not the identifiers of the parents, but their row number in the structure.
The final structure will be in the order of the original data, and all the components except [[relation]] will have the same number of rows as the original data.
Subscripting of a pedigree list extracts one or more families from the
list. We treat character subscripts in the same way that dimnames on
a matrix are used. Factors are a problem though: assume that we
have a vector x with names joe'',
charlie'', ``fred'', then
[[x['joe']]] is the first element of the vector, but
[[temp <- factor('joe', 'charlie', 'fred'); z <- temp[1]; x[z] ]] will
be the second element!
R is implicitly using as.numeric on factors when they are a subscript;
this caught an early version of the code when an element of a data
frame was used to index the pedigree: characters are turned into factors
when bundled into a data frame.
Note: \begin{enumerate} \item What should we do if the family id is a numeric: when the user says [4] do they mean the fourth family in the list or family '4'? The user is responsible to say ['4'] in this case. \item In a normal vector invalid subscripts give an NA, e.g. (1:3)[6], but since there is no such object as an ``NA pedigree'', we emit an error for this. \item The [[drop]] argument has no meaning for pedigrees, but must to be a defined argument of any subscript method; we simply ignore it. \item Updating the father/mother is a minor nuisance; since they must are integer indices to rows they must be recreated after selection. Ditto for the relationship matrix. \end{enumerate} For a pedigree, the subscript operator extracts a subset of individuals. We disallow selections that retain only 1 of a subject's parents, since %' they cause plotting trouble later. Relations are worth keeping only if both parties in the relation were selected.
Convert the pedigree to a data.frame so it is easy to view when removing or trimming individuals with their various indicators. The relation and hints elements of the pedigree object are not easy to put in a data.frame with one entry per subject. These items are one entry per subject, so are put in the returned data.frame: id, findex, mindex, sex, affected, status. The findex and mindex are converted to the actual id of the parents rather than the index. Can be used with as.data.frame(ped) or data.frame(ped). Specify in Namespace file that it is an S3 method. This function is useful for checking the pedigree object with the $findex$ and $mindex$ vector instead of them replaced with the ids of the parents. This is not currently included in the package.
It usually doesn't make sense to print a pedigree, since the id is just %' a repeat of the input data and the family connections are pointers. Thus we create a simple summary.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.