Prune pedigree

Description

prune removes noninformative individuals from a pedigree. This process is usually called trimming or pruning. Individuals are removed if they do not provide any ancestral ties between other individuals. It is possible to add some additional criteria. See details.

Usage

1
prune(x, id, father, mother, unknown=NA, testAdd=NULL, verbose=FALSE)

Arguments

x

data.frame, pedigree data

id

character, individuals's identification column name

father

character, father's identification column name

mother

character, mother's identification column name

unknown

value(s) used for representing unknown parent in x

testAdd

logical, additional criteria; see details

verbose

logical, print some more info

Details

NOTE: this function does not yet work with Pedigree class.

There are always some individuals in the pedigree that jut out. Usually this are older individuals without known ancestors, founders. If such individuals have only one (first) descendant and no phenotype/genotype data, then they do not give us any additional information and can be safely removed from the pedigree. This process resembles cutting/pruning the branches of a tree.

By default prune iteratively removes individuals from the pedigree (from top to bottom) if:

  • they are founders, have both ancestors i.e. father and mother unknown and

  • have only one or no (first) descendants i.e. children

If there is a need to take into account availability of say phenotype/genotype data or any other information, argument testAdd can be used. Value of this argument must be logical and with length equal to number of rows in the pedigree. The easiest way to achieve this is to merge any data to the pedigree and then to perform a test, which will return logical values. Note that value of TRUE in testAdd means to remove an individual - this function is removing individuals! To keep an individual without known parents and one or no children, value of testAdd must be FALSE for that particular individual. Take a look at the examples.

There are various conventions on representing unknown/missing ancestors, say 0. R's default is to use NA. If other values than NA are present, argument unknown can be used to convert unknown/missing values to NA.

It is assumed that pedigree is in extended form i.e. that each father and mother has each own record as an individual. Otherwise error is returned with information on which parents do not appear as individuals.

prune does not only remove lines for pruned individuals but also removes them from father and mother columns.

Pruning is done from top to bottom of the pedigree i.e. from oldest individuals towards younger ones. Take for example the following part of the pedigree in example section:

1
2
3
4
5
6
7
8
9
     0   7
     |   |
     -----
       |
  10   8
   |   |
   -----
     |
     9

Individual 7 is not removed since it has two (first) descendants i.e. 8 and 5 (not shown here). Consecutively, individuals 8 and 9 are also not removed from the pedigree. Individual 10 is removed, since it has only one descendant. Why should individuals 8 and 9 and therefore also 7 stay in the pedigree? Current behaviour is reasonable if pedigree is built in such a way that first individuals with some phenotype or genotype data are gathered and then their pedigree is being built. Say, individual 9 has pehnotype/genotype data and its pedigree is build and there is therefore no need to remove such an individual. However, if pedigree is not built in such a way, then prunPedigree function can not prune all noninformative individuals. Argument testAdd can not help with this issue, since basic tests (founder and one or no first descendants) and testAdd are combined with &.

Value

prune returns a data.frame with possibly fewer individuals. Read also the details.

Author(s)

Gregor Gorjanc

See Also

Pedigree

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
  ## Pedigree example
  x <- data.frame(oseba=c(1,  9, 11, 2, 3, 10, 8, 12, 13,  4, 5, 6, 7, 14, 15, 16, 17),
                    oce=c(2, 10, 12, 5, 5,  0, 7,  0,  0,  0, 7, 0, 0,  0,  0,  0,  0),
                   mama=c(3,  8, 13, 0, 4,  0, 0,  0,  0, 14, 6, 0, 0, 15, 16, 17,  0),
                   spol=c(2,  2,  2, 1, 2,  1, 2,  1,  2,  2, 1, 2, 1,  1,  1,  1,  1),
             generacija=c(1,  1,  1, 2, 2,  2, 2,  2,  2,  3, 3, 4, 4,  5,  6,  7,  8),
                   last=c(2, NA,  8, 4, 1,  6,NA, NA, NA, NA,NA,NA,NA, NA, NA, NA, NA))

  ## Default case
  prune(x=x, id="oseba", father="oce", mother="mama", unknown=0)

  ## Use of additional test i.e. do not remove individual if it has
  ## known value for "last"
  prune(x=x, id="oseba", father="oce", mother="mama", unknown=0,
                testAdd=is.na(x$last))

  ## Use of other data
  y <- data.frame(oseba=c( 11,  15, 16),
                  last2=c(8.5, 7.5, NA))

  x <- merge(x=x, y=y, all.x=TRUE)
  prune(x=x, id="oseba", father="oce", mother="mama", unknown=0,
                testAdd=is.na(x$last2))