match_dataphy: Match data and phylogeny based on model formula

Description Usage Arguments Details Value Note Author(s) References Examples

View source: R/match_dataphy.R

Description

Combines phylogeny and data to ensure that tips in phylogeny match data and that observations with missing values are removed. This function uses variables provided in the 'formula' argument to:

Used internally in samp_phylm, samp_phyglm, clade_phylm, clade_phyglm, intra_phylm, intra_phyglm, tree_phylm, tree_phyglm and all function analysing interactions. Users can also directly use this function to combine a phylogeny and a dataset.

Usage

1
match_dataphy(formula, data, phy, verbose = TRUE, ...)

Arguments

formula

The model formula

data

Data frame containing species traits with row names matching tips in phy.

phy

A phylogeny (class 'phylo' or 'multiphylo')

verbose

Print the number of species that match data and phylogeny and warnings. We highly recommend to use the default (verbose = T), but warning and information can be silenced for advanced use.

...

Further arguments to be passed to match_dataphy

Details

This function uses all variables provided in the 'formula' to match data and phylogeny. To avoid cropping the full dataset, 'match_dataphy' searches for NA values only on variables provided by formula. Missing values on other variables, not included in 'formula', will not be removed from data. If no species names are provided as row names in the dataset but the number of rows in the dataset is the same as the number of tips in the phylogeny, the function assumes that the dataset and the phylogeny are in the same order.

This ensures consistency between data and phylogeny only for the variables that are being used in the model (set by 'formula').

If phy is a 'multiphylo' object, all phylogenies will be cropped to match data. But the dataset order will only match the first tree provided. The returned phylogeny will be a 'multiphylo' object.

Value

The function match_dataphy returns a list with the following components:

data: Cropped dataset matching phylogeny

phy: Cropped phylogeny matching data

dropped: Species dropped from phylogeny and removed from data.

Note

If tips are removed from the phylogeny and data or if rows containing missing values are removed from data, a message will be printed with the details. Further, the final number of species that match data and phy will always be reported by a message.

Author(s)

Caterina Penone & Gustavo Paterno

References

This function is largely inspired by the function comparative.data in caper package David Orme, Rob Freckleton, Gavin Thomas, Thomas Petzoldt, Susanne Fritz, Nick Isaac and Will Pearse (2013). caper: Comparative Analyses of Phylogenetics and Evolution in R. R package version 0.5.2. http://CRAN.R-project.org/package=caper

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Load data:
data(alien)
head(alien$data)
# Match data and phy based on model formula:
comp.data <- match_dataphy(gestaLen ~ homeRange, data = alien$data, alien$phy[[1]])
# Check data:
head(comp.data$data)
# Check phy:
comp.data$phy
# See species dropped from phy or data:
comp.data$dropped
# Example2:
# Match data and phy based on model formula:
comp.data2 <- match_dataphy(gestaLen ~ adultMass, data = alien$data, alien$phy)
# Check data (missing data on variables not included in the formula are preserved)
head(comp.data2$data)
# Check phy:
comp.data2$phy
# See species dropped from phy or data:
comp.data2$dropped

Example output

Loading required package: ape
Loading required package: phylolm
Loading required package: ggplot2
sh: 1: cannot create /dev/null: Permission denied
sh: 1: cannot create /dev/null: Permission denied
                                    family adultMass gestaLen homeRange
Tachyglossus_aculeatus      Tachyglossidae  4020.767   28.375 0.9991117
Ornithorhynchus_anatinus Ornithorhynchidae  1458.208   15.000 0.1120000
Ondatra_zibethicus              Cricetidae  1135.014   27.100 0.0044500
Mesocricetus_auratus            Cricetidae    97.125   15.500        NA
Castor_canadensis               Castoridae 18085.634  110.000        NA
Myocastor_coypus             Myocastoridae  6135.768  131.737 0.0376000
                            SD_mass  SD_gesta   SD_range
Tachyglossus_aculeatus   1218.29240  4.199500 0.79224190
Ornithorhynchus_anatinus  180.81779  2.160000 0.04300000
Ondatra_zibethicus        388.17479  3.333300 0.00000000
Mesocricetus_auratus       12.52913  0.496000         NA
Castor_canadensis        2875.61581 13.090000         NA
Myocastor_coypus          546.08335  3.425162 0.01567712
Used dataset has  49  species that match data and phylogeny
Warning messages:
1: In match_dataphy(gestaLen ~ homeRange, data = alien$data, alien$phy[[1]]) :
  NA's in response or predictor, rows with NA's were removed
2: In match_dataphy(gestaLen ~ homeRange, data = alien$data, alien$phy[[1]]) :
  Some phylo tips do not match species in data (this can be due to NA removal) species were dropped from phylogeny or data
                                    family adultMass gestaLen  homeRange
Tachyglossus_aculeatus      Tachyglossidae  4020.767   28.375 0.99911167
Ornithorhynchus_anatinus Ornithorhynchidae  1458.208   15.000 0.11200000
Ondatra_zibethicus              Cricetidae  1135.014   27.100 0.00445000
Myocastor_coypus             Myocastoridae  6135.768  131.737 0.03760000
Marmota_monax                    Sciuridae  3747.182   31.600 0.03335818
Tamiasciurus_hudsonicus          Sciuridae   209.452   35.724 0.01173571
                            SD_mass SD_gesta    SD_range
Tachyglossus_aculeatus   1218.29240 4.199500 0.792241901
Ornithorhynchus_anatinus  180.81779 2.160000 0.043000000
Ondatra_zibethicus        388.17479 3.333300 0.000000000
Myocastor_coypus          546.08335 3.425162 0.015677117
Marmota_monax             528.35266 2.275200 0.042975197
Tamiasciurus_hudsonicus    23.66808 3.072264 0.007209057

Phylogenetic tree with 49 tips and 48 internal nodes.

Tip labels:
	Tachyglossus_aculeatus, Ornithorhynchus_anatinus, Ondatra_zibethicus, Myocastor_coypus, Marmota_monax, Tamiasciurus_hudsonicus, ...

Rooted; includes branch lengths.
 [1] "Mesocricetus_auratus"     "Castor_canadensis"       
 [3] "Hystrix_brachyura"        "Chinchilla_lanigera"     
 [5] "Marmota_bobak"            "Tamias_townsendii"       
 [7] "Atlantoxerus_getulus"     "Sciurus_niger"           
 [9] "Sciurus_aureogaster"      "Oryctolagus_cuniculus"   
[11] "Macaca_arctoides"         "Macaca_mulatta"          
[13] "Macaca_fascicularis"      "Ovis_ammon"              
[15] "Ovis_aries"               "Hemitragus_jemlahicus"   
[17] "Capra_ibex"               "Rupicapra_rupicapra"     
[19] "Ovibos_moschatus"         "Gazella_subgutturosa"    
[21] "Saiga_tatarica"           "Bubalus_bubalis"         
[23] "Tragelaphus_strepsiceros" "Capreolus_capreolus"     
[25] "Rangifer_tarandus"        "Rusa_timorensis"         
[27] "Cervus_elaphus"           "Rusa_unicolor"           
[29] "Camelus_bactrianus"       "Equus_hemionus"          
[31] "Mustela_sibirica"         "Mustela_lutreola"        
[33] "Neovison_vison"           "Nasua_nasua"             
[35] "Lycalopex_griseus"        "Felis_catus"             
[37] "Pseudocheirus_peregrinus" "Bettongia_lesueur"       
[39] "Macropus_eugenii"         "Macropus_parma"          
[41] "Petrogale_lateralis"      "Petrogale_penicillata"   
[43] "Thylogale_billardierii"   "Potorous_tridactylus"    
[45] "Lasiorhinus_latifrons"   
Used dataset has  84  species that match data and phylogeny
Warning messages:
1: In match_dataphy(gestaLen ~ adultMass, data = alien$data, alien$phy) :
  NA's in response or predictor, rows with NA's were removed
2: In match_dataphy(gestaLen ~ adultMass, data = alien$data, alien$phy) :
  Some phylo tips do not match species in data (this can be due to NA removal) species were dropped from phylogeny or data
                                    family adultMass gestaLen homeRange
Tachyglossus_aculeatus      Tachyglossidae  4020.767   28.375 0.9991117
Ornithorhynchus_anatinus Ornithorhynchidae  1458.208   15.000 0.1120000
Ondatra_zibethicus              Cricetidae  1135.014   27.100 0.0044500
Mesocricetus_auratus            Cricetidae    97.125   15.500        NA
Castor_canadensis               Castoridae 18085.634  110.000        NA
Myocastor_coypus             Myocastoridae  6135.768  131.737 0.0376000
                            SD_mass  SD_gesta   SD_range
Tachyglossus_aculeatus   1218.29240  4.199500 0.79224190
Ornithorhynchus_anatinus  180.81779  2.160000 0.04300000
Ondatra_zibethicus        388.17479  3.333300 0.00000000
Mesocricetus_auratus       12.52913  0.496000         NA
Castor_canadensis        2875.61581 13.090000         NA
Myocastor_coypus          546.08335  3.425162 0.01567712
101 phylogenetic trees
 [1] "Chinchilla_lanigera"      "Marmota_bobak"           
 [3] "Tamias_townsendii"        "Atlantoxerus_getulus"    
 [5] "Sciurus_niger"            "Sciurus_aureogaster"     
 [7] "Lycalopex_griseus"        "Felis_catus"             
 [9] "Pseudocheirus_peregrinus" "Petrogale_lateralis"     

sensiPhy documentation built on April 14, 2020, 7:15 p.m.