README.md

output: github_document


  library(envReport)
  library(envImport)
  library(magrittr)

envImport

The goal of envImport is to obtain, and make seamlessly useable, environmental data from disparate data sources, for a geographic area of interest.

Installation

You can install the development version of envImport from GitHub with:

# install.packages("devtools")
devtools::install_github("Acanthiza/envImport")

Supported data sources

data_name = 'data source'. Data sources are (usually) obvious sources of data. Examples are the Global Biodiversity Infrastructure Facility (GBIF), Atlas of Living Australia (ALA) or Terrestrial Ecosystems Network (TERN). There are 11 data sources currently supported (also see envImport::data_map):

Four of these sources are publicly available (GBIF, ALA, HAVPlot and TERN).

General workflow

data_map

The data_map (see table below) provides a mapping from original data sources to the desired columns in the assembled data set.

Table: Data map of desired columns in the assembled data (col) and names of columns in the original data. Where a column name from the original data source does not match columns in the original data source, the get_x function has usually created a new column to better meet the requirements of the final combined data set

|col |gbif |tern |galah |havplot | |:--------------|:----------------------------------------|:-----------------------------|:-----------------------------|:------------------------------------------------------------------------------------| |data_name |gbif |tern |galah |havplot | |epsg |4326 |4326 |4326 |4326 | |site |gbifID |site_unique |locationID |plotName | |date |eventDate |visit_start_date |eventDate |obsStartDate | |lat |decimalLatitude |latitude |decimalLatitude |decimalLatitude | |long |decimalLongitude |longitude |decimalLongitude |decimalLongitude | |original_name |scientificName |species |scientificName |scientificName | |common |NA |NA |vernacularName |NA | |nsx |NA |NA |organismID |NA | |occ_derivation |occurrenceStatus |NA |occurrenceStatus |abundanceValue | |quantity |organismQuantity |NA |organismQuantity |abundanceValue | |survey_nr |NA |NA |NA |NA | |survey |NA |NA |datasetName |projectID | |ind |NA |NA |NA |NA | |rel_metres |coordinateUncertaintyInMeters |NA |coordinateUncertaintyInMeters |coordinateUncertaintyInMetres | |sens |NA |NA |NA |NA | |lifeform |NA |lifeform |NA |NA | |lifespan |NA |NA |NA |NA | |cover |NA |cover |NA |cover | |cover_code |NA |NA |NA |NA | |height |NA |height |NA |NA | |quad_metres |NA |quad_metres |NA |quad_metres | |epbc_status |NA |NA |NA |NA | |npw_status |NA |NA |NA |NA | |method |samplingProtocol |NA |samplingProtocol |abundanceMethod | |obs |recordedBy |observer_veg |recordedBy |individualName | |denatured |informationWithheld |NA |generalisationInMetres |NA | |kingdom |kingdom |kingdom |kingdom |kingdom | |desc |Global biodiversity information facility |Terrestrial ecosystem network |Atlas of Living Australia |Harmonised Australian Vegetation Plot dataset | |data_name_use |GBIF |TERN |ALA |HAVPlot | |url |https://www.gbif.org/ |https://www.tern.org.au/ |https://www.ala.org.au/ |https://researchdata.edu.au/harmonised-australian-vegetation-dataset-havplot/1950860 | |order |11 |4 |10 |3 |

get_x

get_x functions get data from the data source x. Results are always saved to disk (as getting data can be slow). When run again, they load from the saved file by default. If available, get_x functions use any R packages and functions provided by the data source (e.g. TERN provides ausplotsR [@]). The first arguments to get_x functions are always:

Only the get_x functions for publicly available data are available within envImport.

Within get_x functions the following steps are taken:

get_x functions can be run from get_data.

Combine

No specific functions are provided for combining data. The following are possible (assuming 'files' is a vector of file names resulting from get_x):

rio::import is possibly more robust to differences in schema when importing files (based on observation - needs testing).

Cleaning

envImport does not clean data. Any combined dataset is likely to contain all sorts of duplication and other spurious records. For help cleaning data, see, for example:



Acanthiza/envImport documentation built on Aug. 14, 2024, 8:18 a.m.