ed_process | R Documentation |
This function takes raw data downloaded from EIDITH and puts it through
various preprocessing and cleaning steps. In general there is no need to
call this function directly - it is called by both ed_db_download()
and the direct download functions.
ed_process(dat, endpt)
dat |
The data as exported from EIDITH and imported via the |
endpt |
The name of the API URL endpoints: one of "Event", "Animal", "Specimen", "Test", or "TestIDSpecimenID" (for test-specimen cross referencing). Note these are different than the names of the tables stored locally (which are lowercase and plural). |
Steps taken to clean the data include:
Converting variable names from camelCase
to snake_case
to make it easy
to distinguish between raw and cleaned data.
Converting some variable names to clearer ones: all _id
variables are
numeric primary keys, other identifiers now go by _id_name
.
Where there are multiple _id_name
-type columns that are very similar except for a small set of cases, we drop all but one for ease of use. These can be retrieved from raw data if needed.
Dropping columns that are entirely blank
Dropping redundant columns
Cleaning up whitespace and capitalization variability
Re-arranging table order to put the most pertinent information first.
Normalizing all animal taxonomic information to match the ITIS database.
Coercing some free-form entries (e.g. specimen_type
) to a standard set of categories
Converting yes/no fields to TRUE/FALSE
Fixing spelling errors
Extracting common TRUE/FALSE variables from free-form text of viral interpretation (Genbank numbers and whether virus is known).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.