tycho2: Get data from Tycho 2.0 database

Description Usage Arguments Details Value Examples

View source: R/tycho2.R

Description

Calls the Tycho 2.0 database using the Tycho 2.0 web API.

Usage

1
2
3
tycho2(path = "", params = NULL, queryterms = NULL, apikey = NULL,
  baseurl = "https://www.tycho.pitt.edu/api/", fixdates = NULL,
  start = NULL, end = NULL)

Arguments

path

string (optional). Must be either "query" to perform data queries, or one of the tycho 2.0 database fields to retrieve variable listings.

params

list (optional). A list of query terms in the form list(var1=value1,var2=value2,...)

queryterms

character vector (optional). Vector of query terms passed as strings in the form c("var1[operator]value1", "var2[operator]value2", ...). Dates must be passed this way using the >= and <= operators (i.e. queryterms=c("PeriodStartDate>=2000-01-01"))

apikey

string. (required). Your Project Tycho API key. This can also be passed with params or queryterms.

baseurl

string. Defaults to "https://www.tycho.pitt.edu/api/".

fixdates

"cdc", "iso", or NULL. If fixdates="cdc", PeriodStartDate and PeriodEndDate are rounded to nearest CDC epidemiological week ("epiweek") start and end days (Sunday - Saturday), respectively. If fixdates="cdc", PeriodStartDate and PeriodEndDate are rounded to nearest ISO week start and end dates (Monday - Sunday), respectively. CDC epiweeks are used for US data reporting. Elsewhere, epiweeks are synonymous with ISO weeks. Rounding is done with round2wday. This param may be necessary because some entries in the Tycho 2.0 database have incorrect dates that may be off by one day from the actual epiweek start or end dates. default=NULL.

start

Date, POSIXct, POSIXlt, or character string in "YYYY-MM-DD" format. The start date. If present, overrides "PeriodStartDate" passed to queryterms. Default = NULL

end

Date, POSIXct, POSIXlt, or character string in "YYYY-MM-DD" format. The end date. If present, overrides "PeriodEndDate" passed to queryterms. Default = NULL

Details

Project Tycho, a repository for global health data

Project Tycho is a repository for global health data in a standardized format compliant with FAIR (Findable, Accessible, Interoperable, and Reusable) guidelines.

Version 2.0 of the database currently contains:

Project Tycho 2.0 datasets are represented in a standard format registered with FAIRsharing (bsg-s000718) and include standard SNOMED-CT codes for reported conditions, ISO 3166 codes for countries and first administrative level subdivisions, and NCBI TaxonID numbers for pathogens.

Precompiled datasets with DOI's are also available for download directly from Project Tycho.

See https://www.tycho.pitt.edu/dataset/api/ for a complete documentation of the API.

tycho2()

tycho2 calls apicall with the base URL "https://www.tycho.pitt.edu/api/". If path is the name of a data field in the Tycho 2.0 database, tycho2 will return a dataframe of possible values for the field with additional information. See https://www.tycho.pitt.edu/dataset/api/ for more details. If path is "query", tycho2 will return a dataframe of case counts with associated variables for the query terms specified. See https://www.tycho.pitt.edu/dataset/api/ for more details. Queries are built from a list of key-value pairs passed to the param argument, and/or a character vector of query terms (conditions) passed to the queryterms argument. An account with Project Tycho and an API Key is required to access the database. The API Key can be retrieved from your Tycho account. The API key can be set with the apikey argument, or passed to param or queryterms. Any combination of queryterms, param and apikey can be used.

tycho2() automatically replaces spaces with %20 in the final URL.

To pull large datasets, tycho2() repeatedly calls the API to retrieve partial datasets in chunks of 5000 records until all the requested data has been received, then outputs a single large dataframe. Therefore, the limit and offset querry parameters described in the API do not need to be specified. tycho2() handles these parameters invisibly.

To avoid errors, date ranges should be specified in YYYY-MM-DD format using PeriodStartDate and PeriodEndDate query parameters with the >= and <= operators. The use of >= and <= requires passing dates using the "queryterms" argument.

Although the Tycho 2.0 database can be querried directly by passing a manually assembled API call URL to read.csv, as below...

read.csv('https://www.tycho.pitt.edu/api/query?CountryISO=US&ConditionName=Gonorrhea&apikey=YOURAPIKEY')

...use of tycho2 allows querries to be assembled more flexibly and programmatically.

Accessing the Project Tycho API using tycho2 requires an API key, which can be retrieved from your Project Tycho account. You must have a Project Tycho account to receive an API key.

The Project Tycho 2.0 database and API are by Wilbert van Panhuis (Principal Investogator), Donald Burke (Principal Investogator), Anne Cross (Database Programmer). Project Tycho is published under a Creative Commons Attribution 4.0 International Public License.

Value

dataframe with the following possible columns:

$ConditionName

factor. Name of reported condition as listed in SNOMED-CT

$ConditionSNOMED

factor. SNOMED-CT code for reported condition

$PathogenName

factor. NCBI Taxonomy organism name for pathogen causing reported condition

$PathogenTaxonID

factor. NCBI Taxonomy identifier for pathogen causing reported condition

$Fatalities

logical. Counts of reported condition ($CountValue) represent fatalities

$CountryName

factor. ISO 3166 English Short Name of country

$CountryCode

factor. ISO 3166 2-letter code for country

$Admin1Name

factor. ISO 3166-2 Name of first administrative subdivision (such as US state)

$Admin1ISO

factor. ISO 3166-2 code for first administrative subdivision

$Admin2Name

factor. Geonames Placename of second order administrative division

$CityName

factor. Geonames Name of populated place

$PeriodStartDate

Date, format: YYYY-MM-DD. Start date of time interval for which a count was reported

$PeriodEndDate

Date, format: YYYY-MM-DD. End date of time interval for which a count was reported

$PartOfCumulativeCountSeries

logical. Count is part of a series of cumulative counts (instead of being part of a series of fixed time interval counts)

$AgeRange

Ordered factor. Age range in years for which a count was reported e.g. "0-18". Max age = 130

$Subpopulation

factor. "Civilian", "Military", or "None specified"

$PlaceOfAcquisition

factor. "Domestic", "Abroad", or NA

$DiagnosisCertainty

factor. SNOMED-CT Qualifier for certainty of diagnosis for a count condition: "Definite", "Equivocal", "Possible diagnosis","Probable diagnosis", or NA

$SourceName

factor. Name of the source (system, database, institution) from which counts were obtained by the Project Tycho team

$CountValue

integer. The count value.

Variables described in detail here: https://www.tycho.pitt.edu/dataformat/ProjectTychoCustomCompiledDataFormat.pdf

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## Not run: 
# Note: retrive your API key from your Project Tycho account

# List of conditions showing "ConditionName", "ConditionSNOMED"

TYCHOKEY <- 'some1long2alphanumeric3string'
conditions <- tycho2("condition", apikey = TYCHOKEY)

# All cases of scarlet fever in California

params <- list(ConditionName = "Scarlet fever", Admin1ISO = "US-CA")
Scarlet <- tycho2("query", params = params, apikey = TYCHOKEY)

# All measles cases in California from 2000 to 2010

queryterms <- c(
  "ConditionName=Measles",
  "Admin1ISO=US-CA",
  "PeriodStartDate>=2000-01-01",
  "PeriodEndDate<=2010-01-01"
  )
Measles_CA_2000_2010 <- tycho2("query", queryterms=queryterms, apikey=TYCHOKEY)

## End(Not run)

allopole/tycho2 documentation built on Dec. 26, 2019, 2:48 a.m.