odns
provides a base for exploring and obtaining data available through the Scottish Health and Social Care Open Data platform. The package provides a wrapper for the underlying CKAN) API and simplifies the process of accessing the available data with R, allowing users to quickly explore the available data and start using it without having to write complex queries.
CKAN and by extension this package refers to packages and resources.
The term package refers to a dataset, a collection of resources. A resource, contains the data itself.
Example of structure;
. ├── package_1 │ ├── resource_1 │ ├── resource_2 │ └── resource_3 | └── package_2 ├── resource_1 └── resource_2
To view all the packages available we can use all_packages()
.
#' view all available packages > all_packages() # package_name package_id # covid-19-vaccination-in-scotland 6dbdd466-… # enhanced-surveillance-of-covid-19-in-scotland 3c5231ee-… # hospital-onset-covid-19-cases-in-scotland d67b13ef-… # weekly-covid-19-statistical-data-in-scotland 524b42b4-… # covid-19-in-scotland b318bddf-… # … with 85 more rows
If we wish to search for packages by name, or a part of a name, we can use the contains
argument of all_packages()
. For example if we want to find packages relating to populations;
> all_packages(contains = "population") # package_name package_id # population-estimates 7f010430-6ce1-4813-b25c-f7f335bdc4dc # standard-populations 4dd86111-7326-48c4-8763-8cc4aa190c3e # population-projections 9e00b589-817e-45e6-b615-46c935bbace0 # gp-practice-populations e3300e98-cdd2-4f4e-a24e-06ee14fcc66c # scottish-index-of-multiple-deprivation 78d41fa9-1a62-4f7b-9edb-3e8522a93378
If we prefer to see more detail, including the names of resources within packages, we can use all_resources()
with the package_contains
argument.
#' view all resources within packages whose names contain "population" > all_resources(package_contains = "population") # resource_name resource_id package_name package_id url last_modified # Data Zone (2011) Pop… c505f490-c… population-… 7f010430-… http… 2021-10-11T1… # Intermediate Zone (2… 93df4c88-f… population-… 7f010430-… http… 2021-10-11T1… # Council Area (2019) … 09ebfefb-3… population-… 7f010430-… http… 2021-07-06T0… # Health and Social Ca… c3a393ce-2… population-… 7f010430-… http… 2021-07-06T0… # Health Board (2019) … 27a72cc8-d… population-… 7f010430-… http… 2021-07-06T0… # … with 53 more rows
The resource_contains
argument of all_resources()
can also be used to further narrow the results of a query.
#' view all resources whose names contain "population" > all_resources(resource_contains = "european") # resource_name resource_id package_name package_id url last_modified # Population mortality… ec2af2be-8… hospital-st… c88a5231-… http… 2022-05-10T0… # GP Practice Populati… 2c701f90-c… gp-practice… e3300e98-… http… 2022-05-10T0… # GP Practice Populati… d07debcf-7… gp-practice… e3300e98-… http… 2022-02-07T1… # GP Practice Populati… 4a3c438b-2… gp-practice… e3300e98-… http… 2021-11-02T1… # GP Practice Populati… 0779e100-1… gp-practice… e3300e98-… http… 2022-02-17T1… # … with 45 more rows
When passing search strings they are case insensitive. The example above of all_resources(resource_contains = "european")
would return resources contained 'EUROPEAN', 'European', or european.
Lets say that we are interested in 'standard-populations' resource 'European Standard Population'. We can view the metadata for the package and the resource.
#' view metadata for a package using a valid package name > package_metadata(package = "standard-populations") # $nhs_language # [1] "English" # # $license_title # [1] "UK Open Government Licence (OGL)" # # $maintainer # [1] "" # # $version # [1] "" # #... #' view metadata for a package using a valid package id > package_metadata(package = "4dd86111-7326-48c4-8763-8cc4aa190c3e") # $nhs_language # [1] "English" # # $license_title # [1] "UK Open Government Licence (OGL)" # # $maintainer # [1] "" # # $version # [1] "" # #... #' view metadata for a resource using a valid resource id > resource_metadata(resource="edee9731-daf7-4e0d-b525-e4c1469b8f69") # $cache_last_updated # named list() # # $cache_url # named list() # # $mimetype_inner # named list() # # $hash # [1] "" # # $description # [1] "Different countries across Europe have different population ... # # $format # [1] "CSV" #...
The package and resource metadata contains useful information such as the time it was last modified, the publisher, a description, and notes.
It is also possible to view the data items available within a resource along with their types. This information is particularly useful when putting together a query where we want to return only a subset of data.
> resource_data_items(resource="edee9731-daf7-4e0d-b525-e4c1469b8f69") # id type # 1 _id int # 2 AgeGroup text # 3 EuropeanStandardPopulation numeric
We can import all of the resources within a package using get_resource()
. This is often the quickest and simplest way to import data where all resources within one or more packages are required.
#' get all resources in a package > get_resource(package = "4dd86111-7326-48c4-8763-8cc4aa190c3e") #' get the first 10 rows of each resource in a package > get_resource(package = "4dd86111-7326-48c4-8763-8cc4aa190c3e", limit = 10L) #' both package IDs and names can be used > get_resource(package = "standard-populations", limit = 10L) #' multiple packages can be specified returning all resources under each > get_resource(package = c("standard-populations", "population-projections")
We can also use the resource
argument of get_resource()
to import a specific resource within a package. This is often the simplest way to get a single resource in its entirety.
#' get specific resources > get_resource( resource = c("European Standard Population", "9e00b589-817e-45e6-b615-46c935bbace0"), limit = 5L ) #' get a specific resource, if it exists within a specified package > get_resource( package = "standard-populations", resource = "European Standard Population" )
get_resource()
always returns a list, even when only one resource is being imported. Where multiple resources have been imported, each resource is its own list element.
Where more granular control is desired over the data imported, the get_data()
function can be used. get_data()
allows us to import selected fields and to filter the data. If we want to import the fields 'AgeGroup' and 'EuropeanStandardPopulation' from the 'European Standard Population' resource we can achieve this with get_data()
.
# import specified fields from a resource > get_data( resource = "edee9731-daf7-4e0d-b525-e4c1469b8f69", fields = c("AgeGroup", "EuropeanStandardPopulation") ) # AgeGroup EuropeanStandardPopulati… # 0-4 years 5000 # 5-9 years 5500 # 10-14 years 5500 # 15-19 years 5500 # 20-24 years 6000 # … with 14 more rows
The where
argument of get_data()
can be used to exact further control over the data we import. If we only want to retrieve rows from 'European Standard Population' where the age group is 45-49 years, we can write a SQL style where query to achieve this.
```
get_data( resource = "edee9731-daf7-4e0d-b525-e4c1469b8f69", fields = c("AgeGroup", "EuropeanStandardPopulation"), where = "\"AgeGroup\" = \'45-49 years\'" )
```
The where
argument of get_data()
requires specific formatting to allow for compatibility with the CKAN API. Field names must be double quoted "
, non-numeric values must be single quoted '
, and both single and double quotes must be delimited, for example; where = "\"AgeGroup\" = \'45-49 years\\'"
.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.