fhircrackr: Download FHIR resources"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#"
)


hook_output = knitr::knit_hooks$get('output')
knitr::knit_hooks$set(output = function(x, options) {
    if (!is.null(n <- options$out.lines)){
        if (any(nchar(x) > n)){
            index <- seq(1,nchar(x),n)
            x = substring(x, index, c(index[2:length(index)]-1, nchar(x)))
        } 
        x = paste(x, collapse = '\n# ')
    }
    hook_output(x, options)
})

# hook_warning = knitr::knit_hooks$get('warning')
# knitr::knit_hooks$set(warning = function(x, options) {
#     n <- 90
#     x = knitr:::split_lines(x)
#     # any lines wider than n should be wrapped
#     if (any(nchar(x) > n)) x = strwrap(x, width = n)
#     x = paste(x, collapse = '\n ')
#   hook_warning(x, options)
# })

This vignette covers all topics concerned with downloading resources from a server in some depth. If you are interested in a quick overview, please have a look at the fhircrackr:intro vignette.

Before running any of the following code, you need to load the fhircrackr package:

library(fhircrackr)

FHIR search requests

To download data from a FHIR server, you need to specify which resources you want to get with a FHIR search request. You can just define your search request as a simple string that you provide to fhir_search(). In that case, however, no checking of spelling mistakes of resource types and URL encoding will be done for you. If you are comfortable with this, you can skip the following paragraph, as the first part of this vignette introduces the basics of FHIR search and some functions to build valid FHIR search requests with the fhircrackr.

A FHIR search request will mostly have the form [base]/[type]?parameter(s), where [base] is the base URL to the FHIR server you are trying to access, [type] refers to the type of resource you are looking for and parameter(s) characterize specific properties those resources should have. The function fhir_url() offers a solution to bring those three components together correctly, taking care of proper formatting.

In the simplest case, fhir_url() takes only the base url and the resource type you are looking for like this:

fhir_url(url = "http://hapi.fhir.org/baseR4", resource = "Patient")

Internally, the function fhir_resource_type() is called to check the type you provided against list of all currently available resource types can be found at https://hl7.org/FHIR/resourcelist.html. Case errors are corrected automatically and the function throws a warning, if the resource type doesn't match the list under hl7.org:

fhir_resource_type(string = "Patient") #correct

fhir_resource_type(string = "medicationstatement") #fixed

fhir_resource_type(string = "medicationstatement", fix_capitalization = FALSE) #not fixed

fhir_resource_type(string = "Hospital") #an unknown resource type, a warning is issued
# Warning:
# In fhir_resource_type("Hospital") :
#   You gave "Hospital" as the resource type.
# This doesn't match any of the resource types defined under
# https://hl7.org/FHIR/resourcelist.html.
# If you are sure the resource type is correct anyway, you can ignore this warning.

Besides telling the server which resource type to give back, the resource type also determines the kinds of search parameters that are allowed. Search parameters are used to further qualify the resources you want to download, e.g by restricting the search result to Patient resources of female patients only.

You can add several parameters to the search request. If you don't give any parameters, the search will just return all resources (if not explicitly limited by the parameter max_bundles) of the specified type from the server. Search parameters generally come in the form key = value. There are also a number of resource independent parameters that can be found under https://www.hl7.org/fhir/search.html#Summary. These parameters usually have a _ at the beginning. "_sort" = "status" for examples sorts the results by their status, "_include" = "Observation:patient" includes the linked Patient resources in a search for Observation resources.

Apart from the resource independent parameters, there are also resource dependent parameters referring to elements specific to that resource type. These parameters come without a _ and you can find a list of them at the end of every resource site e.g. at https://www.hl7.org/fhir/patient.html#search for the Patient resource. An example of such a parameter would be "birthdate" = "lt2000-01-01" for patients born before the year 2000 or "gender" = "female" to get female patients only.

You can add search parameters to your request via a named list or a named character vector:

request <- fhir_url(
    url        = "http://hapi.fhir.org/baseR4",
    resource   = "Patient",
    parameters = list(
        "birthdate" = "lt2000-01-01",
        "code"      = "http://loinc.org|1751-1"))

request

As you can see, fhir_url() performs automatic url encoding and the | is transformed to %7C.

Accessing the current request

Whenever you call fhir_url() or fhir_search(), the corresponding FHIR search request will be saved implicitly and can be accessed with fhir_current_request()

If you call fhir_search() without providing an explicit request, the function will automatically call fhir_current_request().

Download FHIR resources from a server

To download resources from a server, you use the function fhir_search() and provide a FHIR search request.

Basic request

We will start with a very simple example and use fhir_search() to download Patient resources from a public HAPI server:

request <- fhir_url(url = "https://hapi.fhir.org/baseR4", resource = "Patient")

patient_bundles <- fhir_search(request = request, max_bundles = 2, verbose = 0)
patient_bundles <- fhir_unserialize(bundles = patient_bundles)

In general, a FHIR search request returns a bundle of the resources you requested. If there are a lot of resources matching your request, the search result isn't returned in one big bundle but distributed over several of them, also called pages, the size of which is determined by the FHIR server. If the argument max_bundles is not set, its default Inf will be applied. fhir_search() will then return all available bundles/pages, meaning all resources matching your request. If you set it to 2 as in the example above, the download will stop after the second bundle. Note that in this case, the result may not contain all the resources from the server matching your request, but it can be useful to first look at the first couple of search results before you download all of them.

If you want to connect to a FHIR server that uses basic authentication, you can supply the arguments username and password. If the server uses some bearer token authentication, you can provide the token in the argument token. See below for more information on authentication.

Because servers can sometimes be hard to reach, fhir_search() will start five attempts to connect to the server before it gives up. With the argument delay_between_attempts you can control the number of attempts as well the time interval between them.

As you can see in the next block of code, fhir_search() returns an object of class fhir_bundle_list where each element represents one bundle of resources, so a list of two in our case:

patient_bundles
# An object of class "fhir_bundle_list"
# [[1]]
# A fhir_bundle_xml object
# No. of entries : 20
# Self Link: http://hapi.fhir.org/baseR4/Patient
# Next Link: http://hapi.fhir.org/baseR4?_getpages=ce958386-53d0-4042-888c-cad53bf5d5a1 ...
# 
# {xml_node}
# <Bundle>
#  [1] <id value="ce958386-53d0-4042-888c-cad53bf5d5a1"/>
#  [2] <meta>\n  <lastUpdated value="2021-05-10T12:12:43.317+00:00"/>\n</meta>
#  [3] <type value="searchset"/>
#  [4] <link>\n  <relation value="self"/>\n  <url value="http://hapi.fhir.org/b ...
#  [5] <link>\n  <relation value="next"/>\n  <url value="http://hapi.fhir.org/b ...
#  [6] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837602"/ ...
#  [7] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/example-r ...
#  [8] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837624"/ ...
#  [9] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837626"/ ...
# [10] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837631"/ ...
# [11] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837716"/ ...
# [12] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837720"/ ...
# [13] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837714"/ ...
# [14] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837721"/ ...
# [15] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837722"/ ...
# [16] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837723"/ ...
# [17] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837724"/ ...
# [18] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/cfsb16116 ...
# [19] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837736"/ ...
# [20] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837737"/ ...
# ...
# 
# [[2]]
# A fhir_bundle_xml object
# No. of entries : 20
# Self Link: http://hapi.fhir.org/baseR4?_getpages=ce958386-53d0-4042-888c-cad53bf5d5a1 ...
# Next Link: http://hapi.fhir.org/baseR4?_getpages=ce958386-53d0-4042-888c-cad53bf5d5a1 ...
# 
# {xml_node}
# <Bundle>
#  [1] <id value="ce958386-53d0-4042-888c-cad53bf5d5a1"/>
#  [2] <meta>\n  <lastUpdated value="2021-05-10T12:12:43.317+00:00"/>\n</meta>
#  [3] <type value="searchset"/>
#  [4] <link>\n  <relation value="self"/>\n  <url value="http://hapi.fhir.org/b ...
#  [5] <link>\n  <relation value="next"/>\n  <url value="http://hapi.fhir.org/b ...
#  [6] <link>\n  <relation value="previous"/>\n  <url value="http://hapi.fhir.o ...
#  [7] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837760"/ ...
#  [8] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837766"/ ...
#  [9] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837768"/ ...
# [10] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837781"/ ...
# [11] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837783"/ ...
# [12] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837784"/ ...
# [13] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837787"/ ...
# [14] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837788"/ ...
# [15] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837789"/ ...
# [16] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837790"/ ...
# [17] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837791"/ ...
# [18] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837792"/ ...
# [19] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837793"/ ...
# [20] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837794"/ ...
# ...

If for some reason you cannot connect to a FHIR server at the moment but want to explore the bundles anyway, the package provides an example list of bundles containing Patient resources. See ?patient_bundles for how to use it.

More than one resource type

In many cases, you will want to download different types of FHIR resources belonging together. For example you might want to download all MedicationStatement resources with the snomed code 429374003 and also download the Patient resources these MedicationStatements refer to. The FHIR search request to do this can be built like this:

request <- fhir_url(
    url        = "https://hapi.fhir.org/baseR4/",
    resource   = "MedicationStatement",
    parameters = list(
        "code"     = "http://snomed.info/ct|429374003",
        "_include" = "MedicationStatement:subject"))

Then you provide the request to fhir_search():

medication_bundles <- fhir_search(request = request, max_bundles = 3)
medication_bundles <- fhir_unserialize(bundles = medication_bundles)

These bundles now contain two types of resources, MedicationStatement resources as well as Patient resources. If you want to have a look at the bundles, it is not very useful to print them to the console. Instead just save them as xml-files to a directory of you choice and look at the resources there:

fhir_save(bundles = medication_bundles, directory = "MyProject/medicationBundles")

If you want to have a look at a bundle like this but don't have access to a FHIR server at the moment, check out ?medication_bundles.

Authentication

If your FHIR server is protected with some kind of bearer token authentication, fhir_search() lets you provide the token as a string or as an object of class Token from the httr package. You can use fhir_authenticate() to create a token generated by an OAuth2/OpenID Connect process. See ?fhir_authenticate for more information on that topic.

Search via POST

The default behaviour of fhir_search() is to send the FHIR search request as a GET request to the server. In some special cases, however, it can be useful to use the POST based search described here instead. This is mostly the case when the URL of you FHIR search request gets long enough to exceed the allowed url length. A common scenario for this would be a request querying an explicit list of identifiers. Let's for example say you are looking for the following list of patient identifiers:

ids <- c("72622884-0a09-4ea9-9a91-685bce3b0fe3", 
         "2ca48b68-a641-4be7-a39d-9ffe2691a29a", 
         "8bcdd92d-5f96-4e07-9f6a-e22a3591ee30",
         "2067558f-c9ed-489a-9c2f-7387bb3426a2", 
         "5077b4b0-07c9-4d03-b9ec-1f9f218f8239")

You can use them comma separated in the value of the identifier search parameter like this:

id_strings <- paste(ids, collapse = ",")

But this string would make the FHIR search request URL very long, especially if it is combined with additional other search parameters.

In a search via POST, the search parameters (everything that would usually follow the resource type after the ?) can be transferred to a body of type application/x-www-form-urlencoded and sent via POST. A body of this kind can be created the same way the parameters are usually given to the parameters argument of fhir_url(), i.e. as a named list or character:

#note the list()-expression
body <- fhir_body(content = list(
    "identifier"  = id_strings,
    "_revinclude" = "Observation:patient"))

The body will then automatically be assigned the content type application/x-www-form-urlencoded. If you provide a body like this in fhir_search(), the url in request should only contain the base URL and the resource type. The function will automatically amend it with the suffix _search and perform a POST:

url <- fhir_url(url = "https://hapi.fhir.org/baseR4/", resource = "Patient")

bundles <- fhir_search(request = url, body = body)

Deal with HTTP Errors

fhir_search() internally sends a GET or POST request to the server. If anything goes wrong, e.g. because your request wasn't valid or the server caused an error, the result of you request will be a HTTP error. fhir_search() will print the error code along with some suggestions for the most common errors to the console.

To get more detailed information on the error response, you can either call fhir_recent_http_error() to print more information into the console or you can pass a string with a file name to the argument log_errors. This will write a log with error information to the specified file:

medication_bundles <- fhir_search(
    request     = request,
    max_bundles = 3,
    log_errors  = "myErrorFile")

Save the downloaded bundles

There are two ways of saving the FHIR bundles you downloaded: Either you save them as R objects, or you write them to an xml file. This is possible while downloading the bundles or after all bundles have been downloaded. The following section covers saving after downloading. See the Dealing with large data sets section for how to save bundles during downloading.

Save bundles as R objects

If you want to save the list of downloaded bundles as an .rda or .RData file, you can't just use R's save() or save_image() on it, because this will break the external pointers in the xml objects representing your bundles. Instead, you have to serialize the bundles before saving and unserialize them after loading. For single xml objects the package xml2 provides serialization functions. For convenience, however, fhircrackr provides the functions fhir_serialize() and fhir_unserialize() that can be used directly on the bundles returned by fhir_search():

#serialize bundles
serialized_bundles <- fhir_serialize(bundles = patient_bundles)

#have a look at them
head(serialized_bundles[[1]])
#create temporary directory for saving
temp_dir <- tempdir()

#save
save(serialized_bundles, file = paste0(temp_dir, "/bundles.rda"))

If you load this bundle again, you have to unserialize it before you can work with it:

#load bundles
load(paste0(temp_dir, "/bundles.rda"))
#unserialize
bundles <- fhir_unserialize(bundles = serialized_bundles)

#have a look
bundles
# An object of class "fhir_bundle_list"
# [[1]]
# A fhir_bundle_xml object
# No. of entries : 20
# Self Link: http://hapi.fhir.org/baseR4/Patient
# Next Link: http://hapi.fhir.org/baseR4?_getpages=ce958386-53d0-4042-888c-cad53bf5d5a1 ...
# 
# {xml_node}
# <Bundle>
#  [1] <id value="ce958386-53d0-4042-888c-cad53bf5d5a1"/>
#  [2] <meta>\n  <lastUpdated value="2021-05-10T12:12:43.317+00:00"/>\n</meta>
#  [3] <type value="searchset"/>
#  [4] <link>\n  <relation value="self"/>\n  <url value="http://hapi.fhir.org/b ...
#  [5] <link>\n  <relation value="next"/>\n  <url value="http://hapi.fhir.org/b ...
#  [6] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837602"/ ...
#  [7] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/example-r ...
#  [8] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837624"/ ...
#  [9] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837626"/ ...
# [10] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837631"/ ...
# [11] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837716"/ ...
# [12] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837720"/ ...
# [13] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837714"/ ...
# [14] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837721"/ ...
# [15] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837722"/ ...
# [16] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837723"/ ...
# [17] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837724"/ ...
# [18] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/cfsb16116 ...
# [19] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837736"/ ...
# [20] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837737"/ ...
# ...
# 
# [[2]]
# A fhir_bundle_xml object
# No. of entries : 20
# Self Link: http://hapi.fhir.org/baseR4?_getpages=ce958386-53d0-4042-888c-cad53bf5d5a1 ...
# Next Link: http://hapi.fhir.org/baseR4?_getpages=ce958386-53d0-4042-888c-cad53bf5d5a1 ...
# 
# {xml_node}
# <Bundle>
#  [1] <id value="ce958386-53d0-4042-888c-cad53bf5d5a1"/>
#  [2] <meta>\n  <lastUpdated value="2021-05-10T12:12:43.317+00:00"/>\n</meta>
#  [3] <type value="searchset"/>
#  [4] <link>\n  <relation value="self"/>\n  <url value="http://hapi.fhir.org/b ...
#  [5] <link>\n  <relation value="next"/>\n  <url value="http://hapi.fhir.org/b ...
#  [6] <link>\n  <relation value="previous"/>\n  <url value="http://hapi.fhir.o ...
#  [7] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837760"/ ...
#  [8] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837766"/ ...
#  [9] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837768"/ ...
# [10] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837781"/ ...
# [11] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837783"/ ...
# [12] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837784"/ ...
# [13] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837787"/ ...
# [14] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837788"/ ...
# [15] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837789"/ ...
# [16] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837790"/ ...
# [17] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837791"/ ...
# [18] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837792"/ ...
# [19] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837793"/ ...
# [20] <entry>\n  <fullUrl value="http://hapi.fhir.org/baseR4/Patient/1837794"/ ...
# ...

After unserialization, the pointers are restored and you can continue to work with the bundles. Note that the example bundles medication_bundles and patient_bundles that are provided with the fhircrackr package are also provided in their serialized form and have to be unserialized as described on their help page.

Save and load bundles as xml files

If you want to store the bundles in xml files instead of R objects, you can use the functions fhir_save() and fhir_load(). fhir_save() takes a list of bundles in form of xml objects (as returned by fhir_search()) and writes them into the directory specified in the argument directory. Each bundle is saved as a separate xml-file named by its index, e.g. 3.xml for the third downloaded bundle. If the folder defined in directory doesn't exist, it is created in the current working directory.

#save bundles as xml files
fhir_save(bundles = patient_bundles, directory = temp_dir)

To read bundles saved with fhir_save() back into R, you can use fhir_load():

bundles <- fhir_load(directory = temp_dir)

fhir_load() takes the name of the directory (or path to it) as its only argument. All xml-files in this directory will be read into R and returned as a list of bundles in xml format just as returned by fhir_search().

Dealing with large data sets

If you need to download a particularly large data set from a FHIR server this can lead to challenges in two areas: computation time and memory usage. Downloading the FHIR Bundles will be time consuming because the paging mechanism leading from one bundle to the next is not optimized for speed in most FHIR server implementations and neither is the execution of complex search queries. Keeping a lot of bundles in the working memory is memory consuming because the xml structures contain a lot of overhead that will be removed, once the relevant bits of information will be transferred into a table.

There are several options to alleviate these problems, a couple of which will be shown in the following.

Saving memory

1. Download trimmed resources using _elements

If you know you are just going to need a few elements from each resource, you can restrict the downloaded resources to those elements, which will result in much smaller resources and thus much smaller bundles. The following example downloads the first bundle of Patient resources that are trimmed down to the name, gender and birthDate elements, which are specified in the _elements parameter. _elements takes a comma separated list of base level elements for the resource and will make sure that the downloaded resources only contain those elements plus the mandatory elements id and meta. The _count parameter in the examples restricts the number of resources in the bundle to 2, which is just done to make the result more printable for this vignette.

request <- fhir_url(url = "http://hapi.fhir.org/baseR4", 
                    resource = "Patient",
                    parameters = c("_elements" = "name,gender,birthDate",
                                   "_count"= "2"))


bundles <- fhir_search(request, max_bundles = 1)

cat(toString(bundles[[1]]))
# <Bundle>
#   <id value="8e0db3ce-817b-48cd-ba3e-1a0d20f64366"/>
#   <meta>
#     <lastUpdated value="2022-03-31T08:48:55.934+00:00"/>
#   </meta>
#   <type value="searchset"/>
#   <link>
#     <relation value="self"/>
#     <url value="http://hapi.fhir.org/baseR4/Patient?_count=2&amp;_elements=name%2Cgender%2CbirthDate"/>
#   </link>
#   <link>
#     <relation value="next"/>
#     <url value="http://hapi.fhir.org/baseR4?_getpages=8e0db3ce-817b-48cd-ba3e-1a0d20f64366&amp;_getpagesoffset=2&amp;_count=2&amp;_pretty=true&amp;_bundletype=searchset&amp;_elements=birthDate,gender,name"/>
#   </link>
#   <entry>
#     <fullUrl value="http://hapi.fhir.org/baseR4/Patient/2564886"/>
#     <resource>
#       <Patient>
#         <id value="2564886"/>
#         <meta>
#           <versionId value="1"/>
#           <lastUpdated value="2021-09-28T01:04:35.774+00:00"/>
#           <source value="#rxRuwftRVG3erMwy"/>
#           <tag>
#             <system value="http://terminology.hl7.org/CodeSystem/v3-ObservationValue"/>
#             <code value="SUBSETTED"/>
#             <display value="Resource encoded in summary mode"/>
#           </tag>
#         </meta>
#         <name>
#           <text value="반영훈 사원"/>
#           <family value="반"/>
#           <given value="영훈"/>
#           <prefix value="사원"/>
#         </name>
#         <gender value="male"/>
#         <birthDate value="1992-01-12"/>
#       </Patient>
#     </resource>
#     <search>
#       <mode value="match"/>
#     </search>
#   </entry>
#   <entry>
#     <fullUrl value="http://hapi.fhir.org/baseR4/Patient/2564911"/>
#     <resource>
#       <Patient>
#         <id value="2564911"/>
#         <meta>
#           <versionId value="1"/>
#           <lastUpdated value="2021-09-28T01:12:59.207+00:00"/>
#           <source value="#rmWF4JDz6p1WVwzl"/>
#           <security>
#             <system value="http://terminology.hl7.org/CodeSystem/v2-0203"/>
#             <code value="RM"/>
#           </security>
#           <tag>
#             <system value="http://terminology.hl7.org/CodeSystem/v2-0203sbtest05"/>
#             <code value="SBTest05m"/>
#           </tag>
#           <tag>
#             <system value="http://terminology.hl7.org/CodeSystem/v3-ObservationValue"/>
#             <code value="SUBSETTED"/>
#             <display value="Resource encoded in summary mode"/>
#           </tag>
#         </meta>
#         <name>
#           <use value="usual"/>
#           <text value="human name"/>
#           <family value="Jonathan"/>
#           <given value="token_sort_test_data05"/>
#         </name>
#         <gender value="male"/>
#         <birthDate value="2021-09-01"/>
#       </Patient>
#     </resource>
#     <search>
#       <mode value="match"/>
#     </search>
#   </entry>
# </Bundle>

As you can see, the resulting Bundle is much smaller than it would be if the full resources where downloaded.

2. Batch process bundles by saving them to hard drive

You can spare working memory by saving the bundles to your hard drive during the download instead of keeping them all in the working memory of your R session at once. If you pass the name of a directory to the argument save_to_disc in your call to fhir_search(), the bundles will not be combined in a bundle list that is returned when the downloading is done, but will instead be saved as xml-files to the directory specified in the argument directory one by one. If the directory you specified doesn't exist yet, fhir_search() will create it for you. This way, the R session will only have to keep one bundle at a time in the working memory. You can later load them using fhir_load() and crack them one after another:

request <- fhir_url(url = "http://hapi.fhir.org/baseR4", resource = "Patient")

fhir_search(
    request      = request,
    max_bundles  = 10,
    save_to_disc = "MyProject/downloadedBundles"
    )

bundles<- fhir_load(directory = "MyProject/downloadedBundles")

3. Batch process bundles by downloading them piece by piece

Alternatively, you can also use fhir_next_bundle_url(). This function returns the url to the next bundle from you most recent call to fhir_search():

assign(x = "last_next_link", value = fhir_url( "http://hapi.fhir.org/baseR4?_getpages=0be4d713-a4db-4c27-b384-b772deabcbc4&_getpagesoffset=200&_count=20&_pretty=true&_bundletype=searchset"), envir = fhircrackr:::fhircrackr_env)

To get a better overview, we can split this very long link along the &:

strsplit(fhir_next_bundle_url(), "&")

You can see two interesting numbers: _count=20 tells you that the queried hapi server has a default bundle size of 20. getpagesoffset=200 tells you that the bundle referred to in this link starts after resource no. 200, which makes sense since the fhir_search() request above downloaded 10 bundles with 20 resources each, i.e. 200 resources. If you use this link in a new call to fhir_search, the download will start from this bundle (i.e. the 11th bundle with resources 201-220) and will go on to the following bundles from there.

When there is no next bundle (because all available resources have been downloaded), fhir_next_bundle_url() returns NULL.

If a download with fhir_search() is interrupted due to a server error somewhere in between, you can use fhir_next_bundle_url() to see where the download was interrupted.

You can also use this function to avoid memory issues. The following block of code utilizes fhir_next_bundle_url() to download all available Observation resources in small batches of 10 bundles that are immediately cracked and saved before the next batch of bundles is downloaded. Note that this example can be very time consuming if there are a lot of resources on the server. To limit the number of iterations uncomment the if statement at the end of the while loop:

#Starting fhir search request
url <- fhir_url(
    url        = "http://hapi.fhir.org/baseR4",
    resource   = "Observation",
    parameters = list("_count" = "500"))

count <- 0

table_description <- fhir_table_description(resource = "Observation")

while(!is.null(url)){

    #load 10 bundles
    bundles <- fhir_search(request = url, max_bundles = 10) 

    #crack bundles
    dfs <- fhir_crack(bundles = bundles, design = table_description)

    #save cracked bundle to RData-file (can be exchanged by other data type)
    save(tables, file = paste0(tempdir(), "/table_", count, ".RData"))

    #retrieve starting point for next 10 bundles
    url <- fhir_next_bundle_url()

    count <- count + 1
    # if(count >= 20) {break}
}

Saving download time

In most cases the bottle neck in your analysis will be the download time from the server, because most FHIR server are optimized for handling a lot of simultaneous small requests instead of a single big one. You can gain time by splitting up your request into chunks and sending it to the server in parallel using a parallelized version of lapply() but there are couple of issues to keep in mind.

Operating system

The easiest to use version of parallelization is the function parallel::mclapply() which uses forking to process list elements from a lapply() call in parallel. As windows doesn't support forking, this solution only can only be used on osx or linux operating systems. If you want to achieve similar results on a windows machine, you can either run the fhircrackr in an R installation/RStudio Server that you set up in WSL2 (see here for an installation guide) or you can try out the windows mclapply hack written by Nathan vanHoudnos.

Breaking pointers

The xml objects that represent the FHIR bundles contain external pointers that will break when they are exported to/from a cluster. This means that objects of type fhir_bundle or fhir_bundle_list always have to be serialized using fhir_serialize() when they are downloaded in parallel.

Splitting up requests

Splitting up a FHIR request isn't always trivial. We'll show you two scenarios where you can split up a request into smaller chunks.

a) You have a list of resource ids or a list of identifiers (e.g. patient identifiers) for which you intend to download the corresponding resources. This is the most simple case, because here you just have to split up the vector of ids that you have into smaller chunks and then send one FHIR search request per chunk. You can to that with fhir_search() but there is also a convenience function for exactly that use case called fhir_get_resources_by_ids(). The following minimal example of course only works if the ids defined here are actually found on the server:

# define list of Patient resource ids
ids <- c("4b7736c3-c005-4383-bf7c-99710811efd9", "bef39d3a-62bb-48c0-83ff-3bb70b51d831",
         "f371ed2f-5cb0-4093-a491-9df6e6bfcdf2", "277c4631-955e-4b52-bd40-78ddcde333b1",
         "72173a13-d32f-4489-a7b4-dfc301df087f", "4a97acec-028e-4b45-a72f-2b7e08cf80ba")

#split into smaller chunks of 2
id_list <- split(ids, ceiling(seq_along(ids)/2))

#Define function that downloads one chunk of patients and serializes the result
extract_and_serialize <- function(x){
                            b <- fhir_get_resources_by_ids(base_url = "http://hapi.fhir.org/baseR4",
                                                           resource = "Patient",
                                                           ids = x)
                            fhir_serialize(b)
}

#Download using 2 cores on linux:
bundles_serialized <- parallel::mclapply(
    X = pat_list,
    FUN = extract_and_serialize,
    mc.cores = 2
)

#Unserialize the resulting list and create one fhir_bundle_list object from it
bundles_unserialized <- lapply(bundles_serialized, fhir_unserialize)
result <- fhir_bundle_list(unlist(bundles_unserialized, recursive = FALSE))

b) You have a request that downloads multiple resource types, like "http://hapi.fhir.org/baseR4/Encounter?_include=Encounter:patient", which downloads all Encounters as well as the Patient resources the Encounter is referencing. This type of request will often take a lot of time and can (depending on your system) be sped up if you only load the encounters in a first step, extract the ids of the referenced Patient resources and download those in parallel in a second step:

#Download all Encounters
encounter_bundles <- fhir_search(request = "http://hapi.fhir.org/baseR4/Encounter")

#Flatten
encounter_table <- fhir_crack(
    bundles = encounter_bundles,
    design = fhir_table_description(resource = "Encounter")
)

#Extract Patient ids
pat_ids <- sub("Patient/", "", encounter_table$subject.reference)

#Split into chunks of 20
pat_id_list <- split(pat_ids, ceiling(seq_along(pat_ids)/20))

#Define function that downloads one chunk and serializes the result
extract_and_serialize <- function(x){
                            b <- fhir_get_resources_by_ids(base_url = "http://hapi.fhir.org/baseR4",
                                                           resource = "Patient",
                                                           ids = x)
                            fhir_serialize(b)
}

#Download using 4 cores on linux:
bundles_serialized <- parallel::mclapply(
    X = pat_id_list,
    FUN = extract_and_serialize,
    mc.cores = 4
)

#Unserialize the resulting list and create one fhir_bundle_list object from it
bundles_unserialized <- lapply(bundles_serialized, fhir_unserialize)
result <- fhir_bundle_list(unlist(bundles_unserialized, recursive = FALSE))

Download random samples from a server

Sometimes it can be useful to download a random sample of resources from a server. The fhircrackr offers a function fhir_sample_resources() which takes a base url, a resource type and (optionally) some FHIR Search parameters and returns a random sample with a given size of those resources. For example you could download 10 random Patient resources of all female patients born before 1960 like this:

bundle <- fhir_sample_resources(
    base_url    = "http://hapi.fhir.org/baseR4",
    resource    = "Patient",
    parameters  = c(gender = "female", birthdate = "lt1960-01-01"),
    sample_size = 10
)
bundle <- fhir_unserialize(fhircrackr:::female_pat_bundle)

This request may take some time because in the first step, the resource (aka logical) IDs of all resources matching the request (i.e. all Patient resources of females born before 1960) are downloaded. This is necessary because the sampling is actually done in this vector of resource IDs.

The following code shows that the result is actually 10 Patient resources who are female and born before 1960. If you want to know more about how to extract information from the resources like this, please see the vignette on flattening resources.

pat <- fhir_table_description(resource = "Patient",
                              cols = c("id", "gender", "birthDate"))

fhir_crack(bundles = bundle, design = pat)

Internally fhir_sample_resources() performs the following steps:

1) Extract the logical IDs of all resources matching the resource type and search parameters given in resource and parameters with the function fhir_get_resource_ids(). This function uses the _elements parameter of FHIR Search to avoid downloading all resources in full and you can use this function as a standalone function too, see ?fhir_get_resource_ids().

2) Draw a random sample (without replacement) from the vector of IDs created in 1).

3) Download the resources belonging to the sampled IDs using fhir_get_resources_by_ids()

If you want to sample resources based on another element then the logical ID, e.g. based on an identifier value or based on a reference, you can use the function fhir_sample_resources_by_ids() provided you have a vector of identifiers/references you want to sample from. Note that in this case the number of actually returned resources won't necessarily match the number in sample_size, because as opposed to the logical ID, an identifier or reference doesn't have to be unique for each resource.

Download Capability Statement

The capability statement documents a set of capabilities (behaviors) of a FHIR Server for a particular version of FHIR. You can download this statement using the function fhir_capability_statement():

cap <- fhir_capability_statement(url = "http://hapi.fhir.org/baseR4")

fhir_capability_statement() takes the base URL of a FHIR server and returns a list of three data frames containing all information from the capability statement of this server. The first one is called Meta and contains some general server information. The second is called Rest and contains information on the operations the server implements. The third is called Resources and gives information on the resource types and associated parameters the server supports. This information can be useful to determine, for example, which FHIR search parameters are implemented in you FHIR server.

A note on HTML in resources

FHIR resources can contain a considerable amount of HTML code (e.g. in a narrative object), which is often created by the server for example to provide a human-readable summary of the resource. This data is usually not the aim of structured statistical analysis, so in the default setting fhir_search() will remove the html parts immediately after download to reduce memory usage (on a hapi server typically by around 30%, see fhir_rm_div()). The memory gain is payed with a runtime increase of 10%-20%. The html removal can be disabled by setting rm_tag = NULL to increase speed at the cost of increased memory usage.

Next steps

To learn about how fhircrackr allows you to convert the downloaded FHIR resources into data.frames/data.tables, see the vignette on flattening FHIR resources.



Try the fhircrackr package in your browser

Any scripts or data that you put into this service are public.

fhircrackr documentation built on Nov. 19, 2022, 1:07 a.m.