Most of the functions in this package are general function and can be used with any data set and use data masking (basically one can refer to column names without using quotation marks) but the defaults are the names from RIS-PACS data.
There are some rules to the prifixes used: The prefixes dog_
and
cat_
, are the general functions that can be used with any dataset.
Functions that start with dog_
are for preproccesing a data frame and
data manipulation, e.g. creating new columns, grouped counts etc. and
functions that start with cat_
are plotting functions. A few functions
are desinged only for RIS-PACS data and they have the prefix ris_
.
Since this package is higly dependent on the tidyverse packages, I recomment installing the following packages from CRAN first:
install.packages("tidyverse")
install.packages("lubridate")
install.packages("devtools")
When you have these packages then install this package with the following code:
devtools::install_github("davidbaniadam/rispacs")
The purpose of this README (originaly a vignette) is to summarize how one should work with RIS-PACS data and restrict the analysis to different areas within Radiology or Clinical Physiology and Nuclear Medicine. We will also see some of the functions in this package.
RIS data is in long format i.e. one Accession number (“the bar code of a scanning”) can have multiple rows, and each row contains only one billing code. RIS can be compared to “procedurertabel” in DRG or LPR. This form is optimal if one wants to count billing codes. Bellow is an example:
library(rispacs)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
ris_test
#> # A tibble: 4 x 4
#> Accession_number Billing_code Procedure_priority Patient_RK
#> <chr> <chr> <chr> <chr>
#> 1 REGH1 SKS1 A 1
#> 2 REGH2 SKS2 A 1
#> 3 REGH2 SKS3 A 1
#> 4 REGH2 SKS4 A 1
We se that every accession_number
can have multiple rows, because
every billing_code
has one row. With real data we might have NA
in
e.g. Procedure_priority
. We should see if it is a real missing value
or caused by the structure of the data.
The structure of PACS data (for the above RIS example) is as follows:
pacs_test
#> # A tibble: 2 x 3
#> Accession_number Study_date_time_PACS Source
#> <chr> <dttm> <chr>
#> 1 REGH1 2018-01-01 15:00:00 CT1
#> 2 REGH2 2018-01-01 15:20:00 CT1
PACS registration are from the machines and not manually registered.
This is why the timestamps from PACS are more accurate than the
timestamps in RIS. We can join RIS with PACS by accession_number
. The
most apporpriate join is a LEFT JOIN
with RIS being the LEFT
table.
As with any join, care most be taken:
NULL
in RIS does not make sense.So in order to to join RIS and PACS we need to:
NA
.The PACS data set should only contain one row per Accession number, although this is not always the case and a few Accesssion numbers migth have two rows in PACS (very rare). In this scenario we only keep the row with the newest timestamp, since keeping both will lead to dublicates when we join by Accession number. We could examine why we have a case like this, but for now we are forced to only keep one row and I choose the row with the latest Study_date_time_PACS.
One can either choose to subset the data for the analysis with RIS and then join with PACS or join RIS with PACS and then subset the data.
RIS-PACS data contains 5 areas of radiology:
And also Clinical Physiology and Nuclear Medicine.
Bellow is a list of the importatant columns that we know exist in RIS and PACS: RIS:
patient_status_1
procedure_name
billing_code
billing_code_modifier
accession_number
procedure_priority
latest_ris_status
preliminary_report_date_time
first_final_report_date_time
order_business_key
medical_record_number
PACS:
Accession_number
Study_date_time_PACS
Actual_Room_number
Source
Modality_type
We subset the data with RIS data and there is not difference if we have joined with PACS or not.
The best way to subset the data to the area we want to examine, is to
use billing_code
. E.g. if we want to look at CT we can use a filter on
all Billing codes starting with UXC. The next is billing_code_modifier
which includes information that contains supplementary information for
the billing_code
, such as contrast, left side or right side etc. For
CT and MR we are only interested in the rows where
billing_code_modifier starts with UXZ.
With ultrasound and X-ray things are more complicated since we might
have billing codes for a accession number that start with “K”. The
easiest way to subset the data in this case is to use a subquery in
SQL
.
Bellow we will create a cookbooks on how to subset for different areas.
An ultrasound examinatin has other billing_codes than UXU, e.g. billing_codes that start with K. So we need to subset for ultrasound like this:
SELECT *
FROM RIS
WHERE accession_number IN (SELECT accession_number FROM RIS WHERE billing_code LIKE 'UXU%')
So this query keeps all the rows where the accession number has had a billing_code that started with UXU.
To make RIS data ready for analysis for CT we must subset RIS as follows:
SELECT *
FROM RIS
WHERE billing_code LIKE 'UXC%' AND billing_code_modifier LIKE 'UXZ%'
Basically we exclude codes TUL from billing_code_modifier
as they will
create dublicates for billing_code
.
MR is as CT:
SELECT *
FROM RIS
WHERE billing_code LIKE 'UXM%' AND billing_code_modifier LIKE 'UXZ%'
The main steps in preparing RIS-PACS data are the following:
Include: Harcode missing values with informative text.
pacs_clean <- pacs_test %>%
arrange(Accession_number, Study_date_time_PACS) %>%
distinct(Accession_number, .keep_all = TRUE) # Nothing happens here because we don't have duplicates.
pacs_clean
#> # A tibble: 2 x 3
#> Accession_number Study_date_time_PACS Source
#> <chr> <dttm> <chr>
#> 1 REGH1 2018-01-01 15:00:00 CT1
#> 2 REGH2 2018-01-01 15:20:00 CT1
We are now ready to join the tables:
rispacs_left <- left_join(ris_test, pacs_clean, by = "Accession_number") %>%
arrange(Accession_number, Procedure_priority)
rispacs_left
#> # A tibble: 4 x 6
#> Accession_number Billing_code Procedure_prior… Patient_RK
#> <chr> <chr> <chr> <chr>
#> 1 REGH1 SKS1 A 1
#> 2 REGH2 SKS2 A 1
#> 3 REGH2 SKS3 A 1
#> 4 REGH2 SKS4 A 1
#> # … with 2 more variables: Study_date_time_PACS <dttm>, Source <chr>
We can create a new definition af a “scan” based on Accession number
.
If the same person has had a new Accession number within a specified
time period (e.g. 30 minuttes) we can combine those Accession numbers.
As an example person X gets a Accession number at 11:05 and a new
Accession number at 11:25. Since the two Accession numbers are within 30
minuttes from eachother we can say they comprise of one scan, which I
call as Accession number2. To give an analogy with a supermarket: An
Accession number is an item with a barcode and “a scan” is when we are
inline; usually we only buy one item but we can buy more that one item.
For this we use the function dog_combine_accession_numbers()
on
rispacs_left
# Create new def for a scan data is still in long format.
a <- dog_combine_accession_numbers(rispacs_left, CPR = Patient_RK)
a
#> # A tibble: 4 x 10
#> Accession_number Accession_numbe… Study_date_time_PA… Patient_RK diff
#> <chr> <chr> <dttm> <chr> <dbl>
#> 1 REGH1 REGH1 2018-01-01 15:00:00 1 0
#> 2 REGH2 REGH1 2018-01-01 15:20:00 1 1
#> 3 REGH2 REGH1 2018-01-01 15:20:00 1 0
#> 4 REGH2 REGH1 2018-01-01 15:20:00 1 0
#> # … with 5 more variables: diff2 <dbl>, diff3 <dbl>, Billing_code <chr>,
#> # Procedure_priority <chr>, Source <chr>
# Spread the column whihc causes data to be in long format.
b <- dog_spread_SKS(a, Billing_code = Billing_code, Accession_number = Accession_number2, col_name = SKS)
b
#> # A tibble: 1 x 2
#> Accession_number2 SKS
#> <chr> <chr>
#> 1 REGH1 SKS1/SKS2/SKS3/SKS4
# Join a and b to transfer rest on columns to the b data frame. I.e. this should contain all the
# original rows but the data is now one row for every accession number.
one_row <- left_join(a, b, by = "Accession_number2") %>%
select(Accession_number2, SKS, Procedure_priority, Patient_RK,
Study_date_time_PACS, Source) %>%
distinct(Accession_number2, .keep_all = TRUE) %>%
mutate(Procedure_priority = if_else(is.na(Procedure_priority), "Priority missing", Procedure_priority),
Source = if_else(is.na(Source), "Scanner ID missing", Source))
one_row
#> # A tibble: 1 x 6
#> Accession_numbe… SKS Procedure_prior… Patient_RK Study_date_time_PA…
#> <chr> <chr> <chr> <chr> <dttm>
#> 1 REGH1 SKS1… A 1 2018-01-01 15:00:00
#> # … with 1 more variable: Source <chr>
rm(a, b)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.