knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
immunogenetr is a comprehensive toolkit for clinical HLA informatics, built on tidyverse principles. It uses the genotype list string (GL string, https://glstring.org/) as its core data structure for storing and computing HLA genotype data.
This vignette walks through the main workflows:
library(immunogenetr) library(dplyr)
Clinical HLA data is typically stored in a tabular format, with each allele in
its own column. immunogenetr includes the HLA_typing_1 dataset as an example:
# HLA_typing_1 contains typing for 10 individuals across all classical HLA loci. head(HLA_typing_1, 3)
The HLA_columns_to_GLstring() function converts these columns into a single
GL string per individual. When used inside mutate(), pass . as the first
argument to reference the working data frame:
HLA_typing_GL <- HLA_typing_1 %>% # Convert all typing columns (A1 through DPB1_2) into a GL string. mutate( GL_string = HLA_columns_to_GLstring(., HLA_typing_columns = A1:DPB1_2), .after = patient ) %>% # Keep only patient ID and the new GL string column. select(patient, GL_string) # View the GL strings. (HLA_typing_GL)
Each GL string encodes the full genotype: alleles within a gene copy are
separated by / (ambiguity), gene copies by +, and loci by ^.
To go the other direction, GLstring_genes() splits a GL string back into
separate columns by locus:
# Take the first patient's GL string and split it into locus columns. # Note: GLstring_genes and GLstring_genes_expanded use pivot_longer on all # columns, so only pass the GL string column (no other data types). single_patient <- HLA_typing_GL[1, "GL_string", drop = FALSE] GLstring_genes(single_patient, "GL_string")
For a fully expanded view with one allele per row, use
GLstring_genes_expanded():
GLstring_genes_expanded(single_patient, "GL_string")
The mismatch functions are the core of immunogenetr. They all take a recipient GL string, a donor GL string, one or more loci, and a direction.
Let's set up a recipient/donor pair:
# Patient 7 is the recipient, patient 9 is the donor. recip_gl <- HLA_typing_GL %>% filter(patient == 7) %>% pull(GL_string) donor_gl <- HLA_typing_GL %>% filter(patient == 9) %>% pull(GL_string)
HLA_mismatch_logical)# Check if there is an HLA-A mismatch in the graft-vs-host direction. HLA_mismatch_logical(recip_gl, donor_gl, "HLA-A", direction = "GvH") # Check host-vs-graft direction. HLA_mismatch_logical(recip_gl, donor_gl, "HLA-A", direction = "HvG")
HLA_mismatch_number)# Count bidirectional mismatches across several loci at once. HLA_mismatch_number( recip_gl, donor_gl, c("HLA-A", "HLA-B", "HLA-C", "HLA-DRB1"), direction = "bidirectional" )
HLA_mismatched_alleles)# Identify the specific mismatched alleles in the HvG direction. HLA_mismatched_alleles(recip_gl, donor_gl, "HLA-A", direction = "HvG")
HLA_match_number)# Count the number of matches (complement of mismatches). HLA_match_number( recip_gl, donor_gl, c("HLA-A", "HLA-B", "HLA-C", "HLA-DRB1"), direction = "bidirectional" )
The HLA_match_summary_HCT() function provides standard match grades used in
hematopoietic cell transplantation:
# X-of-8 matching (A, B, C, DRB1 bidirectional). HLA_match_summary_HCT(recip_gl, donor_gl, direction = "bidirectional", match_grade = "Xof8" ) # X-of-10 matching (adds DQB1). HLA_match_summary_HCT(recip_gl, donor_gl, direction = "bidirectional", match_grade = "Xof10" )
A common workflow is comparing one recipient against multiple potential donors:
# Patient 3 is the recipient; compare against all 10 donors. recipient <- HLA_typing_GL %>% filter(patient == 3) %>% select(GL_string) %>% rename(GL_string_recip = GL_string) donors <- HLA_typing_GL %>% rename(GL_string_donor = GL_string, donor = patient) %>% # Cross-join to pair recipient with each donor. cross_join(recipient) %>% # Calculate 8/8 match grade for each pair. mutate( match_8of8 = HLA_match_summary_HCT( GL_string_recip, GL_string_donor, direction = "bidirectional", match_grade = "Xof8" ), .after = donor ) %>% # Sort best matches first. arrange(desc(match_8of8)) donors %>% select(donor, match_8of8)
HLA_truncate() reduces allele resolution to a specified number of fields:
# Truncate a four-field allele to two fields. HLA_truncate("HLA-A*02:01:01:01", fields = 2) # Works on full GL strings too. HLA_truncate("HLA-A*02:01:01:01+HLA-A*03:01:01:02^HLA-B*07:02:01:01+HLA-B*44:02:01:01", fields = 2 )
HLA_prefix_remove() and HLA_prefix_add() manage the HLA- and locus
prefixes:
# Remove all prefixes to get just the allele fields. HLA_prefix_remove("HLA-A*02:01") # Keep the locus designation but remove "HLA-". HLA_prefix_remove("HLA-A*02:01", keep_locus = TRUE) # Add the full prefix back. HLA_prefix_add("02:01", "HLA-A*") # "HLA-" is added by default. HLA_prefix_add("A*02:01")
GLstring_regex() creates regex patterns that accurately search within GL
strings, preventing partial matches across field boundaries:
gl <- "HLA-A*02:01:01+HLA-A*68:01^HLA-B*07:01+HLA-B*15:01" # A two-field search correctly matches the three-field allele. pattern <- GLstring_regex("HLA-A*02:01") stringr::str_detect(gl, pattern) # But won't falsely match a longer allele number. stringr::str_detect("HLA-A*02:149:01", GLstring_regex("HLA-A*02:14"))
When working in the tidyverse, column names with dashes and asterisks are
inconvenient. HLA_column_repair() converts between WHO-standard (HLA-A*)
and tidyverse-friendly (HLA_A) formats:
# GLstring_genes returns tidyverse-friendly names by default. repaired <- GLstring_genes(single_patient, "GL_string") names(repaired) # Convert back to WHO format with asterisks. who_names <- HLA_column_repair(repaired, format = "WHO", asterisk = TRUE) names(who_names)
The read_HML() function extracts GL strings from HML (HLA Markup Language)
files, which are a standard format for reporting HLA typing results from
next-generation sequencing:
# immunogenetr ships with two example HML files. hml_path <- system.file("extdata", "HML_1.hml", package = "immunogenetr") hml_result <- read_HML(hml_path) hml_result
This library is intended for research use. Any application making use of this package in a clinical setting will need to be independently validated according to local regulations.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.