Loading templates

Let's try to load the sequences from the following FASTA file containing germline cDNA from human immunoglobulin heavy chains :

fasta.file <- system.file("extdata", "IMGT_data", "templates", 
        "Homo_sapiens_IGH_functional_exon.fasta", package = "openPrimeR")

Please use an editor to get acquainted with the format of the file and then take a look at the template sequences using read_templates().

# Load the template sequences from 'fasta.file' and view the results
fasta.file <- system.file("extdata", "IMGT_data", "templates", 
        "Homo_sapiens_IGH_functional_exon.fasta", package = "openPrimeR")
# Load the template sequences from 'fasta.file'
fasta.file <- system.file("extdata", "IMGT_data", "templates", 
        "Homo_sapiens_IGH_functional_exon.fasta", package = "openPrimeR")
template.df <- read_templates(fasta.file)
asS3(template.df)

As you can see, the templates were successfully loaded; Please explore the loaded templates. You can change columns by using the top right and left arrows. Comparing the loaded templates with the original FASTA file, we can see that the metadata contained in the headers of the FASTA file were not annotated by read_templates().

Annotating templates with FASTA metadata

To store the metadata correctly, you can use the hdr.structure and delim arguments of read_templates(), where hdr.structure provides a character vector of identifiers for information contained in the header and delim is a single character that is used in the header of the FASTA file to delimit individual metadata. When supplying the keyword 'GROUP' in the hdr.structure vector, template groups are annotated with their groups, which is relevant for visualizing properties of the templates later. Now, it's your turn: Try to load the the template sequences with their annotated IGHV groups:

# Load the template sequences such that their groups are annotated correctly
fasta.file <- system.file("extdata", "IMGT_data", "templates", 
        "Homo_sapiens_IGH_functional_exon.fasta", package = "openPrimeR")
hdr.structure <- NULL # adjust according to the FASTA header
delim <- NULL # adjust according to the FASTA header
template.df <- read_templates(fasta.file, hdr.structure = hdr.structure, 
                               delim = delim)
# Load the template sequences such that their groups are annotated correctly
fasta.file <- system.file("extdata", "IMGT_data", "templates", 
        "Homo_sapiens_IGH_functional_exon.fasta", package = "openPrimeR")
hdr.structure <- c("ACCESSION", "GROUP", "SPECIES", "FUNCTION")
delim <- "|"
template.df <- read_templates(fasta.file, hdr.structure = hdr.structure, 
                                delim = "|")

Note that, if not specified otherwise via the id.column argument, the first entry of hdr.structure is used as the template identifier. We can verify that the template groups are available now via:

fasta.file <- system.file("extdata", "IMGT_data", "templates", 
        "Homo_sapiens_IGH_functional_exon.fasta", package = "openPrimeR")
hdr.structure <- c("ACCESSION", "GROUP", "SPECIES", "FUNCTION")
delim <- "|"
template.df <- read_templates(fasta.file, hdr.structure = hdr.structure, 
                                delim = "|")
print(template.df$Group)

Defining the primer binding regions

Next, we will move on to defining the binding region of the primers in the templates. Since we deal with immunological templates, we want to annotate the leader region for each sequence individually. For this purpose, we will use a FASTA file containing the leader sequences corresponding to the templates:

leader.fasta <- system.file("extdata", "IMGT_data", "templates", 
        "Homo_sapiens_IGH_functional_leader.fasta", package = "openPrimeR")

Again, you may want to take a look at the structure of the file before continuing. The entries in the file containing the individual binding regions should match those in the template FASTA file and the provided regions should be subsequences of the loaded sequences.

We can define the binding region in the templates using assign_binding_regions(); since we only care about binding of the forward primers, we will only adjust the forward binding region.

# Assign the forward binding regions from 'leader.fasta' to 'template.df':
leader.fasta <- system.file("extdata", "IMGT_data", "templates", 
        "Homo_sapiens_IGH_functional_leader.fasta", package = "openPrimeR")
# Assign the forward binding regions from 'leader.fasta' to 'template.df':
leader.fasta <- system.file("extdata", "IMGT_data", "templates", 
        "Homo_sapiens_IGH_functional_leader.fasta", package = "openPrimeR")
template.df <- assign_binding_regions(template.df, fw = leader.fasta, rev = NULL)

We can verify that the binding regions for forward primers were annotated successfully in the following way:

fasta.file <- system.file("extdata", "IMGT_data", "templates", 
        "Homo_sapiens_IGH_functional_exon.fasta", package = "openPrimeR")
hdr.structure <- c("ACCESSION", "GROUP", "SPECIES", "FUNCTION")
delim <- "|"
template.df <- read_templates(fasta.file, hdr.structure = hdr.structure, 
                                delim = "|")
leader.fasta <- system.file("extdata", "IMGT_data", "templates", 
        "Homo_sapiens_IGH_functional_leader.fasta", package = "openPrimeR")
template.df <- assign_binding_regions(template.df, fw = leader.fasta, rev = NULL)
utils::head(template.df$Allowed_fw)
cbind(utils::head(template.df$Allowed_Start_fw), utils::head(template.df$Allowed_End_fw))

where Allowed_fw contains the sequence of the leader region (the region before the exon) and Allowed_Start_fw and Allowed_End_fw provide the interval of allowed binding positions in the templates for forward primers.

Adjusting the template-specific binding regions

As a final step, we need to modify the binding region for forward primer such that the binding region includes the first position of the exon, which lies outside the leader. Hence, we will extend the current binding region by one position. For this purpose, adjust_binding_regions can be used. This function requires a modified binding range, relative to the previously annotated binding region. The relative positions are defined such that position 0 is the first position after the end of the annotated binding region. For example, assigning the interval [0, 30] would allow binding within the first 30 positions of the exon. Next please extend the binding regions annotation of template.df by one position via adjust_binding_regions():

# Extend the binding regions of 'template.df' by one position:
# Extend the binding regions of 'template.df' by one position:
template.df <- adjust_binding_regions(template.df, 
        c(-max(template.df$Allowed_End_fw), 0), NULL)
# Verify the new annotation:
head(cbind(template.df$Allowed_Start_fw, template.df$Allowed_End_fw))

Note that we have chosen the maximum relative position for the start of the binding region to not affect any change, while we have extended the end of the binding region to the 0-th position, where 0 indicates the start of the exon.



matdoering/openPrimeR documentation built on Feb. 11, 2024, 9:22 p.m.