imgt_tcr_segment_prep: Prepare a data frame with TCR gene segment reference data...

View source: R/imgt_tcr_segment_prep.R

imgt_tcr_segment_prepR Documentation

Prepare a data frame with TCR gene segment reference data from IMGT

Description

Immunoglobulin (IG) reference data from IMGT do not come in a handy format for processing in R. For T cell receptor (TCR) gene segments, this functions uses data from IMGT (fasta files and one manually prepared table) to create a data frame that can be used subsequently to align TCR sequences from scRNAseq (or other). All necessary files (human or mouse) are included in this package (Oct-2021) but may be downloaded manually from IMGT in case there are major updates. The files included can be retrieved with file.copy(list.files(system.file("extdata", "IMGT_ref", package = "igsc")), 'path to your folder'). These files demonstrate the required file names and formats in case you want to provide updated data from IMGT.

Usage

imgt_tcr_segment_prep(path, organism = "human", mc = F)

Arguments

path

path to a folder with all necessary files from IMGT; if not provided human or mouse data downloaded roughly Oct-2021 will be used

organism

if no path is provided data will be taken from this package, either human or mouse

mc

use multicore (mclapply from parallel package) for pairwise alignment of TCR segments

Details

To skip this function and immediately obtain its output, ready made data frames are available with imgt_ref <- readRDS(system.file("extdata", "IMGT_ref/human/hs.rds", package = "igsc")) or imgt_ref <- readRDS(system.file("extdata", "IMGT_ref/mouse/mm.rds", package = "igsc")).

Sources and how to prepare the data yourself. Data for the xlsx-files are from: http://www.imgt.org/IMGTrepertoire/Proteins/proteinDisplays.php?species=human&latin=Homo%20sapiens&group=TRAV, http://www.imgt.org/IMGTrepertoire/Proteins/proteinDisplays.php?species=human&latin=Homo%20sapiens&group=TRBV, http://www.imgt.org/IMGTrepertoire/Proteins/proteinDisplays.php?species=house%20mouse&latin=Mus%20musculus&group=TRAV, http://www.imgt.org/IMGTrepertoire/Proteins/proteinDisplays.php?species=house%20mouse&latin=Mus%20musculus&group=TRBV. Fasta-files are made from the data found here: http://www.imgt.org/vquest/refseqh.html. Leader sequences are from "L-PART1+L-PART2" artificially spliced sets, nucleotides (F+ORF+all P). Others are from "L-PART1+V-EXON" artificially spliced sets and Constant gene artificially spliced exons sets. Fasta-formatted sequences from there have to be copied manually and saved as .fasta files in a folder. This folder then becomes the path argument.

Value

a data frame

Examples

## Not run: 
imgt_df <- imgt_tcr_segment_prep()
openxlsx::write.xlsx(imgt_df, "imgt_ref_df.xlsx")
saveRDS(imgt_df, "imgt_ref_df.rds")

## End(Not run)

Close-your-eyes/igsc documentation built on Jan. 28, 2024, 10:28 p.m.