imgt_tcr_segment_prep: Prepare a data frame with TCR gene segment reference data...
In Close-your-eyes/igsc: Immunoglobolin sequences from scRNAseq

imgt_tcr_segment_prep

R Documentation

Prepare a data frame with TCR gene segment reference data from IMGT

Description

Immunoglobulin (IG) reference data from IMGT do not come in a handy format for processing in R. For T cell receptor (TCR) gene segments, this functions uses data from IMGT (fasta files and one manually prepared table) to create a data frame that can be used subsequently to align TCR sequences from scRNAseq (or other). All necessary files (human or mouse) are included in this package (Oct-2021) but may be downloaded manually from IMGT in case there are major updates. The files included can be retrieved with file.copy(list.files(system.file("extdata", "IMGT_ref", package = "igsc")), 'path to your folder'). These files demonstrate the required file names and formats in case you want to provide updated data from IMGT.

Usage

imgt_tcr_segment_prep(path, organism = "human", mc = F)

Arguments

`path`	path to a folder with all necessary files from IMGT; if not provided human or mouse data downloaded roughly Oct-2021 will be used
`organism`	if no path is provided data will be taken from this package, either human or mouse
`mc`	use multicore (mclapply from parallel package) for pairwise alignment of TCR segments

Details

To skip this function and immediately obtain its output, ready made data frames are available with imgt_ref <- readRDS(system.file("extdata", "IMGT_ref/human/hs.rds", package = "igsc")) or imgt_ref <- readRDS(system.file("extdata", "IMGT_ref/mouse/mm.rds", package = "igsc")).

Sources and how to prepare the data yourself. Data for the xlsx-files are from: http://www.imgt.org/IMGTrepertoire/Proteins/proteinDisplays.php?species=human&latin=Homo%20sapiens&group=TRAV, http://www.imgt.org/IMGTrepertoire/Proteins/proteinDisplays.php?species=human&latin=Homo%20sapiens&group=TRBV, http://www.imgt.org/IMGTrepertoire/Proteins/proteinDisplays.php?species=house%20mouse&latin=Mus%20musculus&group=TRAV, http://www.imgt.org/IMGTrepertoire/Proteins/proteinDisplays.php?species=house%20mouse&latin=Mus%20musculus&group=TRBV. Fasta-files are made from the data found here: http://www.imgt.org/vquest/refseqh.html. Leader sequences are from "L-PART1+L-PART2" artificially spliced sets, nucleotides (F+ORF+all P). Others are from "L-PART1+V-EXON" artificially spliced sets and Constant gene artificially spliced exons sets. Fasta-formatted sequences from there have to be copied manually and saved as .fasta files in a folder. This folder then becomes the path argument.

Value

a data frame

Examples

## Not run: 
imgt_df <- imgt_tcr_segment_prep()
openxlsx::write.xlsx(imgt_df, "imgt_ref_df.xlsx")
saveRDS(imgt_df, "imgt_ref_df.rds")

## End(Not run)

Close-your-eyes/igsc documentation built on Jan. 28, 2024, 10:28 p.m.