repLoad: Load immune repertoire files into the R workspace

Description Usage Arguments Details Value See Also Examples

View source: R/io.R

Description

The repLoad function loads repertoire files into R workspace in the immunarch format where you can immediately use them for the analysis. repLoad automatically detects the right format for your files, so all you need is simply provide the path to your files.

See "Details" for more information on supported formats. See "Examples" for diving right into it.

Usage

1
repLoad(.path, .format = NA, .mode = "paired", .coding = TRUE)

Arguments

.path

A character string specifying the path to the input data. Input data can be one of the following:

- a single repertoire file. In this case repLoad returns an R data.frame;

- a vector of paths to repertoire files. Same as in the case with no metadata file presented in the next section below;

- a path to the folder with repertoire files and, if available, metadata file "metadata.txt". If the metadata file if presented, then the repLoad returns a list with two elements "data" and "meta". "data" is an another list with repertoire R data.frames. "meta" is a data frame with the metadata. If the metadata file "metadata.txt" is not presented, then the repLoad creates a dummy metadata file with sample names and returns a list with two elements "data" and "meta". If input data has multiple chains or cell types stored in the same file (for example, like in 10xGenomics repertoire files), such repertoire files will be splitted to different R data frames with only one type of chain and cell presented. The metadata file will have additional columns specifying cell and chain types for different samples.

.format

A character string specifying what format to use. Do NOT use it. See "Details" for more information on supported formats.

Leave NA (which is default) if you want 'immunarch' to detect formats automatically.

.mode

Either "single" for single chain data or "paired" for paired chain data.

Currently "single" works for every format, and "paired" works only for 10X Genomics data.

By default, 10X Genomics data will be loaded as paired chain data, and other files will be loaded as single chain data.

.coding

A logical value. Pass TRUE to get coding-only clonotypes (by defaul). Pass FALSE to get all clonotypes.

Details

The metadata has to be a tab delimited file with first column named "Sample". It can have any number of additional columns with arbitrary names. The first column should contain base names of files without extensions in your folder. Example:

Sample Sex Age Status
immunoseq_1 M 1 C
immunoseq_2 M 2 C
immunoseq_3 FALSE 3 A

repLoad has the ".format" argument that sets the format for input repertoire files. Immunarch detects the file format automatically, and the argument is left only for the compatability purposes. It will be soon removed. Do not pass it or your code will stop working!

Currently, Immunarch support the following formats:

- "immunoseq" - ImmunoSEQ of any version. http://www.adaptivebiotech.com/immunoseq

- "mitcr" - MiTCR. https://github.com/milaboratory/mitcr

- "mixcr" - MiXCR (the "all" files) of any version. https://github.com/milaboratory/mixcr

- "migec" - MiGEC. http://migec.readthedocs.io/en/latest/

- "migmap" - For parsing IgBLAST results postprocessed with MigMap. https://github.com/mikessh/migmap

- "tcr" - tcR, our previous package. https://imminfo.github.io/tcr/

- "vdjtools" - VDJtools of any version. http://vdjtools-doc.readthedocs.io/en/latest/

- "imgt" - IMGT HighV-QUEST. http://www.imgt.org/HighV-QUEST/

- "airr" - adaptive immune receptor repertoire (AIRR) data format. http://docs.airr-community.org/en/latest/datarep/overview.html

- "10x" - 10XGenomics clonotype annotations tables. https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/output/annotation

- "archer" - ArcherDX clonotype tables. https://archerdx.com/

Value

A list with two named elements:

- "data" is a list of input samples;

- "meta" is a data frame with sample metadata.

See Also

immunr_data_format for immunarch data format; repSave for file saving; repOverlap, geneUsage and repDiversity for starting with immune repertoires basic statistics.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# To load the data from a single file (note that you don't need to specify the data format):
file_path <- paste0(system.file(package = "immunarch"), "/extdata/io/Sample1.tsv.gz")
immdata <- repLoad(file_path)

# Suppose you have a following structure in your folder:
# >_ ls
# immunoseq1.txt
# immunoseq2.txt
# immunoseq3.txt
# metadata.txt

# To load the whole folder with every file in it type:
file_path <- paste0(system.file(package = "immunarch"), "/extdata/io/")
immdata <- repLoad(file_path)
print(names(immdata))

# We recommend creating a metadata file named exactly "metadata.txt" in the folder.

# In that case, when you load your data you will see:
# > immdata <- repLoad("path/to/your/folder/")
# > names(immdata)
# [1] "data" "meta"

# If you do not have "metadata.txt", you will see the same output,
# but your metadata will be almost empty:
# > immdata <- repLoad("path/to/your/folder/")
# > names(immdata)
# [1] "data" "meta"

abrown435/immunarch-test documentation built on July 29, 2020, 12:04 a.m.