ID-translation: Translate study identifiers from barcode to UUID and vice...

ID-translationR Documentation

Translate study identifiers from barcode to UUID and vice versa

Description

These functions allow the user to enter a character vector of identifiers and use the GDC API to translate from TCGA barcodes to Universally Unique Identifiers (UUID) and vice versa. These relationships are not one-to-one. Therefore, a data.frame is returned for all inputs. The UUID to TCGA barcode translation only applies to file and case UUIDs. Two-way UUID translation is available from 'file_id' to 'case_id' and vice versa. Please double check any results before using these features for analysis. Case / submitter identifiers are translated by default, see the from_type argument for details. All identifiers are converted to lower case.

Usage

UUIDtoBarcode(id_vector, from_type = c("case_id", "file_id", "aliquot_ids"))

UUIDtoUUID(id_vector, to_type = c("case_id", "file_id"))

barcodeToUUID(barcodes)

filenameToBarcode(filenames, slides = FALSE)

UUIDhistory(id, endpoint = .HISTORY_ENDPOINT)

Arguments

id_vector

character() A vector of UUIDs corresponding to either files or cases (default assumes case_ids)

from_type

character(1) Either case_id or file_id indicating the type of id_vector entered (default "case_id")

to_type

character(1) The desired UUID type to obtain, can either be "case_id" (default) or "file_id"

barcodes

character() A vector of TCGA barcodes

filenames

character() A vector of file names usually obtained from a GenomicDataCommons query

slides

logical(1L) DEPRECATED: Whether the provided file names correspond to slides typically with an .svs extension. Note The barcodes returned correspond 1:1 with the filename inputs. Always triple check the output against the Genomic Data Commons Data Portal by searching the file name and comparing associated "Entity ID" with the submitter_id given by the function.

id

character(1) A UUID whose history of versions is sought

endpoint

character(1) Generally a constant pertaining to the location of the history api endpoint. This argument rarely needs to change.

Details

Based on the file UUID supplied, the appropriate entity_id (TCGA barcode) is returned. In previous versions of the package, the 'end_point' parameter would require the user to specify what type of barcode needed. This is no longer supported as entity_id returns the appropriate one.

When providing slide file names, the function will only work if all the provided files are slide files with an .svs extension.

Value

Generally, a data.frame of identifier mappings

UUIDhistory: A data.frame containting a list of associated UUIDs for the given input along with file_change status, data_release versions, etc.

Author(s)

Sean Davis, M. Ramos

Examples

## Translate UUIDs >> TCGA Barcode

uuids <- c("b4bce3ff-7fdc-4849-880b-56f2b348ceac",
"5ca9fa79-53bc-4e91-82cd-5715038ee23e",
"b7c3e5ad-4ffc-4fc4-acbf-1dfcbd2e5382")

UUIDtoBarcode(uuids, from_type = "file_id")

UUIDtoBarcode("ae55b2d3-62a1-419e-9f9a-5ddfac356db4", from_type = "case_id")

UUIDtoBarcode("d85d8a17-8aea-49d3-8a03-8f13141c163b", "aliquot_ids")

## Translate file UUIDs >> case UUIDs

uuids <- c("b4bce3ff-7fdc-4849-880b-56f2b348ceac",
"5ca9fa79-53bc-4e91-82cd-5715038ee23e",
"b7c3e5ad-4ffc-4fc4-acbf-1dfcbd2e5382")

UUIDtoUUID(uuids)

## Translate TCGA Barcode >> UUIDs

fullBarcodes <- c("TCGA-B0-5117-11A-01D-1421-08",
"TCGA-B0-5094-11A-01D-1421-08",
"TCGA-E9-A295-10A-01D-A16D-09")

sample_ids <- TCGAbarcode(fullBarcodes, sample = TRUE)

barcodeToUUID(sample_ids)

participant_ids <- c("TCGA-CK-4948", "TCGA-D1-A17N",
"TCGA-4V-A9QX", "TCGA-4V-A9QM")

barcodeToUUID(participant_ids)

library(GenomicDataCommons)

### Query CNV data and get file names

cnv <- files() |>
    filter(
        ~ cases.project.project_id == "TCGA-COAD" &
        data_category == "Copy Number Variation" &
        data_type == "Copy Number Segment"
    ) |>
    results(size = 6)

filenameToBarcode(cnv$file_name)

### Query slides data and get file names

slides <- files() |>
    filter(
        ~ cases.project.project_id == "TCGA-BRCA" &
        cases.samples.sample_type == "Primary Tumor" &
        data_type == "Slide Image" &
        experimental_strategy == "Diagnostic Slide"
    ) |>
    results(size = 3)

filenameToBarcode(slides$file_name, slides = TRUE)

## Get the version history of a BAM file in TCGA-KIRC
UUIDhistory("0001801b-54b0-4551-8d7a-d66fb59429bf")


waldronlab/TCGAmisc documentation built on Dec. 19, 2024, 2:10 p.m.