SeqId | R Documentation |
The SeqId
is the cornerstone used to uniquely identify
SomaLogic analytes.
SeqIds
follow the format <Pool>-<Clone>_<Version>
, for example
"1234-56_7"
can be represented as:
Pool | Clone | Version |
1234 | 56 | 7
|
See Details below for the definition of each sub-unit.
The <Pool>-<Clone>
combination is sufficient to uniquely identify a
specific analyte and therefore versions are no longer provided (though
they may be present in legacy ADATs).
The tools below enable users to extract, test, identify, compare,
and manipulate SeqIds
across assay runs and/or versions.
getSeqId(x, trim.version = FALSE)
regexSeqId()
locateSeqId(x, trailing = TRUE)
seqid2apt(x)
apt2seqid(x)
is.apt(x)
is.SeqId(x)
matchSeqIds(x, y, order.by.x = TRUE)
getSeqIdMatches(x, y, show = FALSE)
x |
Character. A vector of strings, usually analyte/feature column
names, |
trim.version |
Logical. Whether to remove the version number, i.e. "1234-56_7" -> "1234-56". Primarily for legacy ADATs. |
trailing |
Logical. Should the regular expression explicitly specify
trailing |
y |
Character. A second vector of |
order.by.x |
Logical. Order the returned character string by
the |
show |
Logical. Return the data frame visibly? |
Pool: | ties back to the original well during SELEX |
Clone: | ties to the specific sequence within a pool |
Version: | refers to custom modifications (optional/defunct) |
AptName
a SeqId
combined with a string, usually a GeneId
- or
seq.
-prefix, for convenient, human-readable
manipulation from within R
.
getSeqId()
: a character vector of SeqIds
captured from a string.
regexSeqId()
: a regular expression (regex
) string
pre-defined to match SomaLogic the SeqId
pattern.
locateSeqId()
: a data frame containing the start
and stop
integer positions for SeqId
matches at each value of x
.
seqid2apt()
: a character vector with the seq.*
prefix, i.e.
the inverse of getSeqId()
.
apt2seqid()
: a character vector of SeqIds
. is.SeqId()
will
return TRUE
for all elements.
is.apt()
, is.SeqId()
: Logical. TRUE
or FALSE
.
matchSeqIds()
: a character string corresponding to values
in y
of the intersect of x
and y
. If no matches are
found, character(0)
.
getSeqIdMatches()
: a n x 2
data frame, where n
is the
length of the intersect of the matching SeqIds
.
The data frame is named by the passed arguments, x
and y
.
getSeqId()
: extracts/captures the the SeqId
match from an analyte column identifier,
i.e. column name of an ADAT loaded with read_adat()
. Assumes the
SeqId
pattern occurs at the end of the string, which for
the vast majority of cases will be true. For edge cases, see the
trailing
argument to locateSeqId()
.
regexSeqId()
: generates a pre-formatted regular expression for
matching of SeqIds
. Note the trailing match, which is most
commonly required, but locateSeqId()
offers
an alternative to mach anywhere in a string.
Used internally in many utility functions
locateSeqId()
: generates a data frame of the positional SeqId
matches. Specifically
designed to facilitate SeqId
extraction via substr()
.
Similar to stringr::str_locate()
.
seqid2apt()
: converts a SeqId
into anonymous-AptName format, i.e.
1234-56
-> seq.1234.56
. Version numbers (1234-56_ver
)
are always trimmed when present.
apt2seqid()
: converts an anonymous-AptName into SeqId
format, i.e.
seq.1234.56
-> 1234-56
. Version numbers (seq.1234.56.ver
)
are always trimmed when present.
is.apt()
: regular expression match to determine if a string contains
a SeqId
, and thus is probably an AptName
format string. Both
legacy EntrezGeneSymbol-SeqId
combinations or newer
so-called "anonymous-AptNames"
formats (seq.1234.45
) are matched.
is.SeqId()
: tests for SeqId
format, i.e. values returned from getSeqId()
will always return TRUE
.
matchSeqIds()
: matches two character vectors on the basis of their
intersecting SeqIds
. Note that elements in y
not
containing a SeqId
regular expression are silently dropped.
getSeqIdMatches()
: matches two character vectors on the basis of their intersecting SeqIds
only (irrespective of the GeneID
-prefix). This produces a two-column
data frame which then can be used as to map between the two sets.
The final order of the matches/rows is by the input
corresponding to the first argument (x
).
By default the data frame is invisibly returned to
avoid dumping excess output to the console (see the show =
argument.)
Stu Field
intersect()
x <- c("ABDC.3948.48.2", "3948.88",
"3948.48.2", "3948-48_2", "3948.48.2",
"3948-48_2", "3948-88",
"My.Favorite.Apt.3948.88.9")
tibble::tibble(orig = x,
SeqId = getSeqId(x),
SeqId_trim = getSeqId(x, TRUE),
AptName = seqid2apt(SeqId))
# Logical Matching
is.apt("AGR2.4959.2") # TRUE
is.apt("seq.4959.2") # TRUE
is.apt("4959-2") # TRUE
is.apt("AGR2") # FALSE
# SeqId Matching
x <- c("seq.4554.56", "seq.3714.49", "PlateId")
y <- c("Group", "3714-49", "Assay", "4554-56")
matchSeqIds(x, y)
matchSeqIds(x, y, order.by.x = FALSE)
# vector of features
feats <- getAnalytes(example_data)
match_df <- getSeqIdMatches(feats[1:100], feats[90:500]) # 11 overlapping
match_df
a <- utils::head(feats, 15)
b <- withr::with_seed(99, sample(getSeqId(a))) # => SeqId & shuffle
(getSeqIdMatches(a, b)) # sorted by first vector "a"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.