xlt2qq: xlt2qq

Description Usage Arguments Value 128-bit numbers and QQIDs Process Input formats Endianness Author(s) See Also Examples

View source: R/xlt2qq.R

Description

xlt2qq converts a vector of 128-bit numbers (hexlets) in hexadecimal notation, UUID format, IPv6 addresses, or MD5 hashes to QQIDs.

Usage

1
xlt2qq(xlt)

Arguments

xlt

(character) a vector of UUIDs, MD5 hashes, IPv6 addresses, or generally 32 digit hexadecimal numbers

Value

(character) a vector of QQIDs

128-bit numbers and QQIDs

UUIDs, IPv6 addresses and MD5 hashes are specially formatted 128-bit numbers, referred to as hexlets. Randomly chosen 128-bit numbers have a collision probability that is small enough to make them useful as (practically) unique identifiers in applications where a centralized management of IDs is not feasible or not desirable. However since they are long strings of numerals and letters, without overt semantic content, they are hard to distinguish by eye. This creates difficulties when developing, or debugging with structured data, or for the curation of ID tagged information. The qqid package provides tools to convert the leading 20-bits of 128-bit numbers to two "Q-words", and the remainder to a string of 18 Base64 encoded characters. The "Q-words" - the letter Q evokes the word "cue" i.e. a hint or mnemonic - define a unique and invertible mapping to 2^10 integers (0, 1023). Thus two Q-words can encode 20 bits, or 5 hexadecimal letters:

1
2
3
4
5
6
7
8
9
.
.          [0-9a-f]    [0-9a-f]    [0-9a-f]    [0-9a-f]    [0-9a-f]
.  hex:  |--0x[1]--| |--0x[2]--| |--0x[3]--| |--0x[4]--| |--0x[5]--|
.  bit:  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
.        |----------int[1]-----------| |----------int[2]-----------|
.  int:            (0, 1023)                      (0,1023)
.    Q:      (aims, ..., zone)     .       (aims, ..., zone)     .   Base64...
.
  

Process

Input strings are first converted to plain hexadecimal strings. A leading "0x" is deleted, the "-" and ":" separators of UUIDs and IPv6 addresses respectively are deleted, and all letters are converted to lower case. It is an error if the result is not exactly a 32 digit hexadecimal "[0-9a-f]\{32\}" string. The first five hexadecimal letters are interpreted as two ten bit numbers, and mapped as indices into the 1024-element Q-Word vector. The QQID has two Q-words as a head representing digits 1:5 of the input, and the 18 Base64 encoded digits 6:32 of the input as its tail. Since the mapping is fully reversible, QQIDs have exactly the same statistical properties as the input. For details on QQID format see is.QQID().

Input formats

A hexlet comprises 16 octets and is written in the hexadecimal numeral convention. A canonical MD5 hash is such a string of 32 hexadecimal characters. To improve readability, separators are inserted into UUIDs: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" where "x" is a hexadecimal letter. A canonical expanded IPv6 address has the form: "xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx" where "x" is a hexadecimal letter. Conventions exist to omit leading zeros in IPv6 addresses, such shortened addresses are treated as an error. It is up to the user to expand them correctly before processing. There are many representations of hexadecimal numbers, most commonly they have a prefix of "0x". xlt2qq() converts all letters to lowercase on input.

Endianness

The qqid package uses its own functions to convert to and from bits, and is not affected by big-endian vs. little-endian processor architecture or variant byte order. All numbers are interpreted to have their lowest order digits on the right.

Author(s)

(c) 2019 Boris Steipe, licensed under MIT (see file LICENSE in this package).

See Also

qq2uu() to convert a vector of QQIDs to UUIDs.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Convert three example UUIDs and one NA to the corresponding QQIDs
xlt2qq( c(xltIDexample(c(1, 3, 5)), NA) )

# A random hex-string is converted into a valid QQID
(x <- paste0(sample(c(0:9, letters[1:6]), 32, replace=TRUE), collapse=""))
(x <- xlt2qq(x))
is.QQID(x)                    # TRUE

# forward and back again
myID <- "0c460ed3-b015-adc2-ab4a-01e093364f1f"
myID == qq2uu(xlt2qq(myID))   # TRUE

# Confirm that the example hexlets are converted correctly
xlt2qq( xltIDexample(1:5) ) == QQIDexample(1:4)  # TRUE TRUE TRUE TRUE TRUE

qqid documentation built on May 2, 2019, 12:19 p.m.