algebra | R Documentation |
Various “relational algebra”-like operations.
relation_projection(x, margin = NULL) relation_selection(x, subset) relation_cartesian(x, y, ...) relation_complement(x, y) relation_intersection(x, y, ...) relation_union(x, y, ...) relation_symdiff(x, y) relation_division(x, y) relation_remainder(x, y) relation_join(x, y, ...) relation_semijoin(x, y, ...) relation_antijoin(x, y, ...)
x, y |
Relation objects. |
margin |
Either a character vector of domain names, or an integer vector of domain indices. |
subset |
Expression resulting in a logical vector of length equal to the number of tuples in the graph. |
... |
Relation objects for |
These functions provide functionality similar to the corresponding operations defined in relational algebra theory as introduced by Codd (1970). Note, however, that domains in database relations, unlike the concept of relations we use here, are unordered. In fact, a database relation (“table”) is defined as a set of elements called “tuples”, where the “tuple” components are named, but unordered. So in fact, a “tuple” in this sense is a set of mappings from the attribute names into the union of the attribute domains.
The projection of a relation on a specified margin (i.e., a vector of domain names or indices) is the relation obtained when all tuples are restricted to this margin. As a consequence, duplicate tuples are removed.
The selection of a relation is the relation obtained by taking a subset of the relation graph, defined by some logical expression.
The Cartesian product of two relations is obtained by basically building the Cartesian product of all graph elements, but combining the resulting pairs into single tuples.
The union of two relations simply combines the graph elements of both relations; the complement of two relations R and S removes the tuples of S from R.
The intersection (symmetric difference) of two relations is the relation with all tuples they have (do not have) in common.
The division of relation R by relation S is the reversed Cartesian product. The result is a relation with the domain unique to R and containing the maximum number of tuples which, multiplied by S, are contained in R. The remainder of this operation is the complement of R and the division of R by S. Note that for both operations, the domain of S must be contained in the domain of R.
The (natural) join of two relations is their Cartesian product,
restricted to the subset where the elements of the common attributes
do match. The left/right/full outer join of two relations
R and S is
the union of R/S/R and S,
and the inner join of R and S.
The implementation uses merge()
, and so the
left/right/full outer joins are obtained by setting
all.x
/all.y
/all
to TRUE
in
relation_join()
.
The domains to be matched are specified using by
.
The left (right) semijoin of two relations R and S is the join of these, projected to the attributes of R (S). Thus, it yields all tuples of R (S) participating in the join of R and S.
The left (right) antijoin of two relations R and S is the complement of R (S) and the join of both, projected to the attributes of R (S). Thus, it yields all tuples of R (S) not participating in the join of R and S.
The operators %><%
, %=><%
, %><=%
,
%=><=%
, %|><%
, %><|%
, %|><|%
,
%|>%
, %<|%
, and %U%
can be used for the
Cartesian product, left outer join, right outer join, full outer join,
left semi-join, right semi-join, join, left antijoin, right antijoin,
and union, respectively.
E. F. Codd (1970), A relational model of data for large shared data banks. Communications of the ACM, 13/6, 377–387. doi: 10.1145/362384.362685.
relation()
## projection Person <- data.frame(Name = c("Harry", "Sally", "George", "Helena", "Peter"), Age = c(34, 28, 29, 54, 34), Weight = c(80, 64, 70, 54, 80), stringsAsFactors = FALSE) Person <- as.relation(Person) relation_table(Person) relation_table(relation_projection(Person, c("Age", "Weight"))) ## selection relation_table(R1 <- relation_selection(Person, Age < 29)) relation_table(R2 <- relation_selection(Person, Age >= 34)) relation_table(R3 <- relation_selection(Person, Age == Weight)) ## union relation_table(R1 %U% R2) ## works only for the same domains: relation_table(R2 | R3) ## complement relation_table(Person - R2) ## intersection relation_table(relation_intersection(R2, R3)) ## works only for the same domains: relation_table(R2 & R3) ## symmetric difference relation_table(relation_symdiff(R2, R3)) ## Cartesian product Employee <- data.frame(Name = c("Harry", "Sally", "George", "Harriet", "John"), EmpId = c(3415, 2241, 3401, 2202, 3999), DeptName = c("Finance", "Sales", "Finance", "Sales", "N.N."), stringsAsFactors = FALSE) Employee <- as.relation(Employee) relation_table(Employee) Dept <- data.frame(DeptName = c("Finance", "Sales", "Production"), Manager = c("George", "Harriet", "Charles"), stringsAsFactors = FALSE) Dept <- as.relation(Dept) relation_table(Dept) relation_table(Employee %><% Dept) ## Natural join relation_table(Employee %|><|% Dept) ## left (outer) join relation_table(Employee %=><% Dept) ## right (outer) join relation_table(Employee %><=% Dept) ## full outer join relation_table(Employee %=><=% Dept) ## antijoin relation_table(Employee %|>% Dept) relation_table(Employee %<|% Dept) ## semijoin relation_table(Employee %|><% Dept) relation_table(Employee %><|% Dept) ## division Completed <- data.frame(Student = c("Fred", "Fred", "Fred", "Eugene", "Eugene", "Sara", "Sara"), Task = c("Database1", "Database2", "Compiler1", "Database1", "Compiler1", "Database1", "Database2"), stringsAsFactors = FALSE) Completed <- as.relation(Completed) relation_table(Completed) DBProject <- data.frame(Task = c("Database1", "Database2"), stringsAsFactors = FALSE) DBProject <- as.relation(DBProject) relation_table(DBProject) relation_table(Completed %/% DBProject) ## division remainder relation_table(Completed %% DBProject)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.