The query is constructed on 3 different levels (nested queries):
Level 1: construct simple counts for co-occurence of values in both columns
Level 2: aggregate coocurences, drop the values
Level 3: calculate the statistics of co-occurences
1 2 | sql.entity.relation.generic(query1, query1.colname, query1.countname = "cnt",
query2, query2.colname, query2.countname = "cnt")
|
The purpose of Level 1 query is to perform simple counting of distinct values in both columns resulting in the following table:
eone the value from combined list of non-NULL values from column1 and column2
aone.cnt number of occurences of the eone
in column1 or NULL
in case of no occurences
atwo.cnt number of occurences of the eone
in column2 or NULL
in case of no occurences
The purpose of Level 2 query is to perform simple processing of Level 1 query and dropping actual dictionary values, which results in the following table (described only in part concerning column1 - the part concerning column2 is symmetrical) consisting of 8 columns:
aone_cnt number of occurences of the value in column1, 0 or more
aone_cnt_ind occurence indicator of the value in column1, 0 or 1
aone_cnt_nullable number of occurences of the value in column1 as provided by Level 1 query (exact copy of aone.cnt
)
aone_cnt_both_present serving the purpose of a correlator of the counts from both columns:
it is NULL
when atwo.cnt is NULL, otherwise it is aone.cnt
The purpose of Level 3 query is to aggregate results of Level 2 query into single row of the following structure:
distvals_cnt
: number of distinct non-NULL values in column1 and column2 (combined)
f_distentities_frac
: fraction of (combined) distinct values in column1 (first)
s_distentities_frac
: fraction of (combined) distinct values in column2 (second)
distentities_match_frac
: fraction of matching values (1+ – 1+ relationships)
f_mincnt
, f_avgcnt
, f_maxcnt
, f_sdevcnt
:
min
, mean
, max
and sd
of distinct value counts in column1 (first)
s_mincnt
, s_avgcnt
, s_maxcnt
, f_sdevcnt
:
min
, mean
, max
and sd
of distinct value counts in column2 (second)
f_sumcnt
: number of non-NULL rows in column1
s_sumcnt
: number of non-NULL rows in column2
f_avgcnt_when_present
, f_sdevcnt_when_present
:
mean
and sd
of distinct value counts in column1 (first) when they are present
s_avgcnt_when_present
, s_sdevcnt_when_present
:
mean
and sd
of distinct value counts in column2 (second) when they are present
f_mincnt_both_present
, f_avgcnt_both_present
, f_maxcnt_both_present
, f_sdevcnt_both_present
:
min
, mean
, max
and sd
of of distinct value counts in column1 (first) when there are matching rows in column2
s_mincnt_both_present
, s_avgcnt_both_present
, s_maxcnt_both_present
, s_sdevcnt_both_present
:
min
, mean
, max
and sd
of of distinct value counts in column2 (second) when there are matching rows in column1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.