find_similar: Count pairwise matches between columns of two data frames

View source: R/find_similar.R

find_similarR Documentation

Count pairwise matches between columns of two data frames

Description

This is used to identify columns in the two data frames that might be the same. This will only be meaningfull if the rows of the two data frames correspond to each other in some way i.e. they are sorted appropriately.

Usage

find_similar(df1, df2 = NULL)

Arguments

df1, df2

Two data frames with matching number of rows. If the argument df2 is missing then only columns within df1 will be compared.

Details

The returned table summarises results with a row for each pair of columns with matching classes. There are counts for: matches, both zero, one or both is NA, and differences. The proportion of non-zero matches is also given. This is the number of non-zero matches divided by the number of element pairs that don't contain an NA and are not both zero. Excluding matches which are both zeroes makes it easier to see genuinely similar columns in data that contains lots of zeroes or missing values.


jedwards24/edwards documentation built on Sept. 2, 2023, 8:16 a.m.