validateFile: validateFile

Description Usage Arguments Value Examples

View source: R/DupChecker.R

Description

The function calculate MD5 fingerprint for each file in table and then check to see if any two files have same MD5 fingerprint. The files with same fingerprint will be treated as duplication. The function will return a table contains all duplicated files and datasets.

Usage

1
validateFile(fileTable, saveMd5File = TRUE)

Arguments

fileTable

a table with column name "dataset" and "file", here column "file" should contain full name of file.

saveMd5File

if calculated MD5 fingerprint should be save to local file

Value

a list contains two tables. One is the table contains three columns: "dataset", "file" and "md5". Another one is the duplication table whose row indicates MD5 fingerprint and whose column indicates dataset, table cell indicates the corresponding filename.

Examples

1
2
3
4
5
6
7
8
9
rootDir<-paste0(dirname(tempdir()), "/DupChecker")
datafile<-buildFileTable(rootDir=rootDir)
if(nrow(datafile) > 0){
  result<-validateFile(datafile)
  if(result$hasdup){
    duptable<-result$duptable
    write.csv(duptable, file="duptable.csv")
  }
}

Example output



DupChecker documentation built on April 28, 2020, 6:46 p.m.