knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(qfmtchk)
library(dplyr)

Disclaimer

Tests and experiments in coming up with format files are described.

SQL Program

The SQL export program can be used as one possible source of information. The following call produces a first table of columns and positions. The SQL program can be read internally using the following statement.

s_sql_export_prg <- system.file("extdata", "zws_gal_pgb.sql", package = "qfmtchk")
s_sql_routine <- "ExportGAL"
s_exp_pattern <- "PA_EXP.sFormat"
vec_prg <- qfmtchk:::read_sql_prg(ps_sql_prg_path    = s_sql_export_prg,
                                  ps_sql_exp_routine = s_sql_routine,
                                  ps_exp_pattern     = s_exp_pattern)
head(vec_prg)

The function extract_gal_fmt() can be used to directly extract the format information.

tbl_gal_fmt <- extract_gal_fmt(ps_sql_prg_path = s_sql_export_prg)
tbl_gal_fmt

Improvements

Based on the code some manual improvements have to be made. The following list shows all improvements.

  1. Row 3 contains the blood content of the animal
tbl_gal_fmt$ColName[3] <- "BlutanteilAnimal"
  1. Rows 4-6 are a duplicate of rows 7-9
tbl_gal_fmt <- tbl_gal_fmt %>% slice(-(4:6))

After the deletion of the rows, the start and the end positions must be re-computed. This can be done using the function compute_positions().

tbl_gal_fmt <- qfmtchk:::compute_positions(ptbl_fmt = tbl_gal_fmt)
  1. Row 6 corresponds to the blood content of the sire
tbl_gal_fmt$ColName[6] <- "BlutanteilVater"
  1. Row 9 corresponds to duration of pregancy
tbl_gal_fmt$ColName[9] <- "Traechtigkeitsdauer"

The above steps can also be done with the following two function calls

tbl_gal_fmt <- delete_row(ptbl_imp_fmt = tbl_gal_fmt, pvec_del_row_idx = c(4:6))
tbl_gal_fmt <- rename_variable(ptbl_imp_fmt = tbl_gal_fmt,
                            ptbl_col_rename = tibble::tibble(RowIndex = c(3,6,9),
                                                             NewName=c("BlutanteilAnimal", 
                                                                       "BlutanteilVater", 
                                                                       "Traechtigkeitsdauer")))

Check

knitr::kable(tbl_gal_fmt)

Output

Write an output-fmt file.

s_gal_fmt_file <- "gal_check.fmt"
if (file.exists(s_gal_fmt_file)) unlink(s_gal_fmt_file)
n_nr_fmt_rec <- nrow(tbl_gal_fmt)
# start with first line outside of loop
for (idx in 1:n_nr_fmt_rec){
  cat("# ", tbl_gal_fmt$ColName[idx], "\n", sep = "", file = s_gal_fmt_file, append = TRUE)
  cat("[", tbl_gal_fmt$StartPos[idx], "-", tbl_gal_fmt$EndPos[idx], "]\n", sep = "", file = s_gal_fmt_file, append = TRUE)
  cat("data_required=True \n\n", sep = "", file = s_gal_fmt_file, append = TRUE)
}

The same output can be produced by the following function call.

output_fmt(ps_fmt_outfile = s_gal_fmt_file, ptbl_fmt = tbl_gal_fmt)

The first few lines of the produced file is

cat(head(readLines(s_gal_fmt_file), n=10L), sep = "\n")

Conclusion

The above shown tests and experiments show that it is difficult to extract format information from the data export programs. Further experiments with extracting format information from analysis import programs are required.

cat(rmdhelp::get_latest_change(), "\n")


fbzwsqualitasag/qfmtchk documentation built on Dec. 20, 2021, 7:46 a.m.