Current version: 0.0.8
The goal of DAMATO
is to automatically assess the MS spreadsheets
recording in situ data by:
The present built-in spreadsheet format only supports the lab of Prof. Yunmei Li in Nanjing Normal University. If you want to customize the spreadsheet format, you can consult Prof. Li for more details via bishun1994@foxmail.com and cc to liyunmei@njnu.edu.cn.
You can install the released version of DAMATO from My Github with:
install.packages("devtools") # install devtools require
devtools::install_github('bishun945/DAMATO')
run data(DAMATO::dataset_format_1)
to see more details about one of
supported formats.
If you are getting start to data collection, use generage function like this:
library(DAMATO)
Generate_ref_spread()
Then a MS spreadsheet file will be created in fold spreads
in your
working directory. Just add your records to this file. Once you finish
your collection, use Check functions
to make sure the format is okay.
fn = file.choose() # select your spreadsheet file
res_1 = Check_sheet_name(fn)
res_2 = Check_sample_id(res_1)
res_3 = Check_base_info(res_2)
res_4 = Check_geo_location(res_3)
The above Check functions
are merged to one main function which could
be more convenient for users. But it is okay if you want to check a
specific item, such as sheet name
or sample id
.
It is better to use %>%
for these successional functions. res =
Check_sheet_name(fn) %>% Check_sample_id()
res = Check_format(fn)
When all format stuff are passed, the more important thing is to check
the quality of data set you collect. Use Check_param_quality()
and
Check_spectra_quality()
to inspect (interactively to some extent) the
quality and flag them by qualtiy_flag()
.
res_5 = Check_param_quality(res_4) # coming soon
res_6 = Check_spectra_quality(res_5) # coming soon
res_7 = Quatliy_flag(res_6) # coming soon
The first thing for DAMATO is creating a formatted spread if you have to type your paper records in PC. Try run the following function:
Generate_ref_spread()
Then you will get a supper cool, formatted, and version controlled
spread in your current working directory named spread
.
Now let us type our valuable data in it.
Once you have a spread need to be check, please use Check_*
functions
to check their format. Basically, if you have typed your data in the
generated spread by running Generate_ref_spread
, the format would be
pretty well.
fn = "/foo/bar/foobar.xlsx"
tmp <- Check_sheet_name(fn, excel_option = "readxl")
The result is shown as follows:
These warning messages mean the colnames in [base] sheet has some carriage returns or newlines. But this does not matter for the follow-up process. Thus, the number of warnings is 1.
fn = "/foo/bar/foobar.xlsx"
tmp <- Check_sheet_name(fn, excel_option = "readxl") %>%
Check_sample_id()
These messages mean that the first columns of head have been replaced by
SampleID
for matching the [base] sheet. However, in the [apc] sheet, several samples (masked) have been found in error SampleID format. Check the raw excel file and try to re-submit! The final message ofSample ID status:
presentsError
which means there is unable to do the follow-up process.
fn = "/foo/bar/foobar_revised.xlsx"
tmp <- Check_sheet_name(fn, excel_option = "readxl") %>%
Check_sample_id()
Once you have revised those SampleID with inappropriate format, the check function will return the
Pass
information which means you can do next step now.
fn = "/foo/bar/foobar_revised.xlsx"
tmp <- Check_sheet_name(fn, excel_option = "readxl") %>%
Check_sample_id() %>%
Check_base_info()
> Okay, in this check
result, you gonna find some interesting features that if colnames of the
[base] sheet to be checked cannot match the designed format, i.e.,
dataset_format = 1
, then Check_base_info
would return warnings and
errors indicating which colnames could not be found or have possible
matchings (by means of the fuzzy matching).
It is recommended that you could manually inspect your spread and check
the colnames in the [base] sheet following the guides/suggestions from
Try to find the matched pattern (just guess) ...
. If there are some
required colnames but do not exsit in the current spread, just
add/insert an empty column with those colnames.
And don’t remember those are just guess ;)
fn = "/foo/bar/foobar_revised_2.xlsx"
tmp <- Check_sheet_name(fn, excel_option = "readxl") %>%
Check_sample_id() %>%
Check_base_info()
Boom! After your second revision, the format in this spread has been passed now.
Check_*
functionsOnce you finish all format check functions, i.e., Check_sheet_name
,
Check_sample_id
, and Check_base_info
, the final result of those
functions will return a list that includes many elements that could be
further used by upcoming functions or features as well as your own
decision.
The Status_*
in that list all present Pass
which means the list has
been checked by the three corresponding functions.
The data.frame of this list, namely dt
, is restored data from the
spread in the filename fn
but with formatted data.frame.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.