state_transition: Parse reduced input text array and write outputs depending on...
In langfob/docaids: Package For Helping Document Local Variables In R

Description Usage Arguments Value Assumptions Function declaration in output Parameter declarations in output States Used In The Parsing Legal Line Starts In The Parsing

This function takes an input array of strings that were lines in the input file and parses it into chunks corresponding to functions and variables. As it encounters each chunk, it writes out a properly formatted roxygen subsection block that can be pasted into the function's documentation. It also writes a bit of header text for the section. Each variable is written as a subsection.

1	state_transition(all_data, doc_line_numbers)

`all_data`	Vector of character strings, each entry containing the text from the corresponding line in the original input file
`doc_line_numbers`	Vector of integer line numbers serving as an index into the reduced set of lines of interest in the original data. For example, lines 1 and 2 of the original input may be irrelevant, while line 3 of the original file is the first useful line for parsing, so doc_line_numbers[1] = 3.

nothing

Operator begins START and END lines: Each relevant section of the input file begins with a line containing ">>>> START doc_vars_in_this_func >>>>" and ends with a line containing "<<<< END doc_vars_in_this_func <<<<". It doesn't matter how many "greater than" or "less than" characters are in the line. All that matters is that the line begins with an operator so that the tokenizer will return "operator" as the first token on the line to flag it as the start of a block.

For each function block that is parsed, the function's name and argument list are also written to the output, even though they are not intended to be included in what is pasted into the function's documentation. (That information is already in the documentation.) The information is only included to make it easier to identify which function the variable list belongs to when cutting it out of a big file full of outputs.

The output also includes descriptions for variables that are a part of the function's parameter list since the output is for all variables known in the function. These variables should already appear in the @params section of the function's documentation, so they can be deleted from the output. However, they're included here for 2 reasons. First, this code is pretty quick and dirty and it would take more coding to parse out the function's argument list to determine overlapping variables. Second, the information written out here can be helpful in building the @parames and @return sections of the function's documentation before removing them from the sections generated here.

state__start_of_file: Before parsing begins.
state__block_start: Sitting on a ">>> START..." line at the head of a new function block.
state__func_first_line_of_decl: Sitting on the first line of a function declaration. Starts with a symbol, not whitespace. Immediately follows the block start line with no intervening lines.
state__func_decl_cont_line: Sitting on a continuation line of a multi-line function declaration. Line starts with whitespace.
state__var_first_line_of_desc: Sitting on the first line of the description of a variable. Starts with a symbol, not whitespace. Immediately follows the end of a function declaration or variable declaration with no intervening lines.
state__var_desc_cont_line: Sitting on a continuation line of a multi-line description of a variable. Line starts with whitespace.
state__finished: Found end of file, ready to do final cleanup.

symbol: An R function name or variable name with no preceding white space.
whitespace: Spaces and/or tabs.
operator: An R operator; in this case the only one that should occur is the ">" that is used on the block start lines.
EOF: End of file; not returned by the tokenizer, but set in this function and its subcalls.