internal/classes-re-cells.md

title: "Classes and methods related to cells"

Reference

Spreadsheet Implementation Technology: Basics and Extensions Peter Sestoft https://mitpress.mit.edu/books/spreadsheet-implementation-technology

Informal intro to cell references

1.3 Cell reference formats, p. 3

| | A1 | R1C1 | |-----|:-----|:-----------| | abs | $B$4 | R4C2 | | rel | B4 | R[4]C[2] |

Two main points:

Two subtleties the table above glosses over:

By default, an A1-formatted cell reference is relative. Use a dollar sign $ to indicate absolute.

By default, an R1C1-formatted cell reference is absolute. Relative references are indicated by putting the offset inside square brackets. Except when the offset if zero, when it is left out. So RC is how to say "this cell" -- not R[0]C[0].

We like R1C1 format, especially for relative references, because it is invariant when you copy a formula. And it is just more sane, in every possible sense, actually.

Meaning of cell references assuming found in cell B2

| A1 | R1C1 | Meaning | |:-----|:-------------|:----------------------------| | A1 | R[-1]C[-1] | rel: prev row, prev col | | A2 | RC[-1] | rel: this row, prev col | | B1 | R[-1]C | rel: prev row, this col | | B2 | RC | rel: this cell | | C3 | R[1]C[1] | rel: next row, next col | | $A$1 | R1C1 | abs: row 1, col 1 or A | | $A2 | RC1 | mixed: this row, col 1 or A | | C$3 | R3C[1] | mixed: row 3, next col |

Important concepts in Sestoft's Corecalc spreadsheet implementation

From Peter Sestoft's book Spreadsheet Implementation. Section 2.1.1, p28-->.

A subset of the core concepts that seem immediately relevant to us:

Cell references

The contents of a cell must be either:

How do we go from the string representation of a formula (concrete syntax) to the internal representation (abstract syntax)? Formula text is first scanned or tokenized, then it is parsed into an abstract syntax tree (AST). Sestoft presents a full grammar for cell contents (p34) but here I just focus on the cell references.

Here's what cell references look like in the string representation:

[WORKBOOK.xlsx]'WORKSHEET NAME'!A1

Note: Sestoft never mentions the possibility of the filename appearing at the front, but that's definitely possible in Excel.

Here's the relevant subset of the Sestoft cell contents grammar:

Expr ::=
  Raref
| Raref : Raref

Sheetref ::=
  Name ! Raref
| Name ! Raref : Raref

Raref ::=
  Column Row
| $ Column Row
| Row $ Column
| $ Column $ Row
| R Offset C Offset

Offset ::=
  <empty>
| Uint
| [ Int ]

where Column is a column name A, B, ...; Row is a row number 1, 2, ...; Uint is a non-negative integer; and Int is an integer.

Sestoft doesn't allow 3-D cell references, e.g. Sheet1:Sheet12!B2, and I don't plan to deal with them either.

Sestoft doesn't allow for full-column references, e.g. B:B, but I think I need to. Not sure why he doesn't mention full-row references.

Sestoft describes full-on scanning/tokenization + parsing for string representations of cell contents (p33); this returns an AST for each cell as an object of class Cell. In cellranger we only deal with cell references and I'll just use regular expressions.

Revisiting classes relevant to cellranger:

Methods

The Addr method for a RARef returns a CellAddr. English: single cell reference goes in, (row, column) pair comes out. FYI, the result of this is always used in a context where sheet info is available separately, e.g. from the CellRef object that contained the associated RARef.

The constructor for a CellAddr takes the row and column of the host cell and a RARef and returns absolute cell address that the RARef points to.

The ToString method for CellAddr take an absolute cell address and returns a typical string representation. Looks like Sestoft always returns in A1 format in this case. Why?

The Show method for a RARef generates one of the usual string representations of a cell reference. Sestoft's Show takes row and column (so he Shows only after applying Addr method, I guess?) and a format (A1 vs R1C1) as input.

A CellRef is a valid Expr in Sestoft's Corecalc. When you Eval it, the cell address gets resolved (row, column, sheet) and the formula in that cell is evaluated. We obviously won't do all of that in cellranger (i.e. the formula retrieval and evaluation), but should do the cell resolution part. I think we need to a function that takes a host cell and a (potentially relative) cell reference and gives back another (absolute) cell reference. File and sheet reference (or lack thereof) should propagate.

The Show method for a CellRef builds up its output from the the Show method for a RARef and the sheet reference, if such exists.

Mixed cell area reference

Example given in 1.4 Formulas, functions, and arrays p. 5

Consider formula =SUM(A$1:A1) in the absolute cell B1.

I should run through what happens when you move/copy that formula into B2 or B3 or column C. Come back to this.



jennybc/cellranger documentation built on May 19, 2019, 4:04 a.m.