Trace Features
In exporting outputs like TEI, we will need to know which portions of text were changed and how. In the context of a generic export architecture, we prefer a loose coupling between the snapshot model and the export pipeline.
So, in the export pipeline we avoid borrowing logic from the inner workings of the snapshot model. Rather, we want a text with all the annotations required to build our TEI with this additional information.
To this end trace features were introduced in the snapshot core. These are operation features which get automatically injected by each operation to trace its effect on the text being built.
So, for instance a replacement operation will mark with all the characters it is going to replace with a specific trace feature, and all the characters replacing them with another one. The first feature will tell us that those nodes were collected for deletion; the second one that other nodes were added to their place.
Trace features are clearly distinguished from those you are free to add to any operations. All the trace features have these properties:
- their name starts with
$
, a prefix reserved to trace features. - their value is composite. It always includes operation ID, input and output version, and possibly the segment ordinal number for the features requiring it. All these components are separated by space.
- there can be multiple trace features of the same type. This happens when we have branching, so that e.g. the same node can be the input of two different operations belonging to different branches.
- they are not copied into the next version. So, the lifespan of each trace feature is limited: whenever a new operation is executed, it does not inherit trace features from the previous result.
Types
Currently there are five types of trace features:
$seg-in
: input segment (=sequence of contiguous nodes) selected by the operation. Value isOPID TAGIN:TAGOUT N
whereOPID
is the operation ID,TAGIN
the input version tag,TAGOUT
the output version tag, andN
the ordinal number of the node in the segment captured by the operation.$seg-out
: output segment affected by the operation. Value is the same as$seg-in
.$seg2-in
: same asseg-in
, for the second segment in a swap operation.$seg2-out
: same asseg-out
, for the second segment in a swap operation.$anchor
: marks a single anchor node, used as a reference for add or move operations. The value is like that of segments, except for the finalN
which would not make sense for an anchor. By definition, only a single node can be used as anchor, so there is no need to specify its relative position in a segment.
Of course, segments are contiguous in a specific version only. Operations (except for the annotate operation) alter the order of the nodes, and versions are just their output. That’s why to ease later processing it is convenient to store the relative position of each node in a segment for every version.
👉 Operations inject trace features as follows:
- replace:
- input: the segment to be replaced.
- output: the new segment nodes which replaced the old one.
- delete:
- input: the segment to be deleted.
- output: nothing. The delete has no output segment by definition. So, the deleted node, once detached from the version text, will just retain its input segment feature. Anyway, all deleted nodes have a standard
del
feature whose value is equal to that of trace features for segments. Thedel
feature it’s not a trace feature because it must persist forever once attached to a node: once a node is deleted, it will never come back in a sequence. In fact, together withopid
, these feature mark the entrance and exit of a node, as versions define new sequences.
- add before, add after:
- input: the anchor node gets an anchor feature.
- output: the added segment nodes.
- move before, move after:
- input: the segment to be moved; also, the anchor node gets an anchor feature.
- output: the added segment nodes.
- swap:
- input: the segments to be swapped: one in
seg-in
and another inseg2-in
. - output: the swapped segments: one in
seg-out
and another inseg2-out
.
- input: the segments to be swapped: one in
- annotate:
- input: the segment to annotate.
- output: the segment annotated. This is equal to the input segment.
As for deletion, remember that in the chain structure no node is ever removed from the set (just like in a sheet of paper you can put a stroke on a word, but the word still is there, taking the space originally assigned to it). So, even deleted nodes are still part of it; only, they are no longer included in sequences representing a specific combination of nodes resulting in a text version. That’s what the stroke of our example means. Once nodes get out of a sequence, they will never come back in any other one. We might add new nodes equal to the old ones; but they will be represented as such – new nodes, which get added to the set. That’s consistent with the underlying process this model represents: in most cases, it’s not possible to physically remove a word. If you mark it as deleted, like e.g. with a stroke, you might later reintroduce that word by writing it again somewhere else, and that’s right what is represented by “duplicate” nodes in the model.
Thanks to these features, at each version we can see all the nodes affected by the operation which generated it, and connect them to the previous or next versions.
Simple Example
For instance, consider this mock autograph with numbers, where I added an ordinal number to each operation to make it easier to read it:
The text versions in this autograph are:
- v0 one FIVE six ten three four zero
- v1 one two FIVE six ten three four zero
- v2 one two Five six ten three four zero
- v3 one two five six ten three four zero
- v4 one two five six three four zero (alpha)
- v5 one two three four five six zero
- v6 zeroone two three four five six
- v7 zero one two three four five six (beta)
These versions are generated by these operations:
- insert “two” before “FIVE”.
- replace FIVE with the corresponding title-case word “Five”.
- replace this with the full lowercased word, “five”.
- remove “ten”. This version 4 is labeled as a staged version, named alpha, i.e. a stage during the text transformation which happens to be considered as a waypoint along the path towards the final state of the text, accumulating the effects of all the operations up to this point.
- swap “three four” with “five six”.
- move “zero” from the tail to the head.
- insert a space to separate these words. Once we get to this final version 7, we have another staged version, named beta.
Now, let us focus on staged version alpha (v4) and look at the corresponding trace features:
We have an anchor before “FIVE” which is the reference point for the insertion of “two”; what gets inserted is found in the next version 1 as an output segment (“two”).
Then, “FIVE” is selected as the input segment for the next operation, a replacement, whose output segment is the title-cased word.
The same happens to this “Five”, which gets lowercased by another replacement: so, title-case “Five” is the input segment, and in the next version lowercase “five” is the output segment.
Finally, we select “ten” as the input segment of a delete operation. Note that there’s no output segment for it; or in other words, the output segment is zero. Then we have “five” selected as a portion of the text involved in the next operation, the swap, which is past version alpha.
We could go on, but that’s the point: trace features allow us to track the effect of editing operations in text without having to execute them again from another context, which is loosely coupled to the snapshot model.
Let us look at this list of trace features, from bottom to top, i.e. from version 4 back to version 0, which is the base text. First, we can see that the “ten” input segment was removed; the lowercase “five” segment replaced a title-case “Five”; in turn, this replaced an uppercase “FIVE”. Before it, “two” was inserted.
So, trace features coupled with the type and metadata of each operation (operation identifiers are found in the feature value) in most cases are all what a renderer needs to build its output.
🚀 You can inspect trace features using the developer’s demo at https://gve-demo.fusi-soft.com. Just click
Snapshots
, pick the “digits” preset from the list, run operations, and switch to theSteps
tab. This contains a row for each output (version). You can use then-features
column controls to inspect node features, including trace features, for each version up to that corresponding to its row.
Limerick Example
Let us now consider a more realistic example, like our limerick example. We can use a screenshot of the base text UI to show its characters with their numeric IDs:
The snapshot operations are:
- replace “cried” with “said” (in this sample I’ll use
REP_CRIED
as its ID):v1
; - replace “swans” with “crows” (
REP_SWANS
):v2
; - insert “have” + space before “all” (for metrical reasons;
INS_HAVE
):v3
, staged asalpha
; - swap verses 3-4 (
SWAP
):v4
; - replace “crows” with “owls” (
REP_CROWS
):v5
, staged asbeta
.
When using trace features, we get (I replace the alphanumeric operation IDs with symbolic names to enhance readability):
- v0 (base text):
$seg-in
: for the input segmentcried
of the first replace operation (REP_CRIED
). Its 5 nodes (40-44) have values likeREP_CRIED v0:v1 1
(from1
to5
).
- v1 (output of
REP_CRIED
, replace “cried” with “said”):$seg-out
for the output segmentsaid
of the first replace operation (REP_CRIED
). Its 4 nodes (151-154) have values likeREP_CRIED v0:v1 1
(from1
to4
). The same nodes also carry a standardopid
feature with the ID of the operation which added them to the chain. As expected,opid
features get inherited from version to version: once a node has been added, it stays in the chain forever.$seg-in
: for the input segmentswans
ofREP_SWANS
. Its 5 nodes (99-103) have values likeREP_SWANS v1:v2 1
(from1
to5
).
- v2 (output of
REP_SWANS
, replace “swans” with “crows”):$seg-out
: for the output segmentcrows
ofREP_SWANS
. Its 5 nodes (155-159) have values likeREP_SWANS v1:v2 1
(from1
to5
). The same nodes also carry a standardopid
feature.$anchor
: for node 116 (a
) with valueINS_HAVE v2:v3
defines the anchor working as a reference for the insertion operation. There is no input segment here, i.e. it’s zero, because we are going to add new nodes forhave
beforeall
.
- v3 (output of
INS_HAVE
, insert “have” + space before “all”):$seg-out
: for the inserted segmenthave
+ space ofINS_HAVE
. Its 5 nodes (160-164) have values likeINS_HAVE v2:v3 1
(from1
to5
). These nodes also carry the standard features foropid
andreason
.$seg-in
: for the first input segmentfour larks and a wren,↓
ofSWAP
. Its 23 nodes (72-94) have values likeSWAP v3:v4 1
(from1
to23
).$seg2-in
:two crows and a hen↓
, for the second input segment ofSWAP
. Its 21 nodes (95-98, 155-159, 104-115) have values likeSWAP v3:v4 1
(from1
to21
).
- v4 (output of
SWAP
, swapfour larks and a wren,↓
withtwo crows and a hen↓
):$seg-out
: for the swapped segmentfour larks and a wren,↓
ofSWAP
. Its 23 nodes (72-94) have values likeSWAP v3:v4 1
(from1
to23
).$seg2-out
:two crows and a hen↓
, for the second input segment ofSWAP
. Its 21 nodes (95-98, 155-159, 104-115) have values likeSWAP v3:v4 1
(from1
to21
).$seg-in
: for the input segmentcrows
ofREP_CROWS
. Its 5 nodes (155-159) have values like$seg-in:="REP_CROWS v4:v5 1
(from1
to5
).
- v5 (output of
REP_CROWS
, replace “crows” with “owls”):$seg-out
: for the output segmentowls
ofREP_CROWS
. Its 4 nodes (165-168) have values like$seg-out:="01ed346709 v4:v5 1
(from1
to4
).
So, at each version we can look at the trace features to see which segments were affected by the previous operation (in seg-out
and seg2-out
), and which will be affected by the next one (in seg-in
and seg2-in
). In the following table, I list each segment defined for all the versions:
ver | previous (out) | next (in) |
---|---|---|
v0 | cried | |
v1 | said | swans |
v2 | crows | ⚓ a(ll) |
v3 | have_ | four larks and a wren,↓ / two crows and a hen↓ |
v4 | two crows and a hen↓ / four larks and a wren,↓ | crows |
v5 | owls |
As an example, consider how these features would help in later processing like rendering. For instance, by reading backwards we can pick the output segment of each version and find the corresponding input segment (=the input segment with the same operation ID) in its previous version (which is not necessarily equal to the current version - 1, because we might have branching); we then repeat this until we get to the start of the transformation:
- v5
owls
is fromcrows
(REP_CROWS
v4>v5 beta); - v4
two crows...
andfour larks...
and are fromfour larks...
andtwo crows...
(here we have segments pairs as that’s a swap:SWAP
v3>v4); - v3
have_
was inserted beforeall
(INS_HAVE
v2>v3 alpha); - v2
crows
is fromswans
(REP_SWANS
v1>v2); - v1
said
is fromcried
(REP_CRIED
v0>v1).
If we were to represent the final staged version beta as a simple text with notes about its transformations, we could do as follows:
- determine the versions range: every staged version starts from the first version past the previous staged version, and ends with itself. So, for beta we start from the first version past alpha, which corresponds to v4, and ends with v5, which corresponds to beta. Should we rather want version alpha, we would start from the base text (as there is no previous staged version) and end with v3.
- collect all the output segments in that range, i.e.:
- v5:
owls
; - v4:
two crows and a hen↓
/four larks and a wren,↓
.
- v5:
- flatten these segments into a single line, getting:
[1:there was an old man with a beard,
who said: "It is just as I feared!]
[2:two ][3:owls][4: and a hen,]
[5:four larks and a wren,]
[6:have all built their nests in my beard!"]
Here we have 6 segments:
there was an old man... I feared
: this had no changes.two_
: this was part of the first segment of the swap operation.owls
: this has been replaced fromcrows
._and a hen
: this was part of the swap operation.four larks and a wren,
: this was the second segment of the swap operation.have all...beard!
: this had no changes.
That’s a trivial output, but it shows how trace features can ease such processes, especially useful in rendition tasks.
Of course, the more the changes, the more the fragmentation; that’s the price to pay for a lossy, flattened representation of a more structured model. For instance, if we had no alpha
staged version, our versions range would include all the operations, which would result into these collected segments:
- v5:
owls
; - v4:
two crows and a hen↓
/four larks and a wren,↓
. - v3:
have_
- v2:
crows
- v1:
said
By projecting them into the final text as a flat linear sequence with no nesting, we would get this segmentation:
[1:there was an old man with a beard,
who ][2:said][3:: "It is just as I feared!]
[4:two ][5:owls][6: and a hen,]
[7:four larks and a wren,]
[8:have ][9:all built their nests in my beard!"]
where each segment could be annotated like this:
there was... who_
: no changes.said
fromcried
.: ... feared!
: no change.two_
is part of the first segment in swap.owls
fromcrows
._and a hen
is part of the first segment in swap.four larks and a wren,
is the second segment of the swap operation.have_
was inserted beforeall
.all... beard!
: no changes.
Additionally, we could leverage all the standard features attached to operations (e.g. source, ink color, reason, etc.) for richer notes.