Trace Features
In exporting outputs like TEI, we will need to know which portions of text were changed and how. In the context of a generic export architecture, we prefer a loose coupling between the snapshot model and the export pipeline.
So, in the export pipeline we avoid borrowing logic from the inner workings of the snapshot model. Rather, we want a text with all the annotations required to build our TEI with this additional information.
To this end trace features were introduced in the snapshot core. These are operation features which get automatically injected by each operation to trace its effect on the text being built.
So, for instance a replacement operation will mark with all the characters it is going to replace with a specific trace feature, and all the characters replacing them with another one. The first feature will tell us that those nodes were collected for deletion; the second one that other nodes were added to their place.
Trace features are clearly distinguished from those you are free to add to any operations. All the trace features have these properties:
- their name starts with
$, a prefix reserved to trace features. - their value is composite. It always includes operation ID, input and output version, and possibly the segment ordinal number for the features requiring it. All these components are separated by space.
- there can be multiple trace features of the same type. This happens when we have branching, so that e.g. the same node can be the input of two different operations belonging to different branches.
- they are not copied into the next version. So, the lifespan of each trace feature is limited: whenever a new operation is executed, it does not inherit trace features from the previous result.
Thanks to these features, at each version we can see all the nodes affected by the operation which generated it, and connect them to the previous or next versions.
Types
Currently there are these types of trace features:
$seg-in: input segment (=sequence of contiguous nodes) selected by the operation. Value isOPID TAGIN:TAGOUT NwhereOPIDis the operation ID,TAGINthe input version tag,TAGOUTthe output version tag, andNthe ordinal number of the node in the segment captured by the operation.$seg-out: output segment affected by the operation. Value is the same as$seg-in.$seg2-in: same asseg-in, for the second segment in a swap operation.$seg2-out: same asseg-out, for the second segment in a swap operation.$anchor: marks a single anchor node, used as a reference for add or move operations. The value is like that of segments, except for the finalNwhich would not make sense for an anchor. By definition, only a single node can be used as anchor, so there is no need to specify its relative position in a segment.$left-anchor,$right-anchor: anchor nodes set in the node before a deleted segment, and in the node after it (when they are present). Value isOPID TAGIN:TAGOUT IDLISTwhereIDLISTis the list of the IDs of the deleted nodes.
Of course, segments are contiguous in a specific version only. Operations (except for the annotate operation) alter the order of the nodes, and versions are just their output. That’s why to ease later processing it is convenient to store the relative position of each node in a segment for every version.
Replace
- input: the segment to be replaced.
- output: the new segment nodes which replaced the old one.
💡 Example: 2=B in replace R with B in ARZDC (v0) > ABZDC (v1):
v0:$seg-inforR
v1:opidforB$seg-outforB
Delete
- input: the segment to be deleted.
- output: left/right anchors. The delete operation has no output segment by definition; so, the deleted node, once detached from the version text, will just retain its input segment feature. Anyway, all deleted nodes have a standard
delfeature, whose value is equal to that of the trace features for segments. Thedelfeature is not a trace feature because it must persist forever once attached to a node: once a node is deleted, it will never come back in a sequence. In fact, together withopid, these feature mark the entrance and exit of a node, as versions define new sequences.
💡 Example: 3- = delete Z in ARZDC (v0) > ARDC (v1):
v0:$seg-inforZ
v1:$left-anchorforR$right-anchorforD
As for deletion, remember that in the chain structure no node is ever removed from the set (just like in a sheet of paper you can put a stroke on a word, but the word still is there, taking the space originally assigned to it). So, even deleted nodes are still part of it; only, they are no longer included in sequences representing a specific combination of nodes resulting in a text version. That’s what the stroke of our example means. Once nodes get out of a sequence, they will never come back in any other one. We might add new nodes equal to the old ones; but they will be represented as such – new nodes, which get added to the set. That’s consistent with the underlying process this model represents: in most cases, it’s not possible to physically remove a word. If you mark it as deleted, like e.g. with a stroke, you might later reintroduce that word by writing it again somewhere else, and that’s right what is represented by “duplicate” nodes in the model.
Add Before/After
- input: the anchor node gets an anchor feature.
- output: the added segment nodes.
💡 Example: 3+[Y = add Y before Z in ARZDC (v0) > ARYZDC (v1):
v0:$anchorforZ
v1:$opidforY$seg-outforY
Move Before/After
- input: the segment to be moved; also, the anchor node gets an anchor feature.
- output: the added segment nodes.
💡 Example: 3>]5 = move Z after C in ARZDC (v0) > ARDCZ (v1):
v0:$seg-inforZ
v1:$anchorforC
Swap
- input: the segments to be swapped: one in
seg-inand another inseg2-in. - output: the swapped segments: one in
seg-outand another inseg2-out.
💡 Example: 1x2<>4x2 = swap AR with DC in ARZDC (v0) > DCZAR (v1):
v0:$seg-inforAR$seg2-inforDC
v1:$seg2-outforAR$seg2-outforDC
Annotate
- input: the segment to annotate.
- output: the segment annotated. This is equal to the input segment.
💡 Example: 3: [note=sample] = annotate Z in ARZDC (v0) > unchanged (v1):
v0:$seg-inforZ
v1:note=sample$seg-outforZ
Simple Example
For instance, consider this mock autograph with numbers, where I added an ordinal number to each operation to make it easier to read it:

The text versions in this autograph are:
- v0 one FIVE six ten three four zero
- v1 one two FIVE six ten three four zero
- v2 one two Five six ten three four zero
- v3 one two five six ten three four zero
- v4 one two five six three four zero (alpha)
- v5 one two three four five six zero
- v6 zeroone two three four five six
- v7 zero one two three four five six (beta)
These versions are generated by these operations:
- insert “two” before “FIVE”.
- replace FIVE with the corresponding title-case word “Five”.
- replace this with the full lowercased word, “five”.
- remove “ten”. This version 4 is labeled as a staged version, named alpha, i.e. a stage during the text transformation which happens to be considered as a waypoint along the path towards the final state of the text, accumulating the effects of all the operations up to this point.
- swap “three four” with “five six”.
- move “zero” from the tail to the head.
- insert a space to separate these words. Once we get to this final version 7, we have another staged version, named beta.
Now, let us focus on staged version alpha (v4) and look at the corresponding trace features:

We have an anchor before “FIVE” which is the reference point for the insertion of “two”; what gets inserted is found in the next version 1 as an output segment (“two”).
Then, “FIVE” is selected as the input segment for the next operation, a replacement, whose output segment is the title-cased word.
The same happens to this “Five”, which gets lowercased by another replacement: so, title-case “Five” is the input segment, and in the next version lowercase “five” is the output segment.
Finally, we select “ten” as the input segment of a delete operation. Note that there’s no output segment for it; or in other words, the output segment is zero. Then we have “five” selected as a portion of the text involved in the next operation, the swap, which is past version alpha.
We could go on, but that’s the point: trace features allow us to track the effect of editing operations in text without having to execute them again from another context, which is loosely coupled to the snapshot model.
Let us look at this list of trace features, from bottom to top, i.e. from version 4 back to version 0, which is the base text. First, we can see that the “ten” input segment was removed; the lowercase “five” segment replaced a title-case “Five”; in turn, this replaced an uppercase “FIVE”. Before it, “two” was inserted.
So, trace features coupled with the type and metadata of each operation (operation identifiers are found in the feature value) in most cases are all what a renderer needs to build its output.
🚀 You can inspect trace features using the developer’s demo at https://gve-demo.fusi-soft.com. Just click
Snapshots, pick the “digits” preset from the list, run operations, and switch to theStepstab. This contains a row for each output (version). You can use then-featurescolumn controls to inspect node features, including trace features, for each version up to that corresponding to its row.
Limerick Example
Let us now consider a more realistic example, like our limerick example. We can use a screenshot of the base text UI to show its characters with their numeric IDs:

The snapshot operations are:
- replace “cried” with “said” (in this sample I’ll use
REP_CRIEDas its ID):v1; - replace “swans” with “crows” (
REP_SWANS):v2; - insert “have” + space before “all” (for metrical reasons;
INS_HAVE):v3, staged asalpha; - swap verses 3-4 (
SWAP):v4; - replace “crows” with “owls” (
REP_CROWS):v5, staged asbeta.
When using trace features, we get (I replace the alphanumeric operation IDs with symbolic names to enhance readability):
- v0 (base text):
$seg-in: for the input segmentcriedof the first replace operation (REP_CRIED). Its 5 nodes (40-44) have values likeREP_CRIED v0:v1 1(from1to5).
- v1 (output of
REP_CRIED, replace “cried” with “said”):$seg-outfor the output segmentsaidof the first replace operation (REP_CRIED). Its 4 nodes (151-154) have values likeREP_CRIED v0:v1 1(from1to4). The same nodes also carry a standardopidfeature with the ID of the operation which added them to the chain. As expected,opidfeatures get inherited from version to version: once a node has been added, it stays in the chain forever.$seg-in: for the input segmentswansofREP_SWANS. Its 5 nodes (99-103) have values likeREP_SWANS v1:v2 1(from1to5).
- v2 (output of
REP_SWANS, replace “swans” with “crows”):$seg-out: for the output segmentcrowsofREP_SWANS. Its 5 nodes (155-159) have values likeREP_SWANS v1:v2 1(from1to5). The same nodes also carry a standardopidfeature.$anchor: for node 116 (a) with valueINS_HAVE v2:v3defines the anchor working as a reference for the insertion operation. There is no input segment here, i.e. it’s zero, because we are going to add new nodes forhavebeforeall.
- v3 (output of
INS_HAVE, insert “have” + space before “all”):$seg-out: for the inserted segmenthave+ space ofINS_HAVE. Its 5 nodes (160-164) have values likeINS_HAVE v2:v3 1(from1to5). These nodes also carry the standard features foropidandreason.$seg-in: for the first input segmentfour larks and a wren,↓ofSWAP. Its 23 nodes (72-94) have values likeSWAP v3:v4 1(from1to23).$seg2-in:two crows and a hen↓, for the second input segment ofSWAP. Its 21 nodes (95-98, 155-159, 104-115) have values likeSWAP v3:v4 1(from1to21).
- v4 (output of
SWAP, swapfour larks and a wren,↓withtwo crows and a hen↓):$seg-out: for the swapped segmentfour larks and a wren,↓ofSWAP. Its 23 nodes (72-94) have values likeSWAP v3:v4 1(from1to23).$seg2-out:two crows and a hen↓, for the second input segment ofSWAP. Its 21 nodes (95-98, 155-159, 104-115) have values likeSWAP v3:v4 1(from1to21).$seg-in: for the input segmentcrowsofREP_CROWS. Its 5 nodes (155-159) have values like$seg-in:="REP_CROWS v4:v5 1(from1to5).
- v5 (output of
REP_CROWS, replace “crows” with “owls”):$seg-out: for the output segmentowlsofREP_CROWS. Its 4 nodes (165-168) have values like$seg-out:="01ed346709 v4:v5 1(from1to4).
So, at each version we can look at the trace features to see which segments were affected by the previous operation (in seg-out and seg2-out), and which will be affected by the next one (in seg-in and seg2-in). In the following table, I list each segment defined for all the versions:
| ver | previous (out) | next (in) |
|---|---|---|
| v0 | cried | |
| v1 | said | swans |
| v2 | crows | ⚓ a(ll) |
| v3 | have_ | four larks and a wren,↓ / two crows and a hen↓ |
| v4 | two crows and a hen↓ / four larks and a wren,↓ | crows |
| v5 | owls |
As an example, consider how these features would help in later processing like rendering. For instance, by reading backwards we can pick the output segment of each version and find the corresponding input segment (=the input segment with the same operation ID) in its previous version (which is not necessarily equal to the current version - 1, because we might have branching); we then repeat this until we get to the start of the transformation:
- v5
owlsis fromcrows(REP_CROWSv4>v5 beta); - v4
two crows...andfour larks...and are fromfour larks...andtwo crows...(here we have segments pairs as that’s a swap:SWAPv3>v4); - v3
have_was inserted beforeall(INS_HAVEv2>v3 alpha); - v2
crowsis fromswans(REP_SWANSv1>v2); - v1
saidis fromcried(REP_CRIEDv0>v1).
If we were to represent the final staged version beta as a simple text with notes about its transformations, we could do as follows:
- determine the versions range: every staged version starts from the first version past the previous staged version, and ends with itself. So, for beta we start from the first version past alpha, which corresponds to v4, and ends with v5, which corresponds to beta. Should we rather want version alpha, we would start from the base text (as there is no previous staged version) and end with v3.
- collect all the output segments in that range, i.e.:
- v5:
owls; - v4:
two crows and a hen↓/four larks and a wren,↓.
- v5:
- flatten these segments into a single line, getting:
[1:there was an old man with a beard,
who said: "It is just as I feared!]
[2:two ][3:owls][4: and a hen,]
[5:four larks and a wren,]
[6:have all built their nests in my beard!"]
Here we have 6 segments:
there was an old man... I feared: this had no changes.two_: this was part of the first segment of the swap operation.owls: this has been replaced fromcrows._and a hen: this was part of the swap operation.four larks and a wren,: this was the second segment of the swap operation.have all...beard!: this had no changes.
That’s a trivial output, but it shows how trace features can ease such processes, especially useful in rendition tasks.
Of course, the more the changes, the more the fragmentation; that’s the price to pay for a lossy, flattened representation of a more structured model. For instance, if we had no alpha staged version, our versions range would include all the operations, which would result into these collected segments:
- v5:
owls; - v4:
two crows and a hen↓/four larks and a wren,↓. - v3:
have_ - v2:
crows - v1:
said
By projecting them into the final text as a flat linear sequence with no nesting, we would get this segmentation:
[1:there was an old man with a beard,
who ][2:said][3:: "It is just as I feared!]
[4:two ][5:owls][6: and a hen,]
[7:four larks and a wren,]
[8:have ][9:all built their nests in my beard!"]
where each segment could be annotated like this:
there was... who_: no changes.saidfromcried.: ... feared!: no change.two_is part of the first segment in swap.owlsfromcrows._and a henis part of the first segment in swap.four larks and a wren,is the second segment of the swap operation.have_was inserted beforeall.all... beard!: no changes.
Additionally, we could leverage all the standard features attached to operations (e.g. source, ink color, reason, etc.) for richer notes.