Entities Hierarchy
The snapshot is just the lowest level entity in our hierarchy. Being closest to the material supports, and representing a highly complex multiple-versions text, it’s the most complex one. Yet, above it there are other entities we want to represent in our data.
⚠️ This document reflects an early stage of the modeling stage for those entities, representing the base for a Cadmus-based editor.
Overview
The diagram below summarizes the main entities in this project:

Starting from the material side (top of the diagram), everything starts with our carriers. A carrier is defined as “the concrete, physical object that contains, among other things, the textual witnesses and is then digitally reproduced and described in the edition”. In the diagram we are representing two notebook carriers, named H55 and H54.
These notebooks contain many texts. In the diagram we focus on a single epigram, which happens to be found in both these notebooks. Each corresponds to a version of the “same” epigram, but even before that, at the lowest level of our model hierarchy there are the snapshots, a highly structured and computable digital model representing the textual situation specific to that carrier. The “textual situation” is defined as “the transmitted sum of all variants of textual versions on a text carrier”. The snapshot is not just a container of variations of a text, but a compact model capable of generating all of them with their metadata, via operations which represent the scholars’ interpretation of the annotations chaotically scattered on the carrier. As such, it can also be used to animate the history of the making of this specific text across all its supposed stages, as hinted by all the annotations on the carrier. So, when we are interested in getting into the details of autograph texts, a carrier will usually contain multiple snapshots, each referred to a specific text.
At a higher level, typically scholarly critical judgement usually ends up defining a final epigram version from each snapshot.
Let us provide a less abstract example: the below picture was taken from a spreadsheet collecting all the epigram versions from all the carriers:

As you can see, each column starting from C corresponds to a carrier (H55, M1, M2, H54, etc.). All the epigram versions present in various carriers are aligned on the same row: for instance, at row 14 what can be considered the same epigram is attested by two different epigram versions, one in H55 and another in H54, both with incipit “In dem engsten der”.
Now, the very fact that we are aligning the same texts in the same row, and assigning them a shared numeric identifier, implies that we are recognizing them as instances of the same class, the abstract epigram. The epigram is defined as “all textual versions that are “related to each other through textual identity and distinguishable through textual variance” (GGA 224)”. In our model, the “epigram” is thus an abstraction, which is conveniently used to hold metadata shared among all the epigrams: it corresponds to the row in the spreadsheet table, which is identified by your epigram identifier. “Real” epigrams (=epigram versions) are in columns; but they are all lined up in the same row because we consider all these texts as alternatives of a single unit. This unit is our abstraction, the primary reason for lining them up in the same row; so all what we say about the row is meant to apply to all the columns in it.
So, looking again at our diagram, each snapshot is connected to an epigram version, which is the text critically reconstructed as the final stage of the creative process detailed by the snapshot. This is the second (mid) layer in our diagram.
In turn, multiple epigram versions are instances of an abstraction, the epigram, which appears in yet another layer (bottom). In our diagram example, the two epigram versions from snapshots in carriers H55 and H54 are linked to a single epigram. It should be noticed that in this project, due to the peculiar nature of Venetian Epigrams, the epigram has no text at all: it is just an abstract container for metadata shared among epigram versions, each arising from a different carrier.
Finally, on this abstract layer we find a collection. This is a more specialized IT term fit to the entity represented in the digital model, and essentially corresponding to what is defined as an order or sequence (“the arrangement of all textual witnesses as found on a textual carrier”1). The term “collection” here is used with a purely IT sense, meaning any number of items belonging to an ordered set. What corresponds to this collection may vary: it might just be an idea of the author for organizing some epigrams, derived from what he writes about them elsewhere; or it might be what emerges from marks added to the notebooks (e.g. numbers) for each epigram, hinting at some plan for building an ordered collection of them; or something material, like the physical sequence in which they appear. Whatever the specific nature, we adopt a single, more abstract model for them.
⚙️ A collection contains texts in a given order. The order is represented by a sequence. From a purely IT standpoint, a collection is an abstract data type that groups multiple data elements, which can be of the same or different types. That’s the most generic, umbrella term and can include sets, lists, arrays, etc. A “set” is a specialization which differs because its elements are all unique. Among collections, some are ordered: these are “sequence”, “list”, and “array”. A sequence can be finite or infinite (e.g. the sequence of numbers). Lists and arrays are both ordered, but lists are dynamically resized, while arrays are static but more memory-efficient. So, “collection” is more abstract: it’s a conceptual container that doesn’t enforce rules about order, uniqueness, or mutability. It’s like saying “a vehicle” without specifying whether it’s a car, bike, or spaceship. Instead:
- a set is a collection with constraints: no duplicates, no order.
- a sequence is a collection with structure: order matters, duplicates allowed.
- a list is a sequence with flexibility (dynamic size).
- an array is a sequence with rigidity (fixed size).
In this generic model, a “collection” is not necessarily ordered; or it might be ordered in multiple ways. From the point of view of a modular architecture, this means separating these two notions: the collection is the overarching entity, and it can sort its items in no way (when not sorted), or in just 1 way, or in multiple ways. This implies that the model for zero or more ordering’s assigned to the items of a collection will be repeated for each different order. This ensures we have a consistent model without redundancies (i.e. repetitions), and it is also the practical reason for which the collection is a Cadmus item, while sequences are parts of that item, even if the details of this implementation are still to be defined. So, we can say that those items belong to a collection, and say this only once; then, we can add that this collection can be ordered in zero or more different sequences. Anyway, this can be easily regarded as a modeling detail. We can adopt a synecdoche and name these collections “sequences”: from the user’s point of view, nothing changes. Yet, internally we distinguish between the collection in the technical sense from how we can arrange items in it.
Cadmus for VEdition
Given that our modeling is highly complex, though necessarily open to changes, especially at this early stage; and that we need a quick and effective infrastructure to lean our snapshot on, we are going to adopt Cadmus to represent all our entities in a single database, with a uniform data architecture.
Essentially, you can think of Cadmus records (called items) as boxes, where you can put any number and type of objects (called parts). Each of these objects has its own self-contained model, and is typically designed for reuse, so that you can put the same type of objects in many different boxes. For instance, an object representing a structured datation (with all the nuances for years, months, days, centuries, termini ante and post, etc.) can be put into any box representing an item which requires a date. The data model is thus dynamic and built by composition, and so is the web-based editor corresponding to it: the model of a box is just its content, which varies whenever a new object is put into it or an existing one is removed from it. This modularity allows for highly structured and scalable dynamic models, which fit to a lot of different scenarios, including text with all its annotations, whatever their complexity.
In fact, text in Cadmus is just an object, like any other datum, and so are its annotations. For instance, if you have a text with a critical apparatus, a comment, and paleographic annotations, you might have a box with an object for the plain text; an object for the apparatus annotations; another one for the comments; and yet another one for paleographic annotations. Each of these objects has its own model, so that annotating a text essentially means linking an object of any type to a specific portion of it.
This produces a sort of layered annotation system, where each layer contains a set of annotations belonging to a specific knowledge domain, and thus having its own model. For highly complex or highly frequent annotations this has many benefits:
- it allows for a highly scalable scenario, where you can add as many annotations (layers) as you want, without affecting the existing text and its other annotations. You just add another annotation to the layer object, or a new layer object for annotations belonging to a different knowledge domain, without having to change neither the text or its existing annotations. This is not true for annotations systems like XML, where a single tree-based structure holds all the metadata attached to portions of the text; there, adding many heterogeneous structures on top of it means struggling to tackle a complex game of interlocking pieces to build a tree with all the required tags woven together. Additionally, this is not always practical or even possible, and eventually ends up hitting the barrier of overlap, which is not allowed in XML. The typical solution in this case is standoff, which in fact is one of the typical outputs of Cadmus when exporting a subset of its data into TEI; yet, that’s right its complexity which calls for an automatic generation of it.
- it allows designing the model of each annotation without constraints from the physical model. You can define a highly structured object for each type of annotation, without caring about having to interlock its parts with those of other models into a single, predefined structure. Also, instead of just attaching “flat” tags (as element names or attribute name/value pairs) to a portion of text, you can attach a fully structured object of any depth, where each property is either a scalar value or yet another object, without limits.
- it allows abstracting from a specific physical model (e.g. XML) thus producing an easy user experience, requiring no IT skills to create digital content; you just have to fill a web form.
- it allows using the same abstract source model to generate multiple outputs, whether it is a TEI document (in one or more different schemas), an RDF graph, etc.
GVE Parts
These parts are specific to the GVE project.
Snapshot Part
See code for more details.
snapshot(Snapshot):size(Size): size in pixels:width(double)height(double)
style(string): snapshot CSS style.defs(string): optional SVGdefselement code.image(SnapshotImage): background image:url* (string)canvas(Rectangle):x(double)y(double)width(double)height(double)
opacity(double)
text(CharChainNode[]):id(int)index(int)label(string)data(char)sourceTag(string)features(Feature[]):name* (string)value(string)setPolicy* (int)
textStyle(string): CSS style for base text layer.textOptions(SvgBaseTextOptions):lineHeightOffset(double)charSpacingOffset(double)spcWidthOffset(double)offset(Point):x(double)y(double)
minLineHeights(dictionary of doubles keyed by shorts)
operations(CharChainOperationSource[]):rank(short)groupId(string)features(OperationFeature[]):name(string)value(string)setPolicy(FeatureSetPolicy)isNegated(bool)isGlobal(bool)isShortLived(bool)
sources(OperationSource[]):id* (string)type(string)rank(short)note(string)
diplomatics(OperationDiplomatics):g(string): SVG for the graphical representation of the operation.isNewTextHidden(bool)features(Feature[]):name(string)value(string)setPolicy(FeatureSetPolicy)
elementFeatures(dictionary with key=string and value=list ofFeature’s)
id* (string)type* (OperationType)inputTag(string)outputTag(string)atAsIndex(bool)at(int)run(int)toAsIndex(bool)to(int)toRun(int)value(string): the primary value argument for this operation.
opStyle(string): CSS style for the operation layer.timelines(dictionary withstringkeys andAnimationTimelinevalue):
Hands Part
hands(GveHand[]):eid(string📚gve-hand-eids): the thesaurus is used when you do not need to link the owner ID and have just a closed list, like here.ownerId(AssertedCompositeId)tag(string📚gve-hand-tags)tool(string📚gve-hand-tools)color(string📚gve-hand-colors)notes(dictionary of strings)
Thesauri:
- 📚
gve-hand-eids:- schlegel=Schlegel
- schiller=Schiller
- geist=Geist
- riemer=Riemer
- unknown=unknown
Items
Here we list the Cadmus items with their parts, as defined for the GVE editor in the API backend profile. The references part is mostly used for Zotero-based bibliography.
Flags
complete: the item is completerevised: the item has been revisedundisclosed: the item is not (or not yet) meant for publishinglost(for carriers and possibly others)
We can use a flag for
lostbecause this allows browsing and filtering carrier items according to these types at a glance (lostbeing something no longer existing, this is a capital distinction to be made), and because this is a single, binary feature for which a categories part would be too much. Also, alostflag might possibly apply to other items, too.
In a publishing flow, where data move from the backend database (edited with Cadmus) to some frontend presentation, there will be rules to determine when an item in the database is to be published: e.g. the item must be complete and not be undisclosed.
Snapshot Item
- identity:
- material:
- categories:
support: branches for:- format (quarto)
- materials (loose materials, folded materials, bound materials…)
- paper type
- paper colors
- categories:
- content:
- categories:
content: branches for:- copy type (rough, clean)
- language (German, Italian, Latin, Ancient Greek)
- authorship (autograph, allograph)
- numbering (pagination, foliation)
- margins (cropped, torn)
- writing material (ink1, ink2, pencil…)
- categories:
lang: languages (German, Italian, Latin, Ancient Greek) - snapshot (GVE)
- comment with topic categories.
- categories:
- history:
- editorial:
The proposed page rotation feature (right or left) is a property of the whole support, so it will not be encoded as an operation feature. If this is only applied to snapshots, we could add it to its metadata or provide a specific categories for the support; anyway it’s easier to just add the rotation to the entries of the
categories:supportpart, and use it for snapshots only. This will keep all the support properties in the same set, and avoid a full part for just a couple of entries.
Categories thesauri:
-
📚
categories_content:- copy type:
- rough (Rohfassung)
- clean (Reinschrift)
- print (Druck): we could put this here because it can’t be a flag, as it applies only to carriers; and in a sense it could be aligned to these other types which tend to be mutually exclusive.
- language:
- German (Deutsch)
- Italian (Italienisch)
- Latin (Lateinisch)
- Ancient Greek (Altgriechisch)
- authorship:
- autograph (Autograph)
- allograph (Allograph)
- numbering:
- pagination (Paginierung)
- foliation (Folierung)
- margins (Ränder):
- cropped (beschnitten)
- torn (gerissen)
- writing material (Schreibmaterial):
- ink 1 (Tinte 1)
- ink 2 (Tinte 2)
- pencil (Bleistift)
- red chalk (Rötel)
- copy type:
-
📚
categories_support:- format (Format):
- quarto (Quarto)
- loose materials (Loses Material)
- sheet (Blatt)
- cut-out/clipping (Blattausschnitt)
- folded materials (Gefaltetes Material)
- bound materials (Gebundenes Material):
- notebook cover (Heft mit Umschlag)
- notebook bound (Heft)
- book page (Buchseite)
- rotation (Drehung): added from snapshot diplomatic properties:
- right (Rechtsdrehung)
- left (Linksdrehung)
- format (Format):
Carrier Item
-
flags: lost.
- identity:
- material:
- categories:
support - measurements; default size
mm. - preservation states
- categories:
- content:
- categories:
content - categories:
text: branches for:- manuscripts (epigram, epigram collection, letter…)
- prints (literary magazine, edition volume)
- comment with topic categories.
- categories:
- history:
- chronotopes for both origin and provenance (use tags).
- note:
hist
- editorial:
Categories thesauri:
📚 categories_text:
- epigram (Epigramm)
- epigram collection (Epigrammsammlung)
- letter (Brief)
- travel journal (Reisetagebuch)
- working notebook (Arbeitsheft)
- index (Index)
- note (Notat)
- letter recipients list (Liste Briefempfänger)
- calculations (Berechnungen)
- sketches (Skizzen)
- itinerary (Reiseplan)
- scientific descriptions (Naturwissenschaftliche Beschreibung)
- list of words (Wörterlisten)
- remarks on epigram meter (Bemerkungen zur Metrik)
- print (Druck):
- literary magazine (Literaturzeitschrift)
- edition volume (Editionsband)
Epigram Version Item and Lost Lines Item
For epigram version the group ID is the epigram’s EID. It is still to be determined whether we need the epigram version item or not.
- identity:
- material:
- content:
- categories:
text - token-based text
- apparatus layer
- comment layer with topic categories.
- hands (GVE)
- categories:
- history:
- chronotopes
- note:
hist
- editorial:
Epigram Item
- identity:
- content:
- comment with topic categories.
- editorial:
Collection Item
- identity:
- content:
- categories:
seq - links:
seq🔗 version - comment with topic categories.
- categories:
- editorial:
Parts Matrix
| part | snapshot | carrier | version | lines | epigram | collection |
|---|---|---|---|---|---|---|
| categories | content support lang | content support text | text | text | seq | |
| chronotopes | X | X | X | |||
| comment | X | X | X | X | ||
| external IDs | X | X | X | X | ||
| hands (GVE) | X | X | ||||
| links | X | X auth | X auth | seq | ||
| measurements | X | |||||
| metadata | X | X | X | X | X | X |
| note | X | X hist | X hist | X hist | X | X |
| references | X | X | X | X | X | X |
| shelfmarks | X | X | ||||
| snapshot | X | |||||
| states | X | X | X | |||
| text | X | X | ||||
| apparatus= | X | X | ||||
| comment= | X | X |
Thesauri List
This list currently excludes text-related parts as it is not yet defined whether they will be required.
- categories (carrier, topic, seq):
- 📚 categories
- comment:
- 📚 comment-tags
- 📚 doc-reference-types
- 📚 doc-reference-tags
- 📚 comment-id-scopes
- 📚 comment-id-tags
- 📚 assertion-tags
- 📚 comment-categories
- 📚 comment-keyword-languages
- dates:
- 📚 doc-reference-types
- 📚 doc-reference-tags
- events:
- 📚 event-types
- 📚 event-tags
- 📚 chronotope-tags
- 📚 assertion-tags
- 📚 doc-reference-types
- 📚 doc-reference-tags
- 📚 event-relations
- 📚 pin-link-scopes
- 📚 pin-link-tags
- 📚 assertion-tags
- external IDs:
- 📚 external-id-tags
- 📚 external-id-scopes
- 📚 assertion-tags
- 📚 doc-reference-types
- 📚 doc-reference-tags
- flags (txt):
- 📚 flags
- hands (GVE):
- 📚 gve-hand-tags
- 📚 gve-hand-tools
- 📚 gve-hand-colors
- links (default, auth):
- 📚 pin-link-scopes
- 📚 pin-link-tags
- 📚 pin-link-assertion-tags
- 📚 pin-link-docref-types
- 📚 pin-link-docref-tags
- measurements:
- 📚 physical-size-set-names
- 📚 physical-size-dim-tags
- 📚 physical-size-units
- metadata:
- 📚 metadata-types
- 📚 metadata-names (eid, author…)
- note (default, hist):
- 📚 note-tags
- references
- 📚 doc-reference-types
- 📚 doc-reference-tags
- shelfmarks:
- 📚 cod-shelfmark-tags
- 📚 cod-shelfmark-libraries
- snapshot:
- 📚 snapshot-feat-names
- 📚 snapshot-feat-values
- 📚 snapshot-efeat-names
- 📚 snapshot-efeat-values
- 📚 snapshot-dfeat-names
- 📚 snapshot-dfeat-values
- states:
- 📚 physical-states
- 📚 physical-state-features
- 📚 physical-state-reporters
In the snapshot part, thesauri for names and values follow a specific convention:
namesare entries whereid=entry ID andvalue=entry label, as usual in all thesauri. For instance,id=clrandvalue=color.valuesare composite-like entries whereid=name-id +:+ value-id andvalue=label. For instance,id=clr:rand value=red. So these entries represent closed sets of values for specific feature names (=the keys in the features map).
Operation Features
As for the snapshot features, we can provide one or more of these definitions:
- a set of feature definitions for operations metadata.
- a set of feature definitions for operations diplomatic metadata. Here you put all what refers to color, shape, etc.
- a set of feature definitions for operation visual elements metadata. If not using SVG descriptions these are not required. The set refers to the SVG elements which build up the graphical representation of a feature, when there is one. For instance, say you are graphically representing a composite stroke where for some reason one red segment crosses one black segment: in this case, you would graphically represent this with 2 SVG line elements, 1 per stroke. Each of these lines could have any number of features attached, like the ink color for that specific line. So, the element feature definitions would be used here.
With definitions I mean that not only you can provide a list of features, but also that for each feature in the list you can either leave its values as an open set, or close them, defining a list of allowed values for that feature.
For instance, say we want to have 2 features, one for color and another for size. Of course, this is totally unrealistic; it’s just a fake example providing both an open and a closed set. So, we would define 2 features, each having an ID (in English by convention) and a label (in English, German, or any other language we want), e.g.:
- ID=
clr, label=color - ID=
sz, label=size
On passage, in real-world we do not usually adopt abbreviated IDs like “clr” or “sz”; we use the full name (see thesauri naming conventions), “color” and “size”, unless this is too long. Here I’m just using these abbreviations so we can easily differentiate between IDs and values in the example.
Now, say that the size is free, as we want to enter a free measurement here; while the color is limited to a set including only red, green, blue. This means that we will include a “dictionary” of available colors for the feature with the ID=clr, while providing nothing for the other one (sz):
clr:- ID=
r, label=red - ID=
g, label=green - ID=
b, label=blue
- ID=
So, once we have these definitions say for the diplomatic features, the diplomatic features UI will behave as follows:
- the list of features is a closed set, a dropdown list with only “color” and “size”;
- when you pick “color”, the value is a dropdown list too, with only “red”, “green”, “blue”;
- when you pick “size”, the value is a textbox where you are free to type.
Here we represent these features using this convention:
- diplomatic features
- names:
- clr=color
- sz=size
- values:
- clr:r=red
- clr:g=green
- clr:b=blue
This represents a single definition set, in a specific language; just replicate this structure to cover more languages.
- diplomatic features:
- names:
- epigram-nr=epigram number
- page-nr=page number
- position=position
- shape=shape
- scope=scope
- color=color
- values:
- position:baseline=baseline
- position:above=interlinear above
- position:below=interlinear below
- position:outer-top=outside at top
- position:outer-bottom=outside at bottom
- position:outer-left=outside at left
- position:outer-right=outside at right
- shape:line-hrz=horizontal straight line
- shape:line-ne=diagonal up line
- shape:line-se=dialong down line
- shape:line-vrt=vertical straight line
- shape:line-hrz-2=horizontal double straight line
- shape:scrape=scraped off
- shape:erasure=erased
- shape:underline=underline
- shape:underline-dot=dotted underline
- shape:underline-2=double underline
- shape:scribble=scribble
- shape:dot=dot
- shape:check=check mark
- shape:circle-dot=circle with dot
- shape:curve=curved line
- shape:curly-90=curly brace rotated down
- shape:nb=notabene
- shape:cross=cross
- scope:comma=comma
- scope:question-upper=upper part question mark
- scope:exclam-upper=upper part exclamation mark
- scope:period=period
- scope:letter=letter
- scope:word=word
- scope:line=line
- scope:line-2=two lines
- scope:line-3=three lines
- scope:epigram=entire epigram
- color:black=black
- color:dark-brown=dark brown
- color:orange=orange
- color:light-orange=light orange
- color:red=red
- names:
Notes:
- epigram and page numbers are open sets (you enter the number), so they have no entries in
values. - the outer position might also be more granular e.g. using
nw,n,ne, instead of justtop, and the like for all the other “directions”. - I am not sure about “illegible” for shape. This is the shape of the sign used to represent a correction.
- I am not sure about “upper” in upper part question/exclamation: it refers to position or sign?
-
Strictly speaking, for a collection it is not necessary that the witnesses belong to a single textual carrier. In theory, one could even envisage a case where someone is planning a collection by picking texts from different carriers. The definition quoted here anyway refers to the most typical case. ↩