Mock Example Dump 1
Counts of token spans and other spans, grouped by document:
select 'tok' as type,count(s.id), d.author
from span s
inner join "document" d ON s.document_id = d.id
where s."type"='tok'
group by d.author
union
select 'x' as type, count(s.id), d.author
from span s
inner join "document" d ON s.document_id = d.id
where s."type"<>'tok'
group by d.author
order by type, author
type | count | author |
---|---|---|
tok | 74 | Catullus |
tok | 109 | Horatius |
x | 23 | Catullus |
x | 33 | Horatius |
Catullus
The unique div
inside the TEI text element is reported below:
<div type="poem" n="84">
<head>ad Arrium</head>
<lg met="eleg" n="1">
<l met="6da^" n="1"><quote>chommoda</quote> dicebat, si quando commoda vellet</l>
<l met="pent" n="2">dicere, et insidias <persName>Arrius</persName> <quote>hinsidias</quote>,</l>
</lg>
<lg met="eleg" n="2">
<l met="6da^" n="3">et tum mirifice sperabat se esse locutum,</l>
<l met="pent" n="4">cum quantum poterat dixerat <quote>hinsidias</quote>.</l>
</lg>
<lg met="eleg" n="3">
<l met="6da^" n="5">credo, sic mater, sic liber avunculus eius</l>
<l met="pent" n="6">sic maternus avus dixerat atque avia.</l>
</lg>
<lg met="eleg" n="4">
<l met="6da^" n="7">hoc misso in <geogName>Syriam</geogName> requierant omnibus
aures:</l>
<l met="pent" n="8">audibant eadem haec leniter et leviter,</l>
</lg>
<lg met="eleg" n="5">
<l met="6da^" n="9">nec sibi postilla metuebant talia verba,</l>
<l met="pent" n="10">cum subito affertur nuntius horribilis,</l>
</lg>
<lg met="eleg" n="6">
<l met="6da^" n="11"><geogName>Ionios</geogName> fluctus, postquam illuc <persName>Arrius</persName> isset,</l>
<l met="pent" n="12">iam non <geogName>Ionios</geogName> esse sed <quote><geogName>Hionios</geogName></quote>.</l>
</lg>
</div>
In the following text we represent just the tokens prefixed by their ordinal:
1ad 2Arrium
3chommoda 4dicebat, 5si 6quando 7commoda 8vellet
9dicere, 10et 11insidias 12Arrius 13hinsidias,
14et 15tum 16mirifice 17sperabat 18se 19esse 20locutum,
21cum 22quantum 23poterat 24dixerat 25hinsidias.
26credo, 27sic 28mater, 29sic 30liber 31avunculus 32eius
33sic 34maternus 35avus 36dixerat 37atque 38avia.
39hoc 40misso 41in 42Syriam 43requierant 44omnibus 45aures:
46audibant 47eadem 48haec 49leniter 50et 51leviter,
52nec 53sibi 54postilla 55metuebant 56talia 57verba,
58cum 59subito 60affertur 61nuntius 62horribilis,
63Ionios 64fluctus, 65postquam 66illuc 67Arrius 68isset,
69iam 70non 71Ionios 72esse 73sed 74Hionios.
So here we have (numbers refer to tokens ordinals):
- 1 poem (the
div
; 1-74). - 6 strophes (
lg
): 3-13, 14-25, 26-38, 39-51, 52-62, 63-74. - 12 verses (
l
): 3-8, 9-13, 14-20, 21-25, 26-32, 33-38, 39-45, 46-51, 52-57, 58-62, 63-68, 69-74. - 74 tokens.
- 4 sentences:
- “ad Arrium” (1-2).
- vv.1-4 “chommoda… hinsidias”: 3-25.
- vv.5-6 “credo… avia”: 26-38.
- vv.7-12 “hoc misso… Hionios”: 39-74.
The only text outside metrical structures is the title in head
.
Tokens
select id, p1, value, lemma, pos,
index, length,
lemma_id, word_id
from span
where type='tok' and document_id=1
order by p1;
id | p1 | value | lemma | pos | index | length | lemma_id | word_id |
---|---|---|---|---|---|---|---|---|
1 | 1 | ad | ad | ADP | 1019 | 2 | 1 | 1 |
2 | 2 | arrium | arrium | NOUN | 1022 | 6 | 4 | 4 |
3 | 3 | chommoda | chommodus | NOUN | 1123 | 8 | 20 | 20 |
4 | 4 | dicebat | dico | VERB | 1140 | 8 | 29 | 33 |
5 | 5 | si | si | SCONJ | 1149 | 2 | 122 | 136 |
6 | 6 | quando | quando | SCONJ | 1152 | 6 | 108 | 121 |
7 | 7 | commoda | commodus | ADJ | 1159 | 7 | 23 | 23 |
8 | 8 | vellet | volo | VERB | 1167 | 6 | 149 | 159 |
9 | 9 | dicere | dico | VERB | 1219 | 7 | 29 | 34 |
10 | 10 | et | et | CCONJ | 1227 | 2 | 36 | 45 |
11 | 11 | insidias | insidius | NOUN | 1230 | 8 | 56 | 65 |
12 | 12 | arrius | arrius | NOUN | 1249 | 6 | 5 | 6 |
13 | 13 | hinsidias | hinsidius | NOUN | 1274 | 9 | 47 | 56 |
14 | 14 | et | et | CCONJ | 1400 | 2 | 36 | 45 |
15 | 15 | tum | tum | ADV | 1403 | 3 | 139 | 153 |
16 | 16 | mirifice | mirifice | ADV | 1407 | 8 | 79 | 92 |
17 | 17 | sperabat | spero | VERB | 1416 | 8 | 127 | 142 |
18 | 18 | se | se | PRON | 1425 | 2 | 120 | 133 |
19 | 19 | esse | sum | AUX | 1428 | 4 | 130 | 44 |
20 | 20 | locutum | loquor | VERB | 1433 | 8 | 72 | 80 |
21 | 21 | cum | cum | SCONJ | 1487 | 3 | 27 | 28 |
22 | 22 | quantum | quantum | ADV | 1491 | 7 | 109 | 122 |
23 | 23 | poterat | possum | VERB | 1499 | 7 | 99 | 115 |
24 | 24 | dixerat | dico | VERB | 1507 | 7 | 29 | 36 |
25 | 25 | hinsidias | hinsidia | NOUN | 1522 | 9 | 46 | 55 |
26 | 26 | credo | credo | VERB | 1648 | 6 | 25 | 25 |
27 | 27 | sic | sic | ADV | 1655 | 3 | 123 | 138 |
28 | 28 | mater | mater | NOUN | 1659 | 6 | 75 | 85 |
29 | 29 | sic | sic | ADV | 1666 | 3 | 123 | 138 |
30 | 30 | liber | liber | ADJ | 1670 | 5 | 69 | 78 |
31 | 31 | avunculus | avunculus | NOUN | 1676 | 9 | 10 | 11 |
32 | 32 | eius | is | PRON | 1686 | 4 | 61 | 41 |
33 | 33 | sic | sic | ADV | 1736 | 3 | 123 | 138 |
34 | 34 | maternus | maternus | ADJ | 1740 | 8 | 76 | 86 |
35 | 35 | avus | avus | NOUN | 1749 | 4 | 11 | 12 |
36 | 36 | dixerat | dico | VERB | 1754 | 7 | 29 | 36 |
37 | 37 | atque | atque | CCONJ | 1762 | 5 | 6 | 7 |
38 | 38 | avia | avia | NOUN | 1768 | 5 | 9 | 10 |
39 | 39 | hoc | hic | DET | 1881 | 3 | 44 | 58 |
40 | 40 | misso | mitto | VERB | 1885 | 5 | 80 | 93 |
41 | 41 | in | in | ADP | 1891 | 2 | 55 | 64 |
42 | 42 | syriam | syrius | NOUN | 1904 | 6 | 131 | 144 |
43 | 43 | requierant | requiero | VERB | 1922 | 10 | 114 | 127 |
44 | 44 | omnibus | omnis | DET | 1933 | 7 | 91 | 104 |
45 | 45 | aures | auris | NOUN | 1966 | 5 | 8 | 9 |
46 | 46 | audibant | audibo | VERB | 2017 | 8 | 7 | 8 |
47 | 47 | eadem | idem | DET | 2026 | 5 | 52 | 39 |
48 | 48 | haec | hic | DET | 2032 | 4 | 44 | 53 |
49 | 49 | leniter | leniter | ADV | 2037 | 7 | 65 | 74 |
50 | 50 | et | et | CCONJ | 2045 | 2 | 36 | 45 |
51 | 51 | leviter | leviter | ADV | 2048 | 8 | 67 | 77 |
52 | 52 | nec | nec | CCONJ | 2164 | 3 | 84 | 97 |
53 | 53 | sibi | se | PRON | 2168 | 4 | 120 | 137 |
54 | 54 | postilla | postilla | NOUN | 2173 | 8 | 101 | 112 |
55 | 55 | metuebant | metuo | VERB | 2182 | 9 | 77 | 89 |
56 | 56 | talia | talis | DET | 2192 | 5 | 132 | 145 |
57 | 57 | verba | verba | NOUN | 2198 | 6 | 145 | 160 |
58 | 58 | cum | cum | SCONJ | 2251 | 3 | 27 | 28 |
59 | 59 | subito | subito | ADV | 2255 | 6 | 129 | 143 |
60 | 60 | affertur | affero | VERB | 2262 | 8 | 3 | 3 |
61 | 61 | nuntius | nunte | ADV | 2271 | 7 | 90 | 103 |
62 | 62 | horribilis | horribilis | ADJ | 2279 | 11 | 49 | 59 |
63 | 63 | ionios | ionius | NOUN | 2409 | 6 | 59 | 69 |
64 | 64 | fluctus | fluctus | VERB | 2427 | 8 | 40 | 48 |
65 | 65 | postquam | postquam | SCONJ | 2436 | 8 | 102 | 113 |
66 | 66 | illuc | illuc | ADV | 2445 | 5 | 53 | 62 |
67 | 67 | arrius | arrius | ADV | 2461 | 6 | 5 | 5 |
68 | 68 | isset | issum | VERB | 2479 | 6 | 62 | 71 |
69 | 69 | iam | iam | ADV | 2532 | 3 | 51 | 61 |
70 | 70 | non | non | PART | 2536 | 3 | 87 | 100 |
71 | 71 | ionios | ionius | ADJ | 2550 | 6 | 59 | 68 |
72 | 72 | esse | sum | AUX | 2568 | 4 | 130 | 44 |
73 | 73 | sed | sed | CCONJ | 2573 | 3 | 121 | 134 |
74 | 74 | hionios | hionius | ADJ | 2594 | 7 | 48 | 57 |
As you can see P1 is the ordinal token position. P2 is always equal to P1 for tokens, so it’s not reported here. Lemma, word ID and lemma ID have been added by postprocessing spans. POS is the result of a UDPipe Latin tagger, while index and length are the character-based position of the portion of text corresponding to each token in the source text.
Some of the POS tagger results are unreliable (
requiero
instead ofrequiesco
,chommoda
and other H-forms -really not existing- vs.commoda
and its oscillation between noun and adjective, etc.), but in most cases it is correct.
Structures
select id, type, p1, p2, text, index, length
from span
where type<>'tok' and document_id=1
order by p1;
id | type | p1 | p2 | text | index | length |
---|---|---|---|---|---|---|
94 | snt | 1 | 2 | ad Arrium | 847 | 181 |
75 | div | 1 | 74 | ad Arrium chommoda dicebat, si quando commoda vellet dicere…illuc Arrius isset, iam non Ionios esse sed Hionios. | 971 | 995 |
82 | l | 3 | 8 | chommoda dicebat, si quando commoda vellet | 1096 | 42 |
76 | lg | 3 | 13 | chommoda dicebat, si quando commoda vellet dicere, et insidias Arrius hinsidias, | 1053 | 138 |
95 | snt | 3 | 25 | chommoda dicebat, si quando commoda vellet dicere…se esse locutum, cum quantum poterat dixerat hinsidias | 1123 | 417 |
83 | l | 9 | 13 | dicere, et insidias Arrius hinsidias, | 1199 | 37 |
77 | lg | 14 | 25 | et tum mirifice sperabat se esse locutum, cum quantum poterat dixerat hinsidias. | 1337 | 138 |
84 | l | 14 | 20 | et tum mirifice sperabat se esse locutum, | 1380 | 41 |
85 | l | 21 | 25 | cum quantum poterat dixerat hinsidias. | 1467 | 38 |
86 | l | 26 | 32 | credo, sic mater, sic liber avunculus eius | 1628 | 42 |
78 | lg | 26 | 38 | credo, sic mater, sic liber avunculus eius sic maternus avus dixerat atque avia. | 1585 | 138 |
96 | snt | 26 | 38 | credo, sic mater, sic liber avunculus eius sic maternus avus dixerat atque avia. | 1648 | 125 |
87 | l | 33 | 38 | sic maternus avus dixerat atque avia. | 1716 | 37 |
88 | l | 39 | 45 | hoc misso in Syriam requierant omnibus aures: | 1861 | 69 |
97 | snt | 39 | 74 | hoc misso in Syriam requierant omnibus aures…postquam illuc Arrius isset, iam non Ionios esse sed Hionios | 1881 | 741 |
79 | lg | 39 | 51 | hoc misso in Syriam requierant omnibus aures: audibant eadem haec leniter et leviter, | 1818 | 167 |
89 | l | 46 | 51 | audibant eadem haec leniter et leviter, | 1998 | 39 |
80 | lg | 52 | 62 | nec sibi postilla metuebant talia verba, cum subito affertur nuntius horribilis, | 2102 | 138 |
90 | l | 52 | 57 | nec sibi postilla metuebant talia verba, | 2145 | 40 |
91 | l | 58 | 62 | cum subito affertur nuntius horribilis, | 2231 | 39 |
81 | lg | 63 | 74 | Ionios fluctus, postquam illuc Arrius isset, iam non Ionios esse sed Hionios. | 2336 | 135 |
92 | l | 63 | 68 | Ionios fluctus, postquam illuc Arrius isset, | 2379 | 44 |
93 | l | 69 | 74 | iam non Ionios esse sed Hionios. | 2512 | 32 |
Note that structures have no value, but they have a text, used as a human-friendly label and consisting in the first and last portions of its source text, or in the full text when it’s short enough.