Corpus Data Model
The GospeLib corpus is a collection of JSON files that represent scripture texts, lexicons, topical guides, and more. All files are validated by Pydantic models before being ingested into FalkorDB. This guide explains the seven schema families and their structure.
Design Principles
- Every file declares its own type — The
schemafield is required. No heuristic detection needed. - All cross-references are structured objects — No human-formatted reference strings. Every reference is a
PassageRefobject. bookIdis always a canonical slug — e.g.,"gen","1-ne","dc". No integers or display titles as identifiers.- Optional fields are absent, not null — Consumers test with
!== undefined, not!== null. - camelCase throughout — All JSON keys use camelCase.
- Controlled enumerations — Fields like
pos,corpus,languageuse defined enums. - Verse enrichments are additive and co-resident — A single
Versecan carrywitnesses,words, andnotessimultaneously.
Schema Families
| Schema | File Pattern | Purpose |
|---|---|---|
scripture-text | corpus/{bookId}.json | Primary scripture text with optional witnesses, interlinear words, and notes |
lexicon | lexicon/{range}.json | Hebrew/Greek/Aramaic lexicon entries keyed by Strong's number |
topical-guide | tg/{letter}.json | Topical Guide entries with passage citations |
bible-dictionary | bd/{letter}.json | Bible Dictionary articles |
scripture-index | index/{letter}.json | Triple Combination index entries |
verse-commentary | commentary/{commentaryId}/{bookId}.json | Verse-level commentary and footnotes |
scholarly-commentary | scholarly/{commentaryId}.json | Scholarly commentary (Clarke, BYU, etc.) |
Shared Types
Three types appear across multiple schemas:
PassageRef
The canonical representation of a scripture reference:
{ "bookId": "gen", "chapter": 1, "verse": 1 }
{ "bookId": "dc", "chapter": 84, "verse": 26, "verseEnd": 27 }
{ "bookId": "num", "chapter": 16, "chapterEnd": 18 }
verseEndrequiresverseto be present and must be ≥versechapterEndmust be ≥chapter- Absent
versemeans a chapter-level reference
SeeAlsoLink
A typed, discriminated-union link to a related resource:
{ "type": "topic", "topicId": "tg:angels", "title": "Angels" }
{ "type": "article", "articleId": "bd:angels", "title": "Angels" }
{ "type": "passage", "ref": { "bookId": "gen", "chapter": 1, "verse": 1 } }
{ "type": "person", "personId": "person:aaron.1", "name": "Aaron" }
{ "type": "place", "placeId": "place:ammonihah", "name": "Ammonihah" }
Every element has an explicit type field — no positional inference.
NoteAnchor
Locates a footnote's attachment point within a verse:
{ "wordIndex": 13, "charOffset": 67, "word": "Egypt," }
Scripture Text (scripture-text)
The primary schema for any biblical, deuterocanonical, or pseudepigraphical text. Files live at corpus/{bookId}.json.
Structure
ScriptureTextFile
├── schema: "scripture-text"
├── bookId, title, abbreviation, corpus, language
├── introduction? (Markdown)
└── chapters[]
└── Chapter
├── chapter (number)
├── heading? (string)
└── verses[]
└── Verse
├── verse, text
├── sourceText?, sourceTranslit?
├── witnesses?[] (manuscript evidence)
├── words?[] (interlinear alignment)
└── notes?[] (scholarly footnotes)
Witness
Manuscript evidence in a specific language:
{
"language": "aramaic",
"script": "hebrew",
"text": "…]חנך לבח֯ירין …",
"witness": "4QEn^a",
"edition": "Milik 1976",
"hasLacunae": true,
"isPartial": true
}
WordAlignment
A single word token aligned between source language and English:
{ "order": 0, "gloss": "In the beginning", "strongs": "H7225", "token": "בְּרֵאשִׁ֖ית" }
Strong's numbers are always normalized: letter + 4-digit zero-padded (e.g., H0430, G0056).
Lexicon (lexicon)
Hebrew, Greek, and Aramaic lexicon entries keyed by Strong's number. Files live at lexicon/{range}.json.
Structure
LexiconFile
├── schema: "lexicon"
├── language, range: { from, to }
└── entries: { [strongs]: LexiconEntry }
└── LexiconEntry
├── strongs, original, translit, pronunciation
├── pos (controlled enum), posRaw
├── glosses[] (English equivalents)
├── definition: { short, senses[] }
├── derivation: { description, roots[] }
├── occurrences?, translations?[]
└── related?[] (Strong's cross-refs)
The derivation.roots array is pre-extracted for graph traversal — consumers can follow etymology chains without regex parsing.
Topical Guide (topical-guide)
Topical Guide entries organized by letter. Files live at tg/{letter}.json.
Each entry has a topicId (e.g., "tg:angels"), a title, optional seeAlso links, and an array of passage citations.
Bible Dictionary (bible-dictionary)
Bible Dictionary articles organized by letter. Files live at bd/{letter}.json.
Each entry has an articleId, title, content (Markdown body), and optional seeAlso links.
Scripture Index (scripture-index)
Triple Combination index entries. Files live at index/{letter}.json.
Entries use a discriminated union for different link types (passage, topic, article, person, place).
Verse Commentary (verse-commentary)
Verse-level commentary from specific sources. Files live at commentary/{commentaryId}/{bookId}.json.
Each file contains notes anchored to specific verses using NoteAnchor.
Scholarly Commentary (scholarly-commentary)
Full scholarly commentary works (Clarke, BYU, etc.). Files live at scholarly/{commentaryId}.json.
Organized into sections with headings and body content.
Controlled Vocabularies
| Enum | Values |
|---|---|
| CorpusType | ot, nt, bom, dc, pgp, pseudepigrapha, apocrypha, deuterocanonical |
| WitnessLanguage | hebrew, greek, aramaic, ethiopic, latin, syriac, coptic |
| ScriptType | hebrew, greek, ethiopic, latin, syriac, coptic |
| PartOfSpeech | noun.masculine, noun.feminine, verb, adjective, adverb, etc. |
File Naming
- Book IDs use canonical kebab-case slugs:
gen,1-ne,dc,w-of-m - Passage IDs follow
{bookId}.{chapter}.{verse}:gen.1.1,1-ne.3.7 - Lexicon files use Strong's ranges:
H0001-H1000.json - Topical Guide and Bible Dictionary use single letters:
a.json,b.json
The data/book_registry.json file contains the complete mapping of book IDs to titles, abbreviations, corpus types, and chapter counts.