Corpus Data Model

The GospeLib corpus is a collection of JSON files that represent scripture texts, lexicons, topical guides, and more. All files are validated by Pydantic models before being ingested into FalkorDB. This guide explains the seven schema families and their structure.

Design Principles

Every file declares its own type — The schema field is required. No heuristic detection needed.
All cross-references are structured objects — No human-formatted reference strings. Every reference is a PassageRef object.
bookId is always a canonical slug — e.g., "gen", "1-ne", "dc". No integers or display titles as identifiers.
Optional fields are absent, not null — Consumers test with !== undefined, not !== null.
camelCase throughout — All JSON keys use camelCase.
Controlled enumerations — Fields like pos, corpus, language use defined enums.
Verse enrichments are additive and co-resident — A single Verse can carry witnesses, words, and notes simultaneously.

Schema Families

Schema	File Pattern	Purpose
`scripture-text`	`corpus/{bookId}.json`	Primary scripture text with optional witnesses, interlinear words, and notes
`lexicon`	`lexicon/{range}.json`	Hebrew/Greek/Aramaic lexicon entries keyed by Strong's number
`topical-guide`	`tg/{letter}.json`	Topical Guide entries with passage citations
`bible-dictionary`	`bd/{letter}.json`	Bible Dictionary articles
`scripture-index`	`index/{letter}.json`	Triple Combination index entries
`verse-commentary`	`commentary/{commentaryId}/{bookId}.json`	Verse-level commentary and footnotes
`scholarly-commentary`	`scholarly/{commentaryId}.json`	Scholarly commentary (Clarke, BYU, etc.)

Shared Types

Three types appear across multiple schemas:

PassageRef

The canonical representation of a scripture reference:

{ "bookId": "gen", "chapter": 1, "verse": 1 }
{ "bookId": "dc", "chapter": 84, "verse": 26, "verseEnd": 27 }
{ "bookId": "num", "chapter": 16, "chapterEnd": 18 }

verseEnd requires verse to be present and must be ≥ verse
chapterEnd must be ≥ chapter
Absent verse means a chapter-level reference

SeeAlsoLink

A typed, discriminated-union link to a related resource:

{ "type": "topic",   "topicId": "tg:angels",    "title": "Angels" }
{ "type": "article", "articleId": "bd:angels",   "title": "Angels" }
{ "type": "passage", "ref": { "bookId": "gen", "chapter": 1, "verse": 1 } }
{ "type": "person",  "personId": "person:aaron.1", "name": "Aaron" }
{ "type": "place",   "placeId": "place:ammonihah", "name": "Ammonihah" }

Every element has an explicit type field — no positional inference.

NoteAnchor

Locates a footnote's attachment point within a verse:

{ "wordIndex": 13, "charOffset": 67, "word": "Egypt," }

Scripture Text (`scripture-text`)

The primary schema for any biblical, deuterocanonical, or pseudepigraphical text. Files live at corpus/{bookId}.json.

Structure

ScriptureTextFile
├── schema: "scripture-text"
├── bookId, title, abbreviation, corpus, language
├── introduction? (Markdown)
└── chapters[]
    └── Chapter
        ├── chapter (number)
        ├── heading? (string)
        └── verses[]
            └── Verse
                ├── verse, text
                ├── sourceText?, sourceTranslit?
                ├── witnesses?[]     (manuscript evidence)
                ├── words?[]         (interlinear alignment)
                └── notes?[]         (scholarly footnotes)

Witness

Manuscript evidence in a specific language:

{
  "language": "aramaic",
  "script": "hebrew",
  "text": "…]חנך לבח֯ירין …",
  "witness": "4QEn^a",
  "edition": "Milik 1976",
  "hasLacunae": true,
  "isPartial": true
}

WordAlignment

A single word token aligned between source language and English:

{ "order": 0, "gloss": "In the beginning", "strongs": "H7225", "token": "בְּרֵאשִׁ֖ית" }

Strong's numbers are always normalized: letter + 4-digit zero-padded (e.g., H0430, G0056).

Lexicon (`lexicon`)

Hebrew, Greek, and Aramaic lexicon entries keyed by Strong's number. Files live at lexicon/{range}.json.

Structure

LexiconFile
├── schema: "lexicon"
├── language, range: { from, to }
└── entries: { [strongs]: LexiconEntry }
    └── LexiconEntry
        ├── strongs, original, translit, pronunciation
        ├── pos (controlled enum), posRaw
        ├── glosses[] (English equivalents)
        ├── definition: { short, senses[] }
        ├── derivation: { description, roots[] }
        ├── occurrences?, translations?[]
        └── related?[] (Strong's cross-refs)

The derivation.roots array is pre-extracted for graph traversal — consumers can follow etymology chains without regex parsing.

Topical Guide (`topical-guide`)

Topical Guide entries organized by letter. Files live at tg/{letter}.json.

Each entry has a topicId (e.g., "tg:angels"), a title, optional seeAlso links, and an array of passage citations.

Bible Dictionary (`bible-dictionary`)

Bible Dictionary articles organized by letter. Files live at bd/{letter}.json.

Each entry has an articleId, title, content (Markdown body), and optional seeAlso links.

Scripture Index (`scripture-index`)

Triple Combination index entries. Files live at index/{letter}.json.

Entries use a discriminated union for different link types (passage, topic, article, person, place).

Verse Commentary (`verse-commentary`)

Verse-level commentary from specific sources. Files live at commentary/{commentaryId}/{bookId}.json.

Each file contains notes anchored to specific verses using NoteAnchor.

Scholarly Commentary (`scholarly-commentary`)

Full scholarly commentary works (Clarke, BYU, etc.). Files live at scholarly/{commentaryId}.json.

Organized into sections with headings and body content.

Controlled Vocabularies

Enum	Values
CorpusType	`ot`, `nt`, `bom`, `dc`, `pgp`, `pseudepigrapha`, `apocrypha`, `deuterocanonical`
WitnessLanguage	`hebrew`, `greek`, `aramaic`, `ethiopic`, `latin`, `syriac`, `coptic`
ScriptType	`hebrew`, `greek`, `ethiopic`, `latin`, `syriac`, `coptic`
PartOfSpeech	`noun.masculine`, `noun.feminine`, `verb`, `adjective`, `adverb`, etc.

File Naming

Book IDs use canonical kebab-case slugs: gen, 1-ne, dc, w-of-m
Passage IDs follow {bookId}.{chapter}.{verse}: gen.1.1, 1-ne.3.7
Lexicon files use Strong's ranges: H0001-H1000.json
Topical Guide and Bible Dictionary use single letters: a.json, b.json

tip

The data/book_registry.json file contains the complete mapping of book IDs to titles, abbreviations, corpus types, and chapter counts.

Design Principles​

Schema Families​

Shared Types​

PassageRef​

SeeAlsoLink​

NoteAnchor​

Scripture Text (scripture-text)​

Structure​

Witness​

WordAlignment​

Lexicon (lexicon)​

Structure​

Topical Guide (topical-guide)​

Bible Dictionary (bible-dictionary)​

Scripture Index (scripture-index)​

Verse Commentary (verse-commentary)​

Scholarly Commentary (scholarly-commentary)​

Controlled Vocabularies​

File Naming​

Design Principles

Schema Families

Shared Types

PassageRef

SeeAlsoLink

NoteAnchor

Scripture Text (`scripture-text`)

Structure

Witness

WordAlignment

Lexicon (`lexicon`)

Structure

Topical Guide (`topical-guide`)

Bible Dictionary (`bible-dictionary`)

Scripture Index (`scripture-index`)

Verse Commentary (`verse-commentary`)

Scholarly Commentary (`scholarly-commentary`)

Controlled Vocabularies

File Naming