Skip to main content

Corpus Data Model

The GospeLib corpus is a collection of JSON files that represent scripture texts, lexicons, topical guides, and more. All files are validated by Pydantic models before being ingested into FalkorDB. This guide explains the seven schema families and their structure.

Design Principles

  1. Every file declares its own type — The schema field is required. No heuristic detection needed.
  2. All cross-references are structured objects — No human-formatted reference strings. Every reference is a PassageRef object.
  3. bookId is always a canonical slug — e.g., "gen", "1-ne", "dc". No integers or display titles as identifiers.
  4. Optional fields are absent, not null — Consumers test with !== undefined, not !== null.
  5. camelCase throughout — All JSON keys use camelCase.
  6. Controlled enumerations — Fields like pos, corpus, language use defined enums.
  7. Verse enrichments are additive and co-resident — A single Verse can carry witnesses, words, and notes simultaneously.

Schema Families

SchemaFile PatternPurpose
scripture-textcorpus/{bookId}.jsonPrimary scripture text with optional witnesses, interlinear words, and notes
lexiconlexicon/{range}.jsonHebrew/Greek/Aramaic lexicon entries keyed by Strong's number
topical-guidetg/{letter}.jsonTopical Guide entries with passage citations
bible-dictionarybd/{letter}.jsonBible Dictionary articles
scripture-indexindex/{letter}.jsonTriple Combination index entries
verse-commentarycommentary/{commentaryId}/{bookId}.jsonVerse-level commentary and footnotes
scholarly-commentaryscholarly/{commentaryId}.jsonScholarly commentary (Clarke, BYU, etc.)

Shared Types

Three types appear across multiple schemas:

PassageRef

The canonical representation of a scripture reference:

{ "bookId": "gen", "chapter": 1, "verse": 1 }
{ "bookId": "dc", "chapter": 84, "verse": 26, "verseEnd": 27 }
{ "bookId": "num", "chapter": 16, "chapterEnd": 18 }
  • verseEnd requires verse to be present and must be ≥ verse
  • chapterEnd must be ≥ chapter
  • Absent verse means a chapter-level reference

A typed, discriminated-union link to a related resource:

{ "type": "topic", "topicId": "tg:angels", "title": "Angels" }
{ "type": "article", "articleId": "bd:angels", "title": "Angels" }
{ "type": "passage", "ref": { "bookId": "gen", "chapter": 1, "verse": 1 } }
{ "type": "person", "personId": "person:aaron.1", "name": "Aaron" }
{ "type": "place", "placeId": "place:ammonihah", "name": "Ammonihah" }

Every element has an explicit type field — no positional inference.

NoteAnchor

Locates a footnote's attachment point within a verse:

{ "wordIndex": 13, "charOffset": 67, "word": "Egypt," }

Scripture Text (scripture-text)

The primary schema for any biblical, deuterocanonical, or pseudepigraphical text. Files live at corpus/{bookId}.json.

Structure

ScriptureTextFile
├── schema: "scripture-text"
├── bookId, title, abbreviation, corpus, language
├── introduction? (Markdown)
└── chapters[]
└── Chapter
├── chapter (number)
├── heading? (string)
└── verses[]
└── Verse
├── verse, text
├── sourceText?, sourceTranslit?
├── witnesses?[] (manuscript evidence)
├── words?[] (interlinear alignment)
└── notes?[] (scholarly footnotes)

Witness

Manuscript evidence in a specific language:

{
"language": "aramaic",
"script": "hebrew",
"text": "…]חנך לבח֯ירין …",
"witness": "4QEn^a",
"edition": "Milik 1976",
"hasLacunae": true,
"isPartial": true
}

WordAlignment

A single word token aligned between source language and English:

{ "order": 0, "gloss": "In the beginning", "strongs": "H7225", "token": "בְּרֵאשִׁ֖ית" }

Strong's numbers are always normalized: letter + 4-digit zero-padded (e.g., H0430, G0056).

Lexicon (lexicon)

Hebrew, Greek, and Aramaic lexicon entries keyed by Strong's number. Files live at lexicon/{range}.json.

Structure

LexiconFile
├── schema: "lexicon"
├── language, range: { from, to }
└── entries: { [strongs]: LexiconEntry }
└── LexiconEntry
├── strongs, original, translit, pronunciation
├── pos (controlled enum), posRaw
├── glosses[] (English equivalents)
├── definition: { short, senses[] }
├── derivation: { description, roots[] }
├── occurrences?, translations?[]
└── related?[] (Strong's cross-refs)

The derivation.roots array is pre-extracted for graph traversal — consumers can follow etymology chains without regex parsing.

Topical Guide (topical-guide)

Topical Guide entries organized by letter. Files live at tg/{letter}.json.

Each entry has a topicId (e.g., "tg:angels"), a title, optional seeAlso links, and an array of passage citations.

Bible Dictionary (bible-dictionary)

Bible Dictionary articles organized by letter. Files live at bd/{letter}.json.

Each entry has an articleId, title, content (Markdown body), and optional seeAlso links.

Scripture Index (scripture-index)

Triple Combination index entries. Files live at index/{letter}.json.

Entries use a discriminated union for different link types (passage, topic, article, person, place).

Verse Commentary (verse-commentary)

Verse-level commentary from specific sources. Files live at commentary/{commentaryId}/{bookId}.json.

Each file contains notes anchored to specific verses using NoteAnchor.

Scholarly Commentary (scholarly-commentary)

Full scholarly commentary works (Clarke, BYU, etc.). Files live at scholarly/{commentaryId}.json.

Organized into sections with headings and body content.

Controlled Vocabularies

EnumValues
CorpusTypeot, nt, bom, dc, pgp, pseudepigrapha, apocrypha, deuterocanonical
WitnessLanguagehebrew, greek, aramaic, ethiopic, latin, syriac, coptic
ScriptTypehebrew, greek, ethiopic, latin, syriac, coptic
PartOfSpeechnoun.masculine, noun.feminine, verb, adjective, adverb, etc.

File Naming

  • Book IDs use canonical kebab-case slugs: gen, 1-ne, dc, w-of-m
  • Passage IDs follow {bookId}.{chapter}.{verse}: gen.1.1, 1-ne.3.7
  • Lexicon files use Strong's ranges: H0001-H1000.json
  • Topical Guide and Bible Dictionary use single letters: a.json, b.json
tip

The data/book_registry.json file contains the complete mapping of book IDs to titles, abbreviations, corpus types, and chapter counts.