Recommended Adoption Plan
A three-phase plan for integrating external data sources, ordered by criticality and dependency.
Phase 1 — Critical (Immediate)
Fill the most critical gaps with the highest-quality, most permissively licensed sources.
| Source | Datasets | Gaps Filled | License | Complexity |
|---|---|---|---|---|
| STEPBible-Data | TAHOT, TAGNT | Morphological tags + source tokens (Hebrew & Greek) | CC BY 4.0 | Medium — TSV parsing + enrichment of existing interlinear nodes |
| lxx-swete | Full Swete LXX | Septuagint text | Public domain (likely) | Low — simple TSV, new Translation in FalkorDB |
| STEPBible-Data | TVTMS | Versification mapping | CC BY 4.0 | Medium — new edge type in graph |
| Clear-Bible MACULA (optional) | macula-greek, macula-hebrew | Enhanced morphology + syntax trees + semantic roles | CC BY 4.0 (partial) | Medium — XML/TSV parsing + new :SyntaxNode graph structure |
Note on MACULA: MACULA offers richer data than STEPBible for morphology (syntax trees, semantic roles, coreference) but requires legal review on semantic domain data (MARBLE/UBS terms are "used with permission"). The core morphology + syntax data is CC BY 4.0 and can be adopted independently. If legal review clears the semantic domain data, MACULA could become the primary morphological source, with STEPBible as fallback.
Prerequisites: None — all data is publicly available on GitHub.
Estimated new ingest pipeline stages: 2 (morphology enrichment, LXX ingestion). Versification can be a sub-stage of morphology enrichment.
Impact: Resolves the two Critical gaps (morphology, source tokens) plus one High gap (Septuagint). Transforms the interlinear feature from placeholder to fully functional scholarly tool.
Phase 2 — High Priority
Important enhancements requiring moderate integration work.
| Source | Datasets | Gaps Filled | License | Complexity |
|---|---|---|---|---|
| scrollmapper | cross_references table | Cross-references (~340K) | MIT | Low — SQLite/JSON, new edge type |
| scrollmapper | Vulgate translation tables | Vulgate (5 variants) | MIT | Low — fits existing translation pipeline |
| STEPBible-Data | TIPNR | Person names + place geocoding | CC BY 4.0 | Medium — new node types :Person, :Place |
| MorphGNT | SBLGNT morphology | Cross-validation of Greek NT morphology | CC-BY-SA 3.0 | Low — enrichment data, check share-alike |
| OpenScriptures | morphhb | Cross-validation of Hebrew OT morphology | CC BY 4.0 | Low — enrichment data |
Prerequisites: Phase 1 morphology enrichment stage completed (provides the integration pattern). FalkorDB schema extended for :Person and :Place nodes.
Estimated new ingest pipeline stages: 2 (cross-references, people/places).
Impact: Resolves cross-references (High), Vulgate (Medium), and people/places (Medium). Adds significant navigational and scholarly depth.
Phase 3 — Enhancement
Nice-to-have additions for future consideration.
| Source | Datasets | Gaps Filled | License | Complexity |
|---|---|---|---|---|
| ebible.org | Curated translations (20–30) | Additional translations | Per-translation | Medium — USFM parsing infrastructure |
| berean.bible | BSB + Translation Tables | Modern translation + interlinear alignment | Free | Medium — xlsx/tsv parsing |
| viz.bible | Events database | Timeline features | Request-based | Medium — new :Event node type |
| biblicalhumanities | Dodson Lexicon | Greek lexicon supplement | CC0 | Low — merge with existing entries |
| scrollmapper | Additional translations | Multilingual translations | MIT | Low — fits existing pipeline |
| unfoldingWord | usfm-js, wordMAP | USFM tooling for future ingestion | CC BY-SA 4.0 | Low — tooling dependency |
| ETCBC/dss | Text-Fabric DSS corpus | Dead Sea Scrolls text | MIT | High — Text-Fabric ETL + new node types |
| SEDRA IV + Sefaria | REST API + bulk export | Aramaic lexicon (composite) | Apache 2.0 / CC-BY-NC | Medium — API consumption + license review |
| CrossWire SWORD | Commentary modules | Public-domain commentaries (~10) | Public domain | Medium — SWORD→OSIS→JSON pipeline |
| Clear-Bible MACULA | Syntax trees + semantic roles | Discourse analysis | CC BY 4.0 (partial) | Medium — XML parsing + new graph structure |
Prerequisites: Phase 2 complete. USFM parsing infrastructure built (needed for ebible.org and unfoldingWord sources).
Impact: Broadens translation coverage and adds supplementary scholarly tools. Good for internationalization and advanced features.