STEPBible-Data
- Repository: tyndale/STEPBible-Data
- Maintainer: Tyndale House, Cambridge (academic institution)
- License: CC BY 4.0 — allows commercial and non-commercial use, modification, and redistribution with attribution. Fully compatible with open-source distribution.
- Suitability Score: ⭐⭐⭐⭐⭐ (5/5)
Coverage
Format: Tab-separated values (TSV) files. Each dataset is a single large file or a set of files with consistent column structure. Headers describe columns. Straightforward to parse with Python.
Datasets
| Dataset ID | Name | Content |
|---|---|---|
| TAHOT | Tyndale Amalgamated Hebrew OT | Hebrew OT with Strong's numbers + morphological tags per word |
| TAGNT | Tyndale Amalgamated Greek NT | Greek NT with Strong's numbers + morphological tags per word |
| TTESV | Tyndale Translation–ESV | ESV translation aligned word-by-word to Hebrew/Greek |
| TBESH | Tyndale Brief Hebrew Lexicon | ~8,700 entries, BDB-derived, brief glosses |
| TBESG | Tyndale Brief Greek Lexicon | ~5,500 entries, LSJ-derived, brief glosses |
| TFLSJ | Tyndale Full LSJ Greek Lexicon | Full Liddell-Scott-Jones Greek lexicon entries |
| TIPNR | Tyndale Individuated Proper Names with Ref | ~3,000 unique individuals + ~1,000 places; birth/death, family trees, geocoding, verse references |
| TVTMS | Tyndale Versification Traditions Mapping | Maps verse IDs across English, Hebrew, Greek, Latin, Syriac, and other traditions |
| TEHMC | Tyndale Edition-specific Hebrew Manuscripts | Hebrew text variant comparison across editions |
| TEGMC | Tyndale Edition-specific Greek Manuscripts | Greek text variant comparison across editions |
Coming Soon
| Dataset ID | Name | Content |
|---|---|---|
| TAGOT | Tyndale Amalgamated Greek OT | LXX with Strong's + morphological tags (Septuagint tagged!) |
| TFBDB | Tyndale Full BDB Hebrew Lexicon | Full Brown-Driver-Briggs Hebrew lexicon |
| TOTMM / TNTMM | Tyndale OT/NT Morphological Manuscripts | Morphological analysis per manuscript tradition |
| TBCWG | Tyndale Brief Contextual Word Glosses | Context-sensitive translation glosses |
Quality
High. Tyndale House is a respected academic institution. Data undergoes scholarly review. TAHOT and TAGNT are amalgamated from multiple academic sources with cross-verification.
Gaps Filled
- ✅ Morphological tags (Hebrew via TAHOT, Greek via TAGNT) — Critical gap
- ✅ Source tokens (embedded in TAHOT/TAGNT word-level data) — Critical gap
- ✅ Person names database (TIPNR — ~3,000 individuals + ~1,000 places)
- ✅ Place names / geocoding (TIPNR includes coordinates)
- ✅ Versification mapping (TVTMS — multi-tradition)
- ✅ Extended lexicons (TBESH, TBESG, TFLSJ)
- 🔜 Septuagint tagged (TAGOT — coming)
- 🔜 Full BDB Hebrew lexicon (TFBDB — coming)
Integration Notes
- TSV parsing is trivial in Python — add a
StepBibleParserclass to the ingest pipeline - TAHOT/TAGNT data maps directly to enriching existing
:InterlinearWordnodes withmorphology,pos, andparsingproperties - TIPNR would create new
:Personand:Placenode types in FalkorDB with relationship edges to:Passagenodes - TVTMS would create
:VersificationMappingedges between:Passagenodes across translations - Strong's numbers in STEPBible data align with GospeLib's existing Strong's-keyed lexicon entries
- Requires new ingest pipeline stages or extension of existing stages (stages for morphology enrichment, people/places, versification)