Skip to main content

Clear-Bible MACULA

  • Repository: Clear-Bible/macula-greek and Clear-Bible/macula-hebrew
  • Maintainer: Clear Bible, Inc. (published by Biblica, Inc.)
  • License: CC BY 4.0 (top-level), but MARBLE/UBS semantic domain data is "used with permission" — needs legal review for redistribution. SIL glosses also have a custom license.
  • Suitability Score: ⭐⭐⭐⭐⭐ (5/5) — but note licensing caveats for semantic domain data

Coverage

Format: XML (3 variants: nodes, lowfat, TEI) + TSV flat export. Per-book files.

  • macula-greek: Full NT based on Nestle1904 + SBLGNT. Syntax trees with roles, Strong's numbers, Louw-Nida/SDBH semantic domains, semantic frames, participant referents, English + Mandarin glosses, word senses.
  • macula-hebrew: Full OT based on Westminster Leningrad Codex. Same depth of analysis. Updated Feb 2026.
  • Fields per word: morphology, lemma, Strong's, part of speech, syntax role, semantic domain, word sense, gloss, unique xml:id, USFM ref.

Quality

Very high. Richest open biblical linguistic dataset available — exceeds STEPBible in depth (syntax trees, semantic roles, coreference). Actively maintained.

Gaps Filled

  • Morphological tags (Hebrew + Greek) — comprehensive per-word analysis with syntax trees
  • Source tokens — individual word forms with lemmas and unique IDs
  • Syntax / discourse analysis (Gap #14) — full syntax trees with semantic roles
  • 🔶 Pericope divisions (SBLGNT subdivisions included)

Integration Notes

  • XML/TSV parsing straightforward in Python. Tree structure maps naturally to FalkorDB graph.
  • Could supersede or complement STEPBible for morphology
  • The mixed licensing requires legal review — the morphology + syntax data is CC BY 4.0, but semantic domain labels may have restrictions
  • Unique word IDs enable precise alignment with other datasets
  • Per-book file structure aligns with existing ingest pipeline patterns
  • Would create new :SyntaxNode tree structures in FalkorDB linked to :InterlinearWord nodes