About This Project
This is the first digital vedabase 100% verified against scanned photographs of the original printed books.
Every book was compared word-by-word against 68 scanned PDFs of the first editions published during Srila Prabhupada's lifetime. Where the digital text differed, it was corrected. The scans are the sole authority.
Original scan (left) vs. verified digital text (right)
Verify Our Work
Independent verification — the same original scans are available for your review.
Complete Diff Summary — All Books
| Book | Diffs Found | Action | Status |
|---|---|---|---|
| Bhagavad-gita As It Is | 0 | — | Identical |
| Srimad-Bhagavatam (30 vols) | 807 | 635 patched | Corrected |
| Sri Caitanya-caritamrta (17 vols) | 446 | 295 patched | Corrected |
| Teachings of Lord Caitanya | 2,312 | Full replacement | Corrected |
| KRSNA Book | 3 | Surgical patch | Corrected |
| Nectar of Devotion | 5 | Full replacement | Corrected |
| Nectar of Instruction | 0 | — | Identical |
| Sri Isopanisad | 0 | — | Identical |
| Easy Journey to Other Planets | 326 | Full replacement | Corrected |
| Teachings of Lord Kapila | 121 | Full replacement | Corrected |
| Teachings of Queen Kunti | 1 | Surgical patch | Corrected |
| Transcendental Teachings of Prahlada | 121 | Full replacement | Corrected |
| Science of Self Realization | 11 | Full replacement | Corrected |
| Beyond Birth and Death | 6 | Full replacement | Corrected |
| Perfection of Yoga | 3 | Full replacement | Corrected |
| On the Way to Krsna | 0 | — | Identical |
| Perfect Questions Perfect Answers | 0 | — | Identical |
| Krsna Consciousness Topmost Yoga | 0 | — | Identical |
| Krsna Reservoir of Pleasure | 0 | — | Identical |
| Raja-vidya | 0 | — | Identical |
| Elevation to Krsna Consciousness | 0 | — | Identical |
| TOTAL | 4,077 | 66 volumes verified | |
Note: 4 posthumous compilations (A Second Chance, Life Comes from Life, Light of the Bhagavata, Path of Perfection) have no original edition to compare.
Full Book Replacements (8 Books)
| Book | Diffs | Source | Edition |
|---|---|---|---|
| Teachings of Lord Caitanya | 2,312 | 1968 first edition PDF | 1968 |
| Easy Journey to Other Planets | 326 | Scan PDF | 1972 Macmillan |
| Teachings of Lord Kapila | 121 | Scan PDF | Original |
| Transcendental Teachings of Prahlada | 121 | Scan PDF | Original |
| Science of Self Realization | 11 | Scan PDF | 1977 |
| Beyond Birth and Death | 6 | Archive.org OCR | 1974 |
| Nectar of Devotion | 5 | Scan PDF | 1970 ISKCON Press |
| Perfection of Yoga | 3 | Scan PDF | 1972 |
Surgical Patching
| Work | Diffs Applied | Notes |
|---|---|---|
| Sri Caitanya-caritamrta | 295 | 446 total diffs, 95% clean |
| Srimad-Bhagavatam | 635 | 807 total diffs, 95% clean |
| KRSNA Book | 3 | Verified against 1970 scan |
| Teachings of Queen Kunti | 1 | Verified against original |
Zero-Diff Volumes (16 Confirmed Identical)
Scan Files Used (68 PDFs)
| Book | Scan File | Edition |
|---|---|---|
| BG | 1972_Bhagavad_gita-As_It_Is-Macmillan.pdf | 1972 |
| BBD | Beyond_Birth_and_Death-1974.pdf | 1974 |
| EJOP | Easy-Journey-to-Other-Planets-1972.pdf | 1972 |
| EKC | 1973_Elevation_to_Krsna_Consciousness.pdf | 1973 |
| ISO | Sri-Isopanisad-1969.pdf | 1969 |
| KCTY | KRSNA_Consciousness-Topmost_Yoga-1970.pdf | 1970 |
| KB | KRSNA_Book_Vol.1-2_1970.pdf | 1970 |
| KRP | KRSNA-Reservoir-of-Pleasure-1970.pdf | 1970 |
| NOD | Nectar_of_Devotion-1970.pdf | 1970 |
| NOI | Nectar_of_Instruction-1976.pdf | 1976 |
| OWK | On_the_Way_to_Krsna-1973.pdf | 1973 |
| PQPA | Perfect_Questions-1977.pdf | 1977 |
| POY | 1972_Perfection_of_Yoga.pdf | 1972 |
| RVIDYA | 1973_Raja-Vidya.pdf | 1973 |
| SSR | Science-of-Self-Realization-1977.pdf | 1977 |
| TLC | Teachings_of_Lord_Chaitanya-1968.pdf | 1968 |
| TLK | Teachings_of_Lord_Kapila-SCAN.pdf | orig. |
| TQK | Teachings_of_Queen_Kunti-SCAN.pdf | orig. |
| TTP | Transcendental_Teachings_Prahlad-SCAN.pdf | orig. |
| CC | adi1-3.pdf, mad1-9.pdf, ant1-5.pdf (17 files) | 1975 |
| SB | SB1.1.pdf through SB10.3.pdf (30 files) | 1972–1977 |
To eliminate human error and guarantee absolute precision, this version of the Vedabase uses a hybrid process combining advanced automation with rigorous manual verification, taking the original printed books as the sole authority.
Mechanisms to Eliminate Human Error
Tools & Technologies Used
difflib and SequenceMatcher libraries to identify precise text differencesArchitecture Pipeline
┌─────────────────────────────────────────────────────────────────────────────┐
│ VEDABASE CORRECTION PIPELINE │
└─────────────────────────────────────────────────────────────────────────────┘
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ SCAN PDFs │ │ VEDABASE │ │ OUTPUT │
│ (68 files) │ │ (current) │ │ (corrected) │
└──────┬───────┘ └──────┬───────┘ └──────▲───────┘
│ │ │
▼ ▼ │
┌──────────────────────────────────────┐ │
│ PyMuPDF TEXT EXTRACTION │ │
│ • Unicode IAST preservation │ │
│ • Page-by-page processing │ │
│ • Header/footer removal │ │
└──────────────────┬───────────────────┘ │
│ │
▼ │
┌──────────────────────────────────────┐ │
│ NORMALIZATION LAYER │ │
│ • Smart quote → ASCII │ │
│ • Hyphenated line-break repair │ │
│ • Diacritic-aware matching │ │
└──────────────────┬───────────────────┘ │
│ │
▼ │
┌──────────────────────────────────────┐ │
│ PARAGRAPH ALIGNMENT (Jaccard) │ │
│ • Trigram similarity scoring │ │
│ • Best-match paragraph linking │ │
│ • Orphan detection │ │
└──────────────────┬───────────────────┘ │
│ │
▼ │
┌──────────────────────────────────────┐ │
│ DIFF GENERATION (difflib) │ │
│ • SequenceMatcher comparison │ │
│ • Line-level diff extraction │ │
└──────────────────┬───────────────────┘ │
│ │
▼ │
┌──────────────────────────────────────┐ │
│ 5-LAYER NOISE FILTER │────────┤
│ • OCR character confusion │ │
│ • Diacritic normalization │ │
│ • Punctuation variants │ │
│ • Whitespace artifacts │ │
│ • Alignment false positives │ │
└──────────────────┬───────────────────┘ │
│ │
▼ │
┌──────────────────────────────────────┐ │
│ MANUAL VERIFICATION │ │
│ • Scan-by-scan confirmation │ │
│ • Semantic change flagging │────────┘
│ • Apply corrections │
└──────────────────────────────────────┘
Types of Changes Detected
| Category | Description | Example | Action |
|---|---|---|---|
| Style | Punctuation, capitalization, formatting | "Krsna." → "Kṛṣṇa," | Restore original |
| Transliteration | IAST diacritic changes, spelling variants | "Krsna" → "Krishna" | Restore original |
| Semantic | Word changes that alter meaning | "planet" → "planets" | Restore original |
| Additions | Text added after publication | New paragraphs, sentences | Remove addition |
| Deletions | Original text removed | Missing phrases, paragraphs | Restore deleted text |
Scripts & Code
compare.py — Main comparison pipeline (2,500+ lines)
def normalize_for_comparison(text: str) -> str:
# Fix hyphenated line breaks
text = re.sub(r'(\w)[\-\u00ad]\s*\n\s*(\w)', r'\1\2', text)
# Strip diacritics
text = strip_diacritics(text)
# Remove page headers/footers from scans
text = re.sub(r'\d+\s+Bhagavad-g\w*\s+As\s+It\s+Is', '', text)
# Normalize quotes/dashes
text = text.replace('\u201c', '"').replace('\u201d', '"')
text = text.replace('\u2018', "'").replace('\u2019', "'")
return text.strip()
is_noise() — 5-layer OCR noise filter
def is_noise(orig: str, veda: str) -> bool:
"""Filter OCR errors, diacritics, transliteration variants.
Returns True if difference is noise, False if real edit."""
o = strip_diacritics(orig.lower().strip())
v = strip_diacritics(veda.lower().strip())
# After diacritics normalization, same?
if o == v: return True
# OCR zero/O confusion
if o.replace('0','o') == v.replace('0','o'): return True
# Only alpha chars — same?
o_alpha = re.sub(r'[^a-z]', '', o)
v_alpha = re.sub(r'[^a-z]', '', v)
if o_alpha == v_alpha: return True
# Alignment false positive check
if len(o_alpha) > 25 and len(v_alpha) > 25:
ratio = difflib.SequenceMatcher(None, o_alpha, v_alpha).ratio()
if ratio < 0.25: return True
return False
strip_diacritics.py — IAST diacritic handling
DIACRITIC_MAP = str.maketrans({
'ā': 'a', 'Ā': 'A', 'ī': 'i', 'Ī': 'I',
'ū': 'u', 'Ū': 'U', 'ṛ': 'r', 'Ṛ': 'R',
'ṁ': 'm', 'ṃ': 'm', 'ṅ': 'n', 'ñ': 'n',
'ṇ': 'n', 'ś': 's', 'ṣ': 's', 'ḥ': 'h',
'ṭ': 't', 'ḍ': 'd',
})
def strip_diacritics(text):
text = text.translate(DIACRITIC_MAP)
# NFD normalization for remaining combining chars
normalized = unicodedata.normalize('NFD', text)
return ''.join(c for c in normalized
if unicodedata.category(c) != 'Mn')