Vesuvius Challenge
In 79 CE, Mount Vesuvius buried the Villa of the Papyri at Herculaneum under 20 meters of pyroclastic flow. Inside was the only intact library from the classical world — some 1,800 papyrus scrolls, carbonized but preserved in the absence of oxygen. For 275 years after their 18th-century rediscovery, the scrolls were considered unreadable without physical unrolling that would destroy them.
The Vesuvius Challenge, launched in March 2023 by computer scientist Brent Seales, entrepreneur Nat Friedman, and Daniel Gross, changed everything. By releasing high-resolution CT scans and an AI-based ink detection model, the challenge invited the world to decode the scrolls without ever touching them — virtual unwrapping.
In February 2024, three students — Luke Farritor, Youssef Nader, and Julian Schilliger — claimed the $700,000 Grand Prize, deciphering over 2,000 Greek characters. The first word read was ΠΟΡΦΥΡΑ ("purple"). The text is believed to be Philodemus's philosophical writing on pleasure and music.
In February 2025, Oxford's Bodleian scroll PHerc. 172 was imaged at the Diamond Light Source synchrotron, yielding more recoverable text than any previously scanned scroll. The first word decoded: διατροπή — "disgust."
Linear A
Linear A is the undeciphered writing system of the Minoan civilization, used across Crete and the Aegean from roughly 1800 to 1450 BCE. It is the direct ancestor of Linear B — the Mycenaean Greek script cracked by Michael Ventris in 1952 — but the underlying Minoan language remains completely unknown and unrelated to any other language family.
Approximately 1,500 inscriptions survive, mostly administrative tablets from palatial sites like Haghia Triada, Zakros, and Akrotiri on Santorini. The signs can be phonetically read using Linear B correspondences, but the words produced are meaningless because the language itself is unknown — a tantalizing half-reading.
The core obstacle is the small corpus. Statistical and ML approaches require thousands of examples to extract patterns; Linear A's 1,500 short administrative texts offer too little signal. Without a bilingual text — a Minoan Rosetta Stone — purely computational approaches face a fundamental ceiling.
Recent work by researchers including Brent Davis and Silvia Ferrara has mapped sign functions and proposed structural analyses. ML models trained on related Bronze Age scripts have attempted phonetic mapping, but no breakthrough has emerged. Linear A remains one of the last great puzzles of the ancient Mediterranean.
Proto-Sinaitic
Proto-Sinaitic is the oldest known alphabetic writing system — the direct ancestor of every modern alphabet on Earth, from Latin and Greek to Arabic and Hebrew. Developed by Semitic workers in Egyptian turquoise mines at Serabit el-Khadim in the Sinai around 1900–1800 BCE, it borrowed Egyptian hieroglyphic forms but redeployed them as consonantal sound signs for a Semitic language.
The principle — the acrophonic principle — is elegant: the sign for "ox" (ʾaleph in Semitic) represents the sound /ʾ/; the sign for "house" (bayt) represents /b/. This is the origin of our letters A (inverted ox head) and B (floor plan of a house).
Roughly 40 short inscriptions survive, mostly from Sinai but with related examples at sites across the Levant. The readings are contested — scholars agree on the basic phonetic values but disagree sharply on word identification, language affiliation, and content. Recent work by Thomas Schneider, Brian Colless, and others has proposed new readings that challenge the 20th-century consensus.
The tiny corpus and lack of longform text means ML approaches are largely inapplicable here. The frontier is classical philological and epigraphic work, now accelerated by digital imaging — RTI (Reflectance Transformation Imaging) and photogrammetric 3D models revealing previously invisible signs on deteriorated surfaces.
Indus Valley Script
The Indus Valley script is the most widely studied undeciphered writing system on Earth. Used by the Harappan civilization — which at its height encompassed more territory than Mesopotamia and Egypt combined — it appears on roughly 4,000 objects, primarily small stamp seals and pottery sherds. The inscriptions are frustratingly short, averaging just 5 signs.
The language beneath the script is unknown and unattested. The major candidates are an ancestor of the Dravidian language family (Tamil, Telugu, Kannada) or an undocumented language isolate. A minority view holds the "script" is not language at all but a logo-administrative system of pure symbols — though most scholars reject this.
The 417 distinct signs suggest a logosyllabic or logo-consonantal system. Statistical analyses by Rajesh Rao (2009) found that Indus sign sequences have conditional entropy values consistent with linguistic systems — arguing against the "non-linguistic" hypothesis. This work sparked renewed ML interest in the script.
Recent deep learning approaches, including work by Nisha Yadav and teams at TIFR, have mapped sign distributions and proposed syntactic rules. The 2021 South Asian inscription from Khirbat Hamra Ifdan in Jordan, referencing trade goods, hints at Indus-Mesopotamian contact — but no bilingual anchor has emerged.
Maya Hieroglyphs
Maya hieroglyphic writing is the only pre-Columbian writing system substantially deciphered — and the decipherment story is one of the great intellectual dramas of the 20th century. From complete mystery in the 1950s to reading dynastic histories, political betrayals, and philosophical texts by the 1990s, the breakthrough took 40 years and required integrating art history, linguistics, epigraphy, and anthropology in ways that no single discipline could achieve alone.
The key insight came from Soviet scholar Yuri Knorozov in 1952: the Maya script is logosyllabic, combining logograms (word signs) and syllabograms (sound signs), similar in structure to Egyptian or Sumerian cuneiform. Combined with Tatiana Proskouriakoff's 1959 discovery that inscriptions record historical events — not astronomy or prophecy — the script yielded its secrets rapidly.
Today, roughly 85–90% of surviving glyphs can be read. Scholars like David Stuart, Stephen Houston, and Simon Martin continue to refine readings. The corpus spans stelae at Tikal, Palenque, Copan, and Yaxchilán; the four surviving codices (Dresden, Madrid, Paris, Grolier); and vast quantities of ceramic vessel texts. Each newly excavated site adds inscriptions — Palenque's Temple of the Inscriptions alone contains more text than all Egyptian royal inscriptions from Ramesses II.
The remaining 10–15% includes signs still poorly understood, regional scribal variants, and the challenge of the 1,400+ sign inventory — making this the AI frontier: not decipherment from scratch, but computational assistance with the still-opaque remainder, automated glyph segmentation, and cross-corpus pattern analysis at scale.