Home Feed Map Sky Archive Maritime Linguistics Inscriptions Maya Bioanthro Luwian Upload Conferences
AI × Ancient Scripts

The Decipherment
Frontier

Five ancient writing systems. Machine learning meets epigraphy. From Herculaneum scrolls to Maya glyphs — tracking the live edge of human knowledge recovery.

𐝂𐝄𐝀𐝋 𓃀𓈖𓌀𓅓 𐦲𐦳𐦴𐦵 𛀀𛀁𛀂𛀃 ΠΟΡΦΥΡΑ
Undeciphered
Partially decoded
Substantially decoded
Active challenge
Herculaneum Papyri

Vesuvius Challenge

🔴 Live Competition Ancient Greek 79 CE burial
Scroll decoded
~5%
of PHerc.Paris.4 · 2024 Grand Prize

In 79 CE, Mount Vesuvius buried the Villa of the Papyri at Herculaneum under 20 meters of pyroclastic flow. Inside was the only intact library from the classical world — some 1,800 papyrus scrolls, carbonized but preserved in the absence of oxygen. For 275 years after their 18th-century rediscovery, the scrolls were considered unreadable without physical unrolling that would destroy them.

The Vesuvius Challenge, launched in March 2023 by computer scientist Brent Seales, entrepreneur Nat Friedman, and Daniel Gross, changed everything. By releasing high-resolution CT scans and an AI-based ink detection model, the challenge invited the world to decode the scrolls without ever touching them — virtual unwrapping.

In February 2024, three students — Luke Farritor, Youssef Nader, and Julian Schilliger — claimed the $700,000 Grand Prize, deciphering over 2,000 Greek characters. The first word read was ΠΟΡΦΥΡΑ ("purple"). The text is believed to be Philodemus's philosophical writing on pleasure and music.

In February 2025, Oxford's Bodleian scroll PHerc. 172 was imaged at the Diamond Light Source synchrotron, yielding more recoverable text than any previously scanned scroll. The first word decoded: διατροπή — "disgust."

Competition Milestones
First ink detection on unopened scroll 2023
First word: ΠΟΡΦΥΡΑ ("purple") — Farritor Oct 2023
Grand Prize: 2,000+ characters decoded · $700k Feb 2024
Oxford PHerc. 172 scanned · "disgust" decoded Feb 2025
Stage 2: 90% of four scrolls — active 2025–
All 300 scanned scrolls readable future
Library size
~1,800 papyrus scrolls recovered. Unexcavated sections of the villa may contain thousands more.
Content so far
Epicurean philosophy by Philodemus. Scholars hope to find lost works of Aristotle, Sappho, and Sophocles.
Technology
X-ray CT scanning at synchrotron facilities, TimeSformer-based ML models for ink detection, virtual surface unwrapping.
Prize pool
$1.7M+ awarded to date. Active prizes for segmentation, ink detection, and full scroll reading.
AI Developments
🤖
TimeSformer model dominates Grand Prize
Video transformer architecture applied to scroll cross-sections outperforms classical CNNs for carbon ink detection on carbonized papyrus.
2024
🔬
ThaumatoAnakalyptor: automated segmentation
New auto-segmentation pipeline reduces cost from $100/cm² manual labor to fully automated surface tracing — critical bottleneck broken.
2025
📜
Egyptian scroll crossover
Nader reports Herculaneum-trained models producing promising results on Egyptian papyri from Berlin — unexpected generalization across ancient collections.
2025
Corpus feed
Loading from pipeline…
Minoan Civilization · Aegean Bronze Age

Linear A

Undeciphered ~1,500 inscriptions 1800–1450 BCE
Deciphered
0%
language unknown · signs partially mapped

Linear A is the undeciphered writing system of the Minoan civilization, used across Crete and the Aegean from roughly 1800 to 1450 BCE. It is the direct ancestor of Linear B — the Mycenaean Greek script cracked by Michael Ventris in 1952 — but the underlying Minoan language remains completely unknown and unrelated to any other language family.

Approximately 1,500 inscriptions survive, mostly administrative tablets from palatial sites like Haghia Triada, Zakros, and Akrotiri on Santorini. The signs can be phonetically read using Linear B correspondences, but the words produced are meaningless because the language itself is unknown — a tantalizing half-reading.

The core obstacle is the small corpus. Statistical and ML approaches require thousands of examples to extract patterns; Linear A's 1,500 short administrative texts offer too little signal. Without a bilingual text — a Minoan Rosetta Stone — purely computational approaches face a fundamental ceiling.

Recent work by researchers including Brent Davis and Silvia Ferrara has mapped sign functions and proposed structural analyses. ML models trained on related Bronze Age scripts have attempted phonetic mapping, but no breakthrough has emerged. Linear A remains one of the last great puzzles of the ancient Mediterranean.

Linear A signs (Haghia Triada tablet HT 31)
𐝀 𐝁 𐝂 𐝃 𐝄 𐝅 𐝆 𐝇
Administrative record — content readable phonetically but semantically opaque
Corpus size
~1,500 inscriptions. Mostly short administrative tallies. No long texts. No bilingual anchor.
Key sites
Haghia Triada (largest archive), Zakros, Knossos, Akrotiri (Thera) — destroyed 1450 BCE, possibly by Thera eruption.
Relationship to Linear B
Linear B borrowed ~80% of Linear A's signs. Ventris's 1952 decipherment of Linear B as Mycenaean Greek made Linear A phonetically readable but semantically silent.
AI challenge
Corpus too small for unsupervised learning. Approaches using cross-script transfer from Linear B show structural insights but no semantic breakthrough.
AI Approaches
🧬
Cross-script transfer learning
Models trained on Linear B, Cypro-Minoan, and Egyptian hieroglyphs applied to Linear A sign classification. Identifies functional categories but cannot assign meaning.
📊
DĀMOS database + computational analysis
University of Oslo's DĀMOS project provides machine-readable Linear A and B corpora. Statistical distribution analyses suggest administrative structure similar to Linear B.
🔗
The Rosetta Stone problem
Without a bilingual text, decipherment requires identifying a known language beneath the signs. Proposals for Luwian, Semitic, and Proto-Greek substrates all remain unverified.
Sinai Peninsula · Levant · c. 1900–1500 BCE

Proto-Sinaitic

Partially Decoded Earliest alphabet ~40 inscriptions
Deciphered
~40%
contested readings · ancestor of all alphabets

Proto-Sinaitic is the oldest known alphabetic writing system — the direct ancestor of every modern alphabet on Earth, from Latin and Greek to Arabic and Hebrew. Developed by Semitic workers in Egyptian turquoise mines at Serabit el-Khadim in the Sinai around 1900–1800 BCE, it borrowed Egyptian hieroglyphic forms but redeployed them as consonantal sound signs for a Semitic language.

The principle — the acrophonic principle — is elegant: the sign for "ox" (ʾaleph in Semitic) represents the sound /ʾ/; the sign for "house" (bayt) represents /b/. This is the origin of our letters A (inverted ox head) and B (floor plan of a house).

Roughly 40 short inscriptions survive, mostly from Sinai but with related examples at sites across the Levant. The readings are contested — scholars agree on the basic phonetic values but disagree sharply on word identification, language affiliation, and content. Recent work by Thomas Schneider, Brian Colless, and others has proposed new readings that challenge the 20th-century consensus.

The tiny corpus and lack of longform text means ML approaches are largely inapplicable here. The frontier is classical philological and epigraphic work, now accelerated by digital imaging — RTI (Reflectance Transformation Imaging) and photogrammetric 3D models revealing previously invisible signs on deteriorated surfaces.

Discovery
Found by Flinders Petrie at Serabit el-Khadim, Sinai, in 1904. First decipherment proposals by Alan Gardiner in 1916.
Descendants
Proto-Canaanite → Phoenician → Greek, Latin, Cyrillic, Arabic, Hebrew, Ethiopic — essentially every alphabet in use today.
Key inscription
Sphinx inscription from Serabit reads (possibly) "to Baalat" — a Semitic goddess. The most agreed-upon reading in the corpus.
AI role
RTI and photogrammetric imaging recover signs invisible to naked eye. ML used for sign classification and comparison across the 40-inscription corpus.
Current Research
📷
RTI + photogrammetry revival
New digital imaging at Serabit el-Khadim and the British Museum is recovering previously illegible signs, effectively expanding the corpus for the first time in decades.
🔤
Wadi el-Hol inscriptions
John and Deborah Darnell's discovery in 1993 of early alphabetic inscriptions in Upper Egypt push the script's origin possibly to 1900 BCE — earlier than the Sinai examples.
Indus Valley Civilization · 2600–1900 BCE

Indus Valley Script

Undeciphered ~4,000 inscriptions 417 distinct signs
Deciphered
0%
language unknown · largest undeciphered corpus

The Indus Valley script is the most widely studied undeciphered writing system on Earth. Used by the Harappan civilization — which at its height encompassed more territory than Mesopotamia and Egypt combined — it appears on roughly 4,000 objects, primarily small stamp seals and pottery sherds. The inscriptions are frustratingly short, averaging just 5 signs.

The language beneath the script is unknown and unattested. The major candidates are an ancestor of the Dravidian language family (Tamil, Telugu, Kannada) or an undocumented language isolate. A minority view holds the "script" is not language at all but a logo-administrative system of pure symbols — though most scholars reject this.

The 417 distinct signs suggest a logosyllabic or logo-consonantal system. Statistical analyses by Rajesh Rao (2009) found that Indus sign sequences have conditional entropy values consistent with linguistic systems — arguing against the "non-linguistic" hypothesis. This work sparked renewed ML interest in the script.

Recent deep learning approaches, including work by Nisha Yadav and teams at TIFR, have mapped sign distributions and proposed syntactic rules. The 2021 South Asian inscription from Khirbat Hamra Ifdan in Jordan, referencing trade goods, hints at Indus-Mesopotamian contact — but no bilingual anchor has emerged.

Corpus
~4,000 inscriptions. Average length: 5 signs. Longest known: 26 signs. All short — no "Rosetta Stone" equivalent exists.
Major sites
Mohenjo-daro, Harappa, Dholavira — the three largest Harappan cities. Dholavira's signboard inscription (10 large signs) is the most monumental.
Language candidates
Proto-Dravidian (majority view among scholars), Munda language ancestor, or an isolate. No consensus. No living descendant language confirmed.
AI frontier
Rajesh Rao's entropy analysis (2009). TIFR computational studies. Most recently, transformer models trained on Dravidian scripts for cross-script feature comparison.
AI Approaches
📈
Entropy analysis confirms linguistic nature
Rao et al. (2009) in Science showed Indus sign conditional entropy matches known languages and differs from non-linguistic systems — strongest statistical argument for it being language.
🔗
Dravidian computer model (Mahadevan)
Iravatham Mahadevan's concordance of 3,000+ inscriptions forms the primary computational dataset. Tamil-Brahmi comparisons identify possible loanword correspondences.
🧠
LLM attempts (2023–2024)
Multiple teams have fine-tuned LLMs on proposed Dravidian phonetic mappings. Results remain speculative — without a bilingual text, any "reading" is unfalsifiable.
Mesoamerica · 300 BCE – 1697 CE

Maya Hieroglyphs

85–90% Decoded 🔴 AI Frontier Logosyllabic
Deciphered
85–90%
reading dynastic histories · AI segmentation active

Maya hieroglyphic writing is the only pre-Columbian writing system substantially deciphered — and the decipherment story is one of the great intellectual dramas of the 20th century. From complete mystery in the 1950s to reading dynastic histories, political betrayals, and philosophical texts by the 1990s, the breakthrough took 40 years and required integrating art history, linguistics, epigraphy, and anthropology in ways that no single discipline could achieve alone.

The key insight came from Soviet scholar Yuri Knorozov in 1952: the Maya script is logosyllabic, combining logograms (word signs) and syllabograms (sound signs), similar in structure to Egyptian or Sumerian cuneiform. Combined with Tatiana Proskouriakoff's 1959 discovery that inscriptions record historical events — not astronomy or prophecy — the script yielded its secrets rapidly.

Today, roughly 85–90% of surviving glyphs can be read. Scholars like David Stuart, Stephen Houston, and Simon Martin continue to refine readings. The corpus spans stelae at Tikal, Palenque, Copan, and Yaxchilán; the four surviving codices (Dresden, Madrid, Paris, Grolier); and vast quantities of ceramic vessel texts. Each newly excavated site adds inscriptions — Palenque's Temple of the Inscriptions alone contains more text than all Egyptian royal inscriptions from Ramesses II.

The remaining 10–15% includes signs still poorly understood, regional scribal variants, and the challenge of the 1,400+ sign inventory — making this the AI frontier: not decipherment from scratch, but computational assistance with the still-opaque remainder, automated glyph segmentation, and cross-corpus pattern analysis at scale.

Active AI Projects
🎨
WVU Artful Algorithms Project
West Virginia University collaboration between Art History and Computer Science. Fine-tuning SAM (Segment Anything Model) on the Justin Kerr MayaVase database — 5,000+ rollout photographs of Maya ceramic vessels with expert-annotated glyph blocks.
2024 — ongoing
🏛️
TWKM / IDIOM Digital Corpus
Bonn-Bochum Maya epigraphy project building a TEI/XML machine-readable corpus of Classic Maya texts. The IDIOM research environment hosts 1,400+ sign catalog, transliterations, and dynamic RDF database incorporating Mathews's 40-year Maya History Project dataset.
2024 — ongoing · classicmayan.org
🔤
Unicode encoding proposal
UC Berkeley Script Encoding Initiative working to encode Maya hieroglyphs in Unicode (tentative range U+15500–U+159FF). As of 2024 still under development — the logosyllabic flexibility (infixes inside signs) challenges standard Unicode encoding models.
2016 grant — 2024 in progress
📐
Living epigraphy: new glyphs invented
Yucatec epigraphers like Eduardo Puga are creating new Maya glyphs for modern concepts — laptop, giraffe, democracy — using authentic acrophonic and phonetic conventions. The script is being extended by living practitioners for the first time in 500 years.
2024
Script type
Logosyllabic: combines logograms (word signs) with syllabograms (CV syllables). Similar structure to Egyptian hieroglyphs and Sumerian cuneiform.
Corpus size
Thousands of monumental inscriptions. Four surviving codices. Vast ceramic vessel corpus. New texts emerge with every excavation season.
Key decipherment
Knorozov (1952) proves logosyllabic structure. Proskouriakoff (1959) proves historical content. Schele, Houston, Stuart (1970s–90s) crack phonetic readings at scale.
What remains
~10–15% of signs still uncertain. Regional variants. The full Dresden Codex astronomical tables. New LiDAR discoveries at sites like Aguada Fénix contain unread texts.
mayadecipherment.com
David Stuart's active blog — the primary venue for new glyph readings. New posts appear regularly with fresh decipherments from fieldwork and museum collections.
The "Next Frontier" Framing
Logosyllabic structure proven (Knorozov) 1952
Historical content confirmed (Proskouriakoff) 1959
~85% readable: dynastic histories recovered 1990s
MayaVase database digitized (Kerr / Dumbarton Oaks) 2022
AI glyph segmentation (WVU, TWKM) — active 2024–
Unicode encoding proposal — active 2024–
Full corpus machine-readable · last 10–15% cracked future
Corpus feed
Loading from pipeline…
Elam · Proto-historic Iran · c. 2300–1880 BCE

Linear Elamite

Partially deciphered ~40 inscriptions (OCLEI) c. 2300–1880 BCE
Deciphered
~38%
sound values partially established · language partially known

Linear Elamite is a partially deciphered writing system used in the ancient kingdom of Elam in what is now southwestern Iran, during the late third and early second millennia BCE. It is found on stone monuments, silver beakers, and administrative objects from sites including Susa, Persepolis, and Shahdad — about 40 inscriptions in total, making up the Online Corpus of Linear Elamite Inscriptions (OCLEI).

The script was long considered completely undeciphered. A significant breakthrough came in 2022 when François Desset and colleagues published a major decipherment proposal in Zeitschrift für Assyriologie, assigning sound values to a substantial portion of the sign inventory by cross-referencing bilingual royal inscriptions where the same ruler's name appears in both Linear Elamite and Akkadian cuneiform. This cracked open the phonological layer of the script. The underlying language — Elamite — is itself only partially understood, though it is attested in cuneiform across 2,000 years of history.

The relationship between Linear Elamite and the earlier Proto-Elamite script (~3200–2900 BCE, ~10,000 tablets, still completely undeciphered) is debated. They may share a common ancestor or one may derive from the other, but no direct continuity has been established. Linear Elamite appears abruptly, fully formed, during the Shimashki Dynasty period.

The GEAS tool (Alice Kober Gesellschaft für die Entzifferung antiker Schriftsysteme, University of Zurich) provides the most comprehensive publicly accessible Linear Elamite corpus browser, built on Desset's OCLEI dataset. It includes a Unicode character picker, dynamic syllabary, RegEx corpus search, sign frequency analysis, and per-inscription photographic documentation.

Linear Elamite — selected signs (OCLEI)
𐬹 𐬺 𐬻 𐬼 𐬽 𐬾
Syllabic signs — phonetically assigned via Desset et al. (2022) bilingual decipherment
Live Tool · GEAS / OCLEI (Zurich)
Elamicon — Online Corpus of Linear Elamite Inscriptions
Open Full Tool ↗
Unicode character picker
Searchable inscription corpus (OCLEI)
RegEx sign-sequence search
Dynamic syllabary (customisable)
Sign frequency analysis
Per-inscription photos & drawings
Sandbox & reliterator
GEAS-Liberation font download
Corpus (OCLEI)
~40 inscriptions on stone monuments, silver beakers, copper tablets, and administrative objects. Primarily from Susa, Persepolis, and Shahdad. The GEAS corpus is more inclusive than print editions, flagging suspected forgeries rather than excluding them.
2022 Decipherment
Desset, Tabibzadeh, Kervran, Piran & Momtaz (2022) in Zeitschrift für Assyriologie. Cross-referenced bilingual royal names in Linear Elamite and Akkadian cuneiform to establish ~38 sound values. The underlying Elamite language is independently attested in cuneiform, providing a partial phonological and lexical anchor.
Relation to Proto-Elamite
Proto-Elamite (~3200–2900 BCE, ~10,000 tablets) is the world's second largest undeciphered corpus after the Indus script. It remains completely undeciphered. The relationship to Linear Elamite is unresolved — possible ancestor, independent invention, or indirect descendant.
GEAS & Alice Kober Society
Named after Alice Kober (1906–1950), whose combinatorial analysis of Linear B was crucial to Ventris's 1952 decipherment. The society publishes corpora for Linear Elamite, Byblos Script, Deir Alla Script, Raetic, Lepontic, Etruscan, and Elder Futhark Runes under open standards.
Other scripts in GEAS
The same tool hosts: Byblos Script (Lebanon, undeciphered), Deir Alla Script (Jordan, undeciphered), Raetic (Alpine, partially read), Lepontic (Cisalpine Gaul, partially read), Etruscan (largely read), Elder Futhark Runes (largely read).
Research Approaches
🔑
Bilingual decipherment (Desset et al. 2022)
Royal titulary inscriptions name the same rulers in both Linear Elamite and Akkadian cuneiform, enabling phonetic assignments for ~38 signs. The Elamite language's known phonology from cuneiform provides a crucial independent check on proposed readings.
📊
GEAS dynamic syllabary & RegEx corpus search
The OCLEI tool enables scholars to test custom syllabary groupings against the full corpus in real time — a methodological innovation designed to break the circularity problem inherent in any decipherment attempt. Sign-sequence searches with regular expressions replace manual collation of tablets.
🧬
Cross-script comparison (Proto-Elamite)
If Linear Elamite derives from Proto-Elamite, established Linear Elamite sound values could theoretically anchor computational analysis of the far larger Proto-Elamite corpus. The CDLI at UCLA holds the largest digitised Proto-Elamite dataset. The GEAS reliterator can convert CDLI transliterations into Linear Elamite sign representations.
🔬
Silver beakers & administrative context
Several of the most legible Linear Elamite inscriptions appear on silver beakers — luxury objects likely produced for royal gift exchange. The administrative context of other objects (brick stamps, foundation deposits) provides functional parallels to the well-understood cuneiform administrative corpus from Susa.
Kyprianos Database · CoMaF / Würzburg · DFG

Coptic Magical Papyri

Late Roman · Byzantine · Islamic Egypt c. 4th–12th century CE 🔴 Actively updated

The Kyprianos Magical Text Database is the primary open-access resource for the corpus of Coptic-language magical manuscripts from late antique and early Islamic Egypt, roughly the 4th to 12th centuries CE. These texts — written on papyrus, parchment, ostraca, and amulets — document non-institutional ritual practices: spells for healing, protection, love, separation, divination, and apotropaic purposes. They are exceptional sources for the intimate lives of individuals who lived below the level of formal literary culture.

The database is produced by the Corpus of Coptic Magical Formularies (CoMaF) project at the Julius Maximilian University of Würzburg, funded by the German Research Foundation (DFG) and led by Markéta Preininger and Korshi Dosoo. It succeeds the earlier five-year Coptic Magical Papyri project (2018–2023) and will ultimately include transcriptions and translations of every surviving spell, with manuscript metadata, images, scribal hand identification, and archive groupings across Coptic, Greek, Demotic, and Arabic magical texts.

Each manuscript record details material type, dimensions, dating, dialect, provenance, holding institution, shelfmark, edition history, and condition. The searchable interface allows filtering by script, date range, material, content type, and manuscript status.

Corpus
Magical texts in Coptic, Greek, Demotic, and Arabic on papyrus, parchment, ostraca, wooden tablets, and metal amulets. Primarily from Susa, Fayum, Oxyrhynchus, and Upper Egypt. The database flags suspected forgeries rather than excluding them.
Content types
Healing and fever amulets, love spells, separation formularies, divination texts, protective invocations, apotropaic amulets with Psalm 90/91, and formularies combining Christian, Jewish, Gnostic, and pagan ritual elements.
Project lineage
Builds on Franziska Naether's Trismegistos Magic (362 Coptic magic texts). The current CoMaF project (2024–2027) expands the corpus and is a partner of the DFG Centre for Advanced Studies MagEIA.
Named after
Kyprianos of Antioch (3rd century CE) — a convert from sorcery to Christianity whose name became synonymous with magical power in the late antique tradition. Invoked frequently in Coptic magical texts alongside angels, archangels, and Biblical figures.
Live Database · Kyprianos / CoMaF · Würzburg
Manuscript Search — Kyprianos Magical Text Database
Open Full Database ↗
Manuscript-level records
Searchable by date, dialect, material
Holding institution & shelfmark
Edition & publication history
Content type classification
Forgery flags included
Glottolog · Language Endangerment

Living Languages at Risk

🔴 Active monitoring Glottolog 5.x CC-BY 4.0 AES scale
Languages tracked
47
curated for archaeological relevance

Language death and script extinction are the same process at different time scales. The 47 languages tracked here were selected because they either produced archaeological writing systems, are spoken by communities whose ancestors built the sites in our gazetteer, or represent language families central to the decipherment challenges covered on this page.

The Agglomerated Endangerment Scale (AES) aggregates data from UNESCO, the Catalogue of Endangered Languages, and Ethnologue. AES 3–5 languages are actively losing speakers; AES 6 (dormant/extinct) languages survive only in texts — precisely the condition that produces undeciphered scripts.

The connection is direct: Linear A became undecipherable when Minoan became AES 6. The Indus Valley script has no anchor language partly because the Harappan language went fully dormant. By contrast, Maya hieroglyphics remained ~85% readable because Mayan languages — Yucatec, Ch'ol, K'iche' — survived into the present with millions of speakers, providing the living phonological system that made decipherment possible.

Itzaj Maya — closest to the Classic inscriptions — now has fewer than 10 fluent speakers. When it reaches AES 6, one more thread connecting us to the ancient texts breaks permanently.

AES Endangerment Scale
AES 1: Safe 2 tracked
AES 2: Vulnerable 16 tracked
AES 3: Definitely Endangered 7 tracked
AES 4: Severely Endangered 8 tracked
AES 5: Critically Endangered 4 tracked
AES 6: Dormant/Extinct 10 tracked
Scripts connected to living languages
🗣
Yucatec Maya AES 2
Most spoken Maya language; active epigraphy community
🗣
K'iche' AES 2
Largest Maya language; Popol Vuh language
🗣
Ch'ol AES 2
Closest living relative to Classic Maya inscription language
🗣
Mocho' AES 5
Severely endangered Mayan language
🗣
Tamil AES 1
Oldest attested Dravidian language; proposed ancestor of Indus script
🗣
Brahui AES 3
Isolated Dravidian language in Balochistan — near Indus Valley heartland
🗣
Mandaic AES 5
Liturgical language of Mandaean Gnostics; descended from Eastern Aramaic
🗣
Classical Mandaic AES 6
Liturgical language; dormant
🗣
Minoan (unclassified) AES 6
Language of Linear A script; completely undeciphered; extinct ~1450 BCE
🗣
Mycenaean Greek AES 6
Earliest attested Greek; Linear B script
🗣
Coptic AES 6
Last stage of ancient Egyptian; liturgical language only
🗣
Rapanui AES 4
Easter Island language; connected to undeciphered Rongorongo script
🗣
Urartian AES 6
Language of Kingdom of Urartu; related to Hurrian; extinct ~600 BCE
🗣
Luwian AES 6
Hieroglyphic Luwian script; Bronze Age Anatolia; active decipherment research
🗣
Hittite AES 6
Oldest attested Indo-European language; cuneiform script; Hattusa tablets
🗣
Jurchen AES 6
Language of Jin dynasty; own script; ancestor of Manchu
Why language death matters for archaeology
Living speakers provide phonological anchors for decipherment. Every Mayan language speaker keeps a pathway to the Classic inscriptions open.
The Maya connection
Ch'ol Maya, spoken by ~220,000 people in Chiapas, is the closest living relative to the language of Classic Maya inscriptions. Its survival directly enables ongoing decipherment.
Burushaski — the Harappan candidate
The language isolate spoken in Pakistan's Karakoram valleys has been proposed as a Harappan language descendant. ~100,000 speakers, AES 2. One of the few possible living clues to the Indus script.
Rapanui and Rongorongo
Easter Island's ~3,700 Rapanui speakers are the only community that may have ancestral connection to the undeciphered Rongorongo script. AES 4 — severely endangered.
Critically endangered · fewer than 10 speakers
Itzaj — Critically endangered; fewer than 10 fluent speakers
Mocho' — Severely endangered Mayan language
Puruborá — Critically endangered Amazonian language
Mandaic — Liturgical language of Mandaean Gnostics; descended from Eastern Aramaic
Source: Glottolog Agglomerated Endangerment Scale · CC-BY 4.0
Language Map

All 47 tracked languages mapped by their Glottolog coordinates. Color-coded by AES endangerment level — green for safe/vulnerable, orange for severely endangered, red for critically endangered, grey for dormant/extinct. Click any marker for language details, family, speaker count, and Glottolog link.

Safe / Vulnerable
Definitely Endangered
Severely Endangered
Critically Endangered
Dormant / Extinct
All 47 tracked languages
Dzongkha
Sino-Tibetan · Bhutan
Safe
Tamil
Dravidian · India
Safe
Oldest attested Dravidian language; proposed ancestor of Indus script
Avar
Northeast Caucasian · Russia
Vulnerable
Ayacucho Quechua
Quechuan · Peru
Vulnerable
Aymara
Aymaran · Bolivia
Vulnerable
Co-official language of Bolivia; Tiwanaku civilization language
Burushaski
Isolate · Pakistan
Vulnerable
Language isolate in Karakoram; proposed links to Harappan language
Central Nahuatl
Uto-Aztecan · Mexico
Vulnerable
Ch'ol
Mayan · Mexico
Vulnerable
Closest living relative to Classic Maya inscription language
Cusco Quechua
Quechuan · Peru
Vulnerable
Language of Inca Empire; still spoken in Andes
Hadza
Isolate · Tanzania
Vulnerable
Language isolate; click consonants; one of oldest human languages
K'iche'
Mayan · Guatemala
Vulnerable
Largest Maya language; Popol Vuh language
Lezgian
Northeast Caucasian · Russia
Vulnerable
Mam
Mayan · Guatemala
Vulnerable
Navajo
Athabaskan-Eyak-Tlingit · USA
Vulnerable
Largest Native North American language by speakers
Southern Quechua
Quechuan · Peru
Vulnerable
Tzeltal
Mayan · Mexico
Vulnerable
Tzotzil
Mayan · Mexico
Vulnerable
Yucatec Maya
Mayan · Mexico
Vulnerable
Most spoken Maya language; active epigraphy community
Brahui
Dravidian · Pakistan
Definitely Endangered
Isolated Dravidian language in Balochistan — near Indus Valley heartland
Choctaw
Muskogean · USA
Definitely Endangered
Hopi
Uto-Aztecan · USA
Definitely Endangered
Language of Pueblo Southwest; distinct script tradition
Kurukh
Dravidian · India
Definitely Endangered
Newari
Sino-Tibetan · Nepal
Definitely Endangered
Language of Kathmandu Valley; ancient Buddhist manuscripts
Sandawe
Isolate · Tanzania
Definitely Endangered
Language isolate with click consonants
Tojolabal
Mayan · Mexico
Definitely Endangered
Assyrian Neo-Aramaic
Afro-Asiatic · Iraq
Severely Endangered
Descendant of ancient Aramaic; endangered due to displacement
Cherokee
Iroquoian · USA
Severely Endangered
Own syllabary invented by Sequoyah 1820
Cheyenne
Algic · USA
Severely Endangered
Plains language; fewer than 2000 speakers
Hawaiian
Austronesian · USA
Severely Endangered
Major language revitalization effort underway
Lacandon
Mayan · Mexico
Severely Endangered
Last speakers in Lacandón jungle
Malto
Dravidian · India
Severely Endangered
Endangered Northern Dravidian language
Rapanui
Austronesian · Chile
Severely Endangered
Easter Island language; connected to undeciphered Rongorongo script
Surayt
Afro-Asiatic · Turkey
Severely Endangered
Tur Abdin Aramaic dialect, severely endangered
Itzaj
Mayan · Guatemala
Critically Endangered
Critically endangered; fewer than 10 fluent speakers
Mandaic
Afro-Asiatic · Iraq
Critically Endangered
Liturgical language of Mandaean Gnostics; descended from Eastern Aramaic
Mocho'
Mayan · Mexico
Critically Endangered
Severely endangered Mayan language
Puruborá
Puruborán · Brazil
Critically Endangered
Critically endangered Amazonian language
Classical Chinese
Sino-Tibetan · China
Dormant/Extinct
Literary language of Chinese civilization; connects to CHGIS data
Classical Mandaic
Afro-Asiatic · Iraq
Dormant/Extinct
Liturgical language; dormant
Classical Nahuatl
Uto-Aztecan · Mexico
Dormant/Extinct
Aztec empire language; extensive colonial-era texts
Coptic
Afro-Asiatic · Egypt
Dormant/Extinct
Last stage of ancient Egyptian; liturgical language only
Hittite
Indo-European · Turkey
Dormant/Extinct
Oldest attested Indo-European language; cuneiform script; Hattusa tablets
Jurchen
Tungusic · China
Dormant/Extinct
Language of Jin dynasty; own script; ancestor of Manchu
Luwian
Indo-European · Turkey
Dormant/Extinct
Hieroglyphic Luwian script; Bronze Age Anatolia; active decipherment research
Minoan (unclassified)
Unclassified · Greece
Dormant/Extinct
Language of Linear A script; completely undeciphered; extinct ~1450 BCE
Mycenaean Greek
Indo-European · Greece
Dormant/Extinct
Earliest attested Greek; Linear B script
Urartian
Hurro-Urartian · Turkey
Dormant/Extinct
Language of Kingdom of Urartu; related to Hurrian; extinct ~600 BCE
World Atlas of Language Structures · Max Planck Institute

WALS Online

2,662 Languages 192 Typological Features CC-BY 4.0
Coverage
~14%
of world's ~7,000 languages · deep typological data

The World Atlas of Language Structures (WALS) is the gold-standard typological database for comparative linguistics, edited by Matthew Dryer and Martin Haspelmath and published by the Max Planck Institute for Evolutionary Anthropology. It covers 2,662 languages across 192 typological features — from consonant inventories and tone systems to word order, case marking, and evidentiality.

Each language entry records its genealogical classification (family → genus), macroarea, ISO 639-3 code, and geographic coordinates. Features are documented at the level of individual chapters authored by specialists, making WALS a structured reference work as well as a queryable database.

For archAIology's scope, WALS is particularly valuable for mapping the typological context of ancient scripts — understanding what language families the Indus Valley, Linear Elamite, or Proto-Sinaitic scribes may have belonged to based on the distribution of typological features across the ancient Near East, Aegean, and South Asia.

Feature chapters
192 typological features organized into 11 domains: Phonology, Morphology, Nominal Categories, Nominal Syntax, Verbal Categories, Word Order, Simple Clauses, Complex Sentences, Lexicon, Sign Languages, Other.
Language sample
Balanced 100- and 200-language samples for cross-linguistic generalization. Stratified by genealogy and geography to avoid Eurasian overrepresentation.
For ancient scripts
Hittite, Luwian, Urartian, Classical Nahuatl, Mycenaean Greek, and Coptic all have WALS entries — providing the typological fingerprint for decipherment hypotheses.
Data access
Full dataset available as CLDF (Cross-Linguistic Data Formats) on GitHub. Python-accessible via pycldf. CC-BY 4.0.
Live database
WALS Language Browser
Open WALS ↗ Features ↗ Genealogy ↗
Phonology
Consonant inventories
Features 1–19 · tone, click, nasals
Word Order
SOV / SVO / VSO
Features 81–97 · order typology
Nominal Categories
Case, gender, number
Features 28–51 · inflectional morphology
Evidentiality
Grammaticalized source
Feature 77–78 · rare in ancient scripts
Lexicon
Color, body, kinship terms
Features 129–138 · semantic universals
Sign Languages
Visual-gestural systems
Features 139–142 · cross-modal comparison
Jump to → Hittite Greek Arabic Coptic Tamil Yucatec Maya Navajo Sumerian Full genealogy tree All 192 features
ORACC · University of Pennsylvania Museum · Steve Tinney

Cuneify

Live Tool Sumerian · Akkadian · 10+ languages CC-BY-SA 3.0
Script coverage
Full
ATF transliteration → Unicode cuneiform

Cuneiform is the world's oldest writing system, invented by the Sumerians around 3200 BCE and used continuously for over three millennia across Mesopotamia, Anatolia, and the Levant. Written by pressing a reed stylus into wet clay, it encodes Sumerian, Akkadian, Elamite, Hittite, Luwian, Hurrian, Urartian, and a dozen other languages of the ancient Near East.

Cuneify is a transliteration converter developed by Steve Tinney at the Open Richly Annotated Cuneiform Corpus (ORACC). It takes standard Assyriological transliteration — the romanized notation scholars use to transcribe cuneiform signs — and outputs the corresponding Unicode cuneiform characters. Type lugal and get 𒈗; type an-na and get 𒀭𒈾.

The tool accepts both Unicode and ASCII transliteration conventions — sz = š, subscript numerals as digits (e2 = e₂). Results open in the panel below — no font install required in modern browsers.

Input format (ATF)
Separate signs with hyphens (lu-gal), determinatives in braces ({d}en-lil2), logograms uppercase (DINGIR). Use digits for subscripts: u3 = u₃.
Languages supported
Sumerian, Old/Middle/Neo-Babylonian, Old/Middle/Neo-Assyrian, Hittite, Elamite, Hurrian, Urartian, Ugaritic. Sign repertoire follows the ORACC/CDLI standard sign list.
Unicode block
Cuneiform occupies U+12000–U+123FF (1,234 signs) + Cuneiform Numbers U+12400–U+1247F. Modern browsers render them natively without font install.
Live converter · ORACC CGI
Transliteration → 𒀭𒈾 Cuneiform
Open on ORACC ↗
Try an example
Signs & separators
Separate signs with - (hyphen) or spaces.
Compound words: e2-gal (palace), lu2-kal-la
Subscript digits
Use numbers instead of subscripts:
e2 = e₂ · u3 = u₃ · en-lil2 = Enlil
Special characters
sz = š · s, = ṣ · t, = ṭ
j or ng = ŋ (velar nasal)
Determinatives
Semantic classifiers in braces:
{d} = divine · {m} = male name
{ki} = place · {gesz} = wood
Tools & Ecosystem

Linguistic Research Tools

Annotation
ELAN (MPI Nijmegen)
Time-aligned multi-tier annotation of audio/video. Gold standard for endangered language documentation. Exports to JSON, FLEx, Praat TextGrid. Free/open source.
Lexicon & Morphology
FieldWorks Language Explorer (FLEx)
SIL tool for lexicon, morphology, and interlinear text. Exports .flextext for web display via LingView pipeline.
Corpus Query
KorAP / Khepri
Corpus query platforms for large-scale linguistic analysis. KorAP handles billion-word corpora with complex morphosyntactic queries.
Archive
ELAR (SOAS London)
Endangered Languages Archive. 1,000+ endangered language deposits. Open access metadata; audio/video for researchers.
Archive
ELDP / ELDP Corpus
Endangered Language Documentation Programme. Funds and archives field documentation projects worldwide.
Computational
pyglottolog
Python API for Glottolog data. Access 25,000+ languoid records, classifications, coordinates, and endangerment status programmatically.
Social Analysis
ConvoKit (Cornell)
Python toolkit for conversational structure analysis. Useful for analyzing discourse patterns in ancient texts.
Reference
Glottolog 5.x
Comprehensive language family classification and bibliography for 25,000+ languoids. CC-BY 4.0. Used for our language map layer.
Decipherment
TWKM / IDIOM
Bonn-Bochum Maya epigraphy platform. Digital corpus of Classic Maya inscriptions with 1,400+ sign catalog. classicmayan.org
Decipherment · Corpus
GEAS / Elamicon (OCLEI)
Alice Kober Society (Zurich). Unicode corpus browser for Linear Elamite, Byblos, Deir Alla, Raetic, Lepontic, Etruscan, Elder Futhark. RegEx search, dynamic syllabary, sign frequency analysis. Libre software. center-for-decipherment.ch ↗
Decipherment
Vesuvius Challenge
scrollprize.org — live ML competition to read Herculaneum scrolls. $1.7M+ in prizes. Open CT scan data available for research.
Standards
Unicode Consortium
Encoding proposals for Maya (U+15500), Rongorongo, and other undeciphered scripts. Enables digital interchange of ancient texts.
Corpus
MayaVase Database
Justin Kerr's rollout photographs of Maya ceramic vessels. 5,000+ images at Dumbarton Oaks. Used for WVU Artful Algorithms AI training.