AI & ML interests
Historical Media Analysis and Enrichment
Recent Activity
Impresso - Media Monitoring of the Past is an interdisciplinary research project using machine learning to transform how historical media are processed, enriched, explored, and studied across modalities, languages, time periods, and national borders.
We develop the 🚀 Impresso Web App and the 🔬 Impresso Datalab, providing access to a large multilingual corpus of historical newspapers and radio broadcasts.
🤖 Models and 📚 datasets
- 🤖 Impresso models for historical multilingual documents, including language identification, OCR quality assessment, topic inference, NER, and NEL.
- 📚 Impresso datasets curated from digitized historical media sources for ML development and evaluation. Upcoming releases include NER and NEL benchmarks from the HIPE evaluation campaign, an image type classification dataset, and more.
🏛️ Partners and funding
Impresso gratefully acknowledges the continued support of its cultural heritage partners, as well as funding from the SNSF (Grant Nos. CRSII5_173719 and CRSII5_213585) and the FNR (Grant No. 17498891).
spaces 13
Impresso Topic Explorer
Topic model aggregate exploration for the Impresso corpus
Impresso Topic Explorer
Topic model aggregate exploration for the Impresso corpus
Ocrqa Exploration
OCR Quality Exploration on Impresso Corpus
Ad Classification Exploration
Explore yearly ad and non-ad distributions in Impresso