This research area focuses on harvesting, linking and analyzing individual data from digitized/scanned structured handwritten historical documents to study the relationship between health and long-run development. Collaborating with the Center for Big Data Analytics and Digitization (BDAD), the research within the area focuses on:
- AI based methods and tools for automatically identification and segmentation of tables and table structures in (possibly) degraded historical documents.
- Character and digit recognition with special focus on handwritten names, dates, ages, mortality counts, birth weights and income.
- Linkage of transcribed historical data on individuals in Scandinavia to administrative registers.
- Natural language processing using word embeddings and deep learning methods with special focus on named entity recognition.
- Statistical theory for estimation and inference when samples are collected using AI based automated transcription methods.