Collecting and Harvesting Big Data

Collecting and Harvesting Big Data

This research area focuses on harvesting, linking and analyzing individual data from digitized/scanned structured handwritten historical documents to study the relationship between health and long-run development. Collaborating with the Center for Big Data Analytics and Digitization (BDAD), the research within the area focuses on:

  • AI based methods and tools for automatically identification and segmentation of tables and table structures in (possibly) degraded historical documents.
  • Character and digit recognition with special focus on handwritten names, dates, ages, mortality counts, birth weights and income.
  • Linkage of transcribed historical data on individuals in Scandinavia to administrative registers.
  • Natural language processing using word embeddings and deep learning methods with special focus on named entity recognition.
  • Statistical theory for estimation and inference when samples are collected using AI based automated transcription methods.