Collecting and Harvesting Big Data

Collecting and Harvesting Big Data

This research area focuses on harvesting, linking and analyzing individual data from digitized/scanned structured handwritten historical documents to study the relationship between health and long-run development. Collaborating with the Center for Big Data Analytics and Digitization (BDAD), the research within the area focuses on:

  • AI based methods and tools for automatically identification and segmentation of tables and table structures in (possibly) degraded historical documents.
  • Character and digit recognition with special focus on handwritten names, dates, ages, mortality counts, birth weights and income.
  • Linkage of transcribed historical data on individuals in Scandinavia to administrative registers.
  • Natural language processing using word embeddings and deep learning methods with special focus on named entity recognition.
  • Statistical theory for estimation and inference when samples are collected using AI based automated transcription methods.

Big Data

Please address queries to the researcher responsible for this area: Christian Møller Dahl

Other HEDG researchers who have worked or are working on this topic are:
Peter Sandholt Jensen

PhD students:
Emil Nørmark Sørensen
Christian Emil Westermann
Simon Friis Wittrock
Torben Johansen