Skip to main content

Principles and tools for valid quantitative data – defend your hypothesis not your data

Course description
Quantitative empirical research must be based on valid data. The foundation of this lies in principles of securing proper metadata, appropriate variable composition in relation to the research topic under study (Conceptual model), but also the methods applied from data definition to analysis. The course will contain a mixture of discussions and practical exercises in securing quality assured and validated data. The course exercises will be made with EpiData, Inkscape and Stata software. The principles used are of a general nature when you work with quality assurance of quantitative empirical data and may be applied regardless of which specific software and database you use.

Elements contained  in the course are:

  • Considerations in the creation of data structures from a conceptual model

  • How to create data structures and documentation at project and variable level
  • Clean raw data from scratch for analysis with appropriate documentation, including principles of determine number of observations with sufficient information and  level of missing data.
  • Participants become aware of how to combine official and informal help attained through searching on internet. In particular for “community software” and Stata

After the course it is expected that the participants:

  • Understand  principles and standards for handling empirical data from scratch  to analysis.
  • Are introduced to general programming principles applied in reproducible datamanagement.
  • Can apply principles of data validation (double entry, visual verification, completeness and conformity to data definitions) with EpiData software.
  • Can create a conceptual model of their own study with Inkscape.
  • Have gained basic experience in preparing a documented analysis ready dataset from scratch in terms of appropriate metadata, number of observations and handling of missing data.
  • Are able to install open-source software (EpiData & InkScape).

Participants are expected to spend  time on reading of  scientific papers documenting data quality, completing exercises and solutions between course days. Bring your own computer (Mac/Windows/Linux) and make sure you have administrator rights to install software (or know who to contact for installation).

For course approval: The participants must

  • create a conceptual model for their own project (in SVG format).
  • Create a publication ready graph to a given scientific journal (vector graphic) based on a “raw” analytic graph.
  • Must document key metadata elements from their own study in the form of elements  from the Dublin Core Standard (typical biographic descriptors)

Course Litterature (examples):
Danish Code of Conduct for Research Integrity (chapter 2).
Paulsen, A., Overgaard, S. & Lauritsen, J. M. Quality of data entry using single entry, double entry and automated forms processing--an example          based on a study of patient-reported outcomes. PLo S One. 7, 4, s. e35087, 2012
Ohmann, C., Canham, S., Demotes, J., Chêne, G., Lauritsen, J. et al. Raising standards in clinical research: The impact of the ECRIN data centre certification programme, 2011–2016. Contemporary clinical Trials Communications. 2017 : 5, s. 153-159
Rieder HL, Lauritsen JM. Quality assurance of data: ensuring that numbers reflect operational definitions and contain real measurements. Int J Tuberc Lung Dis. 2011 Mar;15(3):296-304.

Course leader: Jens Lauritsen
Number of participants: 12

ECTS credits: 2

Course fee
The course is free of charge for PhD students enrolled in Universities that have joined the "Open market agreement".


Last Updated 26.02.2019