# Advanced Biostatistical methods in Health Sciences - a bootcamp course

The appropriate and efficient statistical analysis of data collected in health sciences often requires advanced techniques which are only covered at an elementary level in introductory statistical courses.

This course aims at researchers in need of an overview about appropriate analytical methods and discussions with statisticians to be able to solve their problem.

This course offers instruction in several statistical topics useful in health sciences. When registering to the course participants are asked to identify the main statistical topics relevant for their project.

The course will be organized as a 'boot camp' where the participants gather for two days at a venue for fruitful and intensive interaction between health science researchers and statisticians.

## List of potential topics

Please remember to send a choice of 3 topics and a brief one-page description of your PhD project (see registration)

1. Analysis of repeated and longitudinal measurements
When you collect the same type of measurements on a patient repeatedly over time (longitudinal data) you face at least two statistical problems: how to account for the correlation induced by the repeated measurements on the same experimental subject and how to model the course over time in clinical relevant terms. The theory of mixed models and generalized estimating equations offer a rich class of solutions to the problems both for normally distributed responses as for categorical ones (like e.g. binary observations).

2. Study design: Randomized clinical trials and observational studies
We discus several aspect of study design, where the main distinction is between randomized clinical trials (RCT) and observational studies. RCT’s are considered the most appropriate design to analyze the effectiveness of a medical intervention allowing and providing a causal interpretation. Nevertheless, several aspects like randomization, have to performed in a proper way to be able to harvest these features from an RCT. We discuss these aspects and possible limitations of RCTs . For observational studies we review the basic concepts of cohort and cross-sectional designs. We discuss the selection of informative covariates, when to adjust for confounding factors and propose methods to reduce the influence of unmeasured confounding.

3. Statistical genetics
Over the last decades genetic data (often high-dimensional) have become more and more common, such as genotype data or methylation data. A wide range of statistical methods (and software) have been applied and/or developed in order to handle, to explore and to draw inferences from these types of data. Examples are methods to test and predict in GWAS (Genome wide Association Studies), to analyze rare variants, depict biological pathways, investigate population structure, impute missing values or perform linkage analysis in family studies. Because of the high dimensionality of the data the topic of multiple testing deserves special attention.

4. Survival analysis
In survival analysis on analyses times to the occurrence of some event like the time to remission of cancer since end of the treatment period. An important feature of such data is the occurrence of censored observation times where a patient leaves the study before the event of interest has been observed. We will discuss non-model based (non-parametric) and model based approaches to analyze such time to event data.

5. Constructing and validating scales from questionnaires
Factor analyses and structural equation models (SEM) are useful tools for exploring and validating new scales describing underlying dimensions or latent variables among large sets of variables or questions from a questionnaire. When using established questionnaire scores or scales (e.g. SF-36, EORTC QLQ-C30) in new populations, Crohnbach’s alpha can provide a crude validation of the scale in the population, while SEM offers a more comprehensive validation.

6. Meta analysis
Meta analysis aims at combining and summarizing the evidence of medical effects already reported in published studies. It uses methods of fixed effects or random effect regression modelling. Meta analysis I becoming an important ingredient to present the already known knowledge in the planning phase of new studies.

7. Causal modelling in medicine
Clinical observational studies do not possess the same causal strength for treatment effect as randomized clinical trials. This is mainly due to the impossibility to control the confounder distribution in the treatment groups. Another problem is the time dependent treatment adjustment which often is affected by previous response. Methods like inverse probability weighting or g-estimation try to account for this imbalance to estimate a causally interpretable effect.

8. Artificial intelligence methods
Tasks like the automatic, data-driven classification of the actual or future health status of patients often need a huge amount of data collected per patients to arrive at reliable results. The amount, complexity and diversity of information (images, health records)  require extension of classical statistical model building as neural nets or penalized regression approaches. We will provide an example of such a modelling approach.

## Course Schedule:

 Wednesday, 22. 01. 2020 9:00 - 9:30 Breakfast 9:30 - 9.45 Welcome 9:45 - 10:45 Flash presentation of participants and projects Participants give a short oral presentation of themselves and their projects 10:45 - 11:00 Coffee break 11:00 - 11:45 Lecture 1 : Longitudinal data 12:00 - 13:00 Lunch (sandwich-boller) Optional walk to Lerbjerg from 12:30 13:00 - 13:45 Practical 1 : Analysis of a longitudinal dataset 14:00 - 16:30 Work on projects in smaller groups Coffee/tea and cake/fruit will be available from 15:00 18:00 - 19:00 Dinner Pizza 19:00 - 21:00 Further discussion about some projects
 Thursday, 23. 01. 2020 8:30 - 9:00 Breakfast 9:00 - 9.15 Recollection of the previous day 9:15 - 10.00 Lecture 2: Study design: Randomized clinical trials and observational studies 10:00 - 10:15 Coffee break 10:15 - 11:00 Lecture 3: Evaluation of predictive modelling 11:15 - 12:00 Work on projects in smaller groups 12:00 - 13:00 Lunch (sandwich-boller) Another optional walk 13:00 - 14:45 Work on projects in smaller groups Coffee/tea and cake/fruit will be available from 14:00 15:00 - 16:00 Discussion and evaluation

## Course site

The course site is the Svanninge Bjerge Forsknings- og Feltstation (http://svanninge.sdu.dk/). There will be 16 twin bedrooms available. Participants are required to share rooms.

## Course costs

The course is included in the course programme of the SDU PhD school and is free for PhD students. Researchers from SUND Odense and SUND Region of Southern Denmark have to pay 600 DKK and other researchers 5263 DKK.

You will recieve an invoice which has to be paid latest December 15th, 2019.

## Registration and preparations for the course:

Please send until 30. November 2019 the an e-mail to Tina Ludvig-Nymark (tludvig-nymark@health.sdu.dk) the following content.

1. Indicate your working place and position (PhD student or not).
2. Choose up to three topics from the list you would like to be discussed at the course.
3. Indicate whether you will stay at the venue overnight and whether you have special requests e.g. food allergies or vegetarian.
4. Fill out the LINKED document with a brief one-page description of your project.  The document will be used in the decision of the themes of the lectures and to present your project to the other participants. Please attach this document to your mail.

After the registration mail you will receive a confirmation mail.

Svanninge Bjerge

##### Svanninge Bakker

Beautiful natural Danish resort