Menu

Data Science & Statistics

Our group combines expertise in different aspects of computer science (data mining, machine learning, optimization, artificial intelligence), statistics (extreme value theory, Bayesian inference, multivariate analysis), and bioinformatics (analysis of biological networks and large-scale biomedical data).

Data Science” is an increasingly expanding new field that focuses on theory and practice of learning from data. We can interpret the name “data science” in two ways:

  1. The science of data. This would be a scientific field that explores how to manage, analyse, or use data (or information), which could be seen as a subset of computer science/informatics and translates literally to “datalogi” in Danish (although “datalogi” means computer science and thus also includes other aspects that are not of particular interest in “data science” such as, e.g., theoretical computer science or operating systems).
  2. Science from data. This interpretation would relate to the process of learning, to the methods used to create knowledge from data, or to the methodology of deriving valid insights from data. In this way it could be seen as a variant of statistics, but it also relates to theory of science and to theory of learning (as studied in machine learning or more general in artificial intelligence). However, this interpretation also aligns with the so-called “4th paradigm”, describing the transformation in many academic fields that is leading to sciences being more strongly based on the (semi-) automated analysis of (big) data (examples are bioinformatics, computational biomedicine, cheminformatics) or new ways of doing research in other disciplines (e.g., digital humanities, computational history).

In our group, we connect between computer science and statistics and subscribe to both interpretations of “data science”. In our research in data science we develop and evaluate methods for data analysis (data mining, machine learning, statistics, operations research, analytics), we strive to improve our way of understanding data and of gaining insights from data (visualization techniques, optimization), and we connect to various areas to apply learning from data in practice as well as to gain insights and to create knowledge and value from data in collaboration with partners in other academic fields, in companies, or in the public sector.

Topics of research

  • data mining
  • machine learning
  • optimization
  • extreme value theory
  • Bayesian inference
  • multivariate data analysis
  • bioinformatics 

Funding and external collaborations/partners

Odense Kommune

We collaborate on traffic analysis, modelling and simulation, traffic light control, and bus line planning.

Urban traffic is a complex system involving individual decisions of participants (pedestrians, cyclists, drivers of cars, trucks, and vehicles for public transport) as well as influential decisions of city planners on the layout of streets, regulation systems (e.g., traffic lights), and routes for public transport, or temporarily invasive decisions in planning building places.

One of the main applications in the analysis of urban traffic data is the detection of anomalies (outliers) in the traffic flow by the application of adapted outlier detection techniques.

By building a mathematical model the traffic system in a city, we can also evaluate simulations or what-if-scenarios (e.g., to study the impact of closing or changing roads) and we can optimize the management of traffic light control or the layout of bus routes.

Aviation Cloud

We collaborate on the design of efficient algorithms for the optimization of flight routes.

The number of passengers in the airline industry has doubled in the last 12 years and the prognosis is that this trend will continue. It poses important issues on security and pollution. A large number of optimization problems are solved to make airways traffic possible. Among them, flight routes must be carefully planned in order to satisfy restrictions imposed by the national and international traffic control institutions and minimize the costs determined by fuel consumption.

Only in EU there are 16000 restrictions to be taken into account when calculating the cheapest routes while a saving in fuel consumption has a direct impact on the reduction of CO2 emissions in the environment.

An industrial PhD was involved in this project, which yielded new fast algorithms to be implemented in the software that the company is selling for private pilots.

DONG Energy (now Ørsted)

We optimized the long term planning of electricity and heat production.

The energy sector in Europe faces huge challenges due to the need of drastically reducing CO2 emissions by 2035. To realize the phase out of fossil fuel as primary source for the production of heat and electricity a large number of investments are currently undertaken by power utility companies and they need aiding tools in their decision making processes.

To this end we studied the long term unit commitment problem with heating constraints to model the Danish energy market and yield predictions on its development in case of different strategic choices taken by energy producers. We also developed a model to decide on which biomass contracts to buy, taking the future uncertainty into account by including a number of scenarios for the future demand and prices.

Finally, we developed an algorithm for maintenance scheduling of production plants and for routing tugs and barges that distribute biomass and fossil fuels in the Danish territory.

Members of group

  • Arthur Zimek
  • Fernando Colchero
  • Hans Christian Petersen
  • Jing Qin
  • Marco Chiarandini
  • Peter Schneider-Kamp
  • Richard Röttger
  • Yuri Goegebeur

Contact person  

Arthur Zimek
Website: imada.sdu.dk/~zimek
E-mail: 
zimek@imada.sdu.dk

Publications

B. Jørgensen & H. C. Petersen (2012): Efficient estimation for incomplete multivariate data. Journal of Statistical Planning and Inference, vol. 142: 1215-1224.

 

F. Colchero, R. Rau, O. R. Jones, J. Barthold, D. A. Conde, A. Lenart, L. Nemeth, A. Scheuerlein, J. Schoeley, C. Torres, V. Zarulli, J. Altmann, D. K. Brockman, A. M. Bronikowski, L. M. Fedigan, A. Pusey, T. S. Stoinski, K. B. Strier, A. Baudisch, S. C. Alberts, J. W. Vaupel (2016): The emergence of longevous populations. PNAS 113(48): E7681-E7690 (Recipient of Cozzarelli Price by the Editorial board of PNAS in 2016)

 

G. O. Campos, A. Zimek, J. Sander, R. J. G. B. Campello, B. Micenková, E. Schubert, I. Assent, M. E. Houle (2016): On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Disc 30(4): 891-927 (ACM Computing Reviews: 21st Annual Best of Computing: Notable Article)

 

M. Chiarandini, R. Fagerberg, S. Gualandi (2017): Handling preferences in student-project allocation. Annals of Operations Research.

 

M. Escobar-Bach, Y. Goegebeur, A. Guillou (2018): Local robust estimation of the Pickands dependence function. Annals of Statistics

 

Unofficial group website

dss.sdu.dk

To give you the best possible experience, this site uses cookies Read more about cookies

Accept cookies