Career Profile

My passion is to make advances in healthcare using data science and machine learning. I am the lead of a group of 5 machine learning associates in NIBR, NX as well as one of the leads of the “AI and Tools” work-package of the IMI - BigPicture consortium, aimed at advancing the state of AI in histopathology.

In my career I have worked on a wide range of data science problems in the healthcare field. This experience spans from working on clinical study protocols, the development of predictive algorithms for kidney disease to the statistical analysis of experiments in early non-clinical research with microarray, NGS and other biomarker data.


Lead of Data Science - Machine Learning group

2018 - Present
NX - NIBR, Novartis, Basel

Lead of currently 5 associates dedicated to machine learning in the Scientific Data Analysis (SDA) group in NIBR Informatics (NX). The group focuses on application of machine learning in cheminformatics as well as the application of deep learning models on histopathology data. Promoted to Associate Director in April 2021.

Data Scientist in Scientific Data Analysis group

2013 - 2018
NX-NIBR, Novartis, Basel
  • Bioinformatics analysis of RNA-Seq and microarray data.
  • Statistical Analyses of various experiments with a focus on complicated experimental designs or multiple endpoint questions.
  • Machine Learning on accelerometer data to develop an algorithm for prediction of gait speed.
  • Deep learning on histopathology images to detect tissue types/lesions.
  • Promoted to Senior Investigator I from Investigator III in Sept. 2017.

Expert Statistician in Companion Diagnostics

2012 - 2013
Novartis Oncology, Basel

Merging of “Molecular Diagnostics (MDx)” into Oncology as the new “Companion Diagnostics (CDx)” group. As part of the merger, promoted to “Expert Statistician”. Job responsibilities include

  • Statistical data analysis for “Nephrotest” project developing a diagnostic for detecting Acute Kidney Injury.
  • Biomarker statistician for Oncology programs Exjade, SOM230 and GP2019.

Recognized with Passion, Quality and Speed Award in 2013 for work on SOM230 project

Senior Statistician in Molecular Diagnostics (MDx)

2010 - 2012
Novartis, Basel

Main responsibility was the analysis of biomarker data to develop diagnostic algorithms, including the writing of study plans, analysis of clinical trial data as well as presenting results internally and to regulatory agencies. Additional duties included the supervision of interns as well as review of results and study plans from other groups.

Recognized by MDx for performing duties above and beyond expectations for 3 different projects.


OpenLearn online university


PhD in Statistics

2004 - 2009
Stanford University

In my studies at Stanford I focused on topics in statistical learning under my advisor Prof. Dr. Rob Tibshirani. My thesis investigated the estimation of sparse binary Markov networks as well as penalized regression techniques such as the Fused Lasso and resulted in several publications.

MA in Statistics

2002 - 2003
University of Michigan, Ann Arbor

During my studies at the University of Ulm, I participated in a Fulbright Exchange program and studied for one year at the University of Michigan in Ann Arbor, finishing the program with a Master’s Degree in Statistics at the top of my class. The courses included Probability Theory, Theoretical Statistics as well as Applied Statistics.

Diploma (Master equivalent) in Mathematics and Economics

1999 - 2004
University of Ulm, Germany

After high school I studied at the University of Ulm Mathematics and Economics towards a Diplom (equivalent to a Master degree). My main focus was probability theory, statistics and Finance and finished my Diploma thesis under my advisor Prof. Dr. Rüdiger Kiesel.


A selection of recent publications. For a full list please visit my homepage

  • A deep learning-based model of normal histology
  • Hoefling H*, Sing T, *Hossain I, Boisclair J, Doelemeyer A, Flandre T, et al. (* contr. equally)
    Toxicol. Pathology, SAGE Publications 2021.
  • Biomarker-Based Classification and Localization of Renal Lesions Using Learned Representations of Histology—A Machine Learning Approach to Histopathology
  • Freyre CAC, Spiegel S, Gubser Keller C, Vandemeulebroecke M, Hoefling H, Dubost V, et al.
    Toxicol. Pathology, SAGE Publications 2021.
  • Continuous Digital Monitoring of Walking Speed in Frail Elderly Patients: Noninterventional Validation Study and Longitudinal Clinical Trial.
  • Mueller A, Hoefling HA, Muaremi A, Praestgaard J, Walsh LC, Bunte O, et al.
    JMIR mHealth and uHealth. 2019;7:e15191.
  • Validity of accelerometry in step detection and gait speed measurement in orthogeriatric patients.
  • Keppler AM, Nuritidinow T, Mueller A, Hoefling H, Schieker M, Clay I, et al.
    PLOS ONE. 2019;14:e0221732.
  • Reproducible Research for large scale data analysis.
  • Hoefling H, Rossini A.
    Implementing Reproducible Research. New York: Chapman and Hall/CRC; 2014. p. 219–40.

    Open-source software projects

    Writing software to help other data scientist with their work has always been an important part of what drives my research. Some of the packages that I have developed for the R programming language can be found below.

    hdf5r - HDF5 is a data model, library and file format for storing and managing large amounts of data. The hdf5r package provides a nearly feature complete, object oriented wrapper for the 'HDF5' API
    flsa - Implements a path algorithm for the Fused Lasso Signal Approximator. For more details see the help files or the article by Hoefling (2009).
    neariso - Implements a path algorithm for Near-Isotonic Regression.

    Skills & Proficiency

    R (Statistics, Package programming)

    Python (Packages, Machine Learning)

    Linux (CLI, makefiles, compile from source)

    Microsoft Office