Holger's Notes

Projects, articles and posts

Introduction

My name in Holger Hoefling and I am a Statistician, Machine Learner and Bioinformatician. For the last 9 years I have worked for Novartis in various role, currently as the Lead of the machine learning group in Scientific Data Analysis, NX, NIBR.

In my career I have worked on a wide range of data science problems in the healthcare field. This experience spans from working on clinical study protocols, the development of predictive algorithms for kidney disease to the statistical analysis of experiments in early non-clinical research with microarray, NGS and other biomarker data.

More recently I have been the leader of a small team of machine learners working on applications in cheminformatics as well deep learning on histopathology images.

Publications

In December 2019, we published the article A deep learning model of normal histology on biorxiv.

In this article we introduce deep learning models characterizing the diversity of normal histology. We show that the models can recognize a wide array of tissue types in rat with high accuracy and the embeddings learned by the models also showed sub-clusters of structures within tissues that the model did not explicitly learn. These embeddings can also be used to classify tissues in species other than rat with minimal retraining.

Please also have a look at my other publications.

hdf5r

An R package for reading and writing HDF5 data, implementing most of the functionality of the C-API in R. It also has a convenient high-level interface that allows users to interact with it similar to regular arrays in R.

Potential places of interest are:

Other interests

When I don’t program for fun, I enjoy hiking, cycling as well as archery.

Latest Posts

Reinforcement learning Nanodegree finished

I just finished the Reinforcement Learning Nanodegree on Udacity (Certificate). Reinforcement Learning is a cutting edge research field where very important advances using Deep Learning are being made very quickly. Some experts even believe that this can be the path to true artificial intelligence.

flsa version 1.5.2 released

Today I pushed a maintenance release of the flsa package. The changes are all under the hood and are related to errors in the One Definition Rule for C++ (ODR) for which newer compilers have more strict checks. Furthermore, some warnings from the gcc have been fixed as well.

Cluster computation in Python

Setting

In scientific computing, it regularly happens that the computational power provided by a single laptop or workstation is not enough to perform the required calculations in an acceptable timeframe. In that case it is necessary to use more resources as are often provided in high performance computing environments. In this post, I will assume that the HPC cluster is run by a scheduler like Grid Engine or Slurm. But for it to be applicable to only assumption is that an “array” job can be started from the command line. Even on a single compute node, the same technique can be used together with the GNU parallel program on Linux (although there are other options that are usually more appropriate).