Kilian Haefeli

Hi, I am Kilian. I am a Machine Learning Engineer and Researcher studying at ETH Zurich.

I am interested in Large Neural Networks, their training and generalization dynamics and how to scale them.

I previously interned Aleph Alpha and worked on Diffusion Models and Graph Neural Networks.

In another life I was a founding engineer at airica which we sold to logitech.

I occasionaly write about stuff: Blog

Email  /  GitHub  /  Google Scholar  /  LinkedIn  /  CV

profile photo


Currently I am researching on establishing the understanding the emergence of In Context learning and how it fits into our broader understanding of generalization.

project image

Efficient Neural Representation Learning for Star-Convex Boundaries

Kilian Haefeli
, 2022
website /

A neural representation model for predicting the temporal evolution of a phase boundary generated during 3D-printing.
We use a direct star-convex parameterization of the phase boundary vs learning the underlying temperature field. The parameterization at a single time is learned by a Graph Neural Network and over time as an RNN.

project image

Diffusion Models for Graphs Benefit From Discrete State Spaces

Kilian Haefeli, Karolis Martinkus, Nathanaƫl Perraudin, Roger Wattenhofer
Learning on Graphs Conference and NeurIPS 2022 GLFrontiers Workshop, 2021
arxiv / code / website /

Diffusion Model for Graphs using discrete Bernoulli Perturbations over edge connections. This approach results in maintained sparsity, and sampling with much less steps resulting in new SOTA graph generation.


Working with fast paced and sharp minded people has been one of the greatest experiences.

project image

Aleph Alpha

2023-10-01 / 2024-01-14
LLM Engineer Intern
website /

Working on Retrieval Augmented Generation for a chat application.

project image


2022-06-01 / 2022-10-01
Junior Data Scientist
website /

Built Recurrent Neural Nets for time-series prediction of Co2 concentration and room occupancy.

project image


2020-05-01 / 2022-09-01
Co-Founder & Junior Data Scientist
website /

Together with my friend Lukas Limacher and Vassilis Kalofolias, I started an IoT company specializing in models for predicting meeting room occupancy.
The company was acquired by Logitech in 2022.


I also study.

project image

University of Toronto Exchange


Attending UofT ECEE as a Graduate Exchange Student learning about Information Theory, Statistical Learning Theory and Parallel Systems.

project image

ETH Zurich Masters, EECS


Attending EECS masters, focussing on optimization and theory of Neural Networks as well as systems for Transformers.

project image

ETH Zurich Bachelors, EECS


Coursework focused on Systems, Algorithms and Machine Learning, with special focus on generative models.

Other Projects

project image

Flash Attention in C CUDA

website /

A cuda C implementation of Flash Attenton without using any libraries such as cublas or cutlass. In ~300 lines of code this kernel is faster and more memory efficient than the standard PyTorch attention module.

project image

Attention in C CUDA

website /

A cuda C implementation of the Attention operator. Multiple increasingly optimized versions of Matrix Transpose, Matmul and Softmax kernels are provided.

project image

Optimizable DeepPoly

website /

Implemented an optimizable Neural DeepPoly Verifier implemented as torch Modules. Compatible wth any Optimizer Setup that PyTorch has to offer. This serves to efficently and tightly verify neural networks on adversarial robustness.

Design and source code from Jon Barron's website