Odunayo Ogundepo

Blog

Thoughts on NLP, multilingual systems, evaluation, and building AI systems that work in the real world.

An Easier Way to Set Up flash-attn

A practical fix for the common flash-attn setup failure mode: install the exact wheel that matches your Python, PyTorch, CUDA, and architecture instead of hoping a generic pip install lines up.

A Simple Utility for Context Caching with Hugging Face

Generation caches in Transformers are not the same thing as reusable context caches. This post shows a practical utility for reusing a fixed system prompt across independent requests without leaking prior conversation history.

Built from scratch by me and Claude :)