I'm Vishnu - currently building AI at CourtCorrect. Previously ML at Vodafone, contributor at Cohere. I write about LLMs, RAG, agents, and the parts of model serving nobody warned you about.
A quick tour through FlashAttention, paged KV-cache, speculative decoding and friends - what each one actually changes.
Notes from wiring an LLM to messy real-world tabular data - schema inference, tool design, and the failure modes you only see in production.
How Monzo scaled from 4M to 8M customers with a tiny infra team - the talks that mattered for ML platform teams.