Nils spends 4,000+ hours with LLMs

0 / 10,000 h 0%

from Dec. 2023 till

I build production AI systems that make messy operational work measurable, faster, and easier to trust.

Hours gathered during thesis, research, implementation, problem solving, writing code, removing code, and reviewing code.

Current Role Junior ML Engineer

REWE Group — one of Europe's largest retail groups. Production AI and enablement.

Focus Controllable LLM Systems

Practical AI systems that fit real workflows and stay measurable.

Published ACM CHI 2025

Peer-reviewed work on context-aware emotion analysis with LLMs.

Timeline.

Portrait of Nils Klüwer

I build LLM systems that stay boring, measurable, and controllable — because that's where they actually create business value. Research background, production mindset.

Based in Cologne. Currently at REWE Group. Working across production AI, evaluation, and applied LLM systems.

2025 — today

Junior ML Engineer, REWE Group

Production AI applications, agentic engineering, evaluation, and enablement.

2024 — 2025

Working Student, Research & Innovation Engineering

Agentic systems, evaluation, and proof-of-concept architectures.

2022 — 2025

M.Sc. Business Informatics, TU Wien

LLM systems, evaluation, and context-aware emotion analysis.

2018 — 2022

B.Sc. Business Informatics, WWU Münster

Foundation in business systems, software, and applied AI.

Production AI, Evaluation & Enablement.

Switch between Business and Dev in the header to see the same work at the right level of detail.

Cluster 01

Agentic Engineering

Let AI handle the boring parts of software delivery.

  • Set up AI assistants for repetitive coding, review, and deployment tasks
  • Help teams ship faster and spend less time on mechanical work
  • Design patterns that stay reliable in real production environments
  • Turn codebases into places where both humans and AI can work effectively
Cluster 02

GenAI Applications

Build real products powered by language models.

  • Production AI products live across internal workflows and user-facing experiences
  • Measure whether an AI feature is actually good enough to ship
  • Own the full path from idea to deployed product
  • Work with text, images, audio, and combinations of them
  • Build AI features that fit real business workflows, not demos
Cluster 03

Enablement

Help teams actually adopt AI, not just experiment.

  • Advise on where AI creates value and where it does not yet
  • Help colleagues use AI tools in daily work with clear guardrails
  • Translate hype into concrete, usable operating habits
  • Focus on adoption that survives beyond one-off pilots

Cloud-native stack for building and running LLM applications at production scale, with a strong bias toward evaluation and observability.

Capabilities

  • Production LLM application development
  • Evaluation pipelines and quality measurement
  • Agentic workflows and AI-assisted engineering
  • Cloud deployment and data pipelines

Understand where LLMs actually create value. Prefer simple, controllable usage over unnecessary complexity. Maintain and improve systems with a mix of human control and agentic harnesses. Make quality evaluable and quantifiable through custom evaluations.

Packaging → shop-ready labels.

A production system at REWE Group that turns product packaging photos into structured master data — so the online shop can legally match what customers see in store, and new products reach the shelf faster.

Context

The master data team came to us: manual label transcription wasn't keeping up. Consumer-protection law requires the online shop to match the physical packaging (country of origin, allergens, nutrition). Trend products need to be listed fast. And supplier data is fragmented — there's no universal API for "what is printed on this package". Photos and videos are.

What I Built

Two tools, shipped by a team of two in six months:

  • Extraction pipeline — photos/videos in, structured labels out. Reviewers approve before anything reaches the master data system.
  • Goldhamster — a custom evaluation tool. Gold dataset, scoring dashboard, in-place gold editing, re-scoring without re-running the pipeline. Makes quality measurable and the path to higher accuracy concrete.
Outcome
  • ~96% label-level accuracy on real photos from stores and logistics centres — not curated benchmark images
  • ask me for unit economics — 25-75x cheaper than market-available solutions.
  • Under one minute end-to-end, from upload to extracted labels
  • Happy stakeholder, measurable quality, clear path toward full automation
What I Learned

The hard part wasn't the model. The stakeholders own the problem; I own the solution. The real work is the translation layer between them — explaining error modes, accuracy trade-offs, and why "96% with a good review UI" beats "100% someday". Good engineering here means knowing which decisions are mine to make and which ones aren't.

Tech Stack
  • GCP
  • Snowflake
  • Gemini (vision)
  • Structured Output
  • LLM-as-Judge
  • Goldhamster (custom eval)
  • Agentic-coded UI
More

Architecture details, evaluation methodology, and lessons from production are best discussed in conversation.

Reach out on LinkedIn

Research on Context-Aware Emotion Analysis.

The research thread is the same one I use in production: make LLM behaviour legible, testable, and grounded in something more robust than intuition. That mindset shaped how I learned to build measurable systems.

Publication · ACM CHI EA 2025

Context over Categories: Implementing the Theory of Constructed Emotion with LLM-Guided User Analysis

Author Nils Kluewer
Where Yokohama, Japan
Publisher ACM Digital Library

A context-aware LLM pipeline based on Lisa Feldman Barrett's theory of constructed emotion.

Read publication
Master Thesis · TU Wien

Context over categories: implementing the theory of constructed emotion

Author Nils Kluewer
Where TU Wien, Vienna
Publisher TU Wien Repositum

Full operationalization via the "context sphere" - a user-specific construct for nuanced emotion analysis.

Read thesis

Let's talk.

Best reached via LinkedIn. Happy to discuss applied AI, agentic engineering, evaluation, or research collaborations.