AI Engineer · Cologne, Germany

Nils spends 4,000+ hours with LLMs

0 / 10,000 h 0%

from Dec. 2023 till

I build production AI systems that make messy operational work measurable, faster, and easier to trust.

Hours gathered during thesis, research, implementation, problem solving, writing code, removing code, and reviewing code.

Message me on LinkedIn GitHub See work

Current Role AI Engineer

REWE Group — one of Europe's largest retail groups. Production AI and enablement.

Focus Controllable LLM Systems

Practical AI systems that fit real workflows and stay measurable.

Published ACM CHI 2025

Peer-reviewed work on context-aware emotion analysis with LLMs.

About

Timeline.

I build LLM systems that stay boring, measurable, and controllable — because that's where they actually create business value. Research background, production mindset.

Based in Cologne. Currently at REWE Group. Working across production AI, evaluation, and applied LLM systems.

06/2026 — today

AI Engineer, REWE Group

Agentic engineering for production systems, evaluation of LLM systems, and enablement.

05/2025 — 06/2026

Junior ML Engineer, REWE Group

Production AI applications, agentic engineering, evaluation, and enablement.

2024 — 2025

Working Student, Research & Innovation Engineering

Agentic systems, evaluation, and proof-of-concept architectures.

2022 — 2025

M.Sc. Business Informatics, TU Wien

LLM systems, evaluation, and context-aware emotion analysis.

2018 — 2022

B.Sc. Business Informatics, WWU Münster

Foundation in business systems, software, and applied AI.

Work

Production AI, Evaluation & Enablement.

Switch between Business and Dev in the header to see the same work at the right level of detail.

Cluster 01

Agentic Engineering

Let AI handle the boring parts of software delivery.

Set up AI assistants for repetitive coding, review, and deployment tasks
Help teams ship faster and spend less time on mechanical work
Design patterns that stay reliable in real production environments
Turn codebases into places where both humans and AI can work effectively

Cluster 02

GenAI Applications

Build real products powered by language models.

Production AI products live across internal workflows and user-facing experiences
Measure whether an AI feature is actually good enough to ship
Own the full path from idea to deployed product
Work with text, images, audio, and combinations of them
Build AI features that fit real business workflows, not demos

Cluster 03

Enablement

Help teams actually adopt AI, not just experiment.

Advise on where AI creates value and where it does not yet
Help colleagues use AI tools in daily work with clear guardrails
Translate hype into concrete, usable operating habits
Focus on adoption that survives beyond one-off pilots

Tech Stack

Cloud-native stack for building and running LLM applications at production scale, with a strong bias toward evaluation and observability.

Capabilities

Production LLM application development
Evaluation pipelines and quality measurement
Agentic workflows and AI-assisted engineering
Cloud deployment and data pipelines

Personal Focus

Understand where LLMs actually create value. Prefer simple, controllable usage over unnecessary complexity. Maintain and improve systems with a mix of human control and agentic harnesses. Make quality evaluable and quantifiable through custom evaluations.

Featured Case Study

Packaging → shop-ready labels.

A production system at REWE Group that turns product packaging photos into structured master data — so the online shop can legally match what customers see in store, and new products reach the shelf faster.

Context

The master data team came to us: manual label transcription wasn't keeping up. Consumer-protection law requires the online shop to match the physical packaging (country of origin, allergens, nutrition). Trend products need to be listed fast. And supplier data is fragmented — there's no universal API for "what is printed on this package". Photos and videos are.

What I Built

Two tools, shipped by a team of two in six months:

Extraction pipeline — photos/videos in, structured labels out. Reviewers approve before anything reaches the master data system.
Goldhamster — a custom evaluation tool. Gold dataset, scoring dashboard, in-place gold editing, re-scoring without re-running the pipeline. Makes quality measurable and the path to higher accuracy concrete.

Outcome

~96% label-level accuracy on real photos from stores and logistics centres — not curated benchmark images
ask me for unit economics — 25-75x cheaper than market-available solutions.
Under one minute end-to-end, from upload to extracted labels
Happy stakeholder, measurable quality, clear path toward full automation

What I Learned

The hard part wasn't the model. The stakeholders own the problem; I own the solution. The real work is the translation layer between them — explaining error modes, accuracy trade-offs, and why "96% with a good review UI" beats "100% someday". Good engineering here means knowing which decisions are mine to make and which ones aren't.

Tech Stack

GCP
Snowflake
Gemini (vision)
Structured Output
LLM-as-Judge
Goldhamster (custom eval)
Agentic-coded UI

Architecture details, evaluation methodology, and lessons from production are best discussed in conversation.

Reach out on LinkedIn

Research

Research on Context-Aware Emotion Analysis.

The research thread is the same one I use in production: make LLM behaviour legible, testable, and grounded in something more robust than intuition. That mindset shaped how I learned to build measurable systems.

Publication · ACM CHI EA 2025

Context over Categories: Implementing the Theory of Constructed Emotion with LLM-Guided User Analysis

Author Nils Kluewer

Where Yokohama, Japan

Publisher ACM Digital Library

A context-aware LLM pipeline based on Lisa Feldman Barrett's theory of constructed emotion.

Read publication

Master Thesis · TU Wien

Context over categories: implementing the theory of constructed emotion

Author Nils Kluewer

Where TU Wien, Vienna

Publisher TU Wien Repositum

Full operationalization via the "context sphere" - a user-specific construct for nuanced emotion analysis.

Read thesis

Speaking

Talks on agentic engineering in practice.

Sharing what actually works when LLMs and agentic systems meet real organizations - beyond the hype, grounded in day-to-day engineering practice.

Talk · Rheinwerk »Coding mit KI« 2026

FOMO, Hype und Alltag – KI-Engineering bei der REWE Group

Speaker Nils Klüwer, REWE Group

Where Online conference

When June 23, 2026

How agentic engineering is adopted in practice at REWE Group - learnings from real projects instead of rigid rules. Announced on the official speaker page.

View talk page

Contact

Let's talk.

Best reached via LinkedIn. Happy to discuss applied AI, agentic engineering, evaluation, or research collaborations.

GitHub github.com/nilskluewer Project Repository LLM-as-LisaFeldmanBarrett LinkedIn linkedin.com/in/nilsklue Publication ACM Digital Library