About

I’m a head of agentic AI and lead research engineer at Dynamo AI (YC 22). Currently I focus on building AgentWarden: a product to detect agentic risk vectors, guardrails and tooling to reduce the risk and observability tool with intelligence to flag when things are going wrong. I spent 2 years at Dynamo working on (synthetic) data flywheels, evaluations, and training (SFT / RL), all focused on creating efficient and aligned custom guardrailing and judge models. What makes it hard (and thus fun) is that the objectives are subjective, under-specified in natural language and require iterative human-model alignment through extensive evals.

Before joining Dynamo I worked in RL for Combinatorial Optimization and Code Generation teams at Qualcomm AI Research in Amsterdam. I studied Artifical Intelligence at the Univeristy of Amsterdam, specializing in Reinforcement Learning where I spent 9 months at Amsterdam Machine Learning lab with prof. Herke van Hoof.

Projects I am most proud of:

Built Dynamo’s output guardrail offering and team from the ground up into a mature, high-demand product. I touched every part of the stack, from interacting with PMs on definig evalaution sets, setting up annotation procedures and feedback loops, synthetic data generation, training, post-training interventions for more customizability and efficient inference. The product is used be a few Fortune 500 companies (1, 2, 3) to safeguard their AI deployments. I also led 2 top-tier ML publications in inference-time alignment (1) and medical LLM evaluation (2). See publications for details.
Togeher with my team at Qualcomm we achieved SOTA on The Abstraction and Reasoning Challenge (ARC) with a ~ 220M language model by combining hindsight relabeling of erronoues program and learning from prioritized hinsight reply (ICML 24’ paper). Despite being ~ a dead end I am also proud of our attempt to use MCTS as a neurally-guided search language model decoding method to provide natural curriculm for learning to write simple programs in zero human data regime (ICML 24’ workshop paper)
Demonstrated that (hierarchical) RL can mitiagte congestion in power grids up to 6x more efficiently than a physics based simulator and that hierarchical policies can outperform the non-hierarchical ones. Wrote a paper about it.

Outside of work, I love endurance sports and science behind achieving peak human perfromance. I swim, bike, run, and like Middle Distance Training (70.3 IM) the most. Have a sub-10 Ironman race under the belt, want to do a sub 9 at some point. I lack time for other sports but I also do enjoy them: despite failing at learning surfing, I am not giving up :)

Sometimes I write about stuff; you can read it here: /posts/.

Contact: if you’d like to chat about AI, go for a bike ride or grab coffee send me a DM on X / LinkedIn / Strava.

Publications

Shallow Robustness, Deep Vulnerabilities: Multi-Turn Evaluation of Medical LLMs

B. Manczak, E. Lin, F. Eiras, J. O'Neill, V. Mugunthan

NeurIPS 2025 Workshop

Created MedQA-Followup, a multi-turn evaluation benchmark that stress-tests frontier LLMs in the medical domain under realistic conditions with heterogeneous retrieved context. Uncovered a critical vulnerability: accuracy degrades from 91.2% to 13.5% across conversation turns. Models that pass standard single-turn safety benchmarks fail catastrophically in multi-turn dialogue, exposing a blind spot in current LLM evaluation practices.

PrimeGuard: Safe and Helpful LLMs through Tuning-Free Routing

B. Manczak, E. Zemour, E. Lin, V. Mugunthan

ICML 2024 Workshop

Built a tuning-free inference-time guardrail that routes LLM requests through structured control flow with task-specific instructions. Increased safe response rate from 61% to 97%, reduced jailbreak attack success from 100% to 8%, and matched helpfulness scores of alignment-tuned models, all without fine-tuning. Released safe-eval, a red-team benchmark for systematic guardrail evaluation.

CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay

N. Butt*, B. Manczak*, A. Wiggers, C. Rainone, D.W. Zhang, M. Defferrard, T. Cohen

ICML 2024

Developed a post-training self-improvement loop for code-generating LMs: the model samples programs, relabels failed attempts via hindsight, and trains on a prioritized replay buffer. A ~220M parameter model achieved SOTA on the Abstraction and Reasoning Corpus (15% of evaluation tasks solved), outperforming all prior neural and symbolic baselines. First neuro-symbolic method to scale to the full ARC evaluation set.

Towards Self-Improving Language Models for Code Generation

M. Defferrard, C. Rainone, D.W. Zhang, B. Manczak, N. Butt, T. Cohen

ICLR 2024 Workshop

Trained code generation models from scratch via expert iteration with zero human-written code. The model uses neurally-guided search to find solutions, then trains on its own discoveries. Systematically characterized how search procedure, problem difficulty, and training-vs-search compute allocation affect the rate of self-improvement.

Hierarchical Reinforcement Learning for Power Network Topology Control

B. Manczak, J. Viebahn, H. van Hoof

arXiv 2023

Designed a 3-level hierarchical RL agent for power grid topology control operating over a combinatorial action space. Decomposed the problem into temporal abstraction layers (when to act, where to act, how to reconfigure). The fully learned hierarchy outperformed flat RL and greedy baselines on the most complex grid scenarios.

Towards Transparent and Explainable Attention Models

K.J. Kubara, B. Manczak, B. Dolicki, K. Sawicz

ML Reproducibility Challenge 2021

Reproduced and extended experiments on attention mechanism interpretability. Validated key claims about when attention weights do and do not faithfully explain model predictions through systematic replication and ablation analysis.

Updates

March 2026 —
Launched my Substack with the first post: Spec Implemented, PR Not Done: Part 1.
December 2025 —
Promoted to Head of Agentic AI at Dynamo, overseeing all of our agentic products.
December 2025 —
Presented our paper at NeurIPS in San Diego.
November 2025 —
Personal: took part in and crashed the Ironman 70.3 World Championship.
November 2025 —
Introducing AgentWarden, a product where I am a lead research engineer.

Projects I am most proud of:#

Publications#

Updates#

Projects I am most proud of:

Publications

Updates