Wallaby Newsletter: Agentic AI and Product Updates

Wallaby Newsletter
Agentic AI and Product Updates

Hello,

Runtime data is becoming a competitive advantage. For developers it shortens feedback loops. For AI coding agents, it can turn a long token-burning debugging session into a focused fix.

In this newsletter: recent numbers on agent token usage, how Wallaby's AI tools help, the Wallaby for Python release, Quokka v3, and Console Ninja's role in runtime debugging.

AI agents need runtime data

Agents are getting better at editing code. The expensive part is what happens next: inspect, run, misunderstand, search, rerun, paste another stack trace, repeat. Recent research and reports show where that cost comes from:

The same agentic coding task can vary by up to 30x in token usage across runs. More tokens did not reliably mean better accuracy. Read the study.
In one multi-agent software workflow study, the Code Review stage used 59.4% of tokens. Initial coding used 8.6%. Input tokens made up 53.9% overall. Read the paper.
In one controlled A/B report, giving the agent live project context used 42% fewer tokens, ran 27% faster, and made 64% fewer tool calls. Read the report.
Research into agent-written tests found that many act as runtime probes: value-revealing print statements outnumbered assertions in the analyzed test artifacts. Read the paper summary.

The pattern is consistent: the cost is in the loop, not the first edit. When agents lack runtime data, they spend tokens rediscovering state. When they can inspect the right signal, they can move faster and verify the fix sooner.

Wallaby gives agents those signals directly: test status, failing test details, runtime values, coverage, execution paths, and logs.

What users report: Wallaby's AI integration often reduces token usage by around 50% and resolves issues around 2-3x faster than workflows without Wallaby runtime context.

Don't take our word for it. Try the same task with and without Wallaby runtime context and compare turns, tool calls, tokens, and whether the fix is actually verified.

Wallaby CLI + AI tools

Use Wallaby CLI when your agent needs to launch Wallaby itself. Claude Code, Codex, GitHub Copilot, JetBrains Junie/AI Assistant, OpenCode, Pi, Cursor, Cline, Devin, and other agents can run Wallaby locally, in worktrees, in containers, or on remote machines whenever they need test and runtime feedback. Read more about Wallaby CLI.

Use Wallaby AI tools and MCP when Wallaby and your agent are running side by side. In the same local context, your agent can ask the Wallaby extension for accurate test results, runtime values, and coverage data on demand. Less guesswork, fewer wasted tokens, and faster convergence on working code.

Wallaby for Python

Wallaby for Python is now available. It brings Wallaby's instant test feedback, inline errors, coverage visualization, runtime values, and AI-agent tooling to Python projects.

Wallaby for Python supports pytest and unittest, works with Python 3.8 through 3.14, and is available for VS Code editors. If you or your team work across JavaScript/TypeScript and Python, you can now use the same fast feedback loop across both ecosystems.

For AI agents: Wallaby for Python exposes the same MCP/AI runtime context for Python tests, so agents can inspect failing tests, coverage, and runtime values instead of guessing from source and raw terminal output.

Try Wallaby for Python →

Quokka v3

Quokka v3 is now available with a redesigned output experience that makes runtime values easier to navigate, inspect, compare, and understand.

The new list/details workflow, richer value exploration, output while code is running, integrated diffs, and runtime value diagrams make Quokka better for exploratory coding and debugging.

Console Ninja

Console Ninja completes the runtime picture for app debugging. It gives your AI agent browser/server logs and runtime errors from your running application, especially when the bug is outside a unit test.

Learn more about Console Ninja PRO →

Use fewer tokens. Get better fixes.

Agentic coding is powerful, but every blind loop costs time and money. Wallaby and Console Ninja give agents verified runtime facts, so they spend less time guessing and more time converging on a working fix.

That is the edge: not a bigger prompt, but better feedback. Fewer wasted tokens, faster verified fixes, and a workflow your agent can actually reason from.

Now is a good time to renew, add the missing pieces, or try the workflow on a real problem in your own codebase. If you're already using Wallaby or Ninja with an AI coding agent, reply and tell us what works, what doesn't, and what data your agent still struggles to get.

Thanks for reading!

Regards, Simon McEnlly