How It Works¶

Zubbl uses a dual learning loop to continuously improve your AI agents — without any code changes.

The Learning Loop¶

graph TB
    A[Your Agent] -->|wrap once| B[Zubbl SDK]
    B -->|1. Record| C[Extract 3 Steps Automatically]
    C -->|2. Learn| D[Build Patterns by Category]
    D -->|3. Score| E[Rank by Success Rate]
    E -->|4. Policy| F[Build Recommendations]
    F -->|5. Inject| G[~30 Token Strategy Prefix]
    G -->|Next Run| A

    C -->|On Failure| H[Reflexion: Analyze Why]
    H -->|Partial Credit| D

What Happens When You Run `wrap()`¶

Step 1: Record (Zero Code Changes)¶

smart_agent = zubbl.wrap(my_agent)  # One line. Never change again.
result = smart_agent("Write a sorting function")

Zubbl automatically extracts 3 steps from every call: - Understand task — detects task type (generate, debug, optimize, explain) - Execute — detects tools used from output (code_generator, debugger, etc.) - Validate — checks output quality

Step 2: Extract Patterns¶

Similar tasks accumulate patterns under the same category. "Write a function to sort" and "Write a function to reverse" both go under writing.

Step 3: Score (Contrastive)¶

Patterns are ranked by success rate. Successful approaches score higher than failed ones. Failed trajectories still contribute — steps that worked before the failure get partial credit.

Step 4: Build Policy¶

After 2+ patterns per category, a policy is built with actionable recommendations:

writing → planner, code_generator, quality_checker
debugging → planner, debugger, quality_checker

Step 5: Inject (~30 Tokens)¶

Next time the agent runs a similar task, the learned strategy is prepended:

[Strategy: planner→understand, code_generator→generate, checker→validate]
Write a function to sort a list

~30 tokens of overhead. The agent gets better without anyone changing any code.

On Failure: Reflexion¶

When an agent fails: 1. An LLM analyzes why it failed 2. The reflection is stored as a negative pattern 3. Steps that worked before the failure get partial credit 4. Next run avoids the same mistake

Benchmark Results¶

Metric	Result
Tasks to reach confidence 1.0	8 per category
Actions learned per category	3
Token overhead	~30 tokens
Code changes required	Zero
Success rate	100%

Dashboard Feedback (No Code Changes)¶

Agent runs → trajectory recorded automatically
Go to dashboard → click trajectory → rate it → add comment
Feedback feeds into the learning loop
Next run is better — no code changes