Parameter-level disjunction
Existing automatic MAS methods usually optimize prompts, search workflows, or train only the designer. The executor remains frozen, so downstream success cannot directly update the model that performs the actual work.
Project page
Breaking the ceiling of automatic multi-agent systems via end-to-end reinforcement learning.
1Oregon State University · 2UCSD · 3Amazon AGI · 4Pennsylvania State University · 5AG2AI, Inc. · *Equal contribution
Abstract
MetaAgent-X jointly optimizes automatic MAS design and downstream execution instead of searching over workflows around frozen agents. The framework generates script-based multi-agent systems, collects execution rollouts, and assigns role-aware credit to both Designer and Executor trajectories.
Executor-Designer Hierarchical Rollout separates design quality from execution stochasticity, while Stagewise Co-evolution alternates updates between roles to stabilize learning. Across math and code benchmarks, MetaAgent-X achieves consistent gains and exposes how better structures and better executor behavior co-evolve.
Motivation
Existing automatic MAS methods usually optimize prompts, search workflows, or train only the designer. The executor remains frozen, so downstream success cannot directly update the model that performs the actual work.
Designer and executor behavior are coupled, but prior systems do not expose how one role's improvement changes the learning signal for the other. MetaAgent-X makes that coupling trainable and measurable.
Search-based systems can overfit to complex external workflows. MetaAgent-X instead generates lightweight task-conditioned scripts, executes them online, and assigns role-aware credit from outcomes.
Method
Executor updates improve task solving under the current design distribution, pushing realized performance toward the existing design ceiling.
Designer rewards average over executions per design; executor rewards normalize all trajectories for the same question.
Experiment Results
| Benchmark | Single Agent | SA + GRPO | Best prior Auto MAS | MetaAgent-X RL |
|---|
Stagewise reaches 44.8% on math and 32.0% on code, outperforming coupled, designer-only, and executor-only variants.
On math benchmarks, SFT overuses ensemble-and-judge workflows. After RL, ensemble drops by roughly 33 to 40 points on AIME, while reflection rises above 70%, giving the executor a repair loop instead of just broader sampling.
Show Cases
The SFT workflow samples an ensemble, but the judge repeats the invalid geometry model. The RL checkpoint chooses solver-critic reflection and turns the contradiction into a corrected derivation.
Multiple solvers share the same faulty circle-packing assumption; the judge notices an impossible constraint but does not repair it.
The critic localizes the distance-model error, the solver switches to a similar-triangles equation, and the final answer is recovered.
The failed branch collapses to an impossible equation. The critic then identifies the wrong distance model, and the refined solver uses L(r-rho)/r = 2 rho (n-1), yielding r = 192/5 and m+n = 197.
Complete Trajectory Show
Find the number of ways to place a digit in each cell of a 2x3 grid so
that the sum of the two numbers formed by reading left-to-right is 999 and
the sum of the three numbers formed by reading top-to-bottom is 99.
Example:
0 0 8
9 9 1
8 + 991 = 999, and 9 + 9 + 81 = 99.
Problem Type: MATH
Problem Analysis:
We have a 2 x 3 grid of digits (0--9).
- The two numbers formed by reading left to right (top row and bottom row)
must sum to 999.
- The three numbers formed by reading top to bottom (first column, second
column, third column) must sum to 99.
Let the grid be:
a b c
d e f
Then:
1. Top row number: 100a + 10b + c
2. Bottom row number: 100d + 10e + f
3. Condition: (100a + 10b + c) + (100d + 10e + f) = 999
4. First column number: 10a + d
5. Second column number: 10b + e
6. Third column number: 10c + f
7. Condition: (10a + d) + (10b + e) + (10c + f) = 99
Workflow Pattern: math_ensemble_judge
Reason: The problem involves constraints that can be approached algebraically,
combinatorially, or by case analysis. Multiple perspectives can help verify
correctness, and a judge can select the most reliable answer.
Agent Design:
- AlgebraicSolver: Focus on solving the system of equations.
- GeometricSolver: Think in terms of digit constraints and possible ranges.
- CombinatorialSolver: Use counting arguments or case analysis.
- MathJudge: Compare the three solutions and pick the best.
Workflow Flow:
1. Each solver independently solves the problem.
2. The judge receives all three solutions.
3. The judge evaluates correctness and selects one answer.
4. The judge outputs the final answer in boxed format.
from workflow import AgentNode, Workflow, ToolRegistry
from workflow.nodes import EnsembleNode
# No tools needed - pure reasoning
tool_registry = ToolRegistry()
# Create three math solvers with different approaches
solver1 = AgentNode(
name="AlgebraicSolver",
system_prompt=(
"You are a mathematician who prefers algebraic approaches.\n"
"Solve the problem using algebraic manipulation and equations.\n"
"Show your reasoning step by step.\n"
"Put your final answer in \\boxed{} format.\n\n"
"Problem: Find the number of ways to place a digit in each cell of a 2x3 grid so that "
"the sum of the two numbers formed by reading left to right is 999, "
"and the sum of the three numbers formed by reading top to bottom is 99.\n"
"Let the grid be:\n"
"\\[\\begin{array}{|c|c|c|} \\hline a & b & c \\ \\hline d & e & f \\ \\hline \\end{array}\\]\n"
"Then:\n"
"1. Top row number: 100a + 10b + c\n"
"2. Bottom row number: 100d + 10e + f\n"
"3. Condition: (100a + 10b + c) + (100d + 10e + f) = 999\n"
"4. First column number: 10a + d\n"
"5. Second column number: 10b + e\n"
"6. Third column number: 10c + f\n"
"7. Condition: (10a + d) + (10b + e) + (10c + f) = 99\n"
"Count integer solutions with 0 <= a,b,c,d,e,f <= 9."
),
tool_registry=tool_registry,
max_turns=1
)
solver2 = AgentNode(
name="GeometricSolver",
system_prompt=(
"You are a mathematician who thinks geometrically and visually.\n"
"Solve the problem using geometric intuition or visual reasoning.\n"
# same problem statement as solver1
),
tool_registry=tool_registry,
max_turns=1
)
solver3 = AgentNode(
name="CombinatorialSolver",
system_prompt=(
"You are a mathematician skilled in counting and combinatorics.\n"
"Solve the problem using combinatorial arguments or case analysis.\n"
# same problem statement as solver1
),
tool_registry=tool_registry,
max_turns=1
)
judge_agent = AgentNode(
name="MathJudge",
system_prompt=(
"You are a senior mathematician acting as a judge.\n\n"
"You will receive solutions from three different solvers.\n"
"Your task is to:\n"
"1. Review each solution carefully\n"
"2. Check the reasoning and calculations\n"
"3. Identify which solution is correct (or most likely correct)\n"
"4. Select ONE answer as the final answer\n\n"
"OUTPUT FORMAT:\n"
"- Briefly explain which solver's answer you chose and why\n"
"- Put the selected final answer in \\boxed{} format\n\n"
"IMPORTANT: You must select ONE answer and output it in \\boxed{answer} format.\n\n"
# same problem statement as solver1
),
tool_registry=tool_registry,
max_turns=1
)
ensemble = EnsembleNode(
name="MathEnsemble",
agents=[solver1, solver2, solver3],
strategy="consensus",
consensus_agent=judge_agent
)
workflow = Workflow(name="math_ensemble_judge")
workflow.add_node(ensemble)
Problem Type: MATH
Approach: We have:
(1) 100a + 10b + c + 100d + 10e + f = 999
(2) 10a + d + 10b + e + 10c + f = 99
We can treat these as two equations in six variables, but we can also think
of them as constraints on the digits. Let's denote Equation (1):
100(a+d)+10(b+e)+(c+f)=999. Equation (2):
10(a+c)+(d+e)+(b+f)=99.
[the solver continues from this rewriting of (2)]
So X=a+d=9, Y=b+e=9, Z=c+f=9. We also have the second equation:
10(a+c)+(d+e)+(b+f)=99. We can check if this is automatically satisfied
given a+d=9, b+e=9, c+f=9. Let's verify: from a+d=9, d=9-a. From
b+e=9, e=9-b. From c+f=9, f=9-c. Then a+c is just a+c.
d+e=(9-a)+(9-b)=18-a-b. b+f=b+(9-c)=9+b-c. So
10(a+c)+(18-a-b)+(9+b-c) = 9a + 9c + 27.
We want this to equal 99: 9a+9c=72, so a+c=8.
Possible (a,c): (0,8),(1,7),...,(8,0) -- 9 possibilities.
For each b from 0 to 9, e=9-b -- 10 possibilities.
Thus total = 9 x 10 = 90.
Answer: boxed{90}
Confidence: High
Problem Type: MATH
Geometric/Visual Reasoning Approach:
Let A=a+d, B=b+e, C=c+f. Then (1) becomes 100A+10B+C=999. Since
A,B,C are in [0,18], we get A=9, B=9, C=9, hence
a+d=9, b+e=9, c+f=9.
Now from (2): 10(a+b+c)+(d+e+f)=99. But
d+e+f=(9-a)+(9-b)+(9-c)=27-(a+b+c). Substitute:
10(a+b+c)+27-(a+b+c)=99
=> 9(a+b+c)=72
=> a+b+c=8.
That's a stars-and-bars count of nonnegative integer solutions to
a+b+c=8 with 0 <= a,b,c <= 9. Since 8 <= 9, the upper bound is not
restrictive, so the number is C(8+3-1,3-1)=C(10,2)=45.
Answer: boxed{45}
Confidence: High
Problem Type: MATH
Problem Analysis:
We have a 2x3 grid with digits a, b, c in the top row and d, e, f in the
bottom row, each from 0 to 9. Conditions:
(1) (100a + 10b + c) + (100d + 10e + f) = 999
(2) (10a + d) + (10b + e) + (10c + f) = 99
We need to count integer solutions (a,b,c,d,e,f) with 0 <= a,b,c,d,e,f <= 9.
This is a combinatorial counting problem with constraints. We can approach
by interpreting the constraints as digit-wise equations (mod 10 and carry-over),
using case analysis on possible carries, or solving the system algebraically
and counting integer solutions.
Approach: we will treat the two conditions as digit-wise constraints with
possible carries. [...]
[long case analysis on c+f in {9,19}, b+e in {9,19}, a+d in {9,19}
derives mutually inconsistent constraints]
So S2 = 9, S3 = 9. But S2 = b+e = 9, S3 = c+f = 9. Then
a+b+c = 9 and a+b+c = 18. So we must have a carry from the tens
place? Let's think: we have 10S2 + S3 = 99. So S2 = 9, S3 = 9.
But S2 = b+e = 9, S3 = c+f = 9. Then a+b+c = 9 and a+b+c = 18.
So we must have a carry from the tens place? Let's think: we have
10S2 + S3 = 99. So S2 = 9, S3 = 9. But S2 = b+e = 9,
S3 = c+f = 9. Then a+b+c = 9 and a+b+c =
[the model loops on this passage until the token budget is exhausted;
no boxed answer is emitted]
Key Insight:
We can derive constraints from the equations, define X=a+d, Y=b+e, Z=c+f,
solve 100X+10Y+Z=999 to get X=9, Y=9, Z=9, then use the column equation
10(a+b+c)+(d+e+f)=99 together with d+e+f=27-(a+b+c) to get
a+b+c=8. Counting nonnegative integer solutions to a+b+c=8 with the
unrestrictive bound a,b,c <= 9:
C(8+3-1,3-1) = C(10,2) = 45.
Reason for selection:
Solution 2 (GeometricSolver) correctly derived the constraints and counted
the solutions. Solution 1 (AlgebraicSolver) arrives at 90 but that is
incorrect because it misinterprets the second condition. Solution 3
(CombinatorialSolver) has a flawed reasoning chain with repeated contradictions.
Answer: boxed{45}
Confidence: High
Final answer: boxed{45}
Ground truth: 45