RouteMoA: Dynamic Routing without Pre-Inference Boosts Efficient Mixture-of-Agents

RouteMoA: Dynamic Routing without Pre-Inference Boosts Efficient Mixture-of-Agents

Jize Wang
Han Wu
Zhiyuan You
Yiming Song
Yijun Wang
Zifei Shan
Yining Li
Songyang Zhang
Xinyi Le
Cailian Chen
Xinping Guan
Dacheng Tao
Published on 1/26/2026
Cross-asset
AI
LLM
Multi-Agent

RouteMoA is a dynamic routing framework for Mixture-of-Agents (MoA) that significantly improves efficiency by avoiding pre-inference. It uses a lightweight scorer (86M parameters) to predict coarse-grained performance scores for each LLM based solely on the query, narrowing the candidate pool to a few high-potential models without running inference. A mixture of judges then refines these scores using self-assessment (confidence scores from active models) and cross-assessment (evaluation by the best model from the previous layer), leveraging posterior knowledge from existing outputs without additional inference cost. Finally, a model ranking mechanism selects models by balancing performance, cost, and latency, with an early-stopping criterion.

Experiments on both small (5 LLMs) and large (15 LLMs) model pools show that RouteMoA matches or surpasses MoA and SMoA in accuracy while drastically reducing cost and latency. On the large pool, it achieves 78.6% average accuracy (vs. MoA's 71.3%) with 89.8% lower cost and 63.6% lower latency. On out-of-distribution tasks (AGIEval-Gaokao), it outperforms SMoA in accuracy while reducing cost by 11.5% and latency by 24.7%. The scorer achieves high hit rates (Top-1: 90.7%, Top-3: 97.9%), and ablation studies confirm the effectiveness of the mixture of judges. RouteMoA provides a practical and scalable solution for multi-agent collaboration, enabling efficient use of diverse LLMs without prohibitive computational overhead.

Highlights

  • 1Proposes RouteMoA, a dynamic routing framework for Mixture-of-Agents that avoids pre-inference by using a lightweight scorer to predict model performance from the query.
  • 2Introduces a mixture of judges that refines scores via self- and cross-assessment using existing outputs, correcting scorer errors without additional inference.
  • 3Achieves 89.8% cost reduction and 63.6% latency reduction on a large-scale 15-model pool while improving accuracy by 10.2% over MoA.
  • 4Demonstrates strong out-of-distribution generalization on AGIEval-Gaokao, outperforming SMoA in accuracy while reducing cost and latency.

Methods

  • M
    SLM-based Scorer: A lightweight model (mDeBERTaV3-base) trained with dual contrastive loss to predict coarse-grained performance scores for each LLM based on the query.
  • M
    Mixture of Judges: Combines scorer predictions with self-assessment (confidence scores from active models) and cross-assessment (evaluation by the best model from the previous layer) to refine scores.
  • M
    Model Ranking: Selects top-k models by balancing performance, cost, and latency, with an early-stopping mechanism based on a score threshold.

Results

  • R
    On a large-scale pool (15 LLMs), RouteMoA achieves 78.6% average accuracy vs. MoA's 71.3% and SMoA's 69.7%, with 89.8% lower cost and 63.6% lower latency than MoA.
  • R
    On a small-scale pool (5 LLMs), RouteMoA reduces cost by 81.4% and latency by 38.7% compared to MoA, while achieving the highest average accuracy (83.1%).
  • R
    On out-of-distribution tasks (AGIEval-Gaokao), RouteMoA achieves 54.62% accuracy vs. SMoA's 52.92%, with 11.5% lower cost and 24.7% lower latency.
  • R
    The scorer achieves 90.7% Top-1 Hit Rate and 97.9% Top-3 Hit Rate, effectively narrowing candidates to high-potential models.
0/5

Analyze Paper

Generate insights from "RouteMoA: Dynamic Routing without Pre-Inference Boosts Effic...".

Suggested Actions