AI cluster debugging

ClusterScope

Scenario

Loading run

pending

Operator summary

Baseline comparison

Agent council

Scaling curve

Step breakdown

Root-cause hypotheses

GPU and fabric view

Experiment matrix

Run Scenario GPUs Efficiency GPU util Step p95 Top finding