Perplexity Gap Dashboard

Completed Marin gap reports rendered as a single dashboard. This refresh includes the 32B all-available diagnostic run, MCQA/FEVER, long-context, LIMA, AA-proxy, prompt-format sensitivity, and code-interpretation slices, with earlier 8B runs kept for reference. Positive gap means Marin is worse than the comparator. Negative gap means Marin is better.
Generated 2026-05-30 15:36 UTC
32B diagnostic, MCQA/FEVER, long-context, LIMA, AA-proxy, prompt-format sensitivity, and code-interpretation summaries served from local artifacts

Marin better
Comparator better