April 2026

Comparing Opus 4.7 and GPT-5.5
on Excel

Two generation jumps, two different stories. GPT-5.5 is a clear step up from 5.4 — more accurate and faster at every level, and now the leading model for hard spreadsheet work. Opus 4.7 is a closer call against 4.6: noticeably better on long, complex tasks when pushed to think hard, but a hair worse on easy tasks and subtly different in feel. On everyday work, every top model lands within 1.8% of the others. Here's the data.

Nico Christie

Nico Christie

April 23, 2026

Opus 4.7 and GPT-5.5 are both live in Shortcut. We ran them against GPT-5.4 on two internal benchmarks. v23 is 40+ classic spreadsheet jobs that would take a person 1–4 hours. v25 is 40+ of our hardest — work that would take a person days. Each model ran at three reasoning effort levels (Medium, High, Max) and was scored against a human-graded rubric. Times reported below are total end-to-end — the model's thinking plus every spreadsheet action it takes to finish the job.

A note on speed. Shortcut offers Opus 4.6 and 4.7 in both standard mode and a fast mode that Pro and Teams users can turn on. Fast mode runs about 2.5× faster at the same accuracy. The chart below shows both — solid lines for standard, dotted lines for fast mode, thin connectors showing the shift. GPT-5.5 has one speed; Opus has two.

1) Highest accuracy

GPT-5.5 · 73.9%

Top score on v25 at Max — tasks that would take a person days. Edges Opus 4.7 Max (72.4%) and GPT-5.4 Max (70.0%).

2) Best formatting

Opus 4.6 · best in class

GPT-5.5 closes most of GPT-5.4's gap, but Opus 4.6 output still looks like it came off an MD's desk. Why 4.6 remains the Shortcut default (for now).

3) Fastest*

GPT-5.5 · 12.1 min

Median time on v25 at Max — about half of Opus 4.7 Max standard (23.3 min). Fast mode brings Opus 4.7 to ~9.3 min, right there with GPT-5.5.

*Opus fast mode isn't available to every Shortcut user, but is meaningfully faster when enabled.

1) Accuracy and speed, together

Speed and accuracy are most useful read together. In standard mode, GPT-5.5 sits at the top-left of the curve on v25 — higher accuracy and less time at every effort level. Opus 4.7 trades time for accuracy; at High and Max it thinks longer to get there. The generational lift on the GPT side is striking: GPT-5.4 at Max scores about what GPT-5.5 scores at Medium.

The dotted lines change the picture. With fast mode enabled, the same Opus accuracy arrives in about 40% of the time. Opus 4.7's curve shifts sharply left — nearly meeting GPT-5.5 at Max effort and overtaking it at Medium. That's the experience Pro and Teams users get when they turn fast mode on.

Shortcut Eval v25 — our hardest spreadsheet tasksGPT-5.5Opus 4.7Opus 4.6GPT-5.4Opus fast mode60%64%68%72%76%0510152025AccuracyMedian time to finish (min)Shortcut Eval v23 — everyday spreadsheet tasksGPT-5.5Opus 4.6Opus 4.7GPT-5.470%72%74%76%78%80%82%AccuracyModel scores at medium effort79.4%GPT-5.579.1%Opus 4.678.7%Opus 4.777.6%GPT-5.4Differentiation is now on long, complex spreadsheetsGPT-5.5Opus 4.7Opus 4.6GPT-5.460%65%70%75%80%v23 · everydayv25 · hardestAccuracyBenchmark difficulty79.4%69.8%GPT-5.578.7%65.3%Opus 4.779.1%63.0%Opus 4.677.6%62.4%GPT-5.4

Easy vs. hard

Differences appear in harder tasks.

On everyday tasks, the top models finish within 1.8% of each other. The differences only show up on the hardest, longest work — on v25 the spread widens 4.1× to 7.4%.

v23 · everyday

1.8%

essentially tied

v25 · hardest

7.4%

4.1× wider

2) Formatting — GPT closes the gap, Opus keeps the crown

Formatting was GPT-5.4's weak spot: correct workbooks that looked rough. GPT-5.5 closes most of that gap — clean headers, consistent number formats, reasonable structure. It still doesn't hit Opus 4.6's bar. Opus 4.6 output looks like it came off an MD's desk. GPT-5.5 is solid; Opus 4.6 is best-in-class. Same prompt, same task:

Same prompt, same task — formatting from scratch

Opus 4.6Taste pick
Opus 4.6 formatting example
GPT-5.5Strong, but not Opus-level
GPT-5.5 formatting example
GPT-5.4Reference
GPT-5.4 formatting example 1
GPT-5.4Reference
GPT-5.4 formatting example 2

3) Two generation jumps, two stories

Same benchmarks, very different generation jumps.

Opus 4.6 → 4.7

A toss-up

4.7 dips slightly on easy tasks (78.7% vs 79.1%) but pulls ahead on hard ones — v25 Medium climbs from 63.0% to 65.3%. Better for long, demanding runs. Opus 4.6 is still the Shortcut default.

GPT-5.4 → 5.5

A clean sweep

Every effort level on v25 picks up 4–7% in accuracy and gets faster — Max drops from 18.4 min to 12.1 min. The leading model on the chart. Still lacks Opus-level formatting taste; feels tight and mechanical where Opus is more considered.

Our recommendation

We're entering a world where it's harder to tell frontier models apart on raw intelligence. Preferences will increasingly come down to look, feel, and taste — already the story in coding, where the choice between Claude Code and Codex usually comes down to the type of agent you like to collaborate with. GPT-5.5 is genuinely a bit more intelligent on the very hardest end-to-end tasks, but it's less verbal and less of a collaborator for punchy, quick touches; Opus is the one that feels like working with a thoughtful partner.

Give GPT-5.5, Opus 4.7, and Opus 4.6 a diligent try and choose your favorite.

  • GPT-5.5 — fire and forget on the hardest work, especially on existing models with clear formatting to maintain and adhere to. 73.9% on v25 in about 12 minutes.
  • Opus 4.6 — the Shortcut default for everyday work. Best-in-class formatting, clear step-by-step reasoning, the cleanest day-to-day feel.
  • Opus 4.7 — worth getting a feel for. Some people love it, some don't, and there's no obvious intelligence difference over 4.6.

You can switch models on any conversation from the chat input.

Try It Today

Opus 4.7 and GPT-5.5 are live for every Shortcut user. Pick the right model and effort for the task in front of you.