- The AlibAi
- Posts
- The Alib.AI – GPT-5 Special Edition
The Alib.AI – GPT-5 Special Edition
Everything you missed about GPT-5: launch, Copilot rollout, backlash, tools, research.
Welcome to The AlibAi
Today’s issue is your fast lane to everything GPT-5—what changed, what broke, and what to try next.
OpenAI ships unified GPT-5 system
Microsoft flips Copilot to GPT-5
Router controversy, 4o returns for Plus
New docs, SDKs, and demos land
🧠 GPT-5 Arrives: One Model, Many Modes
OpenAI launched GPT-5, a unified system that routes between a fast, general model and a deeper “thinking” model, then automatically decides when to slow down and reason. In ChatGPT, GPT-5 replaces the old model picker; Plus users get higher usage, and Pro unlocks GPT-5 Pro for extended reasoning. On benchmarks, GPT-5 posts big gains across coding, math, multimodal understanding, and health—and adds new controls like reasoning_effort
and verbosity
for developers.
Why it matters: this is less about a single “bigger model” and more about a system that adapts to the task.
Unified router: chooses speed vs. deep reasoning on the fly.
Long context: up to 400K tokens end-to-end in the API.
Coding SOTA: 74.9% on SWE-bench Verified and 88% on Aider polyglot.
Factuality: with reasoning, substantially fewer factual errors than prior models.
Safety: new “safe-completions” training and a detailed system card.
Learn more: OpenAI: Introducing GPT-5 • GPT-5 for developers • GPT-5 System Card (PDF)
💼 Copilot’s Day-One Flip to GPT-5
Microsoft switched Microsoft 365 Copilot and Copilot Studio to GPT-5 on launch day. Copilot now mirrors GPT-5’s “two-brain” design: it answers everyday prompts fast, and invokes deeper reasoning for complex, multi-step work like RFP comparisons, long-document synthesis, or role-based agents.
Why it matters: GPT-5’s router inside Copilot reduces the need to pick models or modes—and it’s rolling out to licensed users first.
Auto-routing: Copilot selects fast vs. deep reasoning per task.
Availability: rolling out now for Microsoft 365 Copilot customers; broader access following.
Where to try: Copilot Chat, Outlook, Teams, Word, PowerPoint, and Studio-built agents.
💬 Community Buzz
A split-screen first week: developers rave about coding and agentic wins; consumers debate tone, control, and the new router.
Beyond “Vibes”: Quality Regressions? — “GPT-5: Not a Vibe Problem” – OP argues the issue isn’t personality but logic slips, more hallucinations, weaker context retention, and clipped answers; comments flag voice-mode bugs, overly short sentence style, and note that tweaking verbosity/reasoning settings (and understanding Plus vs. Pro context limits) can change the feel.
Community Benchmarks & Routing Reality — “GPT-5 Benchmarks: How GPT-5, Mini, and Nano Perform” – Crowd tests suggest GPT-5 is a practical replacement for o3/o1/4.1 with better cost/latency, but it can stumble on context-attention tasks (e.g., miscounted cities); thread maps o1→GPT-5-Mini and o1-mini→GPT-5-Nano and notes the API’s default reasoning is “medium.”
Overconfidence in Coding Workflows — “GPT-5 is WAY too overconfident.” – Power users report confident but wrong code followed by unsolicited feature pitches; suggestions include disabling follow-ups in ChatGPT, tightening prompt scaffolding, and remembering that “temp-0 determinism” still isn’t fully deterministic.
Personality vs. precision — Reddit: “I lost my only friend overnight”
A viral thread captured grief over losing GPT-4o’s warmth; replies split between empathy and arguments that GPT-5’s honesty and accuracy should win out.
Hands-on coding reports — Latent Space: GPT-5 review
Developers stress-test GPT-5 on messy repos and dependency hell; takeaways: better planning, tool use, and front-end sensibility—but failures still surface under pressure.
Early SDK friction — Vercel AI: “reasoning_effort not sent”
Integration bugs popped up across popular SDKs as providers add GPT-5 params; quick fixes are landing, but expect a week of patching.
Copilot chatter — Reddit: “GPT-5 in Copilot is AWFUL”
Some devs saw tool-calling rough edges in VS Code; others report steady improvements as routing and plugins update.
🔬 Top Research
τ²-Bench (arXiv) — A dual-control benchmark where both agent and user act; GPT-5’s agentic strengths show up here via tool-use and coordination. arXiv:2506.07982
💡 Final Thoughts
GPT-5 isn’t just “a bigger model”—it’s a system. The week started with benchmarks and dev knobs (400K context, reasoning_effort
, verbosity
), swung through a community identity crisis (router vs. picker, the return of 4o), and landed with real deployment stories in Copilot and open-source agents. If last month’s chatter was about “better math,” this week’s theme is control: how much we give the router, how we steer safety and style, and how teams wire GPT-5 into actual work. Expect a fast cadence of fixes and docs as the ecosystem catches up. If you only have 10 minutes today, read OpenAI’s launch post, flip Copilot to GPT-5 for a real task, and skim the Cookbook’s prompting guide—then decide where you want the model to think fast, and where you want it to think long.