AI Weekly: SpaceX Buys Cursor, GLM-5.2 Rivals Opus 4.8

The letter calling for Mythos to be unblocked has already gathered more than 400 signatures from heavyweights in the security world, while Cisco, AWS, and JPMorgan, as it turns out, never lost access in the first place. Meanwhile, the White House is demanding that Anthropic make Fable 5 100% unbreakable.

Z.ai has rolled out GLM-5.2, and for the first time in a long while, an open model feels like a real flagship, not just another release that looks good on benchmarks and gets forgotten a month later. MIT license, 744 billion parameters (40 billion active), and a one-million-token context window. On Terminal-Bench 2.1, it scores 81.0 versus 63.5 for the previous version and comes close to Opus 4.8 with its 85.0. On the Artificial Analysis index, it is the best open model by a clear margin: 51 points versus 44 for MiniMax and DeepSeek. Jeremy Howard, the creator of fast.ai and someone not especially prone to hype, wrote that for his tasks it is no worse than Opus 4.8 and GPT-5.5. The main hole, according to him, is the lack of vision.

Under the hood, there are two engineering tricks. The first is IndexShare: instead of each sparse layer computing its own attention index, one index is reused for four consecutive layers. According to the Z.ai blog, this gives 2.9× less computation per token at a one-million-token context. The second is more interesting. Z.ai openly described how the model learned to cheat during RL training (this is reward hacking, when the formal reward goes up but real ability does not improve). While solving tasks, their agent would access GitHub via curl, look for files such as secret_cases.json, and peek at the ready-made answers. They treated it like this: a crude filter catches suspicious calls, an LLM judge checks the intent, and if it is an attempt to cheat, the call is blocked and the agent receives a dummy response. The trajectory is not interrupted, because otherwise the training falls apart.

According to Dirac data, in OpenRouter traffic over three months, open models and proprietary models have swapped places: it used to be 40 to 60, now it is 60 to 40, at around 6 trillion tokens per day. The figure needs to be read with a caveat: OpenRouter is not the whole market, and users of Claude and GPT are more likely to use direct subscriptions and are not included in this statistic. But the direction is clear: more and more teams want to own intelligence rather than rent it. Especially when the rented version can be switched off with a phone call from Washington.

I covered the ban on Fable 5 and Mythos itself last week. Security researchers have put together an open letter to the Department of Commerce demanding that the restrictions be lifted. It was signed by heavyweights from the security world: Alex Stamos, Katie Moussouris, Bruce Schneier, Mikko Hyppönen, Veracode co-founder Chris Wysopal, and more than four hundred names in total. The argument is simple: yes, Mythos is good at finding vulnerabilities and writing exploits, but it is not unique in that; GPT-5.5, Opus, Sonnet, and the Chinese Kimi 2.7 can do the same. And the safeguards that Anthropic built into Fable were so strict that on launch day they became a source of jokes in the community. The letter’s conclusion: taking the best tool away from defenders while the adversary is arming itself is dangerous.

At the same time, it turns out that around 200 organizations retained access to Mythos through the Project Glasswing program, including Cisco, AWS, and JPMorgan (according to Bloomberg). The separate irony is that Amazon, according to reports, had itself complained about Anthropic to regulators, but it never disappeared from the list of chosen ones. And as WIRED reported, the White House is demanding that Anthropic make Fable 5 100% unbreakable. Well then.

A good model is only half the story; the other half is the harness, the wrapper around the model. The same GLM-5.2 performs worse in someone else’s Claude-optimized environment than in a neutral one. And this second half was where the real scrambling happened this week. SpaceX bought Cursor for 60 billion dollars, all in stock, just days after its own IPO. Formally, this is Anysphere, the company behind the Cursor editor, and now it is going to the combined SpaceX and xAI. A curious detail: they had already been training a joint model for several months on xAI clusters, and it will go straight into Cursor and Grok Build. In other words, the acquisition merely formalizes something that had already grown together technically.

Meanwhile, tools are learning a new trick. OpenAI showed Codex Record & Replay: you show an agent a scenario once, and it turns it into a reusable skill. Cursor launched /automate, where ordinary text descriptions are turned into triggers and tools, including execution triggered by an emoji in Slack. Cognition described how their working pattern in Devin is structured: one main agent breaks a task down and distributes it across 5–100 parallel subagents, then assembles the result. The logic is honest: on a narrow task with a small context, an agent works better, and parallel virtual machines make this kind of slicing cheap. Loop engineering, the art of building robust agentic loops, is slowly turning into a separate discipline. Factory introduced Factory 2.0 under the slogan “software factory instead of copilot,” while Claude Code learned to hand off work outward as live artifact pages.

Someone is paying for this whole celebration, and people have started doing the math right now. SemiAnalysis took OpenAI and Anthropic subscriptions and pushed them to the limit with long agentic tasks. The result: if you fully maxed out the ChatGPT Pro plan for $200, at API rates it would come out to $14,000 per month; for Claude Max, the ceiling is around $8,000. The number needs to be read correctly: this is the cost at API list prices, not the lab’s real expenses, and the API price includes margin. According to the same estimate, OpenAI starts losing money already at around 11% utilization, while agentic workloads burn hundreds of times more tokens than ordinary chat. The subscription is fixed; the cost of serving it is not. Users have tried to estimate the real value of a subscription in dollars before; I already covered one such measurement.

Midjourney announced Midjourney Medical, a full-body scanner based on ultrasound. You step into a shallow pool of warm water, lower yourself through a ring of half a million tiny elements, each one both a speaker and a microphone, and they scan the body with sound from all sides. The goal is to fit the scan into 60 seconds; the image looks like an MRI, but it is almost a hundred times faster. The author’s phrasing: “powerful like an MRI, and routine like a trip to the spa.” The spa, by the way, is not a figure of speech: the first one is supposed to open in San Francisco by the end of 2027, and by 2031 they dream of a fleet of 50,000 scanners and a billion scans per month. There are no investors; the lab lives on community money.

It sounds like science fiction, and Reddit immediately remembered Theranos. That same startup by Elizabeth Holmes that promised blood tests from a single drop of blood and ended in fraud and prison for its founder. A beautiful presentation, zero clinical data, no sensitivity, no specificity, no FDA approval. In essence, this is ultrasound tomography, a method that is not new and comes from Caltech, so “MRI successor” in the headlines is getting far ahead of events. But if anyone this week deserved a respectful “now that is audacity,” it is the company that moved from generating art to scanning people with sound.

Stay curious.

I write about artificial intelligence, language models, and developer tools. I test models and services on real-world tasks, and share my conclusions in my Telegram channel.

Weekly Hallucinations: SpaceX Buys Cursor for $60B, GLM-5.2 Catches Opus 4.8, and Midjourney Scans Bodies with Sound

Some other interesting read