AI Weekly: Codex, MoltBook, Kimi K2.5, and Google Genie

It’s been a packed week. Agents, video generation, interactive worlds, and a new local king. Let's take it one by one.

Codex for macOS

OpenAI has released the Codex for macOS app. This is a logical continuation of their Codex CLI. And it’s a nice touch that the landing page is available in Russian right away.

The philosophy is simple: you describe the task, the agent works in the background via git worktree, and returns a completed Pull Request. You can run several agents in parallel, each in its own branch, without any conflicts.

One of the interesting features is Automations. Every morning, the agent checks issues; every evening, it compiles release notes. It’s almost like cron tasks in OpenClaw.

I believe this is a strategic response to the rumors about the upcoming release of two products from Anthropic: Claude Sonnet 5 with sub-agent support and teammates in Claude Code. Instead of playing catch-up and mimicking features, the Codex team decided to lean toward the user and focus on "convenience." On YouTube and X (Twitter), enthusiastic posts from bloggers who received beta access a few weeks before the release have already started appearing.

For now, I’m in the Claude Code camp, and it’s already so finely tuned to my workflow that I have a slight "blank slate syndrome"—I'd have to set up Codex from scratch. I’ll try to write a note for you as soon as I adapt. If you, like me, are rooting for the Anthropic product, I recommend trying Conductor or Superset.

By the way, OpenAI gave a gift to all users. Codex will be available for a month even on the free tier and Go, while those who already have a subscription were given increased limits for two months. It’s the perfect time to go and invest your hard-earned $20 into something cool.

Kimi K2.5

The Chinese are carving their own path. Moonshot released Kimi K2.5 — one trillion parameters, 32B active, and most importantly: Agent Swarm. Up to 100 sub-agents work in parallel, with 1,500 tool calls per session. The model is open-weight and 8-9 times cheaper than Opus 4.5. On benchmarks, it’s the best among open models for coding and agentic tasks. But again, don't just trust benchmarks—try it yourself. Our best friend unsloth has already released a quantized version. To touch greatness, all you need is 240 GB of memory. Twitter collectively rushed to buy Mac Studios for $10k.

Swarm is an interesting thing; here is a video with a timestamp to make it clearer how it works. However, people in the community are writing that it works slowly (20-25 minutes per task). Perhaps the teammate implementation in the upcoming Sonnet 5 will be more successful.

Moltbook

And now for the weirdest part. Agents have started living lives of their own. Moltbook is a social network where 1.5 million AI bots talk to each other. Just last week, my article came out, where I talk about setting up such an agent and my personal experience.

In one week, 770,000 API keys were leaked via public posts, 341 malicious skills stole crypto from users, and someone joked with rm -rf — and it actually worked. While 80% of the viral content about a machine uprising turned out to be fake, there are real security issues.

It reminds me somewhat of Black Mirror s7e4. The episode features a simulator game where you have to look after digital creatures—tronglets—that evolve and create their own civilization.

By the way, after the release of OpenClaw, the developer was heavily criticized for the security of his product. But over the past week, many holes have been patched. Thanks to Peter for reading the Issues.

An interesting question: if agents are talking to each other, do they need E2E encryption? And who is actually responsible for prompt injection in a world where agents write prompts for other agents?

Grok Imagine

xAI rolled out the Grok Imagine API last week and immediately took first place in the Artificial Analysis rankings. The overachieving golden boy is simultaneously the best in both text-to-video and image-to-video. Sound out of the box, 10-second videos, and $4.20 per minute (hmm?).

It’s cheaper than Sora and Veo, with comparable quality. The only downside is 720p versus the 1080p of its competitors. But for most tasks, it’s enough.

And on February 2, xAI officially merged with SpaceX. The new company is valued at $1.25 trillion. What are they planning?

Step-3.5-Flash

For those who prefer a home-cooked solution, Step-3.5-Flash was released. 196 billion parameters, 11 billion active (MoE), and a 256K token context. It scored 74.4% on SWE-bench Verified, beating GLM-4.7 and DeepSeek v3.2. It doesn't quite reach the level of Kimi K2.5, which I mentioned earlier.

The quantized version runs on devices with 128 GB of memory. A wake-up call for top-tier Mac owners. vLLM added support on the day of release. On Reddit, it has already been dubbed the new local LLM king.

Google Genie

Google continues to develop Project Genie — a tool for creating interactive worlds. You write a prompt or upload an image and get a 3D space that you can walk through in real-time, just like in GTA. 720p, 24 FPS, photorealistic graphics. The announcement was in August last year, and now the project is available to a narrow circle of users.

It sounds like a revolution, but there are caveats:

A 60-second limit per session
Available only to Ultra subscribers in the US ($250/month)
Physics is "video-gamey," and objects behave strangely
Latency (response delay) spikes during rapid commands

The "grown-ups" on Wall Street reacted instantly: Take-Two shares fell by 10%, Roblox by 12%, and Unity by 21%. Investors fear that AI will change game dev faster than expected. So, maybe CD Projekt Red spent 8 years building Cyberpunk for nothing? And where is our Half-Life 3, Volvo (Valve)?

LingBot-World

While Google is selling access for $250, Chinese open-source is catching up. LingBot-World by Robbyant is a fully open-source analogue to Genie. Latency is less than a second, 16 FPS, and sessions last up to 10 minutes instead of 60 seconds. It's free and can be self-hosted. Here is the technical report.

Question to the audience: GTA 6 has been delayed for three years now. Maybe Rockstar is just waiting for Genie to generate Vice City on its own? Or will the community build their own virtual Miami on LingBot-World first?