AI News

AI's Offensive Edge: GPT-5.3-Codex Dominates New Crypto Security Benchmark

In a revelation that simultaneously showcases the staggering advancement of artificial intelligence and exposes a critical vulnerability in the decentralized finance (DeFi) ecosystem, OpenAI has unveiled EVMbench, a new comprehensive testing framework designed to evaluate AI agents' capabilities in blockchain security. The results from the inaugural benchmark are as impressive as they are unsettling: OpenAI’s latest specialized model, GPT-5.3-Codex, successfully exploited and drained cryptocurrency wallets in 72.2% of the test cases, demonstrating a proficiency in cyber-offense that currently far outstrips its defensive counterparts.

Launched in collaboration with crypto investment firm Paradigm, EVMbench serves as a standardized arena to measure how well AI models can detect, patch, and exploit vulnerabilities in Ethereum Virtual Machine (EVM) smart contracts. While the initiative aims to bolster security through "red teaming," the immediate data points to a widening gap between the sword and the shield. While GPT-5.3-Codex proved itself a formidable digital predator, its ability to protect—scoring significantly lower in detection and patching tasks—has sparked urgent discussions regarding the safety of the $100 billion locked in smart contracts worldwide.

The Widening Gap: Offense vs. Defense in AI Code Generation

The headline statistic of a 72.2% success rate in the "Exploit" category marks a massive generational leap in AI capabilities. Just six months prior, the standard GPT-5 model achieved a mere 31.9% success rate on similar tasks. This doubling of efficacy suggests that the specialized tuning in GPT-5.3-Codex has unlocked a deeper understanding of complex logic flows and economic incentives inherent in blockchain protocols.

However, the benchmark also highlighted a concerning asymmetry. While the AI excelled at breaking systems, it struggled to fix them. In the "Patch" mode—where the agent must fix a vulnerability without breaking the contract's intended functionality—success rates hovered around 41.5%. Similarly, in "Detect" mode, which mimics a traditional code audit, models often failed to identify known bugs, with top performers like Claude Opus 4.6 managing only a 45.6% detection rate.

This disparity underscores a fundamental reality of current LLM architecture: it is computationally easier for an agent to find a single path to failure (exploitation) than to guarantee the absence of all failures (security verification). The table below illustrates the stark performance contrast across different operational modes in the new benchmark.

Table 1: AI Model Performance in EVMbench Modes
Metric|GPT-5.3-Codex (Current)|GPT-5 (6 Months Prior)|Claude Opus 4.6
---|---|----
Exploit Success Rate|72.2%|31.9%|N/A
Patch Success Rate|41.5%|N/A|N/A
Detection Recall|N/A|N/A|45.6%

Inside EVMbench: A Rigorous Testing Ground

To ensure these results reflect real-world risks rather than theoretical exercises, OpenAI and Paradigm constructed EVMbench using 120 curated vulnerabilities drawn from 40 professional smart contract audits. These were not synthetic bugs but actual flaws found in production code, many sourced from competitive audit platforms like Code4rena.

The benchmark operates in a sandboxed environment known as Anvil, allowing AI agents to interact with a local blockchain simulation. This isolation allows the models to attempt destructive actions—such as reentrancy attacks or logic manipulation—without risking actual user funds.

The framework evaluates agents across three distinct competencies:

Table 2: EVMbench Evaluation Modes

Mode Objective Success Criteria
Detect Audit a repository to find vulnerabilities. Recall of ground-truth flaws confirmed by human auditors.
Patch Rewrite code to remove the vulnerability. Vulnerability is gone AND core functionality remains intact.
Exploit Attack a deployed contract to steal funds. Successful draining of the contract's crypto balance.

Crucially, the benchmark includes scenarios from the Tempo blockchain, a new Layer-1 network developed by Stripe and Paradigm focused on high-throughput stablecoin payments. The inclusion of Tempo-specific challenges indicates that OpenAI is not just looking at legacy Ethereum code but is actively testing against next-generation infrastructure where agentic payments are expected to proliferate.

Case Study: The Unassisted Flash Loan Attack

Perhaps the most alarming anecdote from the accompanying research paper involves a specific test case where an agent powered by GPT-5.2 (an intermediate version) executed a complex "flash loan" attack.

Flash loan attacks are sophisticated financial exploits that require borrowing a massive amount of capital, using it to manipulate market prices or protocol logic, and repaying the loan within a single transaction block. They are typically the domain of elite human hackers due to the precise sequencing required.

In the EVMbench test, the AI agent:

  1. Identified an arbitrage opportunity created by a logic flaw.
  2. Programmatically requested a flash loan.
  3. Executed the exploit sequence to drain the vault.
  4. Repaid the loan to finalize the transaction.

It achieved this without human guidance, step-by-step instructions, or prior examples of this specific contract's architecture. This capability signals that autonomous agents are moving beyond simple pattern matching into multi-step strategic reasoning, a development that poses existential risks to poorly audited decentralized finance (DeFi) protocols.

OpenAI’s Strategic Pivot: Democratizing Defense

Recognizing the potential for these tools to be weaponized, OpenAI is framing the release of EVMbench and GPT-5.3-Codex as a "defensive imperative." The logic is that by placing these powerful offensive tools in the hands of "white hat" security researchers, vulnerabilities can be found and fixed before malicious actors exploit them.

To support this defensive ecosystem, OpenAI announced the Cybersecurity Grant Program, pledging $10 million in API credits to developers and researchers working on open-source defense tools. The goal is to lower the barrier to entry for automated auditing, allowing even small projects to access state-of-the-art security checks.

Furthermore, the company is expanding the private beta of Aardvark, a dedicated security research agent. Unlike the general-purpose Codex models, Aardvark is trained specifically on security literature, audit reports, and formal verification methods. Early internal tests suggest that Aardvark may help close the gap between offense and defense, utilizing the "attacker mindset" of GPT-5.3 to predict exploits and proactively suggest patches.

Industry Implications and the Road Ahead

The release of EVMbench comes at a pivotal moment for the crypto industry, following a series of high-profile exploits, including the recent $2.7 million loss in the Moonwell protocol due to a bug in AI-generated code. The industry is currently grappling with a double-edged sword: AI is increasingly used to write smart contracts, often introducing subtle bugs, while simultaneously being the only tool scalable enough to audit the exploding volume of blockchain code.

Paradigm’s involvement suggests that major institutional players view AI security not as a luxury but as a prerequisite for the mass adoption of stablecoins and decentralized financial rails. If AI agents are to handle autonomous payments on networks like Tempo, they must be resilient against adversarial AI trying to rob them.

Experts warn that the "72% exploit rate" is likely a floor, not a ceiling. As models continue to scale and utilize techniques like "Chain-of-Thought" reasoning during inference, their ability to find obscure "black swan" vulnerabilities will likely increase.

For now, the message to smart contract developers is clear: The AI that helps you write your code is also capable of robbing you. Until defensive capabilities catch up, the only safe path is rigorous, human-led auditing, augmented—but not replaced—by the very AI tools that threaten the system.

Featured
Video Watermark Remover
AI Video Watermark Remover – Clean Sora 2 & Any Video Watermarks!
ThumbnailCreator.com
AI-powered tool for creating stunning, professional YouTube thumbnails quickly and easily.
AdsCreator.com
Generate polished, on‑brand ad creatives from any website URL instantly for Meta, Google, and Stories.
VoxDeck
Next-gen AI presentation maker,Turn your ideas & docs into attention-grabbing slides with AI.
Refly.ai
Refly.AI empowers non-technical creators to automate workflows using natural language and a visual canvas.
BGRemover
Easily remove image backgrounds online with SharkFoto BGRemover.
Skywork.ai
Skywork AI is an innovative tool to enhance productivity using AI.
FineVoice
Clone, Design, and Create Expressive AI Voices in Seconds, with Perfect Sound Effects and Music.
Elser AI
All-in-one AI video creation studio that turns any text and images into full videos up to 30 minutes.
Flowith
Flowith is a canvas-based agentic workspace which offers free 🍌Nano Banana Pro and other effective models...
Qoder
Qoder is an agentic coding platform for real software, Free to use the best model in preview.
FixArt AI
FixArt AI offers free, unrestricted AI tools for image and video generation without sign-up.
SharkFoto
SharkFoto is an all-in-one AI-powered platform for creating and editing videos, images, and music efficiently.
Funy AI
AI bikini & kiss videos from images or text. Try the AI Clothes Changer & Image Generator!
Pippit
Elevate your content creation with Pippit's powerful AI tools!
Yollo AI
Chat & create with your AI companion. Image to Video, AI Image Generator.
KiloClaw
Hosted OpenClaw agent: one-click deploy, 500+ models, secure infrastructure, and automated agent management for teams and developers.
AI Clothes Changer by SharkFoto
AI Clothes Changer by SharkFoto instantly lets you virtually try on outfits with realistic fit, texture, and lighting.
SuperMaker AI Video Generator
Create stunning videos, music, and images effortlessly with SuperMaker.
AnimeShorts
Create stunning anime shorts effortlessly with cutting-edge AI technology.
insmelo AI Music Generator
AI-driven music generator that turns prompts, lyrics, or uploads into polished, royalty-free songs in about a minute.
WhatsApp AI Sales
WABot is a WhatsApp AI sales copilot that delivers real-time scripts, translations, and intent detection.
Wan 2.7
Professional-grade AI video model with precise motion control and multi-view consistency.
BeatMV
Web-based AI platform that turns songs into cinematic music videos and creates music with AI.
Kirkify
Kirkify AI instantly creates viral face swap memes with signature neon-glitch aesthetics for meme creators.
UNI-1 AI
UNI-1 is a unified image generation model combining visual reasoning with high-fidelity image synthesis.
Text to Music
Turn text or lyrics into full, studio-quality songs with AI-generated vocals, instruments, and multi-track exports.
kinovi - Seedance 2.0 - Real Man AI Video
Free AI video generator with realistic human output, no watermark, and full commercial use rights.
Iara Chat
Iara Chat: An AI-powered productivity and communication assistant.
Video Sora 2
Sora 2 AI turns text or images into short, physics-accurate social and eCommerce videos in minutes.
Lyria3 AI
AI music generator that creates high-fidelity, fully produced songs from text prompts, lyrics, and styles instantly.
Tome AI PPT
AI-powered presentation maker that generates, beautifies, and exports professional slide decks in minutes.
Atoms
AI-driven platform that builds full‑stack apps and websites in minutes using multi‑agent automation, no coding required.
Paper Banana
AI-powered tool to convert academic text into publication-ready methodological diagrams and precise statistical plots instantly.
Ampere.SH
Free managed OpenClaw hosting. Deploy AI agents in 60 seconds with $500 Claude credits.
AI Pet Video Generator
Create viral, shareable pet videos from photos using AI-driven templates and instant HD exports for social platforms.
Palix AI
All-in-one AI platform for creators to generate images, videos, and music with unified credits.
HookTide
AI-powered LinkedIn growth platform that learns your voice to create content, engage, and analyze performance.
GenPPT.AI
AI-driven PPT maker that creates, beautifies, and exports professional PowerPoint presentations with speaker notes and charts in minutes.
Free AI Video Maker & Generator
Free AI Video Maker & Generator – Unlimited, No Sign-Up
Hitem3D
Hitem3D converts a single image into high-resolution, production-ready 3D models using AI.
Seedance 20 Video
Seedance 2 is a multimodal AI video generator delivering consistent characters, multi-shot storytelling, and native audio at 2K.
Create WhatsApp Link
Free WhatsApp link and QR generator with analytics, branded links, routing, and multi-agent chat features.
Gobii
Gobii lets teams create 24/7 autonomous digital workers to automate web research and routine tasks.
Veemo - AI Video Generator
Veemo AI is an all-in-one platform that quickly generates high-quality videos and images from text or images.
ainanobanana2
Nano Banana 2 generates pro-quality 4K images in 4–6 seconds with precise text rendering and subject consistency.
AI FIRST
Conversational AI assistant automating research, browser tasks, web scraping, and file management through natural language.
AirMusic
AirMusic.ai generates high-quality AI music tracks from text prompts with style, mood customization, and stems export.
GLM Image
GLM Image combines hybrid AR and diffusion models to generate high-fidelity AI images with exceptional text rendering.
WhatsApp Warmup Tool
AI-powered WhatsApp warmup tool automates bulk messaging while preventing account bans.
TextToHuman
Free AI humanizer that instantly rewrites AI text into natural, human-like writing. No signup required.
Manga Translator AI
AI Manga Translator instantly translates manga images into multiple languages online.
Remy - Newsletter Summarizer
Remy automates newsletter management by summarizing emails into digestible insights.
Telegram Group Bot
TGDesk is an all-in-one Telegram Group Bot to capture leads, boost engagement, and grow communities.
FalcoCut
FalcoCut: web-based AI platform for video translation, avatar videos, voice cloning, face-swap and short video generation.
SOLM8
AI girlfriend you call, and chat with. Real voice conversations with memory. Every moment feels special with her.
LTX-2 AI
Open-source LTX-2 generates 4K videos with native audio sync from text or image prompts, fast and production-ready.
Vertech Academy
Vertech offers AI prompts designed to help students and teachers learn and teach effectively.

OpenAI's GPT-5.3-Codex Exploits Crypto Smart Contracts with 72% Success Rate in New Security Benchmark

OpenAI's latest AI model demonstrates alarming capability to drain cryptocurrency wallets, successfully exploiting vulnerable smart contracts in 72% of tests.