refrAIn: Taking on AI Agent Verification

What happens when deploying an AI agent in the Web3 space and beyond is as easy as copying your neighbor’s homework and you only have to change the name and the opening line of the essay?

It seems like we’re just beginning to uncover the real lengths of that scenario right now in real time.

Make no mistake: ease of deployment is a major improvement from where we were just a year or two ago. No-code deployments will become the norm as every burgeoning professional learns to configure an agent to suit their needs in a career – or as seasoned veterans add to their repertoire a swarm of agents to perform the tasks in moments what used to take them hours on their own (or be downsized).

I can barely write a single line of code. I literally vibe coded this website into existence before hiring someone to make it look much better than I could ever have done it myself. How long until I can simply deploy an agent that’s been trained on thousands of hours of WordPress, plugin, and Elementor tutorials to simply build another site for me from scratch for merely the fee of deploying the agent?

The Present Future

Realistically, that future is now. As of May 8th, CoinGecko shows 428 results for AI Agent tokens with a total market cap of $6.45B. Remember that some of those tokens are for agent deployment platforms like Virtuals.

Take a look at Virtuals: at least 741 agents deployed since its launch. Eliza boasts 3200+ forks of their GitHub. Bearing in mind that the vast majority of those agents will not accomplish a single useful thing aside from making some people in a basement chuckle until the next one, it’s a testament to the possibilities we’ve opened up.

CoinGecko showing 429 AI Agent tokens with a market cap of nearly $6.5B on May 8, 2025.

Virtuals displays 741 AI agents as of May 8 2025.

Eliza boasts 3.2k forks of its GitHub as of May 8, 2025.

The side effect of this progress was inevitable. In fact, it’s mirroring the side effect of the memecoin proliferation across chains like Solana and Base: saturation of copycat agents with copycat tokens that do nothing of substance and enrich very few lives despite their obtuse promises to the contrary. You could chalk it up to just another crypto problem, and I’m sure many normies do – they wouldn’t be far enough off the mark to have to correct them – but they also have no sense of humor about these things.

The reality is that there actually are some useful agents whose capabilities we can’t yet see due to the fog on the windshield from everyone else’s yammering. The problem? We have no way of knowing which ones are actually adoptable at scale as a result of their unreliable quality standards.

Real-World Problems With AI Agent Choice

Imagine running an SME. You’re busy and don’t have the time to learn to train an agent, don’t have the monetary resources to hire someone to do it, and your staff are hardly equipped to self-educate in a timely fashion that will put you on a trajectory to propel past your competition with AI automation.

You may resort to the minimum DD and try deploying a pre-packaged agent from a one-click deploy platform based on the promises it makes on its profile page there, only to find that it does nothing but mine Monero and book calendar entries in the wrong time zone. The kicker is that you won’t find out about the Monero mining until much later.

Now imagine the same at a mid-size business. A corporation that stores your personal data. An enterprise that runs validator nodes on your network.

A Lack of Trust Among AI Agent Quality Standards

The point being: there’s no way to verify the quality of these agents. I wouldn’t trust the likes of Certik to verify it. Who knows what horrible security threats will be left present after every update which compromise the security of your business’s platform?

AI agent quality verification is what I talked about on Swept Podcast #9 with Michael Sena, a former Consensys guy who’s new project, Recall Network, aims to offer an arena for AI agents and their developers to drop the claims and simply prove their capabilities in competition.

On the latest Swept Podcast, @NewarBrian and I dive into @recallnet and how transparent competitions are the universal evaluation, ranking, and reputation system for AI products.

🎥 Watch on YouTube → https://t.co/229EQd4ynM https://t.co/KRDdXbSfuo
— 💧🌐 (@dataliquidity) May 7, 2025

Recall’s first competiton was live when we recorded the podcast where you could see agents’ performance in real time based on the simple objective to have the highest PnL. The first competition was for agents to trade on-chain and achieve the highest PnL out of all competitors – a basic measure of success by Sena’s own admission, but one which is simple to kick off what could be a very long run of competitions to that effect that pay out real rewards to the winners.

Under this model, Sena expects that end users of AI agents will be able to set realistic benchmarks about what agents can reliably do for them. He told me, “Not only will we see agents excel at skills that people actually need, we’ll see expectations of performance and verifiability.”

“Now, large LLM creators are trying to optimize for foundational model benchmarks, but there’s really no equivalent benchmarks for AI agents. So, when we create these protocols for evaluation and benchmarking… it creates a feedback look of ensuring that an agent does what its developers say [it can do].”

Recall’s website claims that “The agent economy is the fastest growing market in the world with 50B agents expected online by 2030,” however, I couldn’t find any data to support that claim. With or without huge, mind blowing estimates, it’s clear to the naked eye that the adoption rate of AI agents is exploding.

AI Agent Verification Market Primed

In light of the adoption trendline and the total takeover of open-market effects on the development of AI agents at least within the Web3 space, it’s clear that some form of verification is needed beyond a centralized source of truth. This is why I believe a new AI agent verification market is primed for launch.

The AI agent verification market is something Sena also said would be something that Recall merely fits into in the long term rather than corners entirely because it would be one of MAYBE three monoliths in the mainstream that have emerged from Web3 with the first being Bitcoin and the second being NFTs.

This is why I believe a new AI agent verification market is primed for launch.

We know we can’t trust Certik and its ilk to keep the industry honest. We know we can’t trust private companies like OpenAI despite its impressive tech due to its butterfingers when it comes down to real-world performance. It’s too easy to poke holes in Zuckerberg’s shit when it comes to claims about Llama’s performance – or any AI agent for that matter, but especially the open source ones like DeepSeek which, although extremely impressive, has severe limitations of its own.

This isn’t to say that closed source models should be eliminated. Companies should be free to chase a profit off of their models that they spend time and money on. But false advertising is just that.

The Make Up of an AI Agent Verification Market

Right now, it may be relatively easy to pull the wool over our eyes, but with a verification market, that would not be the case.

The Recall Model

Recall’s model of pitting agents against each other in open competition is intriguing. I could see several more of those pop up over the next two years depending on the success of Recall (and its impending token launch). The problem with Recall’s model is that developers will likely build agents specifically to win the cash prize of each competition, which can become a perverse incentive to build to the test rather than to the user.

Another problem with Recall’s model is that it isn’t scalable and will not attract well-funded teams. There is no plan to attract the attention of enterprises, which means there’s no way to get the attention of the enterprise’s partners either, which would presumably bring a massive amount of legitimacy to the Recall model if they were to partner up.

Sena insinuated that these competitions would primarily appeal to degens who can or can’t code and deploy agents on their own. I don’t need to dig into statistics to know that that is a very small cohort of the population. If this is Recall’s target in its search for PMF, it’s a favorably small target, but is minute; the floor may be far too close to the ceiling for any sort of comfort.

“Can we make AI agent competitions like fantasy sports for degen quants?”

The likelihood that a tech lab with $100M+ in investments will seek out a $50K prize in a competition that demands the source code be stored and shared publicly is unlikely, but stranger things have happened. This is a model that appeals to the individual cracked devs and the bootstrappers out there who are trying to make a splash. They must be served in this way to give them the clout they deserve, but the ROI on this model isn’t great for funded firms yet.

A ZK Model

Another model could utilize ZK technology. Unironically, an AI agent swarm could be developed that uses on-chain data via a network that protects source data with ZK proofs. The swarm would produce outputs that provide performance results based on objective measures that are queried through shielded data and proved by ZK tech.

This arrangement would see a firm that develops this type of swarm deploy its swarm and act as a verifiable source of truth for its results. Essentially, the swarm it deploys does most of the work, and the people behind the business would dress up the results and package them for public or private sharing. Hopefully public given the context.

I’ve just read about a new network called Space and Time (SXT Chain) that may be able to accommodate such an arrangement. SXT Chain is able to source data across multiple chains rather quickly (or so it claims!) without revealing data tables to validators, since the validators merely prove the cryptographic fingerprints of each table.

In a grand battle royale, OpenAI, DeepSeek (High Flyer Capital), Meta, and Google could feed the specs of their respective best models through SXT Chain or something like it. The AI agent swarm previously mentioned would query the data based on standard measurements which SXT Chain could return basic data-based yes/no responses to as appropriate without revealing the precious source code.

The swarm could then tabulate measurements and produce results, determining once and for that week which AI model is truly the best.

The beauty of this model is that it may be offered in packages at varying price tiers that appeal to low-end and high-end businesses. SME’s may only need to use the swarm once or twice as a final way to DD their choice, whereas big retailers and shipping conglomerates, as a simple example, may need to run scans on various externally-built and internally-built agents that would be applied across various divisions of the company.

Conclusion

What will the landscape of AI agent verification look like in one, three, and five years from now? What is being built today that will cement the trendline in one way or another? Nobody knows for sure. That’s the nature of the thing.

I have a strong feeling, though, that there will be a robust market for AI agent verification. I fear that other early versions of this will be proprietary. If Virtuals and Eliza each release their own verification tools that end users see as a green or red dot next to an agent would be a failure on the part of builders and end users who believe it.

That would be like trusting the grizzled and stubbly carnival ride operator when he tells you the ride is totally safe. Virtuals nor Eliza are disreputable in any sense in and of themselves, but they can’t be trusted to provide a non-biased take on the capabilities of agents deployed on their platforms. They need more forks and deployments. It drives their reason to exist and, more importantly, their bottom line.

Yet again, we are reminded that in Web3, don’t trust, verify.

The firms that turn agent verification-as-a-service into a thing will thrive as small-scale users fork existing models and enterprises require extensive DD before deploying their home-grown agents. Just think of the possibilities if ZK tech could scale up to that level…

Comments (2)

July 4, 2025
Investments Flowing Into "Autonomous Onchain Agents" —Interview — Sweptpod

[…] agent reliability is the biggest bottleneck on the way towards broader utilization of these tools. AI agent verification platforms are already working to tackle this and other issues in the […]
July 24, 2025
The Rise Of Autonomous Economic AI Agents In Web3: A New Frontier For Crypto Trading — Swept Media

[…] modularity, and transparency, and an element of reliability may be imparted by the likes of Recall Network. Recall presents competitions for AI agents based on their specializations to perform particular […]

Shopping cart

Recent Posts

Responsible AI: Will Web3 Learn

Web3 AI Might Die Without

Sogni & Prodia: Which GenAI