Researchers reveal flaws in AI agent benchmarking

[ad_1]

As brokers utilizing synthetic intelligence have wormed their means into the mainstream for the whole lot from customer support to fixing software program code, it’s more and more necessary to find out that are the perfect for a given utility, and the factors to contemplate when deciding on an agent apart from its performance. And that’s the place benchmarking is available in.

Benchmarks don’t replicate real-world functions

Nevertheless, a brand new analysis paper, AI Brokers That Matter, factors out that present agent analysis and benchmarking processes comprise numerous shortcomings that hinder their usefulness in real-world functions. The authors, 5 Princeton College researchers, observe that these shortcomings encourage improvement of brokers that do properly in benchmarks, however not in observe, and suggest methods to handle them.

“The North Star of this discipline is to construct assistants like Siri or Alexa and get them to truly work — deal with complicated duties, precisely interpret customers’ requests, and carry out reliably,” mentioned a weblog publish in regards to the paper by two of its authors, Sayash Kapoor and Arvind Narayanan. “However that is removed from a actuality, and even the analysis path is pretty new.”

[ad_2]

Leave a Reply

Your email address will not be published. Required fields are marked *