Blogroll

ADeLe: Predicting and explaining AI performance across tasks

Microsoft Research - Wed, 04/01/2026 - 18:00
At a glance
  • AI benchmarks report performance on specific tasks but provide limited insight into underlying capabilities; ADeLe evaluates models by scoring both tasks and models across 18 core abilities, enabling direct comparison between task demands and model capabilities.
  • Using these ability scores, the method predicts performance on new tasks with ~88% accuracy, including for models such as GPT-4o and Llama-3.1.
  • It builds ability profiles and identifies where models are likely to succeed or fail, highlighting strengths and limitations across tasks.
  • By linking outcomes to task demands, ADeLe explains differences in performance, showing how it changes as task complexity increases.

AI benchmarks report how large language models (LLMs) perform on specific tasks but provide little insight into their underlying capabilities that drive their performance. They do not explain failures or reliably predict outcomes on new tasks. To address this, Microsoft researchers in collaboration with Princeton University and Universitat Politècnica de València introduce ADeLe (opens in new tab) (AI Evaluation with Demand Levels), a method that characterizes both models and tasks using a broad set of capabilities, such as reasoning and domain knowledge, so performance on new tasks can be predicted and linked to specific strengths and weaknesses in a model.

In a paper published in Nature, “General Scales Unlock AI Evaluation with Explanatory and Predictive Power (opens in new tab),” the team describes how ADeLe moves beyond aggregate benchmark scores. Rather than treating evaluation as a collection of isolated tests, it represents both benchmarks and LLMs using the same set of capability scores. These scores can then be used to estimate how a model will perform on tasks it has not encountered before. The research was supported by Microsoft’s Accelerating Foundation Models Research (AFMR) grant program.

ADeLe-based evaluation

ADeLe scores tasks across 18 core abilities, such as attention, reasoning, domain knowledge, and assigns each task a value from 0 to 5 based on how much it requires each ability. For example, a basic arithmetic problem might score low on quantitative reasoning, but an Olympiad-level proof would score much higher.

Evaluating a model across many such tasks produces an ability profile—a structured view of where the model performs and where it breaks down. Comparing this profile to the demands of a new task makes it possible to identify the specific gaps that lead to failure. The process is illustrated in Figure 1.

Figure 1. Top: (1) Model performance on the ADeLe benchmark and (2) the resulting ability profiles, showing each model’s strengths and limitations across core abilities. Bottom: (1) Application of 18 scoring criteria to each task and (2) the resulting task profiles, showing the abilities each task requires. Evaluating ADeLe

Using ADeLe, the team evaluated a range of AI benchmarks and model behaviors to understand what current evaluations capture and what they miss. The results show that many widely used benchmarks provide an incomplete and sometimes misleading picture of model capabilities and that a more structured approach can clarify those gaps and help predict how models will behave in new settings.

ADeLe shows that many benchmarks do not isolate the abilities they are intended to measure or only cover a limited range of difficulty levels. For example, a test designed to evaluate logical reasoning may also depend heavily on specialized knowledge or metacognition. Others focus on a narrow range of difficulty, omitting both simpler and more complex cases. By scoring tasks based on the abilities they require, ADeLe makes these mismatches visible and provides a way to diagnose existing benchmarks and design better ones.

Applying this framework to 15 LLMs, the team constructed ability profiles using 0–5 scores for each of 18 abilities. For each ability, the team measured how performance changes with task difficulty and used the difficulty level at which the model has a 50% chance of success as its ability score. Figure 2 illustrates these results as radial plots that show where the model performs well and where it breaks down.

Figure 2. Ability profiles for 15 LLMs across 18 abilities. Left: OpenAI models. Middle: Llama models. Right: DeepSeek-R1 distilled models.

This analysis shows that models differ in their strengths and weaknesses across abilities. Newer models generally outperform older ones, but not consistently across all abilities. Performance on knowledge-heavy tasks depends strongly on model size and training, while reasoning-oriented models show clear gains on tasks requiring logic, learning, abstraction, and social inference. These patterns typically require multiple, separate analyses across different benchmarks and can still produce conflicting conclusions when task demands are not carefully controlled. ADeLe surfaces them within a single framework.

ADeLe also enables prediction. By comparing a model’s ability profile to the demands of a task, it can forecast whether the model will succeed, even on tasks that are unfamiliar. In experiments, this approach achieved approximately 88% accuracy for models like GPT-4o and LLaMA-3.1-405B, outperforming traditional methods. This makes it possible to both explain and anticipate potential failures before deployment, improving the reliability and predictability of AI model assessment.

Whether AI systems can truly reason is a central debate in the field. Some studies report strong reasoning performance, while others show they break down at scale. These results reflect differences in task difficulty. ADeLe shows that benchmarks labeled as measuring “reasoning” vary in what they require, from basic problem-solving to tasks that combine the need for advanced logic, abstraction, and domain knowledge. The same model can score above 90% on lower-demand tests and below 15% on more demanding ones, reflecting differences in task requirements rather than a change in capability.

Reasoning-oriented models like OpenAI’s o1 and GPT-5 show measurable gains over standard models—not only in logic and mathematics but also with interpreting user intent. However, performance declines as task demands increase. AI systems can reason, but only up to a point, and ADeLe identifies where that point is for each model.

Azure AI Foundry Labs

Get a glimpse of potential future directions for AI, with these experimental technologies from Microsoft Research.

Azure AI Foundry Opens in a new tab Looking ahead

ADeLe is designed to evolve alongside advances in AI and can be extended to multimodal and embodied AI systems. It also has the potential to serve as a standardized framework for AI research, policymaking, and security auditing.

More broadly, it advances a more systematic approach to AI evaluation—one that explains system behavior and predicts performance. This work builds on earlier efforts, including Microsoft research on applying psychometrics to AI evaluation and recent work on Societal AI, emphasizing the importance of AI evaluation.

As general-purpose AI systems continue to outpace existing evaluation methods, approaches like ADeLe offer a path toward more rigorous and transparent assessment in real-world use. The research team is working to expand this effort through a broader community. Additional experiments, benchmark annotations, and resources are available on GitHub (opens in new tab).

Opens in a new tab

The post ADeLe: Predicting and explaining AI performance across tasks appeared first on Microsoft Research.

Categories: Microsoft

Gmail's new Inbox Zero mode is now available, but only if you pay up

How-To Geek - Wed, 04/01/2026 - 18:00

Inbox Zero sounds like a dream for many. However, in reality, it’s a daily challenge to go through newsletters, bills, and important emails. Now, Google is taking aim at this frustration with a feature called AI Inbox that was announced as a part of Gemini-powered updates to Gmail earlier in January this year.

Categories: IT General, Technology

Your phone charger is slowly dying while plugged in—here's why

How-To Geek - Wed, 04/01/2026 - 18:00

People fall into two distinct camps when it comes to phone chargers: those who religiously unplug them when they're done, and the rest of us who leave them plugged in indefinitely. If you belong to the latter camp, allow me to offer a few compelling reasons why unplugging your charger is the smarter move.

Categories: IT General, Technology

Update your iPhone now: DarkSword exploits just got patched on iOS 18

How-To Geek - Wed, 04/01/2026 - 17:47

Apple is rolling out a new patch for old iOS versions to protect users from the widespread hacking exploit known as DarkSword. Apple is releasing this new update for iOS 18 users today—anyone on the latest iOS 26 release is already safe.

Categories: IT General, Technology

Spend $50 at Amazon on Easter candy, toys, and games to get $10 off

Mashable - Wed, 04/01/2026 - 17:46

SPEND $50 TO SAVE $10: Amazon's "Fill your Easter basket" event takes $10 off a purchase of $50 or more on select Easter candy, toys, and games.

Eligible Amazon "Fill your Easter basket" items at a glance Lindt Gold Bunny $4.99 at Amazon Get Deal Hello Kitty Heart Pail $11.99 at Amazon (save $8) Get Deal Crayola Scribble Scrubbie Jumbo Pet $12.29 at Amazon (save $3.60) Get Deal

Easter is coming up quickly. If you don't have Easter basket goodies sorted out yet, Amazon is here to save the day. Spend $50 on select Easter candy, toys, and games at Amazon to save $10 on your purchase. Maybe best of all, plenty of these items will arrive with overnight shipping, depending on your location. Since Easter is just four days away, this quick shipping is a major benefit.

The Amazon Easter Basket event includes great candy options from Lindt, Reese's, Sour Patch Kids, Nerds, and more. For non-sweet treats, snag fresh spring art supplies from Crayola. Even a few Squishmallows and Gund stuffed animals are included in the deal. Games include the favorite Bananagrams and even Catan is included in the sale, which is sitting at a sale price of $43.99 instead of the list price of $54.99.

It's best to hop (sorry) on this offer quickly since Amazon doesn't list a time or date when the deal ends. Browse the entire list of eligible items to pick your favorites for this year's Easter baskets.

Categories: IT General, Technology

Please stop giving ChatGPT your passwords, documents, and private photos

How-To Geek - Wed, 04/01/2026 - 17:45

AI chatbots such as ChatGPT, Claude, and Gemini allow you to upload files and photos or paste large chunks of text. You can then do things such as summarizing documents or turning your selfies into Studio Ghibli-style images. The danger is that you can end up uploading personal information that you really shouldn't be sharing.

Categories: IT General, Technology

Plex Media Server just ended support for old Windows systems

How-To Geek - Wed, 04/01/2026 - 17:32

Plex Media Server, the software known for turning any PC or NAS into a personal streaming service, is officially ending support and updates for 32-bit versions of Windows. Plex's decision comes in response to Microsoft's decision to end support for Windows 10 back in October 2025.

Categories: IT General, Technology

KitKat heist tracker lets candy lovers check if their KitKat is from the heist

Mashable - Wed, 04/01/2026 - 17:32

The problem with announcing any kind of news on April 1 is that absolutely nobody will believe you.

Case in point: On Wednesday morning, KitKat announced that customers could use a special online tracking tool to figure out if their purchased confectionery goods were part of the massive 12-ton KitKat heist that's gotten the internet's attention over the past few days.

The KitKat heist tracker was advertised on the official KitKat X account, and whoever runs the account is ardently insisting, both in the original post and in the replies, that this is real and not an April Fool's joke.

SEE ALSO: A 12-ton KitKat heist is breaking the internet This Tweet is currently unavailable. It might be loading or has been removed.

Taking a look at the tracker itself, it's hard to parse fiction from reality. It appears to be a pretty straightforward tracker with a text input for an 8-digit batch code on the back of each KitKat package. I don't personally have any KitKats on hand to test this out with, but I typed in a random 8-digit number and was told that it wasn't part of the stolen batch.

So, at the very least, the tracker is actually checking for something. It's just impossible to say what would happen if you happened to type in a "correct" batch code.

Whether or not the tracker is a hoax, the heist was very real. More than 400,000 KitKat bars were stolen from a delivery truck going between Italy and Poland, prompting plenty of The Fast and the Furious memes (and some genuine concerns for the public supply of KitKats ahead of the Easter holiday).

For what it's worth, the company, Nestle KitKat, says there is no threat to the chocolatey supply chain at this time.

Categories: IT General, Technology

The Linux backup tool nobody talks about—and why it beats every official sync app

How-To Geek - Wed, 04/01/2026 - 17:31

Typically, when you're uploading a file to the cloud, you need to open your browser, log into your account, navigate to the right folder, click the upload button, locate your files or folders in the file picker window, and then hit upload. What if you could just type one line into a terminal, hit enter, and be done? The open-source tool rclone lets you do just that.

Categories: IT General, Technology

The Hisense 55-inch Canvas art TV is down to a new best price ever post-Spring Sale

Mashable - Wed, 04/01/2026 - 17:20

SAVE $400: As of April 1, the Hisense 55-inch Canvas QLED 4K TV is down to only $599.99 at Amazon. That's 40% off its list price and a new best price ever.

Opens in a new window Credit: Hisense Hisense 55-inch S7N Canvas QLED 4K TV $599.99 at Amazon
$999.99 Save $400   Get Deal

The Amazon Big Spring Sale delivered plenty of excellent TV deals, but budget-friendly brand Hisense isn't playing by the rules. While the brand (which is one of our favorites, BTW) did drop prices over the last week, it waited until the sale was officially over to give us its best prices ever on several TV models — including the coveted Canvas art TV.

As of April 1, the Hisense 55-inch Canvas QLED 4K TV is down to just $599.99 at Amazon. That's 40% or $400 off its list price and a new lowest price on record. For those curious, the same TV was $87.98 more during the Spring Sale.

The Canvas TV is an alternative to the popular Samsung The Frame TV for budget-conscious shoppers. Like The Frame, it transforms a basic black box into a stylish piece of artwork that hangs on your wall. Its matte finish allows it to blend seamlessly into a gallery wall with other non-tech wall hangings. Unlike The Frame, it uses Google TV's interface, which Mashable's Miller Kern (a Canvas TV owner) says is "way more intuitive and responsive than Samsung's."

Beyond doubling as artwork, the Hisense Canvas is a QLED TV, so it's noticeably brighter and more saturated than a basic LED TV. It'll look brilliant in any lighting conditions. It also features a variable refresh rate up to 144Hz, which is surprisingly good for gaming, real-time adaptive brightness and color temperature, and an ultra-slim wall mount that lies flush against the wall for the true framed art look.

Categories: IT General, Technology

The Ryobi vault: 6 discontinued tools and products we want back

How-To Geek - Wed, 04/01/2026 - 17:16

If you're a Ryobi fan, you probably have a wide collection of its bright-colored tools you've slowly collected from Home Depot. Over the years, the company has released hundreds of tools and accessories, mainly under the 18V ONE+ tool line, and some of them are better than others. While there's a good reason many of its products end up discontinued, here are a few I wish could come back.

Categories: IT General, Technology

I turned my old SATA SSD into an "abuse drive"—and it's the smartest storage hack I use

How-To Geek - Wed, 04/01/2026 - 17:00

Admittedly, SATA SSDs don't have too many uses anymore; they're nearly obsolete for many people. When faced with a choice, pretty much anyone will pick an NVMe, which offer significantly faster speeds and have done so for many years.

Categories: IT General, Technology

Apple TV's new #1 comedy show still isn't getting the hype it deserves

How-To Geek - Wed, 04/01/2026 - 16:45

When it comes to the best streaming services out there, one that I’ve always loved to talk about is Apple TV. There’s a lot you can do with an Apple TV subscription, but for me, I’m always checking out the television shows on it. Whether it’s Severance or For All Mankind, there are so many different series to check out.

Categories: IT General, Technology

Why vinyl doesn't actually sound better

How-To Geek - Wed, 04/01/2026 - 16:33

Vinyl sales are near all-time highs, and their popularity has revived a conversation that has raged for decades: why does vinyl sound better than digital? Where is the magic in vinyl that digital lacks? But there is no magic—vinyl isn’t better.

Categories: IT General, Technology

Amazons Big Spring Sale is done and dusted, but these Apple deals are still live

Mashable - Wed, 04/01/2026 - 16:16
The best Amazon Big Spring Sale Apple deals still live: Best AirPods deal AirPods Pro 3 $199 (save $50) Get Deal Best AirTag deal Apple AirTag (1st Gen, 4-Pack) $59.99 (save $39.01) Get Deal Best iPad deal Apple iPad Air, 13-inch (M4, WiFi, 128GB) $749 (save $50) Get Deal Best MacBook deal Apple MacBook Air, 13-inch (M4, 16GB RAM, 512GB SSD) $949 (save $250) Get Deal Best Apple Watch deal Apple Watch Series 11 (GPS, 42mm) $329 (save $70) Get Deal

Amazon's Big Spring Sale, in its third consecutive year, dropped an abundance of deals on us this past week. The sale is now officially over (as of March 31), but we're still keeping a watchful eye out to see if any of those deals are still hanging around.

Many of the best Apple deals have vanished; either stock has dried up or prices started creeping back up. But not all of them. Thankfully, a few deals are still holding strong.

Apple has kept us very busy so far in 2026. We've been hard at work testing the first-ever budget Neo MacBook, as well as the upgraded MacBook Airs and MacBook Pros, a new iPad Air, and the iPhone 17e. Next on our Apple to-do list: the AirPods Max 2.

SEE ALSO: Amazon's Spring Sale is almost over — these are the top 10 deals to shop before midnight

With so many new Apple products, we've seen a bunch of first-time discounts during the Amazon Spring Sale, as well as new record-low prices on last-gen models like the Apple Watch SE 2 and M4 MacBook Air. Most of our favorite deals on last-gen products have gone in and out of stock throughout the week at Amazon, but competitors still have them in stock.

If the model you want is no longer available, be sure to check at Best Buy and Walmart. These retailers are matching (or within a few dollars of) many of the best Spring Sale deals and tend to have more transparent pricing.

We're still tracking prices below on AirPods, MacBooks, and iPads post sale. Just a heads-up: if you see a deal crossed out below, it means it has either sold out or the price has gone back up.

Best AirPods deal Apple AirPods Pro 3 $199 at Amazon
$249 Save $50   Get Deal at Amazon Get Deal at Walmart Why we like it

We've always been fans of the AirPods Pro, but the third generation really takes them to new heights. Mashable's reviewer called them, "without a doubt, one of the best products of the year" in 2025.

The latest premium buds feature outstanding noise cancellation (twice as good as the Pro 2), a solid eight hours of battery life with ANC on, and new foam-infused tips that come in five sizes to find the perfect fit. Apple also brought the heart rate monitoring tech from the Powerbeats Pro 2 (our favorite earbuds for working out) and Fitness app compatibility to the new AirPods Pro 3, and introduced a live translation feature.

While we've seen the buds drop as low as $184 in the past, this Amazon spring sale discount is still worth grabbing — and still hanging on a day after the sale.

More AirPods deals Best AirTag deal Opens in a new window Credit: Apple Apple AirTag (1st Gen, 4-Pack) $59.99 at Walmart
$99 Save $39.01   Get Deal Why we like it

This four-pack of Apple's Bluetooth trackers has gone in and out of stock at Amazon over the last week, but luckily, Walmart's stock has stayed consistent. Right now, get a 4-pack of 1st gen AirTags for just $59.99, the lowest price ever on the bundle. You wind up paying just $15 per AirTag. As of April 1, this pack of AirTags is out of stock at Amazon, but still available for $59.99 at Walmart and Best Buy.

Best iPad deal Apple iPad Air, 13-inch (M4, WiFi, 128GB) $746.50 at Amazon
$799 Save $52.50   Get Deal at Amazon Why we like it

Just launched last month, the 13-inch M4 iPad Air is already on sale. The base configuration with WiFi connectivity and 128GB of storage is now just $749 (normally $799). Mashable's tech editor took the M4 iPad Air for a spin and found it pretty impressive — too impressive for the average user, even. "It delivers iterative updates that improve an already stellar tablet," he writes in his review.

More iPad dealsBest Amazon Big Spring Sale MacBook deal Opens in a new window Credit: Apple Apple MacBook Air, 13-inch (M4, 16GB RAM, 512GB SSD) $949 at Best Buy
$1,199 Save $250   Get Deal Why we like it

Apple released its new MacBook Air with the M5 chip last month, but honestly, they're not a whole lot different than the M4 Airs. The main differences are the upgraded processor and a new wireless chip. You can save some money by grabbing the M4 model with 16GB RAM and 512GB of storage, which is currently down to just $949 at Best Buy. Amazon had the same deal at the start of the Big Spring Sale, but it's since gone out of stock multiple times and jumped up to $999.

More MacBook Air dealsMacBook Pro dealsBest Apple Watch deal Apple Watch Series 11 (GPS, 42mm) $329 at Amazon
$399 Save $70   Get Deal at Amazon Why we like it

The Apple Watch Series 11 offers significant battery improvements over its predecessor. For that reason alone, it's worth the upgrade. It also features a tougher build with durable glass that's twice as resistant to scratches, 5G capability for quicker connectivity, and a Sleep Score and hypertension tool that can flag chronic high blood pressure. It's not a major upgrade (is anything in 2026?), but as Mashable's reviewer put it: "Buy it for the battery life." Now $70 cheaper, it's an even better value.

More Apple Watch deals Apple Pencil deals
Categories: IT General, Technology

Samsung Galaxy has lost its way, and I don't know where to turn

How-To Geek - Wed, 04/01/2026 - 16:15

I've tested and reviewed dozens of smartphones over the years, but my daily driver has remained a Samsung since the Galaxy S4 launched in 2013. Sure, I've run with a Pixel or OnePlus for months at a time, but I always end up returning to Samsung.

Categories: IT General, Technology

Matter 1.5.1 gives your smart cameras an upgrade

How-To Geek - Wed, 04/01/2026 - 16:07

Smart cameras are only as good as how reliably you can access them, and they have remained as one of Matter's trickiest challenges. The Connectivity Standards Alliance is addressing this with Matter 1.5.1, an incremental but meaningful update focusing on refining device performance and improving flexibility across ecosystems. The latest version is aimed at cameras and video doorbells that joined the Matter ecosystem last November.

Categories: IT General, Technology

Stop blaming your ISP—6 hidden bottlenecks secretly killing your gigabit speeds

How-To Geek - Wed, 04/01/2026 - 16:00

A gigabit internet plan is more than enough for just about every household. But what do you do if, despite paying a lot of money for an expensive plan, your connection is still disappointing? Investigate, of course.

Categories: IT General, Technology

Krispy Kreme is celebrating NASAs Artemis II mission with a new space-themed doughnut — how to try

Mashable - Wed, 04/01/2026 - 15:54

We know there is an awful lot going on in the world right now, but we shouldn't forget that humans are actually heading back into space. That's a big deal.

To mark the launch of NASA’s Artemis II mission — the first crewed flight to the Moon in over 50 years — Krispy Kreme has dropped a limited-edition doughnut that looks out of this world. It's a classic Original Glazed doughnut that's dipped in blue vanilla-flavored icing and topped with OREO crunch and white nonpareil stars. It's finished with a cookies-and-cream buttercream dollop and a red icing chevron — a sweet nod to the NASA logo.

View this post on Instagram

You can find these limited-edition doughnuts in-shop at participating shops nationwide or you can order for pickup or delivery through the Krispy Kreme app and website. The Artemis II doughnut is available through April 2 — don't miss out on this chance to celebrate this momentous occasion.

Categories: IT General, Technology

A24s Mother Mary trailer is worth it for the FKA twigs track

Mashable - Wed, 04/01/2026 - 15:50

A24's latest look at David Lowery's upcoming film Mother Mary gives us more of an idea of what to expect when you blend Anne Hathaway and FKA twigs into a pop star dream.

In the trailer above, you can hear Hathaway's vocals on the film's second single, "My Mouth Is Lonely For You," written by the iconic English singer-songwriter; the first track "Burial" was unveiled in March. It's a bit of a moment for pop star-related A24 films right now, with the Mother Mary soundtrack not only including tracks written by FKA twigs but also Charli xcx and Jack Antonoff. The Mother Mary: Greatest Hits album arrives Apr. 17, just before the film releases, and I'll personally be blasting it loud enough to either help or very much hinder the marketing team's work.

Of course, the visuals mean everything here too, as the trailer shows The Devil Wears Prada 2 headliner as eponymous pop star Mother Mary, who reconnects with her best friend, costume designer Sam Anselm (Michaela Coel), before returning to the stage after a hiatus. Hathaway and Coel lead a formidable cast including Hunter Schafer, Atheena Frizzell, Sian Clifford, Kaia Gerber, Jessica Brown Findlay, Isaura Barbé-Brown, and Alba Baptista.

Mother Mary hits cinemas April 24.

Categories: IT General, Technology
Syndicate content

eXTReMe Tracker