Blogroll

Samsung DeX changed how I buy phones: USB ports and processors matter way more than you think

How-To Geek - 1 hour 11 min ago

I use a Samsung Galaxy phone plugged into a monitor with DeX mode as my primary computer. This means when I shop for a new phone, I'm also shopping for a PC. The details I pay attention to are different from most. If you're interested in this same setup, here's what to look out for.

Categories: IT General, Technology

Your Google smart speakers can now understand more commands

How-To Geek - 1 hour 18 min ago

Smart homes are supposed to make life easier, but you might know that’s not always the case if you’ve yelled at a speaker that misunderstood a simple command. Following last month’s major update to Gemini for Home, the company is rolling out Google Home v4.2, focusing on fixing these small but frustrating experiences, along with some improvements to smart home controls.

Categories: IT General, Technology

10 tools every homelabber should try at least once

How-To Geek - 1 hour 30 min ago

Are you looking for fun (or unique) pieces of software to expand your homelab with? I’ve been on the hunt for new software lately, and found 10 tools that everyone should try at least once. In no particular order, here are tools that have (or will) change how I run my homelab.

Categories: IT General, Technology

How to watch the Artemis II launch, the first trip to the Moon in 53 years

How-To Geek - 1 hour 35 min ago

NASA is poised to return to the Moon over 53 years after Apollo 17, and this time you don't need a TV to follow along. The Artemis II mission is scheduled to launch from the Kennedy Space Center on April 1st as soon as 6:24PM ET—here's how to watch as astronauts make history.

Categories: IT General, Technology

The 2027 Kia Seltos just redefined budget luxury SUVs

How-To Geek - 1 hour 40 min ago

Affordable SUVs are no longer content with simply offering basic practicality. As buyer expectations rise, the line between mainstream value and entry-level luxury is becoming increasingly blurred. With its debut today at the New York International Auto Show, the all-new 2027 Kia Seltos makes a strong case that budget-friendly no longer has to mean bare-bones.

Categories: IT General, Technology

AirTags are the best Home Assistant accessory you've overlooked—here's 5 ways I'm using them

How-To Geek - 2 hours 11 min ago

You've likely been sitting on automation potential right in your pocket or attached to your keys. While Apple AirTags started out as a way to find lost luggage or misplaced wallets, they've quietly evolved into the coolest and most versatile Home Assistant accessory out there right now. They are devices that give you precise locations, but they can easily be used in a smart home. Location tracking doesn't have to be as simple as we make it. Using just a cheap tag, you add more features to your home without much effort.

Categories: IT General, Technology

This Subaru SUV hits 60 mph in under 5 seconds—and seats seven

How-To Geek - 2 hours 26 min ago

Subaru just pulled the wraps off its newest SUV, and it’s a pretty big deal for the brand. The all-electric 2027 Getaway is its first three-row model—and also its most powerful yet.

Categories: IT General, Technology

5 PC-building facts that sound like complete nonsense

How-To Geek - 2 hours 41 min ago

Building PCs is hardly anything new, but it's definitely an enthusiast thing, and that can create a lot of myths and misconceptions. You've probably heard myths, such as that liquid coolers are dangerous and can flood your entire PC, or that SSDs are less reliable than HDDs.

Categories: IT General, Technology

8 new shows and movies streaming on HBO Max in April

How-To Geek - 2 hours 56 min ago

I don’t know about you guys, but I’ll be spending a lot of time on HBO Max in April after seeing all the new shows and movies streaming this month.

Categories: IT General, Technology

Kia's compact EV3 electric SUV comes to the US with 320 miles of range

How-To Geek - 2 hours 57 min ago

Kia has introduced the US version of its EV3 crossover, and it's poised to deliver strong range and charging capabilities for a small electric SUV.

Categories: IT General, Technology

ADeLe: Predicting and explaining AI performance across tasks

Microsoft Research - 3 hours 10 min ago
At a glance
  • AI benchmarks report performance on specific tasks but provide limited insight into underlying capabilities; ADeLe evaluates models by scoring both tasks and models across 18 core abilities, enabling direct comparison between task demands and model capabilities.
  • Using these ability scores, the method predicts performance on new tasks with ~88% accuracy, including for models such as GPT-4o and Llama-3.1.
  • It builds ability profiles and identifies where models are likely to succeed or fail, highlighting strengths and limitations across tasks.
  • By linking outcomes to task demands, ADeLe explains differences in performance, showing how it changes as task complexity increases.

AI benchmarks report how large language models (LLMs) perform on specific tasks but provide little insight into their underlying capabilities that drive their performance. They do not explain failures or reliably predict outcomes on new tasks. To address this, Microsoft researchers in collaboration with Princeton University and Universitat Politècnica de València introduce ADeLe (opens in new tab) (AI Evaluation with Demand Levels), a method that characterizes both models and tasks using a broad set of capabilities, such as reasoning and domain knowledge, so performance on new tasks can be predicted and linked to specific strengths and weaknesses in a model.

In a paper published in Nature, “General Scales Unlock AI Evaluation with Explanatory and Predictive Power (opens in new tab),” the team describes how ADeLe moves beyond aggregate benchmark scores. Rather than treating evaluation as a collection of isolated tests, it represents both benchmarks and LLMs using the same set of capability scores. These scores can then be used to estimate how a model will perform on tasks it has not encountered before. The research was supported by Microsoft’s Accelerating Foundation Models Research (AFMR) grant program.

ADeLe-based evaluation

ADeLe scores tasks across 18 core abilities, such as attention, reasoning, domain knowledge, and assigns each task a value from 0 to 5 based on how much it requires each ability. For example, a basic arithmetic problem might score low on quantitative reasoning, but an Olympiad-level proof would score much higher.

Evaluating a model across many such tasks produces an ability profile—a structured view of where the model performs and where it breaks down. Comparing this profile to the demands of a new task makes it possible to identify the specific gaps that lead to failure. The process is illustrated in Figure 1.

Figure 1. Top: (1) Model performance on the ADeLe benchmark and (2) the resulting ability profiles, showing each model’s strengths and limitations across core abilities. Bottom: (1) Application of 18 scoring criteria to each task and (2) the resulting task profiles, showing the abilities each task requires. Evaluating ADeLe

Using ADeLe, the team evaluated a range of AI benchmarks and model behaviors to understand what current evaluations capture and what they miss. The results show that many widely used benchmarks provide an incomplete and sometimes misleading picture of model capabilities and that a more structured approach can clarify those gaps and help predict how models will behave in new settings.

ADeLe shows that many benchmarks do not isolate the abilities they are intended to measure or only cover a limited range of difficulty levels. For example, a test designed to evaluate logical reasoning may also depend heavily on specialized knowledge or metacognition. Others focus on a narrow range of difficulty, omitting both simpler and more complex cases. By scoring tasks based on the abilities they require, ADeLe makes these mismatches visible and provides a way to diagnose existing benchmarks and design better ones.

Applying this framework to 15 LLMs, the team constructed ability profiles using 0–5 scores for each of 18 abilities. For each ability, the team measured how performance changes with task difficulty and used the difficulty level at which the model has a 50% chance of success as its ability score. Figure 2 illustrates these results as radial plots that show where the model performs well and where it breaks down.

Figure 2. Ability profiles for 15 LLMs across 18 abilities. Left: OpenAI models. Middle: Llama models. Right: DeepSeek-R1 distilled models.

This analysis shows that models differ in their strengths and weaknesses across abilities. Newer models generally outperform older ones, but not consistently across all abilities. Performance on knowledge-heavy tasks depends strongly on model size and training, while reasoning-oriented models show clear gains on tasks requiring logic, learning, abstraction, and social inference. These patterns typically require multiple, separate analyses across different benchmarks and can still produce conflicting conclusions when task demands are not carefully controlled. ADeLe surfaces them within a single framework.

ADeLe also enables prediction. By comparing a model’s ability profile to the demands of a task, it can forecast whether the model will succeed, even on tasks that are unfamiliar. In experiments, this approach achieved approximately 88% accuracy for models like GPT-4o and LLaMA-3.1-405B, outperforming traditional methods. This makes it possible to both explain and anticipate potential failures before deployment, improving the reliability and predictability of AI model assessment.

Whether AI systems can truly reason is a central debate in the field. Some studies report strong reasoning performance, while others show they break down at scale. These results reflect differences in task difficulty. ADeLe shows that benchmarks labeled as measuring “reasoning” vary in what they require, from basic problem-solving to tasks that combine the need for advanced logic, abstraction, and domain knowledge. The same model can score above 90% on lower-demand tests and below 15% on more demanding ones, reflecting differences in task requirements rather than a change in capability.

Reasoning-oriented models like OpenAI’s o1 and GPT-5 show measurable gains over standard models—not only in logic and mathematics but also with interpreting user intent. However, performance declines as task demands increase. AI systems can reason, but only up to a point, and ADeLe identifies where that point is for each model.

PODCAST SERIES

The AI Revolution in Medicine, Revisited

Join Microsoft’s Peter Lee on a journey to discover how AI is impacting healthcare and what it means for the future of medicine.

Listen now Opens in a new tab Looking ahead

ADeLe is designed to evolve alongside advances in AI and can be extended to multimodal and embodied AI systems. It also has the potential to serve as a standardized framework for AI research, policymaking, and security auditing.

More broadly, it advances a more systematic approach to AI evaluation—one that explains system behavior and predicts performance. This work builds on earlier efforts, including Microsoft research on applying psychometrics to AI evaluation and recent work on Societal AI, emphasizing the importance of AI evaluation.

As general-purpose AI systems continue to outpace existing evaluation methods, approaches like ADeLe offer a path toward more rigorous and transparent assessment in real-world use. The research team is working to expand this effort through a broader community. Additional experiments, benchmark annotations, and resources are available on GitHub (opens in new tab).

Opens in a new tab

The post ADeLe: Predicting and explaining AI performance across tasks appeared first on Microsoft Research.

Categories: Microsoft

Gmail's new Inbox Zero mode is now available, but only if you pay up

How-To Geek - 3 hours 11 min ago

Inbox Zero sounds like a dream for many. However, in reality, it’s a daily challenge to go through newsletters, bills, and important emails. Now, Google is taking aim at this frustration with a feature called AI Inbox that was announced as a part of Gemini-powered updates to Gmail earlier in January this year.

Categories: IT General, Technology

Your phone charger is slowly dying while plugged in—here's why

How-To Geek - 3 hours 11 min ago

People fall into two distinct camps when it comes to phone chargers: those who religiously unplug them when they're done, and the rest of us who leave them plugged in indefinitely. If you belong to the latter camp, allow me to offer a few compelling reasons why unplugging your charger is the smarter move.

Categories: IT General, Technology

Update your iPhone now: DarkSword exploits just got patched on iOS 18

How-To Geek - 3 hours 24 min ago

Apple is rolling out a new patch for old iOS versions to protect users from the widespread hacking exploit known as DarkSword. Apple is releasing this new update for iOS 18 users today—anyone on the latest iOS 26 release is already safe.

Categories: IT General, Technology

Please stop giving ChatGPT your passwords, documents, and private photos

How-To Geek - 3 hours 26 min ago

AI chatbots such as ChatGPT, Claude, and Gemini allow you to upload files and photos or paste large chunks of text. You can then do things such as summarizing documents or turning your selfies into Studio Ghibli-style images. The danger is that you can end up uploading personal information that you really shouldn't be sharing.

Categories: IT General, Technology

Plex Media Server just ended support for old Windows systems

How-To Geek - 3 hours 39 min ago

Plex Media Server, the software known for turning any PC or NAS into a personal streaming service, is officially ending support and updates for 32-bit versions of Windows. Plex's decision comes in response to Microsoft's decision to end support for Windows 10 back in October 2025.

Categories: IT General, Technology

KitKat heist tracker lets candy lovers check if their KitKat is from the heist

Mashable - 3 hours 39 min ago

The problem with announcing any kind of news on April 1 is that absolutely nobody will believe you.

Case in point: On Wednesday morning, KitKat announced that customers could use a special online tracking tool to figure out if their purchased confectionery goods were part of the massive 12-ton KitKat heist that's gotten the internet's attention over the past few days.

The KitKat heist tracker was advertised on the official KitKat X account, and whoever runs the account is ardently insisting, both in the original post and in the replies, that this is real and not an April Fool's joke.

SEE ALSO: A 12-ton KitKat heist is breaking the internet This Tweet is currently unavailable. It might be loading or has been removed.

Taking a look at the tracker itself, it's hard to parse fiction from reality. It appears to be a pretty straightforward tracker with a text input for an 8-digit batch code on the back of each KitKat package. I don't personally have any KitKats on hand to test this out with, but I typed in a random 8-digit number and was told that it wasn't part of the stolen batch.

So, at the very least, the tracker is actually checking for something. It's just impossible to say what would happen if you happened to type in a "correct" batch code.

Whether or not the tracker is a hoax, the heist was very real. More than 400,000 KitKat bars were stolen from a delivery truck going between Italy and Poland, prompting plenty of The Fast and the Furious memes (and some genuine concerns for the public supply of KitKats ahead of the Easter holiday).

For what it's worth, the company, Nestle KitKat, says there is no threat to the chocolatey supply chain at this time.

Categories: IT General, Technology

The Linux backup tool nobody talks about—and why it beats every official sync app

How-To Geek - 3 hours 40 min ago

Typically, when you're uploading a file to the cloud, you need to open your browser, log into your account, navigate to the right folder, click the upload button, locate your files or folders in the file picker window, and then hit upload. What if you could just type one line into a terminal, hit enter, and be done? The open-source tool rclone lets you do just that.

Categories: IT General, Technology

The Hisense 55-inch Canvas art TV is down to a new best price ever post-Spring Sale

Mashable - 3 hours 51 min ago

SAVE $400: As of April 1, the Hisense 55-inch Canvas QLED 4K TV is down to only $599.99 at Amazon. That's 40% off its list price and a new best price ever.

Opens in a new window Credit: Hisense Hisense 55-inch S7N Canvas QLED 4K TV $599.99 at Amazon
$999.99 Save $400   Get Deal

The Amazon Big Spring Sale delivered plenty of excellent TV deals, but budget-friendly brand Hisense isn't playing by the rules. While the brand (which is one of our favorites, BTW) did drop prices over the last week, it waited until the sale was officially over to give us its best prices ever on several TV models — including the coveted Canvas art TV.

As of April 1, the Hisense 55-inch Canvas QLED 4K TV is down to just $599.99 at Amazon. That's 40% or $400 off its list price and a new lowest price on record. For those curious, the same TV was $87.98 more during the Spring Sale.

The Canvas TV is an alternative to the popular Samsung The Frame TV for budget-conscious shoppers. Like The Frame, it transforms a basic black box into a stylish piece of artwork that hangs on your wall. Its matte finish allows it to blend seamlessly into a gallery wall with other non-tech wall hangings. Unlike The Frame, it uses Google TV's interface, which Mashable's Miller Kern (a Canvas TV owner) says is "way more intuitive and responsive than Samsung's."

Beyond doubling as artwork, the Hisense Canvas is a QLED TV, so it's noticeably brighter and more saturated than a basic LED TV. It'll look brilliant in any lighting conditions. It also features a variable refresh rate up to 144Hz, which is surprisingly good for gaming, real-time adaptive brightness and color temperature, and an ultra-slim wall mount that lies flush against the wall for the true framed art look.

Categories: IT General, Technology

The Ryobi vault: 6 discontinued tools and products we want back

How-To Geek - 3 hours 55 min ago

If you're a Ryobi fan, you probably have a wide collection of its bright-colored tools you've slowly collected from Home Depot. Over the years, the company has released hundreds of tools and accessories, mainly under the 18V ONE+ tool line, and some of them are better than others. While there's a good reason many of its products end up discontinued, here are a few I wish could come back.

Categories: IT General, Technology
Syndicate content

eXTReMe Tracker