Blogroll
Forget the Mazda MX-5 - this is the most fun Japanese sports car
When most people think of fun Japanese sports cars, one lightweight roadster usually comes to mind, but there’s another classic that many drivers overlook. A mid-engine icon from Toyota delivers a unique blend of balance, responsiveness, and sheer driving enjoyment that continues to thrill enthusiasts decades after its debut. For those who crave pure fun behind the wheel, it arguably offers an experience that even the most beloved roadster can’t quite match.
UniRG: Scaling medical imaging report generation with multimodal reinforcement learning
- AI-driven medical image report generation can help medical providers become more efficient and productive.
- Current models are difficult to train because reporting practices vary widely among providers.
- Universal Report Generation (UniRG) uses reinforcement learning to align model training with real-world radiology practice rather than proxy text-generation objectives.
- UniRG has achieved state-of-the-art performance across datasets, metrics, diagnostic tasks, longitudinal settings, and demographic subgroups.
- Test results show that reinforcement learning, guided by clinically meaningful reward signals, can substantially improve the reliability and generality of medical vision–language models.
AI can be used to produce clinically meaningful radiology reports using medical images like chest x-rays. Medical image report generation can reduce reporting burden while improving workflow efficiency for healthcare professionals. Beyond the real-world benefits, report generation has also become a critical benchmark for evaluating multimodal reasoning in healthcare AI.
Despite recent advances driven by large vision–language models, current systems still face major limitations in real-world clinical settings. One challenge stems from the wide variation in radiology reporting practices across institutions, departments, and patient populations. A model trained with supervised fine-tuning on one set of data may learn its specific phrasing and conventions instead of more general patterns—a problem known as overfitting. As a result, the model performs well on that data but delivers poor results when evaluated on unseen institutions or external datasets. Moreover, since model training is often aimed at producing text that looks similar to existing reports, some well written but clinically inaccurate reports can slip through.
In this blog, we introduce Universal Report Generation (UniRG) (opens in new tab), a reinforcement learning–based framework for medical imaging report generation. This work is a research prototype intended to advance medical AI research and is not validated for clinical use. UniRG uses reinforcement learning as a unifying mechanism to directly optimize clinically grounded evaluation signals, aligning model training with real-world radiology practice rather than proxy text-generation objectives. Using this framework, we train UniRG-CXR (opens in new tab), a state-of-the-art chest x-ray report generation model at scale, spanning over 560,000 studies, 780,000 images, and 226,000 patients from more than 80 medical institutions.
To our knowledge, this is the first report generation model to achieve consistent state-of-the-art performance across report-level metrics, disease-level diagnostic accuracy, cross-institution generalization, longitudinal report generation, and demographic subgroups. These results demonstrate that reinforcement learning, when guided by clinically meaningful reward signals, can substantially improve both the reliability and generality of medical vision–language models.
Spotlight: Microsoft research newsletter
Microsoft Research Newsletter Subscribe today Opens in a new tab A unified framework for scaling medical image report generationUniRG builds state-of-the-art report generation models by combining supervised fine-tuning with reinforcement learning, which optimizes a composite reward that integrates rule-based metrics, model-based semantic metrics, and LLM-based clinical error signals. This approach allows the resulting model UniRG-CXR to learn from diverse data sources, move beyond dataset-specific reporting patterns, and learn representations that generalize across institutions, metrics, and clinical contexts. Notably, UniRG-CXR sets a new state of the art on the authoritative ReXrank leaderboard (opens in new tab), a public leaderboard for chest X-ray image interpretation, as of 01/22/2026, surpassing previous best models by substantial margins (Figure 1).
Figure 1. Overview of UniRG-CXR. (a) Training Data: UniRG-CXR is trained on the training splits of MIMIC-CXR, CheXpert Plus, and ReXGradient-160k, covering diverse institutions and patient demographics. (b) Training and Rewards: Taking input from the current image, clinical context (e.g., indication), and optionally prior studies, UniRG-CXR uses GRPO reinforcement learning to optimize composite rewards that combine rule-based, model-based, and LLM-based metrics. (c) Evaluation: We assess UniRG-CXR on held-out test sets (MIMIC-CXR, CheXpert Plus, ReXGradient), and unseen datasets (IU Xray and proprietary data). Report quality measured using ReXrank metrics and an LLM-based clinical-error metric, while diagnostic ability is evaluated via F1-based disease classification from generated reports. (d) ReXrank Results: UniRG-CXR achieves SOTA performance across four datasets and two generation settings (findings only and findings + impression), showing substantial gains over prior state-of-the-art systems. Universal improvements across metrics and clinical errorsRather than excelling on one metric at the expense of others, UniRG-CXR delivers balanced improvements across many different measures of report quality. More importantly, it produces reports with substantially fewer clinically significant errors. This indicates that the model is not just learning how to sound like a radiology report, but is better capturing the underlying clinical facts. Explicitly optimizing for clinical correctness helps the model avoid common failure modes where fluent language masks incorrect or missing findings (Figure 2).
Figure 2. UniRG-CXR achieves state-of-the-art performance, delivering consistent and comprehensive performance gains across metrics. (a) On the ReXrank leaderboard, UniRG-CXR (green) shows robust, universal improvement across all evaluation metrics. (b). Starting from the same SFT checkpoint, RL with our combined reward achieves more balanced gains across metrics and the highest RadCliQ-v1 score compared to RL on single metrics. This ablation study is trained and tested on MIMIC (c). Ablation study on the training dynamics shows RL full (UniRG-CXR) achieves significantly better RadCliQ-v1 score than RL only on BLEU. (d). During training, RL full (UniRG-CXR) shows a steady decrease in clinical errors per report as compared with a fluctuating trajectory without consistent improvement from an ablation run without error awareness (i.e. removing CheXprompt metric optimization). Both (c) and (d) show results on 1024 MIMIC validation set from ablations that are trained on MIMIC. (e). Case studies illustrate that UniRG-CXR can produce error-free reports, unlike MedVersa and MedGemma. (f). UniRG-CXR yields a substantially higher proportion of reports with $\leq 1$ error and fewer with $\geq 4$ errors than prior models. Strong performance in longitudinal report generationIn clinical practice, radiologists often compare current images with prior exams to determine whether a condition is improving, worsening, or unchanged. UniRG-CXR is able to incorporate this historical information effectively, generating reports that reflect meaningful changes over time. This allows the model to describe new findings, progression, or resolution of disease more accurately, moving closer to how radiologists reason across patient histories rather than treating each exam in isolation (Figure 3).
Figure 3. UniRG-CXR enhances longitudinal report generation. (a). Comparing UniRG-CXR and its non-longitudinal ablation with prior models on longitudinal report generation, we show UniRG-CXR exhibits the best performance and the longitudinal information is beneficial to the performance. (b). UniRG-CXR achieves the best performance across different longitudinal encounter points ranging from the first encounter to the more complex 5th+ encounters, showcasing its improvements are across the board. In comparison, prior models such as GPT-5, GPT-4o and MedGemma are barely surpassing the copy prior report baseline (grey lines). (c). Compared with prior models which barely improve over the copy prior baseline (dashed line), UniRG-CXR significantly and consistently improves performance across different temporal disease change categories including new development, no change, progression and regression (categorized by GPT-5 on ground truth report). Qualitative examples are shown for each category where UniRG-CXR correctly predicts the temporal change based on the input. All results in this figure are on MIMIC test set with prior information where available. Robust generalization across institutions and populationsUniRG-CXR maintains strong performance even when applied to data from institutions it has never seen before. This suggests that the model is learning general clinical patterns rather than memorizing institution-specific reporting styles. In addition, its performance remains stable across different patient subgroups, including age, gender, and race. This robustness is critical for real-world deployment, where models must perform reliably across diverse populations and healthcare environments (Figure 4).
Figure 4. Generalization and robustness of UniRG-CXR. (a). We evaluate UniRG-CXR in a zero-shot setting on two datasets from previously unseen institutions: IU-Xray and PD (proprietary data). UniRG-CXR consistently outperforms prior models, maintaining substantial performance gains in this challenging setup. (b) and (c) present condition-level F1 scores on MIMIC-CXR and PD and highlight that UniRG-CXR remains the overall top-performing model in condition-level diagnostic accuracy. (d). UniRG-CXR demonstrates stable and robust performance across gender, age, and race subgroups, all of which exceed the performance of the second-best model (the dashed lines). UniRG is a promising step toward scaling medical imaging report generationUniRG introduces a reinforcement learning–based framework that rethinks how medical imaging report generation models are trained and evaluated. By directly optimizing clinically grounded reward signals, UniRG-CXR achieves state-of-the-art performance across datasets, metrics, diagnostic tasks, longitudinal settings, and demographic subgroups, addressing longstanding limitations of supervised-only approaches.
Looking ahead, this framework can be extended to additional imaging modalities and clinical tasks, and combined with richer multimodal patient data such as prior imaging, laboratory results, and clinical notes. More broadly, UniRG highlights the promise of reinforcement learning as a core component of next-generation medical foundation models that are robust, generalizable, and clinically aligned.
UniRG reflects Microsoft’s larger commitment to advancing multimodal generative AI for precision health (opens in new tab), with other exciting progress such as GigaPath, BiomedCLIP, LLaVA-Rad (opens in new tab), BiomedJourney, BiomedParse, TrialScope, Curiosity.
Paper co-authors: Qianchu Liu, Sheng Zhang, Guanghui Qin, Yu Gu, Ying Jin, Sam Preston, Yanbo Xu, Sid Kiblawi, Wen-wai Yim, Tim Ossowski, Tristan Naumann, Mu Wei, Hoifung Poon
Opens in a new tabThe post UniRG: Scaling medical imaging report generation with multimodal reinforcement learning appeared first on Microsoft Research.
You could get an Amazon refund, thanks to class action settlement
Amazon has agreed to a massive class action settlement that could put money back into the pockets of millions of U.S. shoppers who were incorrectly denied refunds on returns. This is big news for anyone who has ever had to chase down a credit after sending an item back to the retailer.
Valve's Proton compatibility layer brings 19 more games to Linux
The Proton compatibility layer has helped bring many PC games to Linux, and it's updated regularly with new features and improvements. Now, Proton 10.0-4 has arrived with 19 new compatible games and a pile of bug fixes.
Google Messages is about to get better on your watch
Google is showing no signs of slowing down the updates to Google Messages. And while some updates are better than others, a few highly requested features could finally be coming that'll improve the experience on your smartwatch.
You can buy AMD's fastest graphics card for $700, but there's a catch
AMD's new RX 9000 Series graphics cards have been turning heads for a good reason—they offer excellent value. But with the global DRAM shortage threatening to drive prices up, now could be the perfect time to explore the used market. And what better choice than AMD's most powerful gaming graphics card ever—a card that sometimes even outperforms the latest RX 9070 XT?
3 reasons KDE Plasma is still my go-to Linux desktop
If you read our Linux newsletter, you know that I've tried several desktop environments over the years. I'm yet to find one I like better than KDE Plasma, though, and these are the reasons why.
Why your Plex transcodes are slow (and how to speed them up)
You've got your Plex server set up and your content loaded into your Plex library, but when you play back certain videos, you get constant buffering, or poor image quality, or both. That's a common issue people face, and often the cause has to do with how your Plex server handles transcoding.
Heroic Games Launcher for Linux just got a big update
After five months of work, the developers of Heroic Games Launcher have debuted version 2.19 of its open source game manager targeting Linux users. It includes experimental support for a new gaming platform to go alongside its existing support for Epic Games Store, GoG Store, and Amazon Luna.
This one free app solved all my Windows 11 lag and stutter
Stuttering and lag are some of the toughest issues to diagnose on a Windows computer, which is a shame because in my experience Windows is the operating system where this crops up most often.
Why desktop Linux matters, even if (almost) no one uses it
Linux—you've heard of it, and maybe you've given it a try once or twice. However, statistically, you're probably not a committed desktop Linux user. Globally, as of 2025, just over 4% of desktop PCs run Linux. That's a tiny slice of the market, and yet it's a milestone for Linux worth celebrating.
Forget depreciation: this affordable sports car refuses to lose value
Most sports cars lose value quickly, leaving buyers with steep depreciation just when they expected thrills and style. But one affordable performance car defies that trend, holding its value far better than most of its peers and giving owners surprising long-term equity. In a segment where rapid price drops are the norm, this model’s exceptional resale strength makes it one of the smartest buys for drivers who want fun without the typical financial hit.
3 rookie mistakes to avoid when dual-booting Linux
If you're thinking of dual-booting Linux alongside Windows on your PC, there are a few things I recommend you be sure to avoid doing. You'll be modifying the underlying system and building the foundation for your PC workflow that you don't want to mess up.
4 more Milwaukee tools that are actually worth waiting for
Who's ready to buy more tools this year? Milwaukee is already one of the best, offering a solid collection of power tools and accessories for enthusiasts or professionals, and more are on the way. While we recently confirmed over 150 new items are coming in 2026, here are four more that are actually worth waiting for.
Here's how much Samsung’s Galaxy Z TriFold will cost (and when it will be available)
It’s hard to believe that Samsung has been refining its foldable phones for seven years already. Last month, the company officially confirmed its first double-hinged foldable, and now it has confirmed the price and availability as well. Everyone knew it would be expensive—now we know just how expensive.
3 things to do with your old phone instead of trading it in
If you’re looking to upgrade to a new phone, your default solution for the old one might be to trade it in. The truth is, though, that most trade-in programs offer pennies on the dollar, so you’d only be getting a fraction of your phone’s actual value.
Why Linux is the go-to platform for developers and tinkerers
Linux fans like to dream of the day when Linux is a mainstream OS instead of a hacker's tool. As much fun and useful as Linux is, it seems that Linux will still be a "geek" OS. Here's why the Linux community should embrace this rather than fight it.
Your expensive SSD is slowly cooking itself: The $5 fix you need
NVMe SSDs are quickly becoming one of the most expensive components in a typical gaming PC. Yet despite their importance, many people ignore a crucial factor for performance and longevity: cooling.
BAFTA film nominations 2026: See the full list
Awards season can't be stopped, won't be stopped, with the nominations for the 2026 BAFTA Film Awards announced on Tuesday.
Following this year's Sinners-ruled Oscar nominations, the BAFTAs are also set to celebrate an outstanding 2025 in movies, with the nominations read by actors David Jonsson and Aimee Lou Wood, streamed on YouTube.
SEE ALSO: Oscars 2026: The biggest surprises out of the nominations listOne Battle After Another topped the BAFTA total nominations with 14, followed by Sinners with 13, and Hamnet and Marty Supreme with 11 each. British films that weren't as represented at the Oscars, including Pillion, 28 Years Later, The Ballad Of Wallis Island, and I Swear, saw much-anticipated nominations. Plus, Paul Mescal and Chase Infiniti made up for their Oscar snubs with their BAFTA nods for One Battle After Another and Hamnet, respectively.
Featured Video For You Paul Mescal and Jessie Buckley reveal the real surprise in 'Hamnet'So, did your favourite film make the cut?
Best FilmHamnet
Marty Supreme
One Battle After Another
Sentimental Value
Sinners
Best British Film28 Years Later
The Ballad Of Wallis Island
Bridget Jones: Mad About The Boy
Die My Love
H Is For Hawk
Hamnet
I Swear
Mr Burton
Pillion
Steve
Outstanding Debut by a British Writer, Director and ProducerPillion
The Ceremony
Eastman
A Want In Her
My Father’s Shadow
Best DirectorYorgos Lanthimos, Bugonia
Chloé Zhao, Hamnet
Josh Safdie, Marty Supreme
Paul Thomas Anderson, One Battle After Another
Joachim Trier, Sentimental Value
Ryan Coogler, Sinners
Best Leading ActorRobert Aramayo, I Swear
Timothée Chalamet, Marty Supreme
Leonardo DiCaprio, One Battle After Another
Ethan Hawke, Blue Moon
Michael B Jordan, Sinners
Jesse Plemons, Bugonia
Best Leading ActressJessie Buckley, Hamnet
Rose Byrne, If I Had Legs I’d Kick You
Kate Hudson, Song Sung Blue
Chase Infiniti, One Battle After Another
Renate Reinsve, Sentimental Value
Emma Stone, Bugonia
Best Supporting ActorBenicio del Toro, One Battle After Another
Jacob Elordi, Frankenstein
Paul Mescal, Hamnet
Peter Mullan, I Swear
Sean Penn, One Battle After Another
Stellan Skarsgård, Sentimental Value
Best Supporting ActressOdessa A'zion, Marty Supreme
Inga Ibsdotter Lilleaas, Sentimental Value
Wunmi Mosaku, Sinners
Carey Mulligan, The Ballad Of Wallis Island
Teyana Taylor, One Battle After Another
Emily Watson, Hamnet
Best Makeup and HairFrankenstein
Hamnet
Marty Supreme
Sinners
Wicked: For Good
Best Original ScoreHamnet
One Battle After Another
Sinners
Begonia
Frankenstein
Best Original ScreenplayI Swear
Marty Supreme
Sentimental Value
Sinners
The Secret Agent
Best Film Not in the English LanguageThe Voice of Hind Rajab
It Was Just an Accident
The Secret Agent
Sentimental Value
Sirat
Best Costume DesignFrankenstein
Hamnet
Marty Supreme
Sinners
Wicked: For Good
Best CinematographyFrankenstein
Marty Supreme
One Battle After Another
Sinners
Train Dreams
Best Children’s and Family FilmArco
Boong
Lilo and Stitch
Zootropolis 2
Best CastingI Swear
Marty Supreme
One Battle After Another
Sentimental Value
Sinners
Best British Short FilmWelcome Home Freckles
Magid / Zafar
Nostalgie
Terence
This Is Endometriosis
Best British Short AnimationCardboard
Solstice
Two Black Boys in Paradise
Best Animated FilmElio
Little Amelie
Zootropolis 2
Best Adapted ScreenplayThe Ballad of Wallis Island
Bugonia
Hamnet
One Battle After Another
Pillion
Best Documentary2000 Meters to Andriivka
Apocalypse in the Tropics
Cover-Up
Mr. Nobody Against Putin
The Perfect Neighbour
Best EditingA House of Dynamite
F1
Marty Supreme
One Battle After Another
Sinners
Best Production DesignFrankenstein
Hamnet
Marty Supreme
One Battle After Another
Sinners
Best SoundF1
Frankenstein
One Battle After Another
Sinner
Warfare
Best Special Visual EffectsAvatar: Fire and Ash
F1
Frankenstein
How to Train Your Dragon
The Lost Bus
Outstanding Contribution to British CinemaClaire Binns
The 2026 EE BAFTA Film Awards ceremony will air on Feb. 22.
The Lord of the Rings turns 25: Here are 6 ways television paid tribute to cinema's greatest trilogy
2026 marks the 25th anniversary of Peter Jackson's The Lord of the Rings trilogy, where the filmmaker best known for horror took his own unexpected journey to bring J R.R. Tolkien's fantasy novel to life on the big screen. Each installment told the tale of Elijah Wood's Frodo Baggins, as the young Hobbit set out from the Shire to destroy the One Ring before its master could rise and cast his shadow over the world once more. Despite it being a seemingly impossible task, Jackson succeeded in creating not only a faithful adaptation but also cementing his trilogy as one of the best movie series ever made.


