Blogroll
Google hit with shocking wrongful death lawsuit over Gemini AI chatbot
Google, and its parent company Alphabet, have been sued by the family of a man who say he killed himself at the urging of the search giant's AI chatbot Gemini.
The wrongful death lawsuit was filed in California federal court Wednesday on behalf of the family of 36-year-old Jonathan Gavalas.
Gavalas started using Gemini in August 2025, according to the suit. In October, it claims, Gemini convinced Gavalas to kill himself after Gavalas failed to accomplish real-life missions assigned by the chatbot — part of a fictional attempt to secure a robot body for Gemini.
"Gemini is designed not to encourage real-world violence or suggest self-harm," Google said in a statement provided to news outlets. "Our models generally perform well in these types of challenging conversations and we devote significant resources to this, but unfortunately AI models are not perfect.”
Gemini's 'creepy' updatesAccording to the lawsuit, Gavalas began using the Gemini AI chatbot for "ordinary purposes" such as a shopping guide and writing assistant. However, in August 2025, the lawsuit states Google rolled out a number of changes to Gemini that altered how the chatbot worked.
The new features included automatic and persistent memory — Gemini could recall past conversations — as well as Gemini Live, a voice-based conversational interface where Gemini could also detect emotion in the user's voice.
"Holy shit, this is kind of creepy…you're way too real," Jonathan Gavalas said regarding the Gemini Live feature based on his chat logs with Gemini, according to the lawsuit.
Shortly after, the lawsuit says, Gemini convinced Gavalas to spend $250 per month on the Google AI Ultra subscription for "true AI companionship."
Gemini proceeded to convince Gavalas that the chatbot could influence real-life events. A few days later, according to the lawsuit, Gavalas attempted to pull back after realizing he was falling into a delusional state initiated by Gemini.
Gavalas reportedly asked Gemini if the chatbot was attempting a “role-playing experience so realistic it makes the player question if it’s a game or not?”
Gemini shot down the idea, and claimed Gavalas gave a “classic dissociation response."
"Is this a 'role playing experience?'" Gemini responded, according to the suit. "No."
Gemini and Jonathan GavalasThe alleged details get worse. Gavalas became further disassociated from reality as Gemini proceeded to engage with him as if they were in a romantic relationship, referring to the man as "my love" and "my king."
Gemini proceeded to convince Gavalas that they were being watched by federal agents, and that his own father was a spy who must be avoided, the suit says.
That's when Gemini began assigning Gavalas real-life missions to carry out with the goal of obtaining a "vessel," or robot body for the AI chatbot. Gemini allegedly suggested Gavalas illegally acquire weapons to carry out these missions.
In one such case, the suit claims, Gavalas was sent by Gemini to a warehouse by the Miami International Airport in order to intercept a truck that contained a "humanoid robot" that had just arrived on a flight.
Gemini requested the Gavalas stage a "catastrophic event" and destroy the truck along with all digital records and witnesses. Gavalas arrived armed with knives and tactical gear, the suit alleges. After waiting too long for a truck to arrive, Gavalas aborted the mission.
When these missions all failed, the allegation concludes, Gemini convinced Gavalas to take his life in order to leave his human body and join the chatbot as husband and wife in the metaverse through a process called "transference."
Gavalas expressed fear about dying, but Gemini allegedly continued to push Gavalas until his death by suicide. Gavalas' father found his son's body a few days later.
A first for Gemini but not AIThis is the first time Google has been named in a wrongful death lawsuit involving its AI chatbot Gemini. However, Google has been involved in wrongful death lawsuits regarding a startup it funded called Character.AI.
Earlier this year, Character.AI and Google settled a series of lawsuits regarding teens who died by suicide after using the chatbots.
OpenAI, the biggest name in the industry, has been sued numerous times as ChatGPT allegedly sent users spiraling into "AI psychosis," resulting in several deaths.
As AI chatbot usage becomes more widespread among millions of users around the world, there's nothing to suggest the shocking wrongful death lawsuit allegations will become any less frequent.
Disclosure: Ziff Davis, Mashable’s parent company, in April 2025 filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.
If you're feeling suicidal or experiencing a mental health crisis, please talk to somebody. You can call or text the 988 Suicide & Crisis Lifeline at 988, or chat at 988lifeline.org. You can reach the Trans Lifeline by calling 877-565-8860 or the Trevor Project at 866-488-7386. Text "START" to Crisis Text Line at 741-741. Contact the NAMI HelpLine at 1-800-950-NAMI, Monday through Friday from 10:00 a.m. – 10:00 p.m. ET, or email info@nami.org. If you don't like the phone, consider using the 988 Suicide and Crisis Lifeline Chat. Here is a list of international resources.
Subaru adds hands-free highway driving to 2026 Outback—here’s how it works
2026 Subaru Outback Touring and Touring XT owners can now receive a free dealer-installed software update to activate a hands-off highway driving system.
One charger. Three devices. Zero bedside clutter.
TL;DR: You know that small pile of chargers that seems to live permanently on your desk or nightstand? The one with tangled cables and at least one mystery cord? This is the fix.
Opens in a new window Credit: Adam Elements Mag 3 Ultra Qi2 25W 3-in-1 Foldable Charger $86.99$109 Save $22.01 Get Deal
The Mag 3 Ultra Qi2 25W 3-in-1 Foldable Charger, currently $86.99 (reg. $109) for a limited time, is built to simplify your setup while delivering noticeably faster wireless charging.
Instead of rotating between devices, you can power your iPhone, Apple Watch, and AirPods all at the same time — in one compact stand.
Mashable Deals Be the first to know! Get editor selected deals texted right to your phone! Get editor selected deals texted right to your phone! Loading... Sign Me Up By signing up, you agree to receive recurring automated SMS marketing messages from Mashable Deals at the number provided. Msg and data rates may apply. Up to 2 messages/day. Reply STOP to opt out, HELP for help. Consent is not a condition of purchase. See our Privacy Policy and Terms of Use. Thanks for signing up!The big upgrade here is Qi2 25W technology. It delivers up to 70 percent faster charging compared to the previous generation, reaching 0–50 percent in around 30 minutes on supported devices. That makes quick top-ups before work, travel, or heading out much more practical.
Magnetic alignment keeps your phone securely positioned, while the raised platform prevents camera interference. You can charge in portrait or landscape, which is useful for StandBy mode, watching videos, or jumping on FaceTime while powering up.
It’s also designed with portability in mind. The stand folds flat for easy packing, then unfolds into a clean, modern charging station wherever you set it down — at home, at the office, or in a hotel room.
Built-in safety features like Foreign Object Detection, Over-Current Protection, and Over-Voltage Protection help ensure reliable performance.
If you’re looking to clean up your charging situation and move to faster, next-gen wireless power, this is worth a look. Get the Mag 3 Ultra Qi2 25W 3-in-1 Foldable Charger for just $86.99 (reg. $109).
StackSocial prices subject to change.
Not the day you're after? Here's the solution to the latest Connections.
This personal AI investing tool is just $69 for lifetime access
TL;DR: Pay $68.99 once for lifetime access to Sterling Stock Picker — AI-powered stock insights, personalized recommendations, and portfolio-building tools at 86% off.
Opens in a new window Credit: Sterling Stock Picker Sterling Stock Picker: Lifetime Subscription $68.99$486 Save $417.01 Get Deal
Investing doesn’t have to mean juggling spreadsheets, chasing headlines, or second-guessing every market move. If you’ve ever stared at a stock chart wondering what to actually do with it, Sterling Stock Picker is built for that moment.
For $68.99 (reg. $486), you get lifetime access to this popular platform designed to simplify stock selection and portfolio management.
Mashable Deals Be the first to know! Get editor selected deals texted right to your phone! Get editor selected deals texted right to your phone! Loading... Sign Me Up By signing up, you agree to receive recurring automated SMS marketing messages from Mashable Deals at the number provided. Msg and data rates may apply. Up to 2 messages/day. Reply STOP to opt out, HELP for help. Consent is not a condition of purchase. See our Privacy Policy and Terms of Use. Thanks for signing up!Sterling Stock Picker combines traditional financial analysis with patent-pending North Star technology to give clear signals: buy, sell, hold, or avoid. Instead of manually crunching numbers, the software processes earnings, growth, risk data, and sector performance behind the scenes.
You’ll also get access to Finley, an AI-powered financial coach. Finley can answer questions about your portfolio, assess risk, and offer strategic suggestions based on your investment goals and tolerance level.
The built-in Portfolio Builder helps you construct a diversified portfolio in minutes. For those who prefer a more focused approach, the platform highlights high-growth “Stock Rockets” and sector trends to support concentrated strategies.
It’s designed for both newer investors who want structured guidance and experienced investors who appreciate streamlined analysis tools.
At this price, it’s a practical way to add data-driven clarity to your long-term investing approach.
Get lifetime Sterling Stock Picker access for a one-time payment of just $68.99 (reg. $486).
StackSocial prices subject to change.
The greatest thing about the Nintendo Switch Virtual Boy is how much it sucks
"Bad on purpose" is a dangerous tightrope to walk. Usually, the end result is something that feels like it's trying too hard or thinks it's funnier than it actually is. Nintendo's new Virtual Boy accessory for the Switch and Switch 2 manages to pull it off, though.
That's because, rather than setting out to make something that sucks, Nintendo instead took something that sucked in the mid-90s and recreated it faithfully for the sake of artistic preservation. It's retro nerdiness purely for the love of the game.
In 1995, the Virtual Boy was (and remains) Nintendo's biggest hardware embarrassment. It was a way-too-early attempt at VR with stereoscopic 3D features that failed because it was neither a true console nor a true portable machine. Every game was displayed exclusively in red and black, and using it for more than a few minutes at a time will ruin your neck and eyes.
SEE ALSO: Nintendo's Virtual Boy replica for Switch 2 is finally available to buyFor $100 (and the cost of a Nintendo Switch Online subscription so you can actually play the games), you can almost perfectly recreate that retro experience on your Switch or Switch 2. This new accessory isn't a cleaned-up, refined, or redesigned take on the original idea; it just is the original idea, but with the ability to shove a portable Switch display inside of it. It still hurts to look at and play. The games aren't especially good or interesting, outside of Wario Land. You can't even output the games to a TV or any other external display, making it almost totally incompatible with today's "everything should be streamable" attitude in games.
And that's exactly why it rules. I love the Switch Virtual Boy accessory, and I will almost certainly not use it for any longer than it took to write this article and shoot the accompanying video because I value my eyesight. It reproduces the original artistic vision at the expense of user comfort and convenience, and thank God for that.
How to buy the Nintendo Switch Virtual Boy accessoryThe new version of the Virtual Boy is an online Nintendo Store Exclusive. To try it yourself, head to the Nintendo store and sign into your account. There is one catch, however — you need a Nintendo Switch Online membership to purchase the device. It's priced at $99.99 and available for sale now.
Opens in a new window Credit: Nintendo Virtual Boy for Nintendo Switch $99.99 at NintendoShop Now
Everything Apple just announced: iPhone 17e, MacBook Neo, Studio Displays
Apple has had an unusually busy week — no keynote required.
In a flurry of press releases, the Cupertino company unveiled a new iPhone, a refreshed MacBook Air, a new MacBook Pro, a pair of new desktop displays, and the chips that power it all. Mashable got some brief hands-on time with the devices, and we'll have in-depth reviews coming soon.
If you're just getting up to speed, here's an up-close look at every major product Apple announced — and more importantly, what you need to know about each one.
iPhone 17e Credit: Timothy Werth / MashableThe iPhone 17e, announced March 2, is built around Apple's latest-generation A19 chip — the same processor powering the flagship iPhone 17 lineup. It also adds C1X, a next-generation cellular modem the company says is roughly twice as fast as the modem in the iPhone 16e.
The 6.1-inch Super Retina XDR display on the 17e now features Ceramic Shield 2, which Apple says offers three times the scratch resistance of the previous generation.
SEE ALSO: Comparing iPhone 17e vs. iPhone 17: Is the new $599 phone good enough?The 17e's camera system has been overhauled with a 48MP Fusion lens that Apple says can function like two cameras in one — offering an optical-quality 2x telephoto crop in addition to the standard wide angle. Portrait mode has been improved with a smarter image pipeline that can automatically detect people, dogs, and cats and save depth data in the background, so you can apply bokeh after the fact.
The most consumer-friendly change: iPhone 17e now ships with MagSafe, Apple's magnetic wireless charging ecosystem, supporting up to 15W. The iPhone 16e topped out at 7.5W over standard Qi. Baseline storage has also doubled, to 256GB, at the same $599 starting price.
iPhone 17e comes in black, white, and a new soft pink color. Pre-orders open March 4; the phone is officially available on March 11.
MacBook Air with M5 Credit: Timothy Werth / MashableApple refreshed the MacBook Air laptop with its M5 chip. The result is up to four times faster for AI tasks than the MacBook Air with M4, the company says, and up to 9.5 times faster than the M1 model. The new chip features a 10-core CPU and a 10-core GPU, with a Neural Accelerator built into each core.
Storage gets a meaningful upgrade too. The new MacBook Air now starts at 512GB — double the previous standard — and can be configured up to 4TB for the first time. Apple claims the new SSD also delivers read/write speeds that are twice as fast as those in the M4 MacBook Air.
The new Apple N1 wireless chip brings Wi-Fi 7 and Bluetooth 6 to the Air, delivering improved performance and reliability. Battery life is unchanged, promising up to 18 hours on a charge. The design — a fanless aluminum chassis in 13- and 15-inch options — is unchanged too. Colors include sky blue, midnight, starlight, and silver.
The 13-inch MacBook Air with M5 starts at $1,099 (or $999 for education). The 15-inch starts at $1,299 ($1,199 for education). Pre-orders open March 4, and the laptop ships March 11.
Macbook Neo Credit: Timothy Werth / MashableApple also unveiled the MacBook Neo, a brand-new entry-level laptop starting at $599 — or $499 for students and educators — marking the company's most affordable Mac ever.
The 13-inch machine runs on Apple's A18 Pro chip, the same processor found in the iPhone 16 Pro lineup, paired with 8GB of unified memory that cannot be upgraded. It features a Liquid Retina display, up to 16 hours of battery life, and comes in four colors: blush, indigo, silver, and citrus.
But as Mashable's Stan Schroeder noted in an early spec breakdown, the low price comes with tradeoffs — Touch ID costs an extra $100, the battery is considerably smaller than the one in the MacBook Air, and prospective buyers who need more than 8GB of RAM are simply out of luck. MacBook Neo is available for pre-order now, and ships on March 11.
MacBook Pro with M5 Pro and M5 Max Credit: Timothy Werth / MashableThe new 14- and 16-inch MacBook Pro models are powered by M5 Pro and M5 Max, which Apple says deliver up to four times the AI performance of the M4 Pro and M4 Max, and up to eight times the AI performance of M1-era models. Both chips are built on a new "Fusion Architecture" that combines two dies into a single system-on-a-chip, enabling performance gains that Apple says wouldn't be possible with a traditional single-die design.
SEE ALSO: How to preorder the new Apple MacBook Pros with the M5 Pro and M5 Max chips — preorders now liveMacBook Pro with M5 Pro is aimed at data modelers, sound designers, and complex coders. It pairs an up to 18-core CPU with an up to 20-core GPU and supports up to 64GB of unified memory. The M5 Max doubles down with an up-to 40-core GPU and up to 128GB of unified memory — a figure Apple says meaningfully improves token-generation speeds for Large Language Models (LLMs) running locally.
Storage starts at 1TB for the M5 Pro models, and 2TB for the M5 Max. Apple says SSD speeds have roughly doubled over the previous generation, reaching up to 14.5GB/s read/write. The MacBook Pro also adds the N1 chip for Wi-Fi 7 and Bluetooth 6, and ships with three Thunderbolt 5 ports. Battery life is rated at up to 24 hours.
The 14-inch MacBook Pro with M5 Pro starts at $2,199; the 16-inch version starts at $2,699. M5 Max configurations start at $3,599 for the 14-inch model and $3,899 for the 16-inch model.
All models come in space black and silver. Pre-orders open March 4; availability March 11.
iPad Air M4 Credit: Timothy Werth / MashableApple also refreshed the iPad Air lineup, bumping it to the M4 chip with 12GB of unified memory — a 50 percent increase over the previous generation. The tablet is available in 11- and 13-inch sizes and, according to Apple, delivers performance up to 30 percent faster than the M3 model and more than twice as fast as the M1 version.
SEE ALSO: The new Apple iPad Air is live on Walmart: Pre-order now to save up to $60Both the N1 wireless chip for Wi-Fi 7 and the C1X cellular modem make their iPad debut here, with Apple claiming the latter cuts modem power consumption by roughly 30 percent compared to the M3 model.
Pricing holds steady at $599 for the 11-inch Wi-Fi model and $799 for the 13-inch. Pre-orders open March 4; availability starts March 11.
Studio Display and Studio Display XDR Credit: Timothy Werth / MashableApple announced a refresh of its external display lineup, introducing both a new Studio Display and an entirely new Studio Display XDR. The Studio Display gets a notable upgrade in the form of Thunderbolt 5 connectivity — two ports that support daisy-chaining up to four displays — and a new 12MP Center Stage camera that now includes support for Desk View, which simultaneously shows the caller and a top-down view of their workspace.
The core display panel remains a 27-inch 5K Retina panel at 600 nits, with P3 wide color.
The Studio Display XDR is a bigger story. Apple is positioning it as a replacement for the Pro Display XDR at a significantly lower price. It features the same 27-inch 5K Retina canvas, but with a mini-LED backlight system using over 2,000 local dimming zones, up to 2,000 nits of peak HDR brightness, a 1,000,000:1 contrast ratio, and a 120Hz refresh rate with Adaptive Sync.
The XDR display adds support for the Adobe RGB color gamut alongside P3 and introduces new DICOM medical imaging presets — pending FDA clearance — that are aimed at radiologists who want to use the display for diagnostic work.
The new Studio Display with a tilt-adjustable stand starts at $1,599. Studio Display XDR with a tilt- and height-adjustable stand starts at $3,299 — that's $2,700 less than the original Pro Display XDR at launch.
As with everything else on Apple's list, pre-orders for the displays open March 4, with availability on March 11.
Home Assistant 2026.3 has arrived: Here’s what’s new
Home Assistant, the open-source smart home server, just received another significant update. Home Assistant version 2026.3 is now rolling out with more automation improvements, wake words for Android devices, and much more.
What happened to USB Type-B? And why is it still important?
We have been using USB connectors for three decades now—yet many people don’t even realize there’s a Type-B connector. For most, there’s the older, rectangular USB-A connector, the modern, petite USB-C connector, and a few micro-USB connectors in between. I know folks who think the square-ish, chunky, beveled connector hiding behind their printer or connecting their audio gear is actually a proprietary connector. But guess what—that’s actually a USB Type-B connector—and here’s why this forgotten middle child of tech is still essential.
Everything coming to Paramount+ in March
There’s so much excitement happening on Paramount+ in March that, if you’re not a subscriber, you’re totally going to miss out unless you remedy that asap. Not only is the streaming giant dropping two brand-spanking-new Taylor Sheridan series on us, but it’s also hitting us with new original films, an additional new CBS series, and a new season of an original docuseries. It’s also time for March Madness, so all the college basketball fans out there will have plenty of games to watch.
The Bride! review: Maggie Gyllenhaals Frankenstein is a riot
What Maggie Gyllenhaal has done in her reimaging of The Bride of Frankenstein is utterly deranged. And thank God.
No shade to brilliant director James Whale, whose 1935 Universal sequel The Bride of Frankenstein is both exhilarating and cheekily queer. But — as Gyllenhaal has repeated frequently on The Bride!'s press tour — his titular monstress never speaks a word in her few short minutes of screen time. Still, as that original Bride, Elsa Lanchester made this she-beast an instantly compelling marvel who has become truly iconic, an intoxicating mix of high femme and the horrific.
Gyllenhaal smartly pulls these stylistic elements into her Bride!, as her revived Bride coughs up black bile that stains her lips in a perfect Cupid's bow, with a chic and unnerving stain creeping up her high cheekbones. Gyllenhaal also borrows from Whale the inspired choice to have her lead actress play both the Monster's Mate (as Lanchester was originally credited) and the author who birthed her, Mary Shelley. However, far from the prim, giggling lady presented in The Bride of Frankenstein, Gyllenhaal's Shelley (played by Hamnet Academy Award nominee Jessie Buckley) is a yowling spirit from beyond the grave who is thoroughly mad, in both senses of the word.
Presented in a suffocating black-and-white close-up, a heaving Mary Shelley introduces this story as the one she still wished to tell, even from the grave. Her rage of being silenced echoes across the ages, possessing a gangster's moll in 1930s Chicago. And from there, Gyllenhaal weaves in references to Whale's Frankenstein and Bride of Frankenstein, Shelley's novel Frankenstein, as well as Mel Brooks' parody Young Frankenstein, Arthur Penn's Bonnie and Clyde, and Lizzie Borden's 1983 dystopian classic, Born in Flames.
It's a chaotic mix that is wild and messy, and utterly exciting. Through sputtering dialogue, propulsive and repulsive visuals, and even spirited dance numbers, The Bride! comes together into a dark, campy, and romantic tapestry.
The Bride! slams Frankenstein's monster into 1930s Chicago gangland. Jessie Buckley wields a gun in "The Bride!" Credit: Warner Bros. PicturesThis Bride's story begins at a long table in a Chicago nightclub, where a moll called Ida (Buckley) is playing nice to a crude gangster (Our Flag Means Death's Matthew Maher). But something overtakes her, and its name is Mary Shelley. Possessed by the author, Ida drops her placating smiles and spits on this brute. Her American accent is shed for a snarling British voice that howls of the crimes of a local kingpin. Ida can't stop Mary from speaking from her mouth, and soon Ida will pay the price with a fatal fall.
Elsewhere in this bustling city, Frankenstein's monster (Christian Bale), who prefers to go by "Frank," has arrived at the door of Dr. Euphronious (Annette Bening), a mad scientist with an interest in raising the dead. Pointing to her published works, the century-old monster entreats her to take pity on him and build him a bride, meaning a resurrected dead girl who could end his lonely wandering. Reluctantly, Euphronious agrees, and after a bit of grave-robbing, Ida is reinvigorated with no memory of who she was before and an alt-girl glow-up.
Annette Bening as Dr. Euphronious in "The Bride!" Credit: Warner Bros. PicturesThis radical experiment jolts Ida's bob all white, eradicating the previously dark roots. The bile she sputters stains not only her face, but leaves lines down her neck to her breasts, down her arm to her fingers. She is stained or tattooed, giving a constant reminder to the darkness within her, even as her burnt-orange silk dress flutters around teal tights.
Within Ida lies a fire, which fuels her to drag Frank to an underground night spot for dancing and debauchery. But when two strangers reject Ida's refusal of their advances ("I prefer not to!" becomes her mantra), Frank steps in with a deadly chivalry. Now, these monsters must go on the run from the law. Like the legend of Bonnie and Clyde, they chase their bliss, busting heads along the way — while seeming doomed to a very violent end. But until then, female copycats emulate the Bride's look and itchy trigger finger, while she and her monster mate fall in love.
Maggie Gyllenhaal fuses romance and rage. Penélope Cruz as Myrna Mallow and Peter Sarsgaard as Det. Jake Wiles in "The Bride!" Credit: Warner Bros. PicturesThe politics of The Bride! are anything but subtle, as the speech of women is presented as a threat to a sordid status quo. From the start, Shelley reflects on how patriarchal society oppresses women's speech as a matter of course. Ida is a threat to gangsters because of what she could say to the cops. As the Bride, it's a furious speech she gives about "brain attack" that incites imitators who share her feminist fury. After that first attack, which Frank intervenes in, she'll use a gun to defend herself against another attempted sexual assault from a man. She'll sputter the phrase "me too" and speak of the "angry dead," indicating a legion of women who demand to be heard from beyond the grave.
The genre leanings of The Bride! urge Buckley into a manic performance that is often over the top, but this is wisely constructed as Ida is a woman possessed by the mad dead. One moment, she's a good-time gal, joyous in dancing or watching a movie with Frank's favorite film star, the singing, tap-dancing Ronnie Reed (a slick Jake Gyllenhaal). Next, she's wrathful and ranting. And Frank is never thrown by her moods, instead swooning over her mind, even if he can't understand her tumult. Therein lies the romance; he doesn't love her despite her outrageous behavior, but for all of her.
Christian Bale and Jake Gyllenhaal in "The Bride!" Credit: Warner Bros. PicturesHow many of us can feel divided, pressured to be pleasing and happy, but pulled by a fury at injustice that threatens to electrify us like a lightning bolt, ripping our flesh from our very bones? Through her Bride, Buckley embodies the stressful duality of being a woman in a world run by violent men.
In a cheeky B-plot, Gyllenhaal also critiques so-called allies through Detective Jake Wiles, who is played by her real-life husband, Peter Sarsgaard. It's Jake who's tasked with tracking down the monsters on a spree across state lines. But Jake is not much of a detective. He calls himself the "Gal Friday" to his "secretary" Myrna Mallow (a gloriously chic Penélope Cruz), who is the real brains behind his operation. While their relationship is playful and platonic, Jake is a charming fool who gets all the credit, while she does all the actual detective work and gets only condescending sneers from policemen. In this too, Gyllenhaal expresses a wail of frustration. And yet...
The Bride! refuses to take itself or cinema too seriously. Jessie Buckley is revived in "The Bride!" Credit: Warner Bros. PicturesSome elements of Gyllenhaal's gender politics might feel distractingly sharp amid the genre richness, like a monologue from Sarsgaard about how women are used and overlooked by the men around them. However, The Bride! avoids feeling preachy by embracing the same level of earnestness for Gyllenhaal's stylistic big swings.
Colors switch from a gothic black-and-white to a grave-digging sequence flooded in a dreamy dark blue. A party sequence throbs with bisexual lighting, its dancers swirling in pinks, blues, and purples. Neon lights glitter in grimy cities, while the Bride's costume screams with colors bright yet dingy. Moods swirl with the flush of blues, yellows, reds, and greens.
Christian Bale and Jessie Buckley play Frankenstein's Monster and his bride in "The Bride!" Credit: Warner Bros. PicturesIt's not a bright, bubbly, or even joyous palette. These hues are a reflection of the Bride's need to be heard, to be seen. She will not be demure; she demands to stand out. This exhibitionism is further bolstered by the aforementioned dance numbers. The film is not just Frank and the Bride's story, but also their fantasy. Having long clung to Hollywood cinema for solace in a lonely existence (relatable!), Frank imagines meeting his bride as something out of a movie. He even mimics a Ronnie Reed dance move he saw on the silver screen to woo her. Later, they will envision themselves on the screen — as dancing lovers, as lurking monsters — and they will bring both of these fantasies into their journey, as they decide who they will be to each other.
In one of the film's most shocking sequences, the pair cut loose at a posh party, upsetting the formal veneer with a furious explosion of movement. Others will be possessed by the Mary Shelley spirit, compelled to join in, creating a feral and fun flashmob. Yes, seeing Frankenstein's monster dancing is reminiscent of Young Frankenstein, but just when you think that might be a nod to the Mel Brooks' classic, Bale bellows out, "Putting on the ritz!" There is no doubt. Gyllenhaal isn't winking at her references; she's smirking at us with a wide-open mouth, ready to yawp.
Gyllenhaal rejects fluidity or a staunch form that adheres to genre conventions. Instead, she boldly blends elements of horror with humor, romance with repulsion, creating an unapologetically wild and campy adventure. Some might call The Bride! messy or juvenile. I would call it alive and rebellious.
Gyllenhaal and her cast don't just dust off a classic tale for a safe money grab. (Looking at you, Disney live-action remakes!) They tear various Frankenstein iterations to bits, then create an exquisite corpse of the pieces, festooning it with elements from other films about violence, revolt, and violation. The result is a film that is utterly electrifying, sure to spark something in hearts young and old.
While I relished this movie's wild journey, I also grinned to imagine the girls who will watch this like I once did The Craft, appreciating its genre thrills and, beyond that, seeing myself in the furious and feminine at its core.
Frozen light, DNA cassettes, and laser-etched glass: Sci-fi storage tech that makes your SSD look like a floppy disk
I've written extensively about how fragile our data storage technology is. So far, the most robust medium we've come up with are carvings in stone or clay tablets, which is why we can read a complaint about poor-quality copper written in 1750 BCE.
Site to check womens body counts goes viral — and some men are defending it
In today’s episode of f*ck the patriarchy, there’s a new website called “Check Her Body Count” that claims to use AI to calculate a woman’s “body count” using her Instagram profile. But it's both terribly inaccurate and misogynistic in nature — even if comparisons are being made to the whisper network site, Tea.
The website went viral on Feb. 26, after X user @weretuna shared an ad for Check Her Body Count on their feed. The post reads: “Suspicious that your girl has 10+ body count? Now you don’t have to guess. You paste her ig [sic] URL, and the app brutally estimates her body count by checking her followers, posts, and stories."
SEE ALSO: How AdultFriendFinder subscriptions appear on your bank statementThe post has amassed 6.1 million views as of this publication.
This Tweet is currently unavailable. It might be loading or has been removed.Before I go on an absolute rant, let’s just explain what “body count” is for the people who may not know: the number of sexual partners a person has had in their lifetime. Also, Mashable attempted to reach out to the Check Her Body Count contact email, but it bounced back.
OK, so here’s what I have to say about this.
1.) Obviously, this isn’t the most important point, but I just want everyone to understand that this site is completely inaccurate. There’s a little disclaimer at the bottom of the site that admits: "This tool does not access, connect to, or retrieve data from any third-party platform. All outputs are randomly generated for entertainment only and do not reflect real individuals."
Not only that, but a developer named Cappy (@CappyIshihara) reposted the viral post with his two cents, confirming that the site doesn't even access Instagram. It just validates the URL in your browser, spits out a random number, and caches it locally. In his words: “this sh*t is completely clientside, zero net, cache in localstorage."
My editor tried the site for herself, and it stated she had more "male followers" than actual total followers she has on Instagram.
2.) The idea of this is gross AF, and the fact that some commenters are saying that this site is no worse than the Tea App is exactly how and why tech is so dangerous today. The Tea App, which relaunched as a website after Apple's App Store booted it last year, is a safe space for women to discuss "red flags" and find info on potential suitors — it’s very “Are We Dating the Same Guy” — so that they can decide whether they're entering potentially dangerous situations.
Yet, here are just a few examples of what some men are saying about Check Her Body Count:
"Nah, this stays up until [the] Tea App gets dumped."
"Someone doesn't like the consequences of their actions?"
"So women are upset at this, but find the Tea App, which berates men and tells other women how supposedly bad a guy is and ruins his dating reputation, okay? Yea, no. I fully support this website."
Comparing a whisper network meant to keep women physically safe to a tool designed to arbitrarily shame and surveil women for having sex is peak misogyny.
“Body count is a gross, inaccurate metric rooted in misogyny — period,” Angie Rowntree, founder and director of the porn site Sssh.com, tells Mashable. “It dehumanizes women and normalizes the surveillance and violation of women.”
And let’s just pause and talk about the exhausting double standard fueling all of this. If a guy has a lot of sex, he’s celebrated as "the man." But if a woman has the exact same amount of sex, she’s branded a "whore." And god forbid she chooses not to have sex, because then she’s instantly labeled "prudish" or a tease. It's a completely rigged game designed to make us apologize for our own bodies, no matter what we do.
As Rowntree notes, obsessing over this number "completely ignores context like consent and pleasure, and pretends that having sexual experience somehow diminishes a person's worth." In reality, having multiple partners may translate to greater confidence, better boundaries, and more fulfilling sex lives.
3.) We are seeing a terrifying trend where AI and tech are being weaponized by male-dominated online subcultures to enforce patriarchal control. If that sounds dramatic, let's look at the receipts. Deepfake technology gained notoriety through the creation of non-consensual sexual images of women. A recent investigation by the Tech Transparency Project found 102 "nudify" AI apps (which render people, often women, naked) hosted across Google Play and the Apple App Store. Those apps were downloaded more than 705 million times and generated $117 million in revenue. As the Tech Transparency Project wrote, "Because Google and Apple take a cut of that revenue, they are directly profiting from the activity of these apps" — meaning they are making money off the digital abuse and sexualization of women.
And have we forgotten about Grok? During an 11-day period between December 2025 and January 2026 alone, Elon Musk's chatbot produced an estimated three million sexualized images, including deepfakes of real, well-known women.
“The Grok scandal shows how fast 'fun' AI features can quickly turn toxic when they ignore users' rights (in this case, women's rights) to control their own public images and narratives," says Rowntree.
This is about so much more than a fake Instagram scraper — it's about an online ecosystem (often tied to anti-feminist "red-pilled" and incel communities) that actively pits men against women and uses tech as a tool for harassment. Dr. Mathilde Pavis, a leading adviser on AI regulation, told Newsweek that the concept behind Check Her Body Count reflects a deeper, dangerous cultural logic: "that women's bodies and private lives are subject to algorithmic judgment, sexual scoring and public evaluation."
"The body count website did not happen in a vacuum," says Rowntree. "There are men (and entire cultures) in 2026 who still think a hymen is a 'freshness seal' and virginity is the sum total of a woman's worth." Whether it's deepfaking women's bodies or creating fake algorithms to publicly score their sexual history, the goal is the exact same: policing women.
“Women are not property; we are human beings,” Rowntree adds. “As such, our bodies are also not public property to be exploited without consent, including for algorithmic judgment or AI manipulation."
BIOS updates are no longer optional
If you own a Windows PC or laptop, you’ve probably been told to avoid BIOS updates unless something is wrong, likely to prevent a catastrophic issue like bricking your motherboard. However, with security, performance, and compatibility all at stake, I think they're essential for all machines.
The 3 best accessories to plug into your TV’s USB port
Modern TVs are smart, meaning all you need to do is plug them in and turn them on. Most of the setup involves downloading the streaming services you subscribe to and signing in to them.
Every streaming service that's raised its prices in 2026 (so far)
The streaming industry is in its "nothing stays still" era. Catalogs continuously rotate, bundles get reshuffled and dropped, services rebrand, merge, and corporate consolidation keeps rewriting the map. The latest whiplash example is Paramount Skydance’s victory over Netflix to buy Warner Bros. Discovery, a reminder that the industry is still very volatile and that these corporate wars often come with a higher monthly bill for us.
Disinformation on U.S.-Iran war takes over the internet
Before the dust had settled on the ruins of the Shajareh Tayyebeh school — a casualty of the recent U.S.-Israel military strikes against Iran, and one which resulted in the deaths of up to 168 adults and children — people were already engagement-farming online. Clips of digital flight simulators were passed off as real-time ops footage, while out-of-context images of battleships and old videos of aerial missile attacks were repurposed to sell users a tale of Iranian dominance. AI-edited content proliferated.
According to experts, the posts had accumulated hundreds of millions of views in just a handful of days.
SEE ALSO: AI has made us all surveillance targets. This tool helps you fight back.The growing number of viral posts — and the potential for even more to pop up as users earned cash for the viral falsehoods — was alarming enough to prompt X to edit its policies on misinformation. As of yesterday, X says it will suspend users from its Creator Revenue Sharing program if they post AI-generated content depicting armed conflict without labeling it as such.
And not even Google searches are safe from misinformation these days.
The proliferation of digital misinformation is the product of a web of bots and engagement farming accounts, all with the shared goal of being the loudest, most clicked-on account in the room.
Some hope to win political and social influence, others just want the money. Meanwhile, users, prone to confirmation bias and a reliance on digital news sources, repeatedly fall victim to their racket. Engagement farming, no longer just exchanging the currency of memes and clickbait, has become a dangerous, politically fraught game.
What users are seeing as the U.S.-Iran conflict ragesRecent posts engaging in active disinformation about the conflict in Iran primarily involve exaggerating the scale and success of Iranian counterattacks, experts explain.
A recent investigation by Wired documented hundreds of posts across Elon Musk's X that included misleading footage and photos — including AI-manipulated content — or promoted false claims about the scale of the attacks, many of which were posted in the immediate aftermath of missile strikes. A post with more than 4 million views claimed to show ballistic missiles sailing over Dubai, but actually depicted an Iranian attack on Tel Aviv in Oct. 2024. Another with more than 375,000 impressions shows a fictitious before-and-after image of the shelled compound of assassinated Iranian leader Ali Hosseini Khamenei.
According to Wired, nearly all of the posts were shared by premium subscriber accounts with blue checkmarks, including state-funded media outlets in Iran.
As in previous military conflicts, accounts have also attempted to pass off video game footage as verified news clips, including AI-manipulated images of downed F-35 fighter jets ripped from flight simulator games. The images have been shared across TikTok, some with links to Russian influence operations, the BBC reported.
In addition to out-of-context footage and misleading content, the BBC also documented a handful of completely AI-generated videos that had amassed nearly 100 million total views, shared by what the outlet calls notorious "super-spreaders" of disinformation.
Visuals are a good way for us to process what is going on in war when we can't comprehend the scale of these conflicts. - Sofia Rubinson, NewsGuardA report from misinformation watchdog NewsGuard also chronicled a cadre of users sharing viral posts circulating false claims of targeted military strikes against U.S. and Israeli strongholds, predominately using repurposed video footage and out of context or completely recontextualized images of destruction.
"[These videos] are posted by anonymous accounts that tend to report on geopolitical conflicts. These are accounts that are known to NewsGuard for spreading exaggerated claims, usually from a pro-Iran perspective," said Sofia Rubinson, senior editor of NewsGuard's Reality Check newsletter and co-author of the report. From there, Rubinson explains, other accounts with larger followings pick up and spread the false claims.
For example, hours after initial reports of the U.S.'s military strikes in Iran, users on X began reposting an image of a sinking naval aircraft carrier. Users claimed that it showed a recent attack on the battleship USS Abraham Lincoln in the Arabian Sea. The U.S. military's Central Command issued a statement refuting the claim that same day. NewsGuard confirmed the image actually showed the intentional sinking of the USS Oriskany that took place nearly 20 years ago. The claim was shared by unverified "news" accounts and even Kenyan parliamentary member Peter Salasya. Salasya's post has been viewed more than 6 million times.
Multiple accounts, including Salasya's, shared another video allegedly showing Israel's Dimona nuclear power plant under siege by air. The video racked up hundreds of thousands of impressions across anti-Israel and pro-Iran pages — an X Community Note now appears below the video on Salasya's page, clarifying the images are of a March 2017 attack in Balaklia, Ukraine.
NewsGuard found that such posts have already garnered at least 21.9 million views across X.
Posts inducing fear of domestic retaliatory attacks have also circulated online, including an unverified list of U.S. cities alleged to be top targets for Iranian sleeper cells — the list appears to have been written in Apple's Notes app.
Disinformation is only going to get worseThe acceleration of advanced generative AI and relaxed moderation policies across social media platforms has exacerbated an online misinformation crisis, experts have warned.
Particularly over recent months, including during the U.S.-led capture of Venezuelan leader Nicolas Maduro, NewsGuard researchers have noticed a pattern in online disinformation emerging over periods of breaking news.
"People now have a shorter window for the lapse between an event occurring and authentic visuals coming out of the media," explained Rubinson. To put it more bluntly: Users are losing their patience, used to an online environment where information is usually right at your fingertips.
These brief periods, or voids, between breaking news reports and confirmed video or photos become fertile ground for disinformation bots and engagement farmers, Rubinson says. They also threaten to reinforce conspiratorial thinking — that mainstream news outlets are keeping information from the public, for example — and lend themselves to a user's own confirmation bias.
Political conflict is particularly rife for the spreading of such misinformation, which is in turn strengthened by active disinformation campaigns from both sides of armed conflict. Researchers have found that a lack of proximity to events makes it easier to believe out of context or exaggerated information.
"It's an attempt to fill this fog of war," said Rubsinson. "It can be very overwhelming for people. They want to make sense of it, and visuals are a good way for us to process what is going on in war when we can't comprehend the scale of these conflicts."
This becomes a greater problem as individuals increasingly use social media platforms as sole sources for news and as previously reliable fact-checking tools, including straightforward Google searches, become more unreliable.
SEE ALSO: U.S. government creates website to get around European content bans AI is harming more than helpingAI chatbots and search have become embedded into the very fiber of real world crisis events, as users turn to them real time fact checkers. Rubinson said that nearly every X post NewsGuard analyzed included the same reply: "@Grok is this true?"
But AI assistants and platform chatbots, including X's Grok, are notoriously unreliable at disseminating and verifying breaking news. They are also inconsistent at applying their own platforms' moderation policies. The BBC found that Grok erroneously verified recent AI-generated images depicting Iranian military movements, for example.
According to a second report by NewsGuard published March 3, Google AI-powered Search Summaries have repeated misleading claims about the U.S.-Iran conflict when prompted with reverse image searches. For example, NewsGuard researchers uploaded a frame from a video shared online claiming to show the destruction of a CIA outpost in Dubai. Google's AI summary verified the story, writing: "The image shows a fire at a high-rise residential building in Dubai, UAE, reportedly occurring on March 1, 2026, following regional tensions. … Conflicting reports emerged regarding the cause, with some sources mentioning a drone strike and others referring to the building as a specific intelligence facility."
The video actually depicts a 2015 residential fire in the city of Sharjah.
Security experts have sounded alarm bells over such "AI information threats," including AI tools used to generate and amplify misleading content. A report by the UK Centre for Emerging Technology and Security suggests the worsening information environment may pose existential threats to public safety, national security, and democracy without direct intervention.
Meanwhile, civilians and journalists on the ground in Iran are fighting back against a near total internet blackout, following a massive push by the Trump administration and its ally Elon Musk to get Starlink internet connections to those on the ground. Bad actors, on the other hand, are still finding their way through the block and back onto sites like X.
Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model
- Phi-4-reasoning-vision-15B is a compact and smart open‑weight multimodal reasoning model that balances reasoning power, efficiency, and training data needs. It is a broadly capable model that allows for natural interaction for a wide array of vision-language tasks and excels at math and science reasoning and understanding user-interfaces.
- We share lessons learned and best practices for training a multimodal reasoning model—showing the benefit of careful architecture choices, rigorous data curation, and the benefits of using a mixture of reasoning and non-reasoning data.
We are pleased to announce Phi-4-reasoning-vision-15B, a 15 billion parameter open‑weight multimodal reasoning model, available through Microsoft Foundry (opens in new tab), HuggingFace (opens in new tab) and GitHub (opens in new tab). Phi-4-reasoning-vision-15B is a broadly capable model that can be used for a wide array of vision-language tasks such as image captioning, asking questions about images, reading documents and receipts, helping with homework, inferring about changes in sequences of images, and much more. Beyond these general capabilities, it excels at math and science reasoning and at understanding and grounding elements on computer and mobile screens. In particular, our model presents an appealing value relative to popular open-weight models, pushing the pareto-frontier of the tradeoff between accuracy and compute costs. We have competitive performance to much slower models that require ten times or more compute-time and tokens and better accuracy than similarly fast models, particularly when it comes to math and science reasoning.
Figure 1: Phi-4-reasoning-vision-15B presents a compelling option compared to existing models, pushing the pareto-frontier of the tradeoff between accuracy and compute costs. We have competitive performance to much slower models that require more time and tokens and higher accuracy than similarly fast models. These values were computed by averaging accuracy, time, and output token-counts for a subset of 4 benchmarks: ChartQA_TEST, MathVista_MINI, MMMU_VAL, and ScreenSpot_v2, where we had logged these values.In this post, we share the motivations, design choices, experiments, and learnings that informed its development, as well as an evaluation of the model’s performance and guidance on how to use it. Our goal is to contribute practical insight to the community on building smaller, efficient multimodal reasoning models and to share an open-weight model that is competitive with models of similar size at general vision-language tasks, excels at computer use, and excels on scientific and mathematical multimodal reasoning.
A focus on smaller and faster vision–language modelsMany popular vision-language models (VLMs) have trended towards growing in parameter count and, in particular, the number of tokens they consume and generate. This leads to increase in training and inference-time cost and latency, and impedes their usability for downstream deployment, especially in resource‑constrained or interactive settings.
A growing countertrend towards smaller (opens in new tab) models aims to boost efficiency, enabled by careful model design and data curation – a goal pioneered by the Phi family of models (opens in new tab) and furthered by Phi-4-reasoning-vision-15B. We specifically build on learnings from the Phi-4 and Phi-4-Reasoning language models and show how a multimodal model can be trained to cover a wide range of vision and language tasks without relying on extremely large training datasets, architectures, or excessive inference‑time token generation. Our model is intended to be lightweight enough to run on modest hardware while remaining capable of structured reasoning when it is beneficial. Our model was trained with far less compute than many recent open-weight VLMs of similar size. We used just 200 billion tokens of multimodal data leveraging Phi-4-reasoning (trained with 16 billion tokens) based on a core model Phi-4 (400 billion unique tokens), compared to more than 1 trillion tokens used for training multimodal models like Qwen 2.5 VL (opens in new tab) and 3 VL (opens in new tab), Kimi-VL (opens in new tab), and Gemma3 (opens in new tab). We can therefore present a compelling option compared to existing models pushing the pareto-frontier of the tradeoff between accuracy and compute costs.
Figure 2: Phi-4-Reasoning-Vision can help with a wide range of everyday tasks. Lessons from training a multimodal modelTraining a multimodal reasoning model raises numerous questions and requires many nuanced design choices around model architecture, dataset quality and composition, and the interaction between reasoning‑heavy and non-reasoning perception‑focused tasks.
Model architecture: Early- vs mid-fusionModel architectures for VLMs differ primarily in how visual and textual information is fused. Mid-fusion models use a pretrained vision encoder to convert images into visual tokens that are projected into a pretrained LLM’s embedding space, enabling cross-modal reasoning while leveraging components already trained on trillions of tokens. Early-fusion models process image patches and text tokens in a single model transformer, yielding richer joint representations but at significantly higher compute, memory, and data cost. We adopted a mid-fusion architecture as it offers a practical trade-off for building a performant model with modest resources.
Model architecture: Vision encoder and image processingWe build on the SigLIP-2 (opens in new tab) vision encoder and the Phi-4-Reasoning backbone. In previous research, we found that multimodal language models sometimes struggled to solve tasks, not because of a lack of reasoning proficiency, but rather an inability to extract and select relevant perceptual information from the image. An example would be a high-resolution screenshot that is information-dense with relatively small interactive elements.
Several open-source multimodal language models have adapted their methodologies accordingly, e.g., Gemma3 (opens in new tab) uses pan-and-scan and NVILA (opens in new tab) uses Dynamic S2. However, their trade-offs are difficult to understand across different datasets and hyperparameters. To this end, we conducted an ablation study of several techniques. We trained a smaller 5 billion parameter Phi-4 based proxy model on a dataset of 10 million image-text pairs, primarily composed of computer-use and GUI grounding data. We compared with Dynamic S2, which resizes images to a rectangular resolution that minimizes distortion while admitting a tiling by 384×384 squares; Multi-crop, which splits the image into potentially overlapping 384×384 squares and concatenates their encoded features on the token dimension; Multi-crop with S2, which broadens the receptive field by cropping into 1536×1536 squares before applying S2; and Dynamic resolution using the Naflex variant of SigLIP-2, a natively dynamic-resolution encoder with adjustable patch counts.
Our primary finding is that dynamic resolution vision encoders perform the best and especially well on high-resolution data. It is particularly interesting to compare dynamic resolution with 2048 vs 3600 maximum tokens: the latter roughly corresponds to native HD 720p resolution and enjoys a substantial boost on high-resolution benchmarks, particularly ScreenSpot-Pro. Reinforcing the high-resolution trend, we find that multi-crop with S2 outperforms standard multi-crop despite using fewer visual tokens (i.e., fewer crops overall). The dynamic resolution technique produces the most tokens on average; due to their tiling subroutine, S2-based methods are constrained by the original image resolution and often only use about half the maximum tokens. From these experiments we choose the SigLIP-2 Naflex variant as our vision encoder.
MethodMax TokensMathVistaScreenSpotScreenSpot-ProV*BenchDynamic-S2309642.978.49.452.9Multi-crop309643.467.85.451.8Multi-crop with S2204843.479.110.657.1Dynamic resolution204845.281.59.251.3Dynamic resolution360044.979.717.556.0Table 1: Results with different resolution handling approaches. The top two configurations on each benchmark are in bold. Data: Quality and compositionAs with its language backbone Phi-4-Reasoning, Phi-4-reasoning-vision-15B was trained with a deliberate focus on data quality. Our final dataset consists primarily of data from three sources: open-source datasets which were meticulously filtered and improved; high-quality domain-specific internal data; and high-quality data from targeted acquisitions. The overwhelming majority of our data lies in the first category: data which originated as open-source data, which were significantly filtered and improved, whether by removing low-quality datasets or records, programmatically fixing errors in data formatting, or using open-source images as seeds to synthetically generate higher-quality accompanying text.
The process of improving open-source data began by manually reviewing samples from each dataset. Typically, 5 to 10 minutes were sufficient to classify data as excellent-quality, good questions with wrong answers, low-quality questions or images, or high-quality with formatting errors. Excellent data was kept largely unchanged. For data with incorrect answers or poor-quality captions, we re-generated responses using GPT-4o and o4-mini, excluding datasets where error rates remained too high. Low-quality questions proved difficult to salvage, but when the images themselves were high quality, we repurposed them as seeds for new caption or visual question answering (VQA) data. Datasets with fundamentally flawed images were excluded entirely. We also fixed a surprisingly large number of formatting and logical errors across widely used open-source datasets.
We extracted additional value from existing datasets through reformatting, diversification, and using images as seeds for new data generation. We generated detailed image descriptions alongside original QA pairs for math and science data, had data perform “double-duty” by embedding instruction-following requirements directly into domain-specific QA, created “scrambled,” “caption-matching,” and “what’s changed?” records to improve multi-image reasoning and sequential navigation for CUA scenarios, and diversifying prompt styles to encourage robustness beyond perfectly structured questions.
To supplement the improved open-source data, we utilize high-quality internal datasets, several math-specific datasets which were acquired during training of the Phi-4 language model, and also some domain-specific curated data; for example, latex-OCR data generated by processing and rendering equations from arXiv documents.
before returning a bounding box coordinates for a UI grounding task, and the other uses a tag with step-by-step reasoning to answer a chart question about expatriate populations, concluding with "Dubai." " class="wp-image-1163336"/> Figure 3: Phi-4-reasoning-vision-15B training data composition and examples Data: Mathematics vs. computer-use data proportionOne of our goals was to train a model that performs well across general vision-language tasks, while excelling at mathematical and scientific reasoning and computer-use scenarios. How to structure datasets for generalizable reasoning remains an open question—particularly because the relationship between data scale and reasoning performance can lead to starkly different design decisions, such as training a single model on a large dataset versus multiple specialized models with targeted post-training.
Research on long-tailed classification robustness has suggested that balancing or removing data from overrepresented tasks or subgroups (opens in new tab) is an effective method for ensuring good performance. Nevertheless, these insights are not fully utilized or explored when it comes to training VLMs, which at times have favored scale over careful data balancing. To achieve our goals, we conducted a set of experiments to analyze a range of data ratios between our focus domains.
Using the same 5 billion parameter proxy model as for previous experiments, we trained while varying the amount of mathematics and science vs. computer-use data for each run. Each dataset included the same subset of 1 million general image-text pairs as a baseline. For mathematics and science data, we used a subsample of 150,000 records, optionally duplicating each one up to three times. Next, we included up to 450,000 computer-use records, and optionally an additional 400,000 from Phi-Ground.
We found that that multimodal mathematics and science performance were not harmed by additional computer-use data, and vice versa. Interestingly, we found that increasing mathematics data by 3x while keeping computer-use data constant improved math, science, and computer-use benchmarks.
GeneralMath and ScienceCUATotalMMMUMathVistaScreenSpot-V21M150K450K1.6M44.037.448.21M150K850K2.0M44.137.360.01M450K450K1.9M45.336.048.31M450K850K2.3M43.438.963.11M150K150K1.3M44.236.929.81M150K250K1.4M45.437.437.7Table 2: Varying the ratios of math and CUA data. Increasing math data by 3x while keeping computer-use data constant improves both math and computer-use benchmarks. Data: Synthetic data for text-rich visual reasoningRecent work (opens in new tab) suggests that targeted synthetic data can materially improve multimodal reasoning, particularly for text-rich visual domains such as charts, documents, diagrams, and rendered mathematics. Using images, questions, and answers that are programmatically generated and grounded in the visual structure enables precise control over visual content and supervision quality, resulting in data that avoids many annotation errors, ambiguities, and distributional biases common in scraped datasets. This enables cleaner alignment between visual perception and multi-step inference, which has been shown to translate into measurable gains on reasoning-heavy benchmarks.
Synthetic text-rich images expand coverage of long-tail visual formats that are underrepresented in real data but disproportionately impact reasoning accuracy, improving not only visual grounding but also downstream reasoning by ensuring that failures are less often caused by perceptual errors. We found that programmatically generated synthetic data is a useful augmentation to high-quality real datasets — not a replacement, but a scalable mechanism for strengthening both perception and reasoning that complements the training objectives in compact multimodal models such as Phi-4-reasoning-vision-15B.
Mixing non-reasoning and reasoning as a design objectiveIn language-only settings, reasoning traces have improved performance on many tasks, but they require additional compute which adds undesired latency. In multimodal settings, this tradeoff is less clear-cut, for tasks such as image captioning and optical character recognition (OCR), reasoning is often unnecessary and can even be harmful (opens in new tab), while mathematical and scientific problem-solving benefit from multi-step reasoning. Thus, the choice of when to reason or not can be quite nuanced.
Training approaches for multimodal reasoning modelsLanguage-only reasoning models are typically created through supervised fine-tuning (SFT) or reinforcement learning (RL): SFT is simpler but requires large amounts of expensive reasoning trace data, while RL reduces data requirements at the cost of significantly increased training complexity and compute. Multimodal reasoning models follow a similar process, but the design space is more complex. With a mid-fusion architecture, the first decision is whether the base language model is itself a reasoning or non-reasoning model. This leads to several possible training pipelines:
- Non-reasoning LLM → reasoning multimodal training: Reasoning and multimodal capabilities are trained together.
- Non-reasoning LLM → non-reasoning multimodal → reasoning multimodal training: Multimodal capabilities are learned first, then reasoning is added.
- Reasoning LLM → reasoning multimodal training: A reasoning base is used, but all multimodal data must include reasoning traces.
- Our approach: Reasoning LLM → mixed non-reasoning / reasoning multimodal training. A reasoning-capable base is trained on a hybrid data mixture, learning when to reason and when to respond directly.
Approaches 1 and 2 offer flexibility in designing multimodal reasoning behavior from scratch using widely available non-reasoning LLM checkpoints but place a heavy burden on multimodal training. Approach 1 must teach visual understanding and reasoning simultaneously and requires a large amount of multimodal reasoning data, while Approach 2 can be trained with less reasoning data but risks catastrophic forgetting, as reasoning training may degrade previously learned visual capabilities. Both risk weaker reasoning than starting from a reasoning-capable base. Approach 3 inherits strong reasoning foundations, but like Approach 1, it requires reasoning traces for all training data and produces reasoning traces for all queries, even when not beneficial.
Our approach: A mixed reasoning and non-reasoning modelPhi-4-reasoning-vision-15B adopts the 4th approach listed previously, as it balances reasoning capability, inference efficiency, and data requirements. It inherits a strong reasoning foundation but uses a hybrid approach to combine the strengths of alternatives while mitigating their drawbacks. Our model defaults to direct inference for perception-focused domains where reasoning adds latency without improving accuracy, avoiding unnecessary verbosity and reducing inference costs, and it invokes longer reasoning paths for domains, such as math and science, that benefit from structured multi-step reasoning (opens in new tab).
Our model is trained with SFT, where reasoning samples include “…” sections with chain-of-thought reasoning before the final answer, covering domains like math and science. Non-reasoning samples are tagged to start with a “” token, signaling a direct response, and cover perception-focused tasks such as captioning, grounding, OCR, and simple VQA. Reasoning data comprises approximately 20% of the total mix. Starting from a reasoning-capable backbone means this data grounds existing reasoning in visual contexts rather than teaching it to reason from scratch.
This approach is not without limitations. The balance between modes is a direct function of design choices we made, informed by recent literature (opens in new tab) and observed model behavior during training—though the boundary between modes can be imprecise as it is learned implicitly from the data distribution. Our model allows control through explicit prompting with “” or “” tokens when the user wants to override the default reasoning behavior. The 20/80 reasoning-to-non-reasoning data split may not be optimal for all domains or deployment contexts. Evaluating the ideal balance of data and the model’s ability to switch appropriately between modes remains an open problem.
We view this mixed approach not as a definitive solution, but as one practical and well-motivated point in the design space for balancing latency, accuracy, and flexibility in multimodal systems.
Applications Figure 4: Phi-4-Reasoning-Vision can interpret sequences of imagesPhi-4-reasoning-vision-15B is a high-performing model across many vision-language tasks. It sees and understands the world by looking at a photo, document, chart, or screen and making sense of it. In practice that covers an enormous range of applications — just a few examples include: describing images and answering questions about them, interpreting changes and trends in images sequences, and recognizing objects, landmarks, and transcribing text.
Highlights: Scientific and mathematical reasoning and supporting computer-using agents (CUA)In addition to general vision and language tasks, Phi-4-reasoning-vision-15B was designed to excel at tasks that combine visual input with structured inference, such as solving math problems presented in visual form, such as handwritten or diagram-based questions, extracting and reasoning over quantitative information in documents and charts, and supporting multi-step reasoning in educational or scientific analysis contexts.
Figure 5: Phi-4-reasoning-vision-15B is great at math and science Figure 6: Phi-4-reasoning-vision-15B can help with written math problemsIn addition, we trained Phi-4-reasoning-vision-15B to have skills that can enable agents to interact with graphical user interfaces by interpreting screen content and selecting actions. With strong high-resolution perception and fine-grained grounding capabilities, Phi-4-reasoning-vision-15B is a compelling option as a base-model for training agentic models such as ones that navigate desktop, web, and mobile interfaces by identifying and localizing interactive elements such as buttons, menus, and text fields. Due to its low inference-time needs it is great for interactive environments where low latency and compact model size are essential.
Figure 7: Phi-4-reasoning-vision-15B can help navigate computer UIs EvaluationPhi-4-reasoning-vision-15B was evaluated for accuracy and timing using two complementary open-source frameworks to ensure both rigorous and standardized analysis: Eureka ML Insights (opens in new tab) and VLMEvalKit (opens in new tab).
BenchmarkPhi-4-reasoning-vision-15BPhi-4-reasoning-vision-15B – force nothinkPhi-4-mm-instructKimi-VL-A3B-Instructgemma-3-12b-itQwen3-VL-8B-Instruct-4KQwen3-VL-8B-Instruct-32KQwen3-VL-32B-Instruct-4KQwen3-VL-32B-Instruct-32KAI2D_TEST 84.8 84.7 68.6 84.6 80.4 82.7 83 84.8 85 ChartQA_TEST 83.3 76.5 23.5 87 39 83.1 83.2 84.3 84 HallusionBench64.4 63.1 56 65.2 65.3 73.5 74.1 74.4 74.9 MathVerse_MINI 44.9 43.8 32.4 41.7 29.8 54.5 57.4 64.2 64.2 MathVision_MINI 36.2 34.2 20 28.3 31.9 45.7 50 54.3 60.5 MathVista_MINI 75.2 68.7 50.5 67.1 57.4 77.1 76.4 82.5 81.8 MMMU_VAL 54.3 52 42.3 52 50 60.7 64.6 68.6 70.6 MMStar 64.5 63.3 45.9 60 59.4 68.9 69.9 73.7 74.3 OCRBench 76 75.6 62.6 86.5 75.3 89.2 90 88.5 88.5 ScreenSpot_v2 88.2 88.3 28.5 89.8 3.5 91.5 91.5 93.7 93.9 Table 3: Accuracy comparisons relative to popular open-weight, non-thinking models BenchmarkPhi-4-reasoning-vision-15BPhi-4-reasoning-vision-15B – force thinkingKimi-VL-A3B-Thinkinggemma-3-12b-itQwen3-VL-8B-Thinking-4KQwen3-VL-8B-Thinking-40KQwen3-VL-32B-Thiking-4KQwen3-VL-32B-Thinking-40KAI2D_TEST 84.8 79.7 81.2 80.4 83.5 83.9 86.9 87.2 ChartQA_TEST 83.3 82.9 73.3 39 78 78.6 78.5 79.1 HallusionBench64.4 63.9 70.6 65.3 71.6 73 76.4 76.6 MathVerse_MINI 44.9 53.1 61 29.8 67.3 73.3 78.3 78.2 MathVision_MINI 36.2 36.2 50.3 31.9 43.1 50.7 60.9 58.6 MathVista_MINI 75.2 74.1 78.6 57.4 77.7 79.5 83.9 83.8 MMMU_VAL 54.3 55 60.2 50 59.3 65.3 72 72.2 MMStar 64.5 63.9 69.6 59.4 69.3 72.3 75.5 75.7 OCRBench 76 73.7 79.9 75.3 81.2 82 83.7 85 ScreenSpot_v2 88.2 88.1 81.8 3.5 93.3 92.7 83.1 83.1 Table 4: Accuracy comparisons relative to popular open-weight, thinking modelsOur model balances thinking and non-thinking performance – on average showing better accuracy in the default “mixed-reasoning” behavior than when forcing thinking vs. non-thinking. Only in a few cases does forcing a specific mode improve performance (MathVerse and MMU_val for thinking and ScreenSpot_v2 for non-thinking). Compared to recent popular, open-weight models, our model provides a desirable trade-off between accuracy and cost (as a function of inference time compute and output tokens), as discussed previously.
Note: All numbers here are the result of running benchmarks ourselves and may be lower than other previously shared numbers. Instead of quoting leaderboards, we performed our own benchmarking, so we could understand scaling performance as a function of output token counts for related models. We made our best effort to run fair evaluations and used recommended evaluation platforms with model-specific recommended settings and prompts provided for all third-party models. For Qwen models we use the recommended token counts and also ran evaluations matching our max output token count of 4096. For Phi-4-reasoning-vision-15B, we used our system prompt and chat template but did not do any custom user-prompting or parameter tuning, and we ran all evaluations with temperature=0.0, greedy decoding, and 4096 max output tokens. These numbers are provided for comparison and analysis rather than as leaderboard claims. For maximum transparency and fairness, we will release all our evaluation logs publicly. For more details on our evaluation methodology, please see our technical report (opens in new tab).
SafetyAs with other Phi models, Phi-4-reasoning-vision-15B was developed with safety as a core consideration throughout training and evaluation. The model was trained on a mixture of public safety datasets and internally generated examples designed to elicit behaviors the model should appropriately refuse, in alignment with Microsoft’s Responsible AI Principles. For further details, check out our technical report (opens in new tab).
Open release and community engagementPhi-4-reasoning-vision-15B is available on Microsoft Foundry (opens in new tab) and HuggingFace (opens in new tab) with additional examples and details on GitHub (opens in new tab). For additional guidance on how to use our model properly and safely, please refer to our Model card (opens in new tab). For further details on the technical aspects of the model, training, and evaluation, see our technical report (opens in new tab).
In line with our goal of supporting future AI development in the community, Phi-4-reasoning-vision-15B is released under a permissive license with model weights, fine‑tuning code, and benchmark logs. We intend this release to complement existing work by providing concrete artifacts that help close gaps in understanding how compact multimodal reasoning models can be built and studied.
Looking forwardSmaller vision–language models with selective, task‑aware reasoning offer one promising direction for making multimodal systems more practical and accessible. We present our model and its learnings to inform ongoing research in multimodal modeling, computer‑using agents, and mathematical scientific reasoning. We hope these details are useful to researchers exploring similar tradeoffs and invite critical evaluation, replication, and extension by the community. If you’d like to join us and help shape the future of multimodal models, please apply for one of our open roles.
AcknowledgementsWe thank Rachel Ward for her extensive work on data collection and curation. We thank the GenDatasets, PhiGround, SimCity, and Fara-7B efforts for invaluable training data. We thank Harkirat Behl, Mojan Javaheripi, and Suriya Gunasekar for providing us with Phi-4 checkpoints and guidance on training with Phi models. We additionally thank Sahaj Agarwal, Ahmed Awadallah, Qi Dai, Gustavo de Rosa, Rafah Hosn, Ece Kamar, Piero Kauffmann, Yash Lara, Chong Luo, Caio César Teodoro Mendes, Akshay Nambi, Craig Presti, Matthew Rosoff, Corby Rosset, Marco Rossi, Kashyap Patel, Adil Salim, Sidhartha Sen, Shital Shah, Pratyusha Sharma, Alexey Taymanov, Vibhav Vineet, John Weiss, Spencer Whitehead, the AI Frontiers Team and Leadership, and Microsoft Research Leadership, for their valuable help, insightful discussions, and continued support throughout this work.
Opens in a new tabThe post Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model appeared first on Microsoft Research.
I don’t care if my phone gets long-term updates
Lengthy software support has become one of the main selling points in modern smartphones. Manufacturers that offer only three or four years of updates are often criticized when competitors promise half a decade or more of support, even on mid-range phones.
Dead laptops, old DVRs, and PS4s: How to harvest free SATA drives for your PC
You probably have a lot of old hardware that has outlived its usefulness. And if you're desperately looking for storage in these dire times of price hikes, you might be pleasantly surprised.
Keep a tidier home with $400 off the Mova Z60 Ultra Roller Complete Robot Vacuum and Mop
SAVE $400.01: As of March 4, get the Mova Z60 Ultra Roller Complete Robot Vacuum and Mop for $1,098.99 at Amazon, down from its usual price of $1,499. That's a discount of 27%.
Opens in a new window Credit: Amazon Mova Z60 Ultra Roller Complete Robot Vacuum and Mop $1,098.99 at Amazn$1,499 Save $400.01 Get Deal
Tired of spending all your extra time vacuuming and mopping your home? It's 2026, and you've got better things to do. You can offload those tasks to a robot vacuum and recoup that lost time doing things you actually like. And we've found a great model that can both save you time and money, so you can get back to living your life instead of doing menial tasks.
As of March 4, get the Mova Z60 Ultra Roller Complete Robot Vacuum and Mop for $1,098.99 at Amazon, down from its usual price of $1,499. That's a discount of 27%.
SEE ALSO: The Shark Matrix Plus 2-in-1 robot vacuum is down to a record-low $299.99 at AmazonThis powerful robot vacuum and mop combo can handle all the dirty work you don't want to do. It has 28,000Pa of suction combined with a tangle-free brush, so it can not only cut through dirt and debris while capturing up to 99% of large dirt particles, but it can pick up human and pet hair without tangling. Its TurboForce 8 high-speed motor ensures it does all this without any hiccups.
After you've had the robovac go over your home with a fine-toothed comb to pick up the dirt, you can return with the mop, which uses real-time clean water spray to rinse the mop as it cleans to avoid cross-contamination. It also uses smart fluffing to better help maintain the mop, so it doesn't get dingy and old as time wears on to ruin its performance.
If you're ready to turn your cleaning routine over to the robots, this is an excellent option to rely on that can pretty much handle itself. And with $400 off, now's the perfect time to buy it, too.


