Microsoft

Hunting speculative information leaks with Revizor

Microsoft Research - Thu, 04/13/2023 - 18:00

Spectre and Meltdown are two security vulnerabilities that affect the vast majority of CPUs in use today. CPUs, or central processing units, act as the brains of a computer, directing the functions of its other components. By targeting a feature of the CPU implementation that optimizes performance, attackers could access sensitive data previously considered inaccessible. 

For example, Spectre exploits speculative execution—an aggressive strategy for increasing processing speed by postponing certain security checks. But it turns out that before the CPU performs the security check, attackers might have already extracted secrets via so-called side-channels. This vulnerability went undetected for years before it was discovered and mitigated in 2018. Security researchers warned that thieves could use it to target countless computers, phones and mobile devices. Researchers began hunting for more vulnerabilities, and they continue to find them. But this process is manual and progress came slowly. With no tools available to help them search, researchers had to analyze documentation, read through patents, and experiment with different CPU generations. 

A group of researchers from Microsoft and academic partners began exploring a method for systematically finding and analyzing CPU vulnerabilities. This effort would produce a tool called Revizor (REV-izz-or), which automatically detects microarchitectural leakage in CPUs—with no prior knowledge about the internal CPU components. Revizor achieves this by differentiating between expected and unexpected information leaks on the CPU. 

Spotlight: On-Demand EVENT

Microsoft Research Summit 2022

On-Demand
Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.

Explore sessions

The Revizor process begins by describing what is expected from the CPU in a so-called “leakage contract.” Revizor then searches the CPU to find any violations of this contract. It creates random programs, runs them on the CPU, records the information they expose, and compares the information with the contract. When it finds a mismatch that violates the contract, it reports it as a potential vulnerability. 

Details were published in 2022 in the paper: Revizor: Testing Black-box CPUs against Speculation Contracts

To demonstrate Revizor’s effectiveness, the researchers tested a handful of commercial CPUs and found several known vulnerabilities, including Spectre, MDS, and LVI, as well as several previously unknown variants. 

However, the search was still slow, which hindered the discovery of entirely new classes of leaks. The team identified the root causes of the performance limitations, and proposed techniques to overcome them, improving the testing speed by up to two orders of magnitude. The improvements are described in a newly published paper: Hide and Seek with Spectres: Efficient discovery of speculative information leaks with random testing

These improvements supported a testing campaign of unprecedented depth on Intel and AMD CPUs. In the process, the researchers found two types of previously unknown speculative leaks (affecting string comparison and division) that had escaped previous analyses—both manual and automated. These results show that work which previously required persistent hacking and painstaking manual labor can now be automated and rapidly accelerated. 

The team began working with the Microsoft Security Response Center and hardware vendors, and together they continue to find vulnerabilities so they can be closed before they are discovered by hackers—thereby protecting customers from risk. 

Revizor is part of Project Venice, which investigates novel mechanisms for the secure sharing and partitioning of computing resources, together with techniques for specifying and rigorously validating their resilience to side-channel attacks. 

Read the paper Download code

The post Hunting speculative information leaks with Revizor appeared first on Microsoft Research.

Categories: Microsoft

Research Focus: Week of April 10, 2023

Microsoft Research - Wed, 04/12/2023 - 18:43

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

In this article
  1. Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-Demand VMs
  2. Embracing Noise: How can systems be designed and created with and for noise? 
  3. DOTE: Rethinking (Predictive) WAN Traffic Engineering 
  4. Predoctoral Research Assistant (contract) – Computational Social Science
  5. NEW RESEARCH Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-Demand VMs

    To improve the utilization of computing resources, cloud providers often offer underutilized capacity at a discount, but with lower guarantees of availability. However, many customers hesitate to take full advantage of such offerings (such as spot virtual machines), even though they can provide scalability and lower costs for workloads that can handle interruptions.

    In a new paper: Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-Demand VMs,
    researchers from Microsoft propose an intelligent framework to optimize customer cost while maintaining resource availability by dynamically mixing on-demand VMs with spot VMs. Snape is composed with a reliable model for predicting the eviction rate of spot VMs from the production trace and an intelligent constrained reinforcement learning (CRL) framework for learning the best mixture policy, given the predicted eviction rate and other service signals. 

    This proactive design enables an online decision-making system for dynamically adjusting the mixture of on-demand and spot VMs and ensures that a more aggressive and cheaper policy is only adopted when the reliability is high (low predicted eviction rates of spot VM). Experiments across different configurations show that Snape achieves 44% savings compared to the policy of using only on-demand VMs, and at the same time, maintains 99.96% availability—2.77% higher than with a policy of using only spot VMs. 

    Read the paper

    SPOTLIGHT: AI focus area

    AI and Microsoft Research

    Learn more about the breadth of AI research at Microsoft

    Learn more NEW RESEARCH  Embracing Noise: How can systems be designed and created with and for noise? 

    Noise—as a term used to describe data as not meaningful or useful to a system—is a helpful concept in fields like data science, machine learning, and AI. It can help make data manageable, for example by allowing “noisy” data points to be identified and removed so the data can be streamlined to fit a computational structure. But unlike computer systems, which operate with explicit definitions and discrete structures, people have varying boundaries and perceptions of what is meaningful. This presents choices that involve noise. For example, what specific input will we be expecting and what remaining potential input will be considered noise? What constitutes valid input, and what are the consequences of deciding that something is “invalid”? 

    In a new paper: Embracing Data Noise, Microsoft researcher Ida Larsen-Ledet examines conceptualization, acceptance, and use of noise; including what may be gained from viewing seemingly undesirable output as noise with potential. 

    When designing computing systems, removing or reducing noise can be the right choice – for example, in safety-critical environments. But noise shouldn’t be uncritically disregarded. If we look at noise in a nuanced way, we may be better able to apply it in useful ways.

    Read the paper NEW RESEARCH DOTE: Rethinking (Predictive) WAN Traffic Engineering 

    Uncertainty about future network traffic trends presents a crucial real-world challenge for routing, especially over wide-area networks where bandwidth is expensive, and applications have stringent quality-of-service requirements. In a new paper, DOTE: Rethinking (Predictive) WAN Traffic Engineering, researchers from Microsoft Research teamed up with researchers from the Hebrew University and the Technion to explore a new design point for traffic engineering on wide-area networks (WANs): directly optimizing traffic flow on the WAN using only historical data. 

    The novel algorithmic framework of DOTE combines stochastic optimization and deep learning to identify appropriate routing using as input only historical traffic demands. Intrinsically, the technique picks up on patterns in traffic demands at the scale of large WANs, allowing it to identify high-quality routing without predicting future demands. The research shows this method provably converges to the global optimum in well-studied theoretical models and demonstrates the performance benefits through extensive analyses of empirical data from operational networks, including Microsoft’s backbone network.

    Read the paper OPPORTUNITY  Predoctoral Research Assistant (contract) – Computational Social Science

    Microsoft Research New York City seeks a recent college graduate for a contingent Predoctoral Research Assistant position in computational social science (CSS). Our Predoctoral Research Assistant program is aimed at candidates seeking research experience prior to pursuing a PhD in fields related to CSS. 

    Our computational social science group is widely recognized as a leading center of CSS research. Our research lies at the intersection of computer science, statistics, and social sciences, and uses large-scale demographic, behavioral, and network data to investigate human activity and relationships. Apply by May 5 for a one-year assignment beginning in Summer 2023, with a possibility to extend to a total of 18 months. 

    Apply now

    The post Research Focus: Week of April 10, 2023 appeared first on Microsoft Research.

Categories: Microsoft

Research Focus: Week of April 10, 2023

Microsoft Research - Wed, 04/12/2023 - 18:43

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

In this article
  1. Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-Demand VMs
  2. Embracing Noise: How can systems be designed and created with and for noise? 
  3. DOTE: Rethinking (Predictive) WAN Traffic Engineering 
  4. Predoctoral Research Assistant (contract) – Computational Social Science
  5. NEW RESEARCH Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-Demand VMs

    To improve the utilization of computing resources, cloud providers often offer underutilized capacity at a discount, but with lower guarantees of availability. However, many customers hesitate to take full advantage of such offerings (such as spot virtual machines), even though they can provide scalability and lower costs for workloads that can handle interruptions.

    In a new paper: Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-Demand VMs,
    researchers from Microsoft propose an intelligent framework to optimize customer cost while maintaining resource availability by dynamically mixing on-demand VMs with spot VMs. Snape is composed with a reliable model for predicting the eviction rate of spot VMs from the production trace and an intelligent constrained reinforcement learning (CRL) framework for learning the best mixture policy, given the predicted eviction rate and other service signals. 

    This proactive design enables an online decision-making system for dynamically adjusting the mixture of on-demand and spot VMs and ensures that a more aggressive and cheaper policy is only adopted when the reliability is high (low predicted eviction rates of spot VM). Experiments across different configurations show that Snape achieves 44% savings compared to the policy of using only on-demand VMs, and at the same time, maintains 99.96% availability—2.77% higher than with a policy of using only spot VMs. 

    Read the paper

    Spotlight: On-Demand EVENT

    Microsoft Research Summit 2022

    On-Demand
    Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.

    Explore sessions NEW RESEARCH  Embracing Noise: How can systems be designed and created with and for noise? 

    Noise—as a term used to describe data as not meaningful or useful to a system—is a helpful concept in fields like data science, machine learning, and AI. It can help make data manageable, for example by allowing “noisy” data points to be identified and removed so the data can be streamlined to fit a computational structure. But unlike computer systems, which operate with explicit definitions and discrete structures, people have varying boundaries and perceptions of what is meaningful. This presents choices that involve noise. For example, what specific input will we be expecting and what remaining potential input will be considered noise? What constitutes valid input, and what are the consequences of deciding that something is “invalid”? 

    In a new paper: Embracing Data Noise, Microsoft researcher Ida Larsen-Ledet examines conceptualization, acceptance, and use of noise; including what may be gained from viewing seemingly undesirable output as noise with potential. 

    When designing computing systems, removing or reducing noise can be the right choice – for example, in safety-critical environments. But noise shouldn’t be uncritically disregarded. If we look at noise in a nuanced way, we may be better able to apply it in useful ways.

    Read the paper NEW RESEARCH DOTE: Rethinking (Predictive) WAN Traffic Engineering 

    Uncertainty about future network traffic trends presents a crucial real-world challenge for routing, especially over wide-area networks where bandwidth is expensive, and applications have stringent quality-of-service requirements. In a new paper, DOTE: Rethinking (Predictive) WAN Traffic Engineering, researchers from Microsoft Research teamed up with researchers from the Hebrew University and the Technion to explore a new design point for traffic engineering on wide-area networks (WANs): directly optimizing traffic flow on the WAN using only historical data. 

    The novel algorithmic framework of DOTE combines stochastic optimization and deep learning to identify appropriate routing using as input only historical traffic demands. Intrinsically, the technique picks up on patterns in traffic demands at the scale of large WANs, allowing it to identify high-quality routing without predicting future demands. The research shows this method provably converges to the global optimum in well-studied theoretical models and demonstrates the performance benefits through extensive analyses of empirical data from operational networks, including Microsoft’s backbone network.

    Read the paper OPPORTUNITY  Predoctoral Research Assistant (contract) – Computational Social Science

    Microsoft Research New York City seeks a recent college graduate for a contingent Predoctoral Research Assistant position in computational social science (CSS). Our Predoctoral Research Assistant program is aimed at candidates seeking research experience prior to pursuing a PhD in fields related to CSS. 

    Our computational social science group is widely recognized as a leading center of CSS research. Our research lies at the intersection of computer science, statistics, and social sciences, and uses large-scale demographic, behavioral, and network data to investigate human activity and relationships. Apply by May 5 for a one-year assignment beginning in Summer 2023, with a possibility to extend to a total of 18 months. 

    Apply now

    The post Research Focus: Week of April 10, 2023 appeared first on Microsoft Research.

Categories: Microsoft

Research Focus: Week of April 10, 2023

Microsoft Research - Wed, 04/12/2023 - 18:43

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

In this article
  1. Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-Demand VMs
  2. Embracing Noise: How can systems be designed and created with and for noise? 
  3. DOTE: Rethinking (Predictive) WAN Traffic Engineering 
  4. Predoctoral Research Assistant (contract) – Computational Social Science
  5. NEW RESEARCH Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-Demand VMs

    To improve the utilization of computing resources, cloud providers often offer underutilized capacity at a discount, but with lower guarantees of availability. However, many customers hesitate to take full advantage of such offerings (such as spot virtual machines), even though they can provide scalability and lower costs for workloads that can handle interruptions.

    In a new paper: Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-Demand VMs,
    researchers from Microsoft propose an intelligent framework to optimize customer cost while maintaining resource availability by dynamically mixing on-demand VMs with spot VMs. Snape is composed with a reliable model for predicting the eviction rate of spot VMs from the production trace and an intelligent constrained reinforcement learning (CRL) framework for learning the best mixture policy, given the predicted eviction rate and other service signals. 

    This proactive design enables an online decision-making system for dynamically adjusting the mixture of on-demand and spot VMs and ensures that a more aggressive and cheaper policy is only adopted when the reliability is high (low predicted eviction rates of spot VM). Experiments across different configurations show that Snape achieves 44% savings compared to the policy of using only on-demand VMs, and at the same time, maintains 99.96% availability—2.77% higher than with a policy of using only spot VMs. 

    Read the paper

    SPOTLIGHT: AI focus area

    AI and Microsoft Research

    Learn more about the breadth of AI research at Microsoft

    Learn more NEW RESEARCH  Embracing Noise: How can systems be designed and created with and for noise? 

    Noise—as a term used to describe data as not meaningful or useful to a system—is a helpful concept in fields like data science, machine learning, and AI. It can help make data manageable, for example by allowing “noisy” data points to be identified and removed so the data can be streamlined to fit a computational structure. But unlike computer systems, which operate with explicit definitions and discrete structures, people have varying boundaries and perceptions of what is meaningful. This presents choices that involve noise. For example, what specific input will we be expecting and what remaining potential input will be considered noise? What constitutes valid input, and what are the consequences of deciding that something is “invalid”? 

    In a new paper: Embracing Data Noise, Microsoft researcher Ida Larsen-Ledet examines conceptualization, acceptance, and use of noise; including what may be gained from viewing seemingly undesirable output as noise with potential. 

    When designing computing systems, removing or reducing noise can be the right choice – for example, in safety-critical environments. But noise shouldn’t be uncritically disregarded. If we look at noise in a nuanced way, we may be better able to apply it in useful ways.

    Read the paper NEW RESEARCH DOTE: Rethinking (Predictive) WAN Traffic Engineering 

    Uncertainty about future network traffic trends presents a crucial real-world challenge for routing, especially over wide-area networks where bandwidth is expensive, and applications have stringent quality-of-service requirements. In a new paper, DOTE: Rethinking (Predictive) WAN Traffic Engineering, researchers from Microsoft Research teamed up with researchers from the Hebrew University and the Technion to explore a new design point for traffic engineering on wide-area networks (WANs): directly optimizing traffic flow on the WAN using only historical data. 

    The novel algorithmic framework of DOTE combines stochastic optimization and deep learning to identify appropriate routing using as input only historical traffic demands. Intrinsically, the technique picks up on patterns in traffic demands at the scale of large WANs, allowing it to identify high-quality routing without predicting future demands. The research shows this method provably converges to the global optimum in well-studied theoretical models and demonstrates the performance benefits through extensive analyses of empirical data from operational networks, including Microsoft’s backbone network.

    Read the paper OPPORTUNITY  Predoctoral Research Assistant (contract) – Computational Social Science

    Microsoft Research New York City seeks a recent college graduate for a contingent Predoctoral Research Assistant position in computational social science (CSS). Our Predoctoral Research Assistant program is aimed at candidates seeking research experience prior to pursuing a PhD in fields related to CSS. 

    Our computational social science group is widely recognized as a leading center of CSS research. Our research lies at the intersection of computer science, statistics, and social sciences, and uses large-scale demographic, behavioral, and network data to investigate human activity and relationships. Apply by May 5 for a one-year assignment beginning in Summer 2023, with a possibility to extend to a total of 18 months. 

    Apply now

    The post Research Focus: Week of April 10, 2023 appeared first on Microsoft Research.

Categories: Microsoft

Building toward more autonomous and proactive cloud technologies with AI

Microsoft Research - Mon, 04/10/2023 - 18:00
Read part 1 Cloud Intelligence/AIOps blog series

In the first blog post in this series, Cloud Intelligence/AIOps – Infusing AI into Cloud Computing Systems, we presented a brief overview of Microsoft’s research on Cloud Intelligence/AIOps (AIOps), which innovates AI and machine learning (ML) technologies to help design, build, and operate complex cloud platforms and services effectively and efficiently at scale. As cloud computing platforms have continued to emerge as one of the most fundamental infrastructures of our world, both their scale and complexity have grown considerably. In our previous blog post, we discussed the three major pillars of AIOps research: AI for Systems, AI for Customers, and AI for DevOps, as well as the four major research areas that constitute the AIOps problem space: detection, diagnosis, prediction, and optimization. We also envisioned the AIOps research roadmap as building toward creating more autonomous, proactive, manageable, and comprehensive cloud platforms. 

Vision of AIOps Research AutonomousProactiveManageableComprehensiveFully automate the operation of cloud systems to minimize system downtime and reduce manual efforts.Predict future cloud status, support proactive decision-making, and prevent bad things from happening.Introduce the notion of tiered autonomy for infusing autonomous routine operations and deep human expertise. Span AIOps to the full cloud stack for global optimization/management and extend to multi-cloud environments.

Starting with this blog post, we will take a deeper dive into Microsoft’s vision for AIOps research and the ongoing efforts to realize that vision. This blog post will focus on how our researchers leveraged state-of-the-art AIOps research to help make cloud technologies more autonomous and proactive. We will discuss our work to make the cloud more manageable and comprehensive in future blog posts.

Autonomous cloud Motivation

Cloud platforms require numerous actions and decisions every second to ensure that computing resources are properly managed and failures are promptly addressed. In practice, those actions and decisions are either generated by rule-based systems constructed upon expert knowledge or made manually by experienced engineers. Still, as cloud platforms continue to grow in both scale and complexity, it is apparent that such solutions will be insufficient for the future cloud system. On one hand, rigid rule-based systems, while being knowledge empowered, often involve huge numbers of rules and require frequent maintenance for better coverage and adaptability. Still, in practice, it is often unrealistic to keep such systems up to date as cloud systems expand in both size and complexity, and even more difficult to guarantee consistency and avoid conflicts between all the rules. On the other hand, engineering efforts are very time-consuming, prone to errors, and difficult to scale.

Spotlight: Microsoft Research Podcast

AI Frontiers: AI for health and the future of research with Peter Lee

Peter Lee, head of Microsoft Research, and Ashley Llorens, AI scientist and engineer, discuss the future of AI research and the potential for GPT-4 as a medical copilot.

Listen now

To break the constraints on the coverage and scalability of the existing solutions and improve the adaptability and manageability of the decision-making systems, cloud platforms must shift toward a more autonomous management paradigm. Instead of relying solely on expert knowledge, we need suitable AI/ML models to fuse operational data and expert knowledge together to enable efficient, reliable, and autonomous management decisions. Still, it will take many research and engineering efforts to overcome various barriers for developing and deploying autonomous solutions to cloud platforms.

Toward an autonomous cloud

In the journey towards an autonomous cloud, there are two major challenges. The first challenge lies in the heterogeneity of cloud data. In practice, cloud platforms deploy a huge number of monitors to collect data in various formats, including telemetry signals, machine-generated log files, and human input from engineers and users. And the patterns and distributions of those data generally exhibit a high degree of diversity and are subjected to changes over time. To ensure that the adopted AIOps solutions can function autonomously in such an environment, it is essential to empower the management system with robust and extendable AI/ML models capable of learning useful information from heterogeneous data sources and drawing right conclusions in various scenarios.

The complex interaction between different components and services presents another major challenge in deploying autonomous solutions. While it can be easy to implement autonomous features for one or a few components/services, how to construct end-to-end systems capable of automatically navigating the complex dependencies in cloud systems presents the true challenge for both researchers and engineers. To address this challenge, it is important to leverage both domain knowledge and data to optimize the automation paths in application scenarios. Researchers and engineers should also implement reliable decision-making algorithms in every decision stage to improve the efficiency and stability of the whole end-to-end decision-making process.

Over the past few years, Microsoft research groups have developed many new models and methods for overcoming those challenges and improving the level of automation in various cloud application scenarios across the AIOps problem spaces. Notable examples include:

  • Detection: Gandalf and ATAD for the early detection of problematic deployments; HALO for hierarchical faulty localization; and Onion for detecting incident-indicating logs.
  • Diagnosis: SPINE and UniParser for log parsing; Logic and Warden for regression and incident diagnosis; and CONAN for batch failure diagnosis.
  • Prediction: TTMPred for predicting time to mitigate incidents; LCS for predicting the low-capacity status in cloud servers; and Eviction Prediction for predicting the eviction of spot virtual machines.
  • Optimization: MLPS for optimizing the reallocation of containers; and RESIN for the management of memory leak in cloud infrastructure.

These solutions not only improve service efficiency and reduce management time with more automatous design, but also result in higher performance and reliability with fewer human errors. As an illustration of our work toward a more autonomous cloud, we will discuss our exploration for supporting automatic safe deployment services below.

Exemplary scenario: Automatic safe deployment

In online services, the continuous integration and continuous deployment (CI/CD) of new patches and builds are critical for the timely delivery of bug fixes and feature updates. Because new deployments with undetected bugs or incompatible issues can cause severe service outages and create significant customer impact, cloud platforms enforce strict safe-deployment procedures before releasing each new deployment to the production environments. Such procedures typically involve multi-stage testing and verification in a sequence of canary environments with increasing scopes. When a deployment-related anomaly is identified in one of these stages, the responsible deployment is rolled back for further diagnosis and fixing. Owing to the challenges of identifying deployment-related anomalies with heterogeneous patterns and managing a huge number of deployments, safe-deployment systems administrated manually can be extremely costly and error prone.

To support automatic and reliable anomaly detection in safe deployment, we proposed a general methodology named ATAD for the effective detection of deployment-related anomalies in time-series signals. This method addresses the challenges of capturing changes with various patterns in time-series signals and the lack of labeled anomaly samples due to the heavy cost of labeling. Specifically, this method combines ideas from both transfer learning and active learning to make good use of the temporal information in the input signal and reduce the number of labeled samples required for model training. Our experiments have shown that ATAD can outperform other state-of-the-art anomaly detection approaches, even with only 1%-5% of labeled data.

At the same time, we collaborated with product teams in Azure to develop and deploy Gandalf, an end-to-end automatic safe deployment system that reduces deployment time and increases the accuracy of detecting bad deployment in Azure. As a data-driven system, Gandalf monitors a large array of information, including performance metrics, failure signals and deployment records. It also detects anomalies in various patterns throughout the entire safe-deployment process. After detecting anomalies, Gandalf applies a vote-veto mechanism to reliably determine whether each detected anomaly is caused by a specific new deployment. Gandalf then automatically decides whether the relevant new deployment should be stopped for a fix or if it’s safe enough to proceed to the next stage. After rolling out in Azure, Gandalf has been effective at helping to capture bad deployments, achieving more than 90% precision and near 100% recall in production over a period of 18 months.

Flow of Automatic Safe Deployment System Proactive cloud Motivation

Traditional decision-making in the cloud focuses on optimizing immediate resource usage and addressing emerging issues. While this reactive design is not unreasonable in a relatively static system, it can lead to short-sighted decisions in a dynamic environment. In cloud platforms, both the demand and utilization of computing resources are undergoing constant changes, including regular periodical patterns, unexpected spikes, and gradual shifts in both temporal and spatial dimensions. To improve the long-term efficiency and reliability of cloud platforms, it is critical to adopt a proactive design that takes the future status of the system into account in the decision-making process.

A proactive design leverages data-driven models to predict the future status of cloud platforms and enable downstream proactive decision-making. Conceptually, a typical proactive decision-making system consists of two modules: a prediction module and a decision-making module, as displayed in the following diagram.

In the prediction module, historical data are collected and processed for training and fine-tuning the prediction model for deployment. The deployed prediction model takes in the online data stream and generates prediction results in real time. In the decision-making module, both the current system status and the predicted system status, along with other information such as domain knowledge and past decision history, is considered for making decisions that balance both present and future benefits.

Toward proactive design

Proactive design, while creating new opportunities for improving the long-term efficiency and reliability of cloud systems, does expose the decision-making process to additional risks. On one hand, thanks to the inherent randomness in the daily operation of cloud platforms, proactive decisions are always subjected to the uncertainty risk from the stochastic elements in both running systems and the environments. On the other hand, the reliability of prediction models adds another layer of risks in making proactive decisions. Therefore, to guarantee the performance of proactive design, engineers must put mechanisms in place to address those risks.

To manage uncertainty risk, engineers need to reformulate the decision-making in proactive design to account for the uncertainty elements. They can often use methodological frameworks, such as prediction+optimization and optimization under chance-constraints, to incorporate uncertainties into the target functions of optimization problems. Well-designed ML/AL models can also learn uncertainty from data for improving proactive decisions against uncertainty elements. As for risks associated with the prediction model, modules for improving data quality, including quality-aware feature engineering, robust data imputation, and data rebalancing, should be applied to reduce prediction errors. Engineers should also make continuous efforts to improve and update the robustness of prediction models. Moreover, safeguarding mechanisms are essential to prevent decisions that may cause harm to the cloud system.

Microsoft’s AIOps research has pioneered the transition from reactive decision-making to proactive decision-making, especially in problem spaces of prediction and optimization. Our efforts not only lead to significant improvement in many application scenarios traditionally supported by reactive decision-making, but also create many new opportunities. Notable proactive design solutions include Narya and Nenya for hardware failure mitigation, UAHS and CAHS for the intelligent virtual machine provisioning, CUC for the predictive scheduling of workloads, and UCaC for bin packing optimization under chance constraints. In the discussion below, we will use hardware failure mitigation as an example to illustrate how proactive design can be applied in cloud scenarios.

Exemplary scenario: Proactive hardware failure mitigation

A key threat to cloud platforms is hardware failure, which can cause interruptions to the hosted services and significantly impact the customer experience. Traditionally, hardware failures are only resolved reactively after the failure occurs, which typically involves temporal interruptions of hosted virtual machines and the repair or replacement of impacted hardware. Such a solution provides limited help in reducing negative customer experiences.

Narya is a proactive disk-failure mitigation service capable of taking mitigation actions before failures occur. Specifically, Narya leverages ML models to predict potential disk failures, and then make decisions accordingly. To control risks associated with uncertainty, Narya evaluates candidate mitigation actions based on the estimated impacts to customers and chooses actions with minimum impact. A feedback loop also exists for collecting follow-up assessments to improve prediction and decision modules.

Hardware failures in cloud systems are often highly interdependent. Therefore, to reduce the impact of predictions errors, Narya introduces a novel dependency-aware model to encode the dependency relationship between nodes to improve the failure prediction model. Narya also implements an adaptive approach that uses A/B testing and bandit modeling to improve the ability to estimate the impacts of actions. Several safeguarding mechanisms in different stages of Narya are also in place to eliminate the chance of making unsafe mitigation actions. Implementation of Narya in Azure’s production environment has reduced the node hardware interruption rate for virtual machines by more than 26%.

Our recent work, Nenya, is another example for proactive failure mitigation. Under a reinforcement learning framework, Nenya fuses prediction and decision-making modules into an end-to-end proactive decision-making system. It can weigh both mitigation costs and failure rates to better prioritize cost-effective mitigation actions against uncertainty. Moreover, the traditional failure mitigation method usually suffers from data imbalance issues; cases of failure form only a very small portion of all cases, which have mostly healthy situations. Such data imbalance would introduce bias to both the prediction and decision-making process. To address this problem, Nenya adopts a cascading framework to ensure that mitigation decisions are not made with heavy costs. Experiments with Microsoft 365 data sets on database failure have proved that Nenya can reduce both mitigation costs and database failure rates compared with existing methods.

Future work

As management systems become more automated and proactive, it is important to pay special attention to both the safety of cloud systems and the responsibility to cloud customers. The autonomous and proactive decision system will depend heavily on advanced AI/ML models with little manual effort. How to ensure that the decisions made by those approaches are both safe and responsible is an essential question that future work should answer.

The autonomous and proactive cloud relies on the effective data usage and feedback loop across all stages in the management and operation of cloud platforms. On one hand, high-quality data on the status of cloud systems are needed to enable downstream autonomous and proactive decision-making systems. On the other hand, it is important to monitor and analyze the impact of each decision on the entire cloud platform in order to improve the management system. Such feedback loops can exist simultaneously for many related application scenarios. Therefore, to better support an autonomous and proactive cloud, a unified data plane responsible for the processing and feedback loop can take a central role in the whole system design and should be a key area of investment.

As such, the future of cloud relies not only on adopting more autonomous and proactive solutions, but also on improving the manageability of cloud systems and the comprehensive infusion of AIOps technologies over all stacks of cloud systems. In future blog posts, we will discuss how to work toward a more manageable and comprehensive cloud.

Read part 3

The post Building toward more autonomous and proactive cloud technologies with AI appeared first on Microsoft Research.

Categories: Microsoft

AI and the Future of Health

Microsoft Research - Thu, 03/30/2023 - 20:43

The emergence of increasingly capable large-scale AI models, such as the recently released GPT-4, is one of the most significant advances in computing in decades. These innovations are rapidly transforming every aspect of the value we get from technology, as demonstrated through Microsoft’s integration of GPT-4 into Bing, Edge, Microsoft 365, Power Platform, GitHub, and other offerings. More recently, Nuance has announced DAX Express, which uses a unique combination of conversational, ambient, and generative AI to automatically draft clinical notes after patient visits – helping to reduce care providers’ cognitive burdens and increase the joy of practicing medicine (whilst releasing time for care).

We are at an inflection point for the use of AI in healthcare – one of society’s most critical sectors. The significance of this moment is reflected in Peter Lee’s recent article in the New England Journal of Medicine on the potential future clinical applications of GPT-4. At Microsoft Research’s Health Futures organization, the multidisciplinary group dedicated to discovery in this space, we see this as the continuation of a journey, and a major milestone in the long process of innovating to help address the greatest challenges in healthcare.

In this blog, we will share some of our research team’s work to make healthcare more data-driven, predictive, and precise – ultimately, empowering every person on the planet to live a healthier future.

Enabling precision medicine and connected care

We are today at a unique moment in history where medicine, biology, and technology are converging on a large scale. This presents immense possibilities to revolutionize healthcare and the practice of medicine with the aid of trustworthy AI. While we embrace the potential of AI, we understand that the practice of medicine is an intricate balance of “art” and “science.” We recognize and honor the enduring physician-patient relationship, which is fundamental and timeless. Our diverse team comprises researchers, scientists, engineers, biotechnologists, designers, social scientists, strategists, healthcare experts, and medical professionals who collaborate globally and inclusively to reimagine and transform the lives of the patients and public we serve.

As we consider how technologies have shaped the practice of medicine over the centuries, from the individual to the ecosystem level, we are reminded that no technology exists in a vacuum. Our core understanding of biological systems is rapidly evolving, and with it, our understanding of what technologies are relevant and useful. Simultaneously, the use of technology across the health and life science industries, and the way healthcare is delivered, are also rapidly changing – reshaping our traditional healthcare delivery model from one of diagnosis and treatment, to one that prioritizes prevention and precise individualized care.

Spotlight: On-Demand EVENT

Microsoft Research Summit 2022

On-Demand
Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.

Explore sessions

Recent advancements in machine learning and AI have fueled computational technologies that allow us to aggregate complex inputs from multiple data sources, with the potential to derive rich insights that rapidly expand our knowledge base and drive deeper discovery and faster innovation. At the same time, it remains an open question how to best use and regulate these technologies in real-world settings and at scale across healthcare and the life sciences. Nonetheless, we believe that we are on a path to delivering on the goal of precision medicine – a change in clinical practice which will be enabled by precision diagnostics, precision therapeutics, and connected care technologies.

To achieve this goal, we seek to collaborate with health and life sciences organizations with a similar appetite for transformation, complementary expertise, and a commitment to propel the change required. We are also engaged with the broader community in pursuing responsible and ethical use of AI in healthcare. Our diverse team has been successful in bridging the gap between the fields of medicine, biology and chemistry on one hand, and computing on the other. We act as “translators” between these fields, and through a process of ongoing collaboration and feedback, we have discovered new challenges and innovative solutions.

Below are some examples of our collaborative research approach:

Exploring diagnostic tools from new modalities Multimodal foundation models for medicine: an example from radiology

The field of biomedicine involves a great deal of multimodal data, such as radiology images and text-based reports. Interpreting this data at scale is essential for improving care and accelerating research. Radiology reports often compare current and prior images to track changes in findings over time. This is crucial for decision making, but most AI models do not take into account this temporal structure. We are exploring a novel self-supervised framework that pre-trains vision-language models using pairs of reports and sequences of images. This includes handling missing or misaligned images and exploiting temporal information to learn more efficiently. Our approach, called BioViL-T, achieves state-of-the-art results on several downstream tasks, such as report generation, and interpreting disease progression by focusing on relevant image regions across time. BioViL-T is part of ongoing collaboration with our colleagues at Nuance to develop scalable and flexible AI solutions for radiology that can empower care providers and augment existing workflows.

Project InnerEye: Democratizing Medical Imaging AI

Project InnerEye is a research project that is exploring ways in which machine learning has the potential to assist clinicians in planning radiotherapy treatments so that they can spend more time with their patients. Project InnerEye has been working closely with the University of Cambridge and Cambridge University Hospitals NHS Foundation Trust to make progress on this problem through a deep research collaboration. To make our research as accessible as possible, we released the InnerEye Deep Learning Toolkit as open-source software. Cambridge University Hospitals NHS Foundation Trust and University Hospitals Birmingham NHS Trust led an NHS AI in Health and Care Award to evaluate how this technology could potentially save clinicians’ time, reduce the time between the scan and commencing treatment, and scale this to more NHS Trusts. Any clinical use of the InnerEye machine learning models remains subject to regulatory approval.

Immunomics: Decoding the Immune System to Diagnose Disease

The human immune system is an astonishing diagnostic engine, continuously adapting itself to detect any signal of disease in the body. Essentially, the state of the immune system tells a story about virtually everything affecting a person’s health. What if we could “read” this story? Our scientific understanding of human health would be fundamentally advanced. More importantly, this would provide a platform for a new generation of precise medical diagnostics and treatment options. We are partnering with Adaptive Biotechnologies to develop the machine learning and biotechnology tools that will allow us to realize this dream.

Fundamental advances towards new medicines and therapeutics Protein Engineering

Several research groups are delving into the potential of machine learning to enhance our comprehension of proteins and their pivotal role in various biological processes. We are also using AI to design new proteins for therapeutics and industry. By applying machine learning to extract patterns from databases of sequences, structures, and properties, Microsoft hopes to train models that can make protein engineering by directed evolution more efficient, and directly generate proteins that will perform desired functions. The ability to generate computationally distinct yet viable protein structures holds tremendous promise for uncovering novel biological insights and developing targeted therapies for previously untreatable illnesses.

Investigating the Cancer Microenvironment through Ex Vivo Research

Microsoft is working on ways to identify specific characteristics of cancer cells and their surrounding microenvironments that might be targeted for treatment. By studying how cancer cells and their surroundings interact with each other, the team aims to create a more precise approach to cancer treatment that takes into account both genetic and non-genetic factors.

Accelerating biomedical research

Microsoft and the Broad Institute – combining their expertise in genomics, disease research, cloud computing and data analytics – are developing an open-source platform to accelerate biomedical research using scalable analytical tools. The platform is built on top of the Broad Institute’s Terra platform, providing a user-friendly interface for accessing and analyzing genomic data. Leveraging Microsoft’s Azure cloud computing services, the platform will enable secure storage and analysis of large datasets. Additionally, the platform will incorporate machine learning and other advanced analytical tools to help researchers gain insights into complex diseases and develop new treatments.

Advancing clinical interpretation and exploration through multimodal language models

In the quest for precision medicine and accelerating biomedical discovery, Microsoft is committed to advancing the state of the art in biomedical natural language processing (NLP). A crucial factor in future-facing, data-driven health systems is the accessibility and interpretability of multimodal health information. To meet this need, Microsoft has laid a solid foundation across multiple modalities in biomedical NLP building on our deep research assets in deep learning and biomedical machine reading.

One significant achievement is our development and application of large language models (LLMs) in biomedicine. Microsoft was among the first to create and assess the applicability of LLMs, such as PubMedBERT and BioGPT, which are highly effective in structuring biomedical data. However, to address the inherent limitations of LLMs, Microsoft is developing methods to teach them to fact-check themselves and provide fine-grained provenance. Additionally, Microsoft is exploring ways to facilitate efficient verification with humans in the loop.

Besides text, other modalities such as radiology images, digital pathology slides, and genomics contain valuable health information. Microsoft is developing multimodal learning and fusion methods that incorporate these modalities. These methods include predicting disease progression and drug response, with the ultimate goal of delivering safe and high-quality healthcare.

Observational data in biomedicine is often plagued by confounders, making it challenging to draw causal relationships. To overcome this obstacle, Microsoft is developing advanced causal methods that correct implicit biases and scale biomedical discovery. These methods will allow Microsoft to leverage real-world evidence and contribute to the creation of more effective healthcare delivery systems. For our end-to-end biomedical applications, we have made exciting progress in deep collaborations with Microsoft partners such as The Jackson Laboratory and Providence St. Joseph Health.

Empowering everyone to live a healthier future

Microsoft has pursued interdisciplinary research that enables people to reach the full potential of their health for many years, but we’ve never been more excited about the possibilities than we are today. The latest developments in AI have inspired us to accelerate our efforts across these and many other projects, and we look forward to even more innovation and collaboration in this new era.

The post AI and the Future of Health appeared first on Microsoft Research.

Categories: Microsoft

AI and the Future of Health

Microsoft Research - Thu, 03/30/2023 - 20:43

The emergence of increasingly capable large-scale AI models, such as the recently released GPT-4, is one of the most significant advances in computing in decades. These innovations are rapidly transforming every aspect of the value we get from technology, as demonstrated through Microsoft’s integration of GPT-4 into Bing, Edge, Microsoft 365, Power Platform, GitHub, and other offerings. More recently, Nuance has announced DAX Express, which uses a unique combination of conversational, ambient, and generative AI to automatically draft clinical notes after patient visits – helping to reduce care providers’ cognitive burdens and increase the joy of practicing medicine (whilst releasing time for care).

We are at an inflection point for the use of AI in healthcare – one of society’s most critical sectors. The significance of this moment is reflected in Peter Lee’s recent article in the New England Journal of Medicine on the potential future clinical applications of GPT-4. At Microsoft Research’s Health Futures organization, the multidisciplinary group dedicated to discovery in this space, we see this as the continuation of a journey, and a major milestone in the long process of innovating to help address the greatest challenges in healthcare.

In this blog, we will share some of our research team’s work to make healthcare more data-driven, predictive, and precise – ultimately, empowering every person on the planet to live a healthier future.

Enabling precision medicine and connected care

We are today at a unique moment in history where medicine, biology, and technology are converging on a large scale. This presents immense possibilities to revolutionize healthcare and the practice of medicine with the aid of trustworthy AI. While we embrace the potential of AI, we understand that the practice of medicine is an intricate balance of “art” and “science.” We recognize and honor the enduring physician-patient relationship, which is fundamental and timeless. Our diverse team comprises researchers, scientists, engineers, biotechnologists, designers, social scientists, strategists, healthcare experts, and medical professionals who collaborate globally and inclusively to reimagine and transform the lives of the patients and public we serve.

As we consider how technologies have shaped the practice of medicine over the centuries, from the individual to the ecosystem level, we are reminded that no technology exists in a vacuum. Our core understanding of biological systems is rapidly evolving, and with it, our understanding of what technologies are relevant and useful. Simultaneously, the use of technology across the health and life science industries, and the way healthcare is delivered, are also rapidly changing – reshaping our traditional healthcare delivery model from one of diagnosis and treatment, to one that prioritizes prevention and precise individualized care.

Spotlight: Microsoft Research Podcast

AI Frontiers: AI for health and the future of research with Peter Lee

Peter Lee, head of Microsoft Research, and Ashley Llorens, AI scientist and engineer, discuss the future of AI research and the potential for GPT-4 as a medical copilot.

Listen now

Recent advancements in machine learning and AI have fueled computational technologies that allow us to aggregate complex inputs from multiple data sources, with the potential to derive rich insights that rapidly expand our knowledge base and drive deeper discovery and faster innovation. At the same time, it remains an open question how to best use and regulate these technologies in real-world settings and at scale across healthcare and the life sciences. Nonetheless, we believe that we are on a path to delivering on the goal of precision medicine – a change in clinical practice which will be enabled by precision diagnostics, precision therapeutics, and connected care technologies.

To achieve this goal, we seek to collaborate with health and life sciences organizations with a similar appetite for transformation, complementary expertise, and a commitment to propel the change required. We are also engaged with the broader community in pursuing responsible and ethical use of AI in healthcare. Our diverse team has been successful in bridging the gap between the fields of medicine, biology and chemistry on one hand, and computing on the other. We act as “translators” between these fields, and through a process of ongoing collaboration and feedback, we have discovered new challenges and innovative solutions.

Below are some examples of our collaborative research approach:

Exploring diagnostic tools from new modalities Multimodal foundation models for medicine: an example from radiology

The field of biomedicine involves a great deal of multimodal data, such as radiology images and text-based reports. Interpreting this data at scale is essential for improving care and accelerating research. Radiology reports often compare current and prior images to track changes in findings over time. This is crucial for decision making, but most AI models do not take into account this temporal structure. We are exploring a novel self-supervised framework that pre-trains vision-language models using pairs of reports and sequences of images. This includes handling missing or misaligned images and exploiting temporal information to learn more efficiently. Our approach, called BioViL-T, achieves state-of-the-art results on several downstream tasks, such as report generation, and interpreting disease progression by focusing on relevant image regions across time. BioViL-T is part of ongoing collaboration with our colleagues at Nuance to develop scalable and flexible AI solutions for radiology that can empower care providers and augment existing workflows.

Project InnerEye: Democratizing Medical Imaging AI

Project InnerEye is a research project that is exploring ways in which machine learning has the potential to assist clinicians in planning radiotherapy treatments so that they can spend more time with their patients. Project InnerEye has been working closely with the University of Cambridge and Cambridge University Hospitals NHS Foundation Trust to make progress on this problem through a deep research collaboration. To make our research as accessible as possible, we released the InnerEye Deep Learning Toolkit as open-source software. Cambridge University Hospitals NHS Foundation Trust and University Hospitals Birmingham NHS Trust led an NHS AI in Health and Care Award to evaluate how this technology could potentially save clinicians’ time, reduce the time between the scan and commencing treatment, and scale this to more NHS Trusts. Any clinical use of the InnerEye machine learning models remains subject to regulatory approval.

Immunomics: Decoding the Immune System to Diagnose Disease

The human immune system is an astonishing diagnostic engine, continuously adapting itself to detect any signal of disease in the body. Essentially, the state of the immune system tells a story about virtually everything affecting a person’s health. What if we could “read” this story? Our scientific understanding of human health would be fundamentally advanced. More importantly, this would provide a platform for a new generation of precise medical diagnostics and treatment options. We are partnering with Adaptive Biotechnologies to develop the machine learning and biotechnology tools that will allow us to realize this dream.

Fundamental advances towards new medicines and therapeutics Protein Engineering

Several research groups are delving into the potential of machine learning to enhance our comprehension of proteins and their pivotal role in various biological processes. We are also using AI to design new proteins for therapeutics and industry. By applying machine learning to extract patterns from databases of sequences, structures, and properties, Microsoft hopes to train models that can make protein engineering by directed evolution more efficient, and directly generate proteins that will perform desired functions. The ability to generate computationally distinct yet viable protein structures holds tremendous promise for uncovering novel biological insights and developing targeted therapies for previously untreatable illnesses.

Investigating the Cancer Microenvironment through Ex Vivo Research

Microsoft is working on ways to identify specific characteristics of cancer cells and their surrounding microenvironments that might be targeted for treatment. By studying how cancer cells and their surroundings interact with each other, the team aims to create a more precise approach to cancer treatment that takes into account both genetic and non-genetic factors.

Accelerating biomedical research

Microsoft and the Broad Institute – combining their expertise in genomics, disease research, cloud computing and data analytics – are developing an open-source platform to accelerate biomedical research using scalable analytical tools. The platform is built on top of the Broad Institute’s Terra platform, providing a user-friendly interface for accessing and analyzing genomic data. Leveraging Microsoft’s Azure cloud computing services, the platform will enable secure storage and analysis of large datasets. Additionally, the platform will incorporate machine learning and other advanced analytical tools to help researchers gain insights into complex diseases and develop new treatments.

Advancing clinical interpretation and exploration through multimodal language models

In the quest for precision medicine and accelerating biomedical discovery, Microsoft is committed to advancing the state of the art in biomedical natural language processing (NLP). A crucial factor in future-facing, data-driven health systems is the accessibility and interpretability of multimodal health information. To meet this need, Microsoft has laid a solid foundation across multiple modalities in biomedical NLP building on our deep research assets in deep learning and biomedical machine reading.

One significant achievement is our development and application of large language models (LLMs) in biomedicine. Microsoft was among the first to create and assess the applicability of LLMs, such as PubMedBERT and BioGPT, which are highly effective in structuring biomedical data. However, to address the inherent limitations of LLMs, Microsoft is developing methods to teach them to fact-check themselves and provide fine-grained provenance. Additionally, Microsoft is exploring ways to facilitate efficient verification with humans in the loop.

Besides text, other modalities such as radiology images, digital pathology slides, and genomics contain valuable health information. Microsoft is developing multimodal learning and fusion methods that incorporate these modalities. These methods include predicting disease progression and drug response, with the ultimate goal of delivering safe and high-quality healthcare.

Observational data in biomedicine is often plagued by confounders, making it challenging to draw causal relationships. To overcome this obstacle, Microsoft is developing advanced causal methods that correct implicit biases and scale biomedical discovery. These methods will allow Microsoft to leverage real-world evidence and contribute to the creation of more effective healthcare delivery systems. For our end-to-end biomedical applications, we have made exciting progress in deep collaborations with Microsoft partners such as The Jackson Laboratory and Providence St. Joseph Health.

Empowering everyone to live a healthier future

Microsoft has pursued interdisciplinary research that enables people to reach the full potential of their health for many years, but we’ve never been more excited about the possibilities than we are today. The latest developments in AI have inspired us to accelerate our efforts across these and many other projects, and we look forward to even more innovation and collaboration in this new era.

The post AI and the Future of Health appeared first on Microsoft Research.

Categories: Microsoft

Research Focus: Week of March 27, 2023

Microsoft Research - Wed, 03/29/2023 - 18:00

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

In this article
  1. Bing's gendered translations tackle bias in machine translation
  2. Microsoft researcher honored by Women in AI Netherlands
  3. Recognizing women in technology
  4. NEWS Bing’s gendered translations tackle bias in machine translation

    Machine translation (MT) models are designed to learn from large amounts of data collected from real-world sources. However, this training data may contain implicit biases which may be amplified by the model. One such example is the expression of gender, which can vary widely across different languages. In English, the word “lawyer” can refer to either a male or female individual, whereas in Spanish, “abogada” and “abogado” are used to refer to a female and male lawyer, respectively. As a result, MT models often assign arbitrary genders to animate entities in the translated output, even when the source text does not imply a specific gender.

    The Microsoft Translator team has released a feature on Bing Translator which will provide feminine and masculine translations for sentences that have gender neutral words such as “doctor” or “teacher” when translating from English to Spanish, French and Italian. Additionally, to support ongoing research and track progress towards reducing gender bias in MT, the team has published a technical paper outlining their evaluation methodology and test sets. These test sets comprise a linguistically diverse corpus of gender-ambiguous source sentences, along with multiple alternative target language translations.

    Read the blog Download the data Read the paper AWARD Microsoft researcher honored by Women in AI Netherlands

    Rianne van den Berg, a Principal Researcher at Microsoft Research in Amsterdam, has won the AI Researcher award from Women in AI Netherlands.

    Rianne was recognized for her work in deep learning and physics. The award announcement noted her published work in journals such as Nature Physics and Physical Review Letters as well as at prominent AI conferences, such as NeurIPS, ICML and ICLR. The organization also cited Rianne’s dedication to diversity and inclusion.

    In her role on the AI4Science team at Microsoft Research, Rianne’s research focuses on the intersection between computational chemistry and deep learning, with an emphasis on modeling chemical reactions. Her prior research has spanned topics ranging from generative modeling and variational inference to source compression, graph-structured learning, and condensed-matter physics. She received her PhD in theoretical condensed-matter physics in 2016 at the University of Amsterdam, where she also worked as a postdoctoral researcher as part of the Amsterdam Machine Learning Lab (AMLAB).

    Rianne on LinkedIn Rianne at Microsoft Research

    Spotlight: Microsoft Research Podcast

    AI Frontiers: AI for health and the future of research with Peter Lee

    Peter Lee, head of Microsoft Research, and Ashley Llorens, AI scientist and engineer, discuss the future of AI research and the potential for GPT-4 as a medical copilot.

    Listen now INTERVIEW Recognizing women in technology

    Why are women underrepresented in STEM and AI and how can we close that gap? How is technology shaping society, from gender issues to creativity and collaboration?

    Microsoft Research Principal Researcher Cheng Zhang sat down to discuss these issues and more with the UK Chinese Women Connect Association, which recently recognized her as the Highly Commended awardee in the Chinese Women of the Year: Technology category.

    In the interview, Cheng talks about her career in technology research and why she came to Microsoft Research Cambridge, where she works with the Machine Intelligence group. The conversation covers the impact of AI, strategies for making an impact—especially at a very large company—and the value of learning from others. Catch a video replay of this fascinating interview.

    Video: Part 1 Video: Part 2

    The post Research Focus: Week of March 27, 2023 appeared first on Microsoft Research.

Categories: Microsoft

Research Focus: Week of March 27, 2023

Microsoft Research - Wed, 03/29/2023 - 18:00

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

In this article
  1. Bing's gendered translations tackle bias in machine translation
  2. Microsoft researcher honored by Women in AI Netherlands
  3. Recognizing women in technology
  4. NEWS Bing’s gendered translations tackle bias in machine translation

    Machine translation (MT) models are designed to learn from large amounts of data collected from real-world sources. However, this training data may contain implicit biases which may be amplified by the model. One such example is the expression of gender, which can vary widely across different languages. In English, the word “lawyer” can refer to either a male or female individual, whereas in Spanish, “abogada” and “abogado” are used to refer to a female and male lawyer, respectively. As a result, MT models often assign arbitrary genders to animate entities in the translated output, even when the source text does not imply a specific gender.

    The Microsoft Translator team has released a feature on Bing Translator which will provide feminine and masculine translations for sentences that have gender neutral words such as “doctor” or “teacher” when translating from English to Spanish, French and Italian. Additionally, to support ongoing research and track progress towards reducing gender bias in MT, the team has published a technical paper outlining their evaluation methodology and test sets. These test sets comprise a linguistically diverse corpus of gender-ambiguous source sentences, along with multiple alternative target language translations.

    Read the blog Download the data Read the paper AWARD Microsoft researcher honored by Women in AI Netherlands

    Rianne van den Berg, a Principal Researcher at Microsoft Research in Amsterdam, has won the AI Researcher award from Women in AI Netherlands.

    Rianne was recognized for her work in deep learning and physics. The award announcement noted her published work in journals such as Nature Physics and Physical Review Letters as well as at prominent AI conferences, such as NeurIPS, ICML and ICLR. The organization also cited Rianne’s dedication to diversity and inclusion.

    In her role on the AI4Science team at Microsoft Research, Rianne’s research focuses on the intersection between computational chemistry and deep learning, with an emphasis on modeling chemical reactions. Her prior research has spanned topics ranging from generative modeling and variational inference to source compression, graph-structured learning, and condensed-matter physics. She received her PhD in theoretical condensed-matter physics in 2016 at the University of Amsterdam, where she also worked as a postdoctoral researcher as part of the Amsterdam Machine Learning Lab (AMLAB).

    Rianne on LinkedIn Rianne at Microsoft Research

    Spotlight: On-demand video

    AI Explainer: Foundation models ​and the next era of AI

    Explore how the transformer architecture, larger models and more data, and in-context learning have helped advance AI from perception to creation.

    Watch video INTERVIEW Recognizing women in technology

    Why are women underrepresented in STEM and AI and how can we close that gap? How is technology shaping society, from gender issues to creativity and collaboration?

    Microsoft Research Principal Researcher Cheng Zhang sat down to discuss these issues and more with the UK Chinese Women Connect Association, which recently recognized her as the Highly Commended awardee in the Chinese Women of the Year: Technology category.

    In the interview, Cheng talks about her career in technology research and why she came to Microsoft Research Cambridge, where she works with the Machine Intelligence group. The conversation covers the impact of AI, strategies for making an impact—especially at a very large company—and the value of learning from others. Catch a video replay of this fascinating interview.

    Video: Part 1 Video: Part 2

    The post Research Focus: Week of March 27, 2023 appeared first on Microsoft Research.

Categories: Microsoft

AI Explainer: Foundation models ​and the next era of AI

Microsoft Research - Thu, 03/23/2023 - 19:00

The release of OpenAI’s GPT-4 is a significant advance that builds on several years of rapid innovation in foundation models. GPT-4, which was trained on the Microsoft Azure AI supercomputer, has exhibited significantly improved abilities across many dimensionsfrom summarizing lengthy documents, to answering complex questions about a wide range of topics and explaining the reasoning behind those answers, to telling jokes and writing code and poetry.

Microsoft Senior Principal Research Manager Ahmed H. Awadallah was among a group of researchers across the company who have worked in partnership with OpenAI over several months to evaluate this new model’s capabilities. In this video, recapped below, he tells the story of the technical innovations in recent years that have brought us to this moment: the surprising progress of GPT-4’s predecessor models, leading up to the capabilities demonstrated in ChatGPT, and the integration of the latest models into Bing.

In this article
  1. Introduction to foundation models [00:00-11:01]
  2. From GPT-3 to ChatGPT – a jump in generative capabilities [11:02-19:07]
  3. Everyday impact: Integrating foundation models and products [19:09-27:20]
  4. Transcript
  5. While watching this video, you can hover to see video chapter titles and jump directly to those you’re interested in. Read full video transcript

    Introduction to foundation models [00:00-11:01]

    Over the last decade, AI has made significant progress on perception tasks like image recognition and language processing. More recently, the field is witnessing new advances in the form of generative AI, underpinned by a class of large-scale models known as foundation models. Foundation models are trained on massive amounts of data and are capable of performing a wide range of tasks. With a simple natural language prompt like “describe a scene of the sun rising over the beach,” generative AI models can output a detailed description or produce an image based on the generated description, which can then be animated or even turned into video. Many recent language models are not only good at generating text but also generating, explaining, and debugging code.

    Listen in at 1:37

    Three components have been driving these advances:

    • The transformer architecture: A popular choice across modalities, the transformer architecture is efficient, easy to scale and parallelize, and can model interdependence between different components in input and output data.
    • Scale: Growing model size and the use of increasingly large amounts of data have resulted in what is being termed as “emerging capabilities.” When models reach a critical size, they begin displaying capabilities not previously present.
    • In-context learning: Showing potential on a range of applications, from text classification to translation and summarization, this new training paradigm provides pre-trained models with instructions for new tasks or just a few examples instead of training or fine-tuning models on labeled data. Because no additional data or training is needed and prompts are provided in natural language, models can be applied right out of the box and aren’t limited to those with developer experience.
    From GPT-3 to ChatGPT – a jump in generative capabilities [11:02-19:07]

    With the November 2022 release of ChatGPT, a language model optimized for dialogue, we saw exciting developments in text generation. Compared with GPT-3, an earlier language model in the GPT family, ChatGPT not only provides longer, more thorough, and more structured responses to questions and instructions but can also produce answers in different styles, or tones, and tailor explanations to different audiences, like a child, a first-year college student, or someone with a PhD.

    Earlier language models such as GPT-3 were trained to predict the next word in a sentence using large amounts of text from the web with no direct human supervision. Several additional training approaches have helped fuel the improved performance of later models such as ChatGPT. These models are being trained on code in addition to text, which seems to be providing another opportunity to identify the relationship between different parts of speech. This is resulting in models that are better at following instructions and reasoning than models trained on text alone. Human-generated data is also contributing to better outputs. Instruction tuning adds the step of training models on prompts and responses created by a human, while model-generated responses ranked by a human are being employed to train a reward model that can be used to train the main model with reinforcement learning.

    The fast-paced advancements demonstrated by these models have challenged one of the traditional methods used to measure progress: benchmarks. Improvements are happening so fast that benchmarks are becoming obsolete, with many solved or saturated as quickly as they come out.

    Everyday impact: Integrating foundation models and products [19:09-27:20]

    Foundation models are already appearing in products available today. For example, GitHub Copilot leverages OpenAI Codex to assist in writing code. The AI pair programmer has been shown to not only make developers feel more productive but to support them in actually getting more done. A GitHub study found participants using Copilot were 55 percent more productive than participants without access to Copilot.

    Combining language models optimized for dialogue with external knowledge sources and tools is another avenue for improved experiences. The new Bing, for instance, brings together these models and search. Years of research have yielded insight into the web search experience; much of it involves reviewing and synthesizing information across a variety of resources identified via multiple queries, which is time-consuming. The new Bing can do the heavy lifting for the searcher, working behind the scenes to make the necessary queries, collect results, synthesize the information, and present a single complete answer.

    Large language models and foundation models more broadly are not without their limitations, however. There are issues such as reliability, accuracy, staleness, and provenance that need to be explored. Additionally, each specific application of one of these models comes with its own challenges and opportunities. For example, in applying foundation models to web search, we need to rethink the overall user experience, including how people interact with search and how we improve, measure, and personalize the experience over time.

    Listen in at 27:48 Transcript

    Introduction to foundation models [00:00–11:01]

    Hello, everyone. My name is Ahmed Awadallah. I am a researcher here at Microsoft Research. Today, I am going to be talking about foundation models and the impact they are having on the current era of AI.

    If we look back at the last five to 10 years, AI has been making significant impact on many perception tasks like image and object recognition, speech recognition, and most recently on language understanding tasks, where we have been seeing different AI models achieving superior performance and in many cases reaching performance equal to what a human annotator would do on the same task. Over the last couple of years, though, the frontier of AI has changed toward generative AI. 

    We have had quite good text generation models for some time. You could actually prompt a model with asking it to describe an imaginary scene, and it will produce a very good description of what you have asked it to do. And then we started making a lot of progress on image generation, as well. With models like DALL-E 2 and Imagen and even models coming out from such startups like Midjourney and Stability AI, we have been getting to a level of quality of image generation that we have never seen before. Inspired by that, there has been also a lot of work on animating the generated images or even generating videos from scratch. Another frontier for generative models has been code, and not only generating code based on text prompt but also explaining the code or in some cases even debugging the code. I was listening to this episode of the Morning Edition on NPR when it aired at the beginning of February where they were attempting to use a bunch of AI models for producing a schematic design of a rocket and also for coming up with some equations for the rocket design. And, of course, the hypothetical design would have crashed and burned, but I couldn’t help but think how exciting it is that AI has become so good that we are even attempting to measure its proficiency on a field as complex as rocket science.

    [2:11] If we look back, we will find that there are three main components that led to the current performance we are seeing from AI models: the transformer architecture, the scale, and in-context learning. Transformer in particular has been dominating the field of AI for the previous years. At the beginning, we started with natural language processing, and the architecture was very efficient that it took over the field of natural language processing within a very short amount of time. The transformer is a very efficient architecture that’s easy to scale, easy to parallelize, and relies on its heart at the attention mechanism, a technique that allows us to model interdependence between different components or different tokens in our input and output data. Transformers started off mostly in natural language processing, but slowly but surely, they made their way to pretty much any modality. So now we are seeing that models that are operating on images, on videos, on audio, and many other modalities are also using transformers. Five years later since their inception and transformers have surprisingly changed little compared to when they started despite so many attempts at producing better and more efficient variants of transformers, perhaps because of the gains were limited to certain use cases or perhaps because the gains did not persist at scale. Another potential reason is that maybe they made the architecture less universal, which has been one of its more—of its biggest advantages.

    [03:53] The next point is scale, and when we talk about scale, we really mean the amount of compute that’s being used to train the model, and that can be translated into either training bigger and bigger models with larger and larger number of parameters—and we have been seeing a steady increase of that over the previous years—but scale could also mean more data, using more data to train the model on larger and larger amounts of data. And we have seen different models over the previous few years taking different approaches in deciding how much data and how large the model is. But the consistent trend is that we have been scaling larger and larger and using more and more compute. Scale has also led to what is being called as “emerging capabilities.” And that’s one of the most interesting properties of scale that have been described over the previous year or so. By emerging capability, we mean that the model starts to show a certain ability that appears only when it reaches a critical size. Before that, the model is not demonstrating any of this ability at all. For example, let’s look at the figures here, and on the left-hand side, we see arithmetic. If we try to use language models to solve arithmetic word problems, up until a certain scale, they absolutely cannot solve the problem in any way, and they do not perform any better than random. But then at a certain critical point, we start seeing improved performance, and that performance just keeps getting better and better. And we have seen that at so many other tasks, as well, ranging from arithmetic to transliteration to multitask learning.

    [05:38] And perhaps one of the most exciting emerging capabilities of language models recently is their ability to in-context learn, which has been introducing a new paradigm for using these models. If we take a look back at how we have been practicing machine learning in general, with deep learning, you would start by choosing an architecture, a transformer or before that an RNN or CNN, and then you fully supervise train your model. You have a lot of labeled data, and you train your model based on that data. When we started getting into pre-trained models, we instead of training models from scratch, we actually start off with a pre-trained model and then fine-tune it still on a lot of fully supervised labeled data for the task at hand. But then with in-context learning, suddenly we can actually use the models out of the box. We can just use a pre-trained model and use a prompt in order to learn—in order to perform a new task without actually doing any learning. We can do that in zero-shot settings, meaning we do not provide any examples at all, just instructions or a description of what the task is, or in a few-shot setting, where we just provide a small handful number of examples to the model. For example, if we are interested in trying to do text classification, we can just—in this case sentiment analysis—we can just provide the text to the model and ask it to classify the text into either positive or negative. If the task is a little bit harder, we can provide few-shot samples, just a few examples of how do we want the model to classify things into, say, positive, negative, or neutral, and then ask the model to reason about a new piece of text, and it actually does pretty good at it. And it’s not only simple tasks like text classification. We can do translation or summarization and much more complex tasks with that paradigm. We can even try to do things like arithmetic where we try to give the model a word problem and ask it to come up with the answer. On the example we are showing right now, we did give the model just one sample to show it how we would solve a problem and then ask it to solve another problem. But in that particular case, the model actually failed. It did produce an answer, but it was not the correct answer. But then came the idea of chain-of-thought prompts, where instead of just showing the model the input and the output, we can actually also show it the steps it can take in order to get to that output from that particular input. In that case, we are just solving the arithmetic word problem step by step and showing an example of that to the model. When we do that, the models are not only able to produce the correct answer, but they are also able to walk us step by step through how they produced that answer. That mechanism is referred to as a chain-of-thought prompting, and it has been very prominently used in so many tasks and showing very superior performance on multiple tasks. It has been also used in many different ways, including in fine-tuning and training some of the models. The “pre-train and then fine-tune” paradigm have been established paradigm for years, since maybe the inception of BERT and similar pre-trained language models. But now you would see that there’s increased shift into using the models by prompting them instead of having to fine-tune them. That’s evident in a lot of practical usage of the models but even in the publications in the machine learning areas that have been using natural language processing tasks and switching into using prompting instead of using fine-tuning. In-context learning and prompting matters a lot because it’s actually changing the way we apply the models to new tasks. The ability of applying the models to new tasks out of the box without collecting additional data, without doing any additional training, is an amazing ability that increases the amount of tasks that can be applied—the models can be applied to and also reduces the amount of effort needed into building models with these tasks.

    [09:57] The performance has been also amazing by just providing only a few examples, and the tasks in this setting are being adapted to the model rather than the models being adapted to the tasks. If you think about the fine-tuning paradigm, what we did is that we already had the pre-trained model and we were fine-tuning it to adapt to the task. Now we are trying to frame the task in a way that’s more friendly to how the model is being trained so that the model can perform well on the task even without any fine-tuning. Finally, this allows the humans to interact with the models in their normal form of communication, in natural language. We can just give instructions describing the task that we want, and the model would perform the task. And that blurs the line between who is an ML user and who is an ML developer because now anyone can just prompt and describe different tasks to the language model and get the language model to do a large number of tasks without having to have any training or any development involved.

    From GPT-3 to ChatGPT—a jump in generative capabilities [11:02–19:07]

    [11:02] Now looking back at the last three months or so, we have been seeing the field changing quite a bit and a tremendous amount of excitement happening around the release of the ChatGPT model. And if we think about the ChatGPT model as a generative model, we would see that there has been other generative models out there from the GPT family and other models, as well, that have been doing a decent job at text generation. So you can take one of these models, in this case GPT-3, and prompt it to the question asking it to explain what the foundational language model means and it would give you a pretty decent answer. You can ask the same question to ChatGPT and you’ll find that it’s able to provide a much better answer. It’s longer; it’s more thorough; it’s more structured. You can ask it to style it in different ways. You can ask it to simplify it in different ways. And all of these are capabilities that the previous generation of the models could not really do. If we look at how ChatGPT is described, the description lists different things, but it’s mostly optimized for dialogue, allowing the humans to interact in natural language. It’s much better at following instructions and so on and so forth. If we look at step by step about how this actually was manifested in the training, we will see from the description that looking at base models that ChatGPT was built on and other models before ChatGPT, that language model training was following a self-supervised pre-training approach, where we have a lot of unsupervised language, web-scale language, that we are training the models on, and the models in this particular case are trained with an autoregressive next word prediction approach. So we are looking at an input context, which is a sentence or a part of a sentence, and trying to predict the next word. But then over the last year or so, we have been seeing a shift where models are being trained not just on text but also on code. For example, GPT-3.5 models are trained on both text and code, and surprisingly, training the models on both text and codes improves their performance on many tasks that has nothing to do with code. On the figure we see right now, we see different models being compared on—models that were trained with code and models that were not trained with code—and we are seeing that the models that were trained with both text and code show better performance at following task instructions, show better performance at reasoning, compared to similar models that were trained on text only. So the training on code seems to be grounding the models in different ways, allowing them to learn a little bit more about how to reason, about how to look at structured relation between different parts of the text.

    [13:59] The second main difference is the idea of instruction tuning, which has been—what you have been seeing becoming more and more popular over different models over the last year, maybe starting with InstructGPT that introduced the idea of training the models on human-generated data. And this is a departure from the traditional self-supervised approach, where we have been only training the model on unsupervised, free, unstructured text. Now there’s an additional step in the training process that actually trains the models on human-generated data. The human-generated data takes the format of prompt and the response, and it’s trying to teach the model to respond in a particular way given a prompt, and this step of instruction tuning has been actually helping the models get a lot better, especially in zero-shot performance. And we see here that the instruction-tuned models tend to perform a lot better than their non-instruction–tuned counterpart, especially in zero-shot settings. And the last step of the training process introduces yet another human-generated data. In this case, we actually have different responses generated by the model and we have a human providing preferences to all these responses so in a sense ranking responses and choosing which response is better than other responses. This data is used to train a reward model that can then be used to actually train the main model with reinforcement learning. And this approach further aligns the model into responding in certain ways that correspond to the way the human has been providing the feedback data. This notion of training the model with human feedback data is very interesting, and it’s creating a lot of traction with many people thinking about the best technique to train on human feedback data, the best form of human feedback to collect, to train the model on, and it would probably help us improve the models even further in the near future.

    [16:02] Now with all these advances we have been seeing, the pace of innovation and the acceleration of the advances have been moving so fast that it has been very challenging in so many ways, but perhaps one of the most profound ways it has been challenging with is the notion of benchmarking, that traditionally research in machine learning has been very dependent on using very solid benchmarks on measuring the progress of different approaches. But the pace of innovation has been really challenging that recently. To understand how fast the progress has been, let’s look at this data coming from Hypermind, a forecasting company that uses crowd forecasting and has been doing that—tracking some of the AI benchmarks recently. The first benchmark is Massive Multitask Language Understanding benchmark, a large collection of language understanding tasks. In June of 2021, a forecast was made that in a year, by June 2022, we will get to around 57 performance on this task. But in reality, what happens is that by June 2022, we were at around 67 percent, and a couple of months later, we were at 75 percent, and we keep seeing more and more fast improvements after that. A second task is the MATH task, which is a collection of middle and high school math problems, and here the prediction was that in a year, we will get to around 13 percent. But in reality, we ended up going much more beyond that within one year, and we still see more and more advances happening at a faster-than-ever-expected pace. That rate of improvement is actually resulting in a lot of the benchmarks being saturated really fast.

    [17:51] If we look back at benchmarks like MNIST and Switchboard, it took the community 20-plus years in order to fully saturate these benchmarks. And that has been accelerating, accelerating to the point where now we see benchmarks being saturated in a year or less. In fact, many of the benchmarks are becoming obsolete to the point that only 66 percent of machine learning benchmarks have received more than three results at different time points, and many of them are solved or saturated soon after they are being released. And that actually motivated the community to come together with very large efforts to try to design benchmarks that are designed specifically to challenge large language models. In that particular case, with BIG-bench, more than 400 authors from over 100 institutions came together to create it. But even with such an elaborate effort, we are seeing very fast progress, and with large language models and chain-of-thought prompting that we discussed earlier, we are seeing that we are making very fast progress against the hardest tasks in BIG-bench, and in many of them, models are already performing better than humans right now.

    Everyday impact: Integrating foundation models and products [19:09–27:20]

    [19:09] The foundation models are not only getting better and better at benchmarks, but they are actually changing many products that we use every day. We mentioned code generation earlier, so let’s talk a little bit about Copilot. GitHub Copilot is a new experience that helps developers write code, and Copilot is very interesting in many perspectives. One is how fast it went from the model being created in research to how—to the point it made it as a product generally available in GitHub Copilot but also in how much user value it has been generating. This study that was done by the Copilot GitHub team was looking at quantifying the value these models were providing to developers. And in the first part of the study, they asked different questions to the developers, trying to assess how useful the models are, and we see that 88 percent of the participants reported that they feel like they are much more productive when using Copilot than before, and they reported many other positive implications on their productivity, as well. But perhaps even more interesting, the study did a controlled study where there were two groups of developers trying to solve the same set of tasks. A group of them had access to Copilot, and the other group did not, and interestingly, the group that had access to Copilot not only finished the tasks at a higher success rate but also at a much more efficient rate. Overall, they were 55 percent more productive. Fifty-five percent more productivity in a coding scenario is an amazing progress that a lot of people would have been very surprised to think about a model like Copilot performing so fast with such value.

    [21:10] Now beyond code generation and text generation, another frontier where these models are starting to shine is when we start connecting them with external knowledge sources and external tools. Language models that have been optimized for dialogue have amazing language capabilities; they do really good at understanding language, at following instructions. They also do really well at synthesizing and generating answers. They are also conversational in nature and do store knowledge from the training data that they were trained on. But they do have a lot of limitations around reliability, factualness, staleness, access to more recent information that was not part of the training data, provenance, and so on. And that’s why connecting these models to external knowledge sources and tools could be super exciting. Let’s talk about, for example, connecting language models to search as we have seen recently with the new Bing.

    [22:14] If we take a look back years ago, there was many, many studies studying web search, studying tasks that people try to complete in web search scenarios. And many of these tasks were deemed as complex search tasks, tasks that are not navigational, as in trying to go to a particular website, or that are not simple informational tasks where you are trying to look up a fact that you can quickly get with one query but more complex tasks that involve multiple queries. Maybe you are planning a travel, maybe you are trying to buy a product, and as part of your research process, there are multifaceted queries that you would like to look at. There has been a lot of research understanding user behavior with such tasks and how prevalent they are and how much time and effort people spend in order to perform them. And they typically involve spending a significant amount of time with the search engine, reading and synthesizing information from different sources with different queries. But with a new experience like the experience Bing is providing, we can actually take one of these queries and provide much more complex long queries to the search engine. And the search engine uses both search and the power of the language model to generate multiple queries, get the results of all of these queries, and synthesize a detailed answer back to the searcher. Not only that, but it can recommend additional searches and additional ways you could interact with the search engine in order to learn more. That has the potential of saving a lot of time and a lot of effort for many searchers in supporting these complex search tasks in a much better way. Not only that, but there are some of these complex search tasks that are multistep in nature, where I would start with one query and then follow up with another query based on the information I get from the first query. Imagine that I am doing this search before the Super Bowl where I am trying to understand some comparisons, stats, between the two quarterbacks that are going to face each other, and I start with that query. What the search engine did in that particular case is that it actually started with a query where it was trying to identify who are the two quarterbacks that are going to be playing in the Super Bowl. And if I have done that as a human, I would have done that. I would have identified the teams and the two quarterbacks, and then maybe I would follow up with another query where I would actually search for the stats of the two quarterbacks I am asking about, and get that and actually synthesize the information maybe from different results and then get to the answer I am looking for. But with the new Bing experience, I can just issue the query and all of that is happening in the background. Different search queries are being generated, submitted to the search engine, recent results are getting collected, and a single answer is being synthesized and displayed, making me as a searcher much more productive and much more efficient.

    [25:21] The potential of LLM integrated—large language models integrated with search and other tools is very huge and can add much, much value to so many scenarios. But there are also a lot of challenges and a lot of opportunities and a lot of limitations that needs to be addressed. Reliability and safety are one of them; making the models more accurate; thinking about trust, provenance, and bias. User experience and behavior and how the new experience would affect how the users are interacting with the search engine is another one, with new and different tasks or different user interfaces or even different behavior models. Search has been a very well-studied experience, and we have very good understanding of how users interact with the search engine and very reliable behavior models to predict that. Changing this experience will require a lot of additional study there. Personalization and managing user preferences and search history and so on and so forth has also been a very well-studied field in web search, and with new experiences like that, we have so many opportunities and thinking about things like personalization and user experience again but also evaluation and what do metrics mean. How do we measure user satisfaction? How do we understand good and bad abandonment? Good abandonment as in when people get satisfied with the result but they don’t have to click on anything on the search result page, and bad abandonment being the opposite of that. Thinking about feedback loops, which has been playing a large part in improving search engines, and how can we apply them to new experiences and new scenarios. So while integrating language models with an experience like search and other tools and experiences is very exciting, it’s actually also creating so many opportunities for new research problems or for revisiting previous search problems that we had very good understanding for.

    Conclusion [27:21–28:37]

    [27:21] To conclude, we have been seeing incredible advancing with AI over the past couple of years. The progress has been accelerating and outpacing expectations in so many ways, and the advances are not only in terms of academic benchmarks and publications, but we are also seeing an explosion of applications that are changing the products that we use every day. However, we are really much closer to the beginning of a new era with AI than we are to the end state of AI capabilities. There are so many opportunities, and we will probably see a lot more advances and even more accelerated progress over the coming month and years. And there are so many challenges that remain and many new opportunities that are arising because of the state of where these models are. It’s a very exciting time for AI, and we are really looking forward to seeing the advances that will happen moving forward and to the applications that will result from these advances and how they will affect every one of us with the products we use every day. Thank you so much.

    [END]

    Show more Explore more

    The post AI Explainer: Foundation models ​and the next era of AI appeared first on Microsoft Research.

Categories: Microsoft

AI Explainer: Foundation models ​and the next era of AI

Microsoft Research - Thu, 03/23/2023 - 19:00

The release of OpenAI’s GPT-4 is a significant advance that builds on several years of rapid innovation in foundation models. GPT-4, which was trained on the Microsoft Azure AI supercomputer, has exhibited significantly improved abilities across many dimensionsfrom summarizing lengthy documents, to answering complex questions about a wide range of topics and explaining the reasoning behind those answers, to telling jokes and writing code and poetry.

Microsoft Senior Principal Research Manager Ahmed H. Awadallah was among a group of researchers across the company who have worked in partnership with OpenAI over several months to evaluate this new model’s capabilities. In this video, recapped below, he tells the story of the technical innovations in recent years that have brought us to this moment: the surprising progress of GPT-4’s predecessor models, leading up to the capabilities demonstrated in ChatGPT, and the integration of the latest models into Bing.

In this article
  1. Introduction to foundation models [00:00-11:01]
  2. From GPT-3 to ChatGPT – a jump in generative capabilities [11:02-19:07]
  3. Everyday impact: Integrating foundation models and products [19:09-27:20]
  4. Transcript
  5. While watching this video, you can hover to see video chapter titles and jump directly to those you’re interested in. Read full video transcript

    Introduction to foundation models [00:00-11:01]

    Over the last decade, AI has made significant progress on perception tasks like image recognition and language processing. More recently, the field is witnessing new advances in the form of generative AI, underpinned by a class of large-scale models known as foundation models. Foundation models are trained on massive amounts of data and are capable of performing a wide range of tasks. With a simple natural language prompt like “describe a scene of the sun rising over the beach,” generative AI models can output a detailed description or produce an image based on the generated description, which can then be animated or even turned into video. Many recent language models are not only good at generating text but also generating, explaining, and debugging code.

    Listen in at 1:37

    Three components have been driving these advances:

    • The transformer architecture: A popular choice across modalities, the transformer architecture is efficient, easy to scale and parallelize, and can model interdependence between different components in input and output data.
    • Scale: Growing model size and the use of increasingly large amounts of data have resulted in what is being termed as “emerging capabilities.” When models reach a critical size, they begin displaying capabilities not previously present.
    • In-context learning: Showing potential on a range of applications, from text classification to translation and summarization, this new training paradigm provides pre-trained models with instructions for new tasks or just a few examples instead of training or fine-tuning models on labeled data. Because no additional data or training is needed and prompts are provided in natural language, models can be applied right out of the box and aren’t limited to those with developer experience.
    From GPT-3 to ChatGPT – a jump in generative capabilities [11:02-19:07]

    With the November 2022 release of ChatGPT, a language model optimized for dialogue, we saw exciting developments in text generation. Compared with GPT-3, an earlier language model in the GPT family, ChatGPT not only provides longer, more thorough, and more structured responses to questions and instructions but can also produce answers in different styles, or tones, and tailor explanations to different audiences, like a child, a first-year college student, or someone with a PhD.

    Earlier language models such as GPT-3 were trained to predict the next word in a sentence using large amounts of text from the web with no direct human supervision. Several additional training approaches have helped fuel the improved performance of later models such as ChatGPT. These models are being trained on code in addition to text, which seems to be providing another opportunity to identify the relationship between different parts of speech. This is resulting in models that are better at following instructions and reasoning than models trained on text alone. Human-generated data is also contributing to better outputs. Instruction tuning adds the step of training models on prompts and responses created by a human, while model-generated responses ranked by a human are being employed to train a reward model that can be used to train the main model with reinforcement learning.

    The fast-paced advancements demonstrated by these models have challenged one of the traditional methods used to measure progress: benchmarks. Improvements are happening so fast that benchmarks are becoming obsolete, with many solved or saturated as quickly as they come out.

    Everyday impact: Integrating foundation models and products [19:09-27:20]

    Foundation models are already appearing in products available today. For example, GitHub Copilot leverages OpenAI Codex to assist in writing code. The AI pair programmer has been shown to not only make developers feel more productive but to support them in actually getting more done. A GitHub study found participants using Copilot were 55 percent more productive than participants without access to Copilot.

    Combining language models optimized for dialogue with external knowledge sources and tools is another avenue for improved experiences. The new Bing, for instance, brings together these models and search. Years of research have yielded insight into the web search experience; much of it involves reviewing and synthesizing information across a variety of resources identified via multiple queries, which is time-consuming. The new Bing can do the heavy lifting for the searcher, working behind the scenes to make the necessary queries, collect results, synthesize the information, and present a single complete answer.

    Large language models and foundation models more broadly are not without their limitations, however. There are issues such as reliability, accuracy, staleness, and provenance that need to be explored. Additionally, each specific application of one of these models comes with its own challenges and opportunities. For example, in applying foundation models to web search, we need to rethink the overall user experience, including how people interact with search and how we improve, measure, and personalize the experience over time.

    Listen in at 27:48 Transcript

    Introduction to foundation models [00:00–11:01]

    Hello, everyone. My name is Ahmed Awadallah. I am a researcher here at Microsoft Research. Today, I am going to be talking about foundation models and the impact they are having on the current era of AI.

    If we look back at the last five to 10 years, AI has been making significant impact on many perception tasks like image and object recognition, speech recognition, and most recently on language understanding tasks, where we have been seeing different AI models achieving superior performance and in many cases reaching performance equal to what a human annotator would do on the same task. Over the last couple of years, though, the frontier of AI has changed toward generative AI. 

    We have had quite good text generation models for some time. You could actually prompt a model with asking it to describe an imaginary scene, and it will produce a very good description of what you have asked it to do. And then we started making a lot of progress on image generation, as well. With models like DALL-E 2 and Imagen and even models coming out from such startups like Midjourney and Stability AI, we have been getting to a level of quality of image generation that we have never seen before. Inspired by that, there has been also a lot of work on animating the generated images or even generating videos from scratch. Another frontier for generative models has been code, and not only generating code based on text prompt but also explaining the code or in some cases even debugging the code. I was listening to this episode of the Morning Edition on NPR when it aired at the beginning of February where they were attempting to use a bunch of AI models for producing a schematic design of a rocket and also for coming up with some equations for the rocket design. And, of course, the hypothetical design would have crashed and burned, but I couldn’t help but think how exciting it is that AI has become so good that we are even attempting to measure its proficiency on a field as complex as rocket science.

    [2:11] If we look back, we will find that there are three main components that led to the current performance we are seeing from AI models: the transformer architecture, the scale, and in-context learning. Transformer in particular has been dominating the field of AI for the previous years. At the beginning, we started with natural language processing, and the architecture was very efficient that it took over the field of natural language processing within a very short amount of time. The transformer is a very efficient architecture that’s easy to scale, easy to parallelize, and relies on its heart at the attention mechanism, a technique that allows us to model interdependence between different components or different tokens in our input and output data. Transformers started off mostly in natural language processing, but slowly but surely, they made their way to pretty much any modality. So now we are seeing that models that are operating on images, on videos, on audio, and many other modalities are also using transformers. Five years later since their inception and transformers have surprisingly changed little compared to when they started despite so many attempts at producing better and more efficient variants of transformers, perhaps because of the gains were limited to certain use cases or perhaps because the gains did not persist at scale. Another potential reason is that maybe they made the architecture less universal, which has been one of its more—of its biggest advantages.

    [03:53] The next point is scale, and when we talk about scale, we really mean the amount of compute that’s being used to train the model, and that can be translated into either training bigger and bigger models with larger and larger number of parameters—and we have been seeing a steady increase of that over the previous years—but scale could also mean more data, using more data to train the model on larger and larger amounts of data. And we have seen different models over the previous few years taking different approaches in deciding how much data and how large the model is. But the consistent trend is that we have been scaling larger and larger and using more and more compute. Scale has also led to what is being called as “emerging capabilities.” And that’s one of the most interesting properties of scale that have been described over the previous year or so. By emerging capability, we mean that the model starts to show a certain ability that appears only when it reaches a critical size. Before that, the model is not demonstrating any of this ability at all. For example, let’s look at the figures here, and on the left-hand side, we see arithmetic. If we try to use language models to solve arithmetic word problems, up until a certain scale, they absolutely cannot solve the problem in any way, and they do not perform any better than random. But then at a certain critical point, we start seeing improved performance, and that performance just keeps getting better and better. And we have seen that at so many other tasks, as well, ranging from arithmetic to transliteration to multitask learning.

    [05:38] And perhaps one of the most exciting emerging capabilities of language models recently is their ability to in-context learn, which has been introducing a new paradigm for using these models. If we take a look back at how we have been practicing machine learning in general, with deep learning, you would start by choosing an architecture, a transformer or before that an RNN or CNN, and then you fully supervise train your model. You have a lot of labeled data, and you train your model based on that data. When we started getting into pre-trained models, we instead of training models from scratch, we actually start off with a pre-trained model and then fine-tune it still on a lot of fully supervised labeled data for the task at hand. But then with in-context learning, suddenly we can actually use the models out of the box. We can just use a pre-trained model and use a prompt in order to learn—in order to perform a new task without actually doing any learning. We can do that in zero-shot settings, meaning we do not provide any examples at all, just instructions or a description of what the task is, or in a few-shot setting, where we just provide a small handful number of examples to the model. For example, if we are interested in trying to do text classification, we can just—in this case sentiment analysis—we can just provide the text to the model and ask it to classify the text into either positive or negative. If the task is a little bit harder, we can provide few-shot samples, just a few examples of how do we want the model to classify things into, say, positive, negative, or neutral, and then ask the model to reason about a new piece of text, and it actually does pretty good at it. And it’s not only simple tasks like text classification. We can do translation or summarization and much more complex tasks with that paradigm. We can even try to do things like arithmetic where we try to give the model a word problem and ask it to come up with the answer. On the example we are showing right now, we did give the model just one sample to show it how we would solve a problem and then ask it to solve another problem. But in that particular case, the model actually failed. It did produce an answer, but it was not the correct answer. But then came the idea of chain-of-thought prompts, where instead of just showing the model the input and the output, we can actually also show it the steps it can take in order to get to that output from that particular input. In that case, we are just solving the arithmetic word problem step by step and showing an example of that to the model. When we do that, the models are not only able to produce the correct answer, but they are also able to walk us step by step through how they produced that answer. That mechanism is referred to as a chain-of-thought prompting, and it has been very prominently used in so many tasks and showing very superior performance on multiple tasks. It has been also used in many different ways, including in fine-tuning and training some of the models. The “pre-train and then fine-tune” paradigm have been established paradigm for years, since maybe the inception of BERT and similar pre-trained language models. But now you would see that there’s increased shift into using the models by prompting them instead of having to fine-tune them. That’s evident in a lot of practical usage of the models but even in the publications in the machine learning areas that have been using natural language processing tasks and switching into using prompting instead of using fine-tuning. In-context learning and prompting matters a lot because it’s actually changing the way we apply the models to new tasks. The ability of applying the models to new tasks out of the box without collecting additional data, without doing any additional training, is an amazing ability that increases the amount of tasks that can be applied—the models can be applied to and also reduces the amount of effort needed into building models with these tasks.

    [09:57] The performance has been also amazing by just providing only a few examples, and the tasks in this setting are being adapted to the model rather than the models being adapted to the tasks. If you think about the fine-tuning paradigm, what we did is that we already had the pre-trained model and we were fine-tuning it to adapt to the task. Now we are trying to frame the task in a way that’s more friendly to how the model is being trained so that the model can perform well on the task even without any fine-tuning. Finally, this allows the humans to interact with the models in their normal form of communication, in natural language. We can just give instructions describing the task that we want, and the model would perform the task. And that blurs the line between who is an ML user and who is an ML developer because now anyone can just prompt and describe different tasks to the language model and get the language model to do a large number of tasks without having to have any training or any development involved.

    From GPT-3 to ChatGPT—a jump in generative capabilities [11:02–19:07]

    [11:02] Now looking back at the last three months or so, we have been seeing the field changing quite a bit and a tremendous amount of excitement happening around the release of the ChatGPT model. And if we think about the ChatGPT model as a generative model, we would see that there has been other generative models out there from the GPT family and other models, as well, that have been doing a decent job at text generation. So you can take one of these models, in this case GPT-3, and prompt it to the question asking it to explain what the foundational language model means and it would give you a pretty decent answer. You can ask the same question to ChatGPT and you’ll find that it’s able to provide a much better answer. It’s longer; it’s more thorough; it’s more structured. You can ask it to style it in different ways. You can ask it to simplify it in different ways. And all of these are capabilities that the previous generation of the models could not really do. If we look at how ChatGPT is described, the description lists different things, but it’s mostly optimized for dialogue, allowing the humans to interact in natural language. It’s much better at following instructions and so on and so forth. If we look at step by step about how this actually was manifested in the training, we will see from the description that looking at base models that ChatGPT was built on and other models before ChatGPT, that language model training was following a self-supervised pre-training approach, where we have a lot of unsupervised language, web-scale language, that we are training the models on, and the models in this particular case are trained with an autoregressive next word prediction approach. So we are looking at an input context, which is a sentence or a part of a sentence, and trying to predict the next word. But then over the last year or so, we have been seeing a shift where models are being trained not just on text but also on code. For example, GPT-3.5 models are trained on both text and code, and surprisingly, training the models on both text and codes improves their performance on many tasks that has nothing to do with code. On the figure we see right now, we see different models being compared on—models that were trained with code and models that were not trained with code—and we are seeing that the models that were trained with both text and code show better performance at following task instructions, show better performance at reasoning, compared to similar models that were trained on text only. So the training on code seems to be grounding the models in different ways, allowing them to learn a little bit more about how to reason, about how to look at structured relation between different parts of the text.

    [13:59] The second main difference is the idea of instruction tuning, which has been—what you have been seeing becoming more and more popular over different models over the last year, maybe starting with InstructGPT that introduced the idea of training the models on human-generated data. And this is a departure from the traditional self-supervised approach, where we have been only training the model on unsupervised, free, unstructured text. Now there’s an additional step in the training process that actually trains the models on human-generated data. The human-generated data takes the format of prompt and the response, and it’s trying to teach the model to respond in a particular way given a prompt, and this step of instruction tuning has been actually helping the models get a lot better, especially in zero-shot performance. And we see here that the instruction-tuned models tend to perform a lot better than their non-instruction–tuned counterpart, especially in zero-shot settings. And the last step of the training process introduces yet another human-generated data. In this case, we actually have different responses generated by the model and we have a human providing preferences to all these responses so in a sense ranking responses and choosing which response is better than other responses. This data is used to train a reward model that can then be used to actually train the main model with reinforcement learning. And this approach further aligns the model into responding in certain ways that correspond to the way the human has been providing the feedback data. This notion of training the model with human feedback data is very interesting, and it’s creating a lot of traction with many people thinking about the best technique to train on human feedback data, the best form of human feedback to collect, to train the model on, and it would probably help us improve the models even further in the near future.

    [16:02] Now with all these advances we have been seeing, the pace of innovation and the acceleration of the advances have been moving so fast that it has been very challenging in so many ways, but perhaps one of the most profound ways it has been challenging with is the notion of benchmarking, that traditionally research in machine learning has been very dependent on using very solid benchmarks on measuring the progress of different approaches. But the pace of innovation has been really challenging that recently. To understand how fast the progress has been, let’s look at this data coming from Hypermind, a forecasting company that uses crowd forecasting and has been doing that—tracking some of the AI benchmarks recently. The first benchmark is Massive Multitask Language Understanding benchmark, a large collection of language understanding tasks. In June of 2021, a forecast was made that in a year, by June 2022, we will get to around 57 performance on this task. But in reality, what happens is that by June 2022, we were at around 67 percent, and a couple of months later, we were at 75 percent, and we keep seeing more and more fast improvements after that. A second task is the MATH task, which is a collection of middle and high school math problems, and here the prediction was that in a year, we will get to around 13 percent. But in reality, we ended up going much more beyond that within one year, and we still see more and more advances happening at a faster-than-ever-expected pace. That rate of improvement is actually resulting in a lot of the benchmarks being saturated really fast.

    [17:51] If we look back at benchmarks like MNIST and Switchboard, it took the community 20-plus years in order to fully saturate these benchmarks. And that has been accelerating, accelerating to the point where now we see benchmarks being saturated in a year or less. In fact, many of the benchmarks are becoming obsolete to the point that only 66 percent of machine learning benchmarks have received more than three results at different time points, and many of them are solved or saturated soon after they are being released. And that actually motivated the community to come together with very large efforts to try to design benchmarks that are designed specifically to challenge large language models. In that particular case, with BIG-bench, more than 400 authors from over 100 institutions came together to create it. But even with such an elaborate effort, we are seeing very fast progress, and with large language models and chain-of-thought prompting that we discussed earlier, we are seeing that we are making very fast progress against the hardest tasks in BIG-bench, and in many of them, models are already performing better than humans right now.

    Everyday impact: Integrating foundation models and products [19:09–27:20]

    [19:09] The foundation models are not only getting better and better at benchmarks, but they are actually changing many products that we use every day. We mentioned code generation earlier, so let’s talk a little bit about Copilot. GitHub Copilot is a new experience that helps developers write code, and Copilot is very interesting in many perspectives. One is how fast it went from the model being created in research to how—to the point it made it as a product generally available in GitHub Copilot but also in how much user value it has been generating. This study that was done by the Copilot GitHub team was looking at quantifying the value these models were providing to developers. And in the first part of the study, they asked different questions to the developers, trying to assess how useful the models are, and we see that 88 percent of the participants reported that they feel like they are much more productive when using Copilot than before, and they reported many other positive implications on their productivity, as well. But perhaps even more interesting, the study did a controlled study where there were two groups of developers trying to solve the same set of tasks. A group of them had access to Copilot, and the other group did not, and interestingly, the group that had access to Copilot not only finished the tasks at a higher success rate but also at a much more efficient rate. Overall, they were 55 percent more productive. Fifty-five percent more productivity in a coding scenario is an amazing progress that a lot of people would have been very surprised to think about a model like Copilot performing so fast with such value.

    [21:10] Now beyond code generation and text generation, another frontier where these models are starting to shine is when we start connecting them with external knowledge sources and external tools. Language models that have been optimized for dialogue have amazing language capabilities; they do really good at understanding language, at following instructions. They also do really well at synthesizing and generating answers. They are also conversational in nature and do store knowledge from the training data that they were trained on. But they do have a lot of limitations around reliability, factualness, staleness, access to more recent information that was not part of the training data, provenance, and so on. And that’s why connecting these models to external knowledge sources and tools could be super exciting. Let’s talk about, for example, connecting language models to search as we have seen recently with the new Bing.

    [22:14] If we take a look back years ago, there was many, many studies studying web search, studying tasks that people try to complete in web search scenarios. And many of these tasks were deemed as complex search tasks, tasks that are not navigational, as in trying to go to a particular website, or that are not simple informational tasks where you are trying to look up a fact that you can quickly get with one query but more complex tasks that involve multiple queries. Maybe you are planning a travel, maybe you are trying to buy a product, and as part of your research process, there are multifaceted queries that you would like to look at. There has been a lot of research understanding user behavior with such tasks and how prevalent they are and how much time and effort people spend in order to perform them. And they typically involve spending a significant amount of time with the search engine, reading and synthesizing information from different sources with different queries. But with a new experience like the experience Bing is providing, we can actually take one of these queries and provide much more complex long queries to the search engine. And the search engine uses both search and the power of the language model to generate multiple queries, get the results of all of these queries, and synthesize a detailed answer back to the searcher. Not only that, but it can recommend additional searches and additional ways you could interact with the search engine in order to learn more. That has the potential of saving a lot of time and a lot of effort for many searchers in supporting these complex search tasks in a much better way. Not only that, but there are some of these complex search tasks that are multistep in nature, where I would start with one query and then follow up with another query based on the information I get from the first query. Imagine that I am doing this search before the Super Bowl where I am trying to understand some comparisons, stats, between the two quarterbacks that are going to face each other, and I start with that query. What the search engine did in that particular case is that it actually started with a query where it was trying to identify who are the two quarterbacks that are going to be playing in the Super Bowl. And if I have done that as a human, I would have done that. I would have identified the teams and the two quarterbacks, and then maybe I would follow up with another query where I would actually search for the stats of the two quarterbacks I am asking about, and get that and actually synthesize the information maybe from different results and then get to the answer I am looking for. But with the new Bing experience, I can just issue the query and all of that is happening in the background. Different search queries are being generated, submitted to the search engine, recent results are getting collected, and a single answer is being synthesized and displayed, making me as a searcher much more productive and much more efficient.

    [25:21] The potential of LLM integrated—large language models integrated with search and other tools is very huge and can add much, much value to so many scenarios. But there are also a lot of challenges and a lot of opportunities and a lot of limitations that needs to be addressed. Reliability and safety are one of them; making the models more accurate; thinking about trust, provenance, and bias. User experience and behavior and how the new experience would affect how the users are interacting with the search engine is another one, with new and different tasks or different user interfaces or even different behavior models. Search has been a very well-studied experience, and we have very good understanding of how users interact with the search engine and very reliable behavior models to predict that. Changing this experience will require a lot of additional study there. Personalization and managing user preferences and search history and so on and so forth has also been a very well-studied field in web search, and with new experiences like that, we have so many opportunities and thinking about things like personalization and user experience again but also evaluation and what do metrics mean. How do we measure user satisfaction? How do we understand good and bad abandonment? Good abandonment as in when people get satisfied with the result but they don’t have to click on anything on the search result page, and bad abandonment being the opposite of that. Thinking about feedback loops, which has been playing a large part in improving search engines, and how can we apply them to new experiences and new scenarios. So while integrating language models with an experience like search and other tools and experiences is very exciting, it’s actually also creating so many opportunities for new research problems or for revisiting previous search problems that we had very good understanding for.

    Conclusion [27:21–28:37]

    [27:21] To conclude, we have been seeing incredible advancing with AI over the past couple of years. The progress has been accelerating and outpacing expectations in so many ways, and the advances are not only in terms of academic benchmarks and publications, but we are also seeing an explosion of applications that are changing the products that we use every day. However, we are really much closer to the beginning of a new era with AI than we are to the end state of AI capabilities. There are so many opportunities, and we will probably see a lot more advances and even more accelerated progress over the coming month and years. And there are so many challenges that remain and many new opportunities that are arising because of the state of where these models are. It’s a very exciting time for AI, and we are really looking forward to seeing the advances that will happen moving forward and to the applications that will result from these advances and how they will affect every one of us with the products we use every day. Thank you so much.

    [END]

    Show more Explore more

    The post AI Explainer: Foundation models ​and the next era of AI appeared first on Microsoft Research.

Categories: Microsoft

SharePoint 2016 : Mais ou se trouve le “Open in file explorer” dans les modern libraries ???

The Mit's Blog - Thu, 07/07/2016 - 14:11
Ce qui est toujours plaisant lors de montée de version d’un outil, restera toujours la découverte des nouveautés  et ensuite … le recherche délicate des fonctions d’origine … sans parler des fonctions désormais absentes … A chaque migration, l’outil ...
Categories: Microsoft , Technology

Office 2016 : Au revoir le Document Information Panel (DIP)

The Mit's Blog - Tue, 07/05/2016 - 17:59
Il est vraiment difficile d’arriver à suivre toutes les nouveautés et autres informations diverses et variés sur SharePoint et Office 2016, On peut passer à coté de certaines …pas d’une nouveauté mais plutôt d’une disparition, d’une feature deprecate...
Categories: Microsoft , Technology

Subversive-C: Abusing and Protecting Dynamic Message Dispatch

Microsoft Research Publications - Wed, 06/22/2016 - 09:00
The lower layers in the modern computing infrastructure are written in languages threatened by exploitation of memory management errors. Recently deployed exploit mitigations such as control-flow integrity (CFI) can prevent traditional return-oriented programming (ROP) exploits but are much less effective against newer techniques such as Counterfeit Object-Oriented Programming (COOP) that execute a chain of C++ virtual methods. Since these methods are valid control-flow targets, COOP attacks are hard to distinguish from benign computations. Code randomization is likewise ineffective against COOP. Until now, however, COOP attacks have been limited to vulnerable C++ applications which makes it unclear whether COOP is as general and portable a threat as ROP. This paper demonstrates the first COOP-style exploit for Objective-C, the predominant programming language on Apple’s OS X and iOS platforms. We also retrofit the Objective-C runtime with the first practical and efficient defense against our novel attack. Our defense is able to protect complex, real-world software such as iTunes without recompilation. Our performance experiments show that the overhead of our defense is low in practice.
Categories: Microsoft

Compositional Learning of Embeddings for Relation Paths in Knowledge Bases and Text

Microsoft Research Publications - Sat, 06/11/2016 - 09:00
Modeling relation paths has offered significant gains in embedding models for knowledge base (KB) completion. However, enumerating paths between two entities is very expensive, and existing approaches typically resort to approximation with a sampled subset. This problem is particularly acute when text is jointly modeled with KB relations and used to provide direct evidence for facts mentioned in it. In this paper, we propose the first exact dynamic programming algorithm which enables efficient incorporation of all relation paths of bounded length, while modeling both relation types and intermediate nodes in the compositional path representations. We conduct a theoretical analysis of the efficiency gain from the approach. Experiments on two datasets show that it addresses representational limitations in prior approaches and improves accuracy in KB completion.
Categories: Microsoft

A Gray Box Approach For High-Fidelity, High-Speed Time-Travel Debugging

Microsoft Research Publications - Wed, 06/08/2016 - 09:00
Time-travel debugging (TTD) lets developers step backward as well as forward through a program’s execution. TTD is a powerful mechanism for diagnosing bugs, but previous approaches suffer from poor performance due to checkpoint and logging overhead, or poor fidelity because important information like GUI state is not tracked. In this paper, we describe how to provide highperformance and high-fidelity TTD to programs written in managed languages. Previous high-performance debuggers treat components external to the program like the GUI as black boxes, but that is not sufficient for highfidelity time-travel. Instead, we advocate for a gray-box approach that keeps these components live and in sync with the program during time-travel. The key insight is that managed runtime APIs expose most of the functionality required to do this; where it does not, we extend the runtime with a small number of non-intrusive interrogative interfaces. To demonstrate the power of our gray-box approach, we implement ReJS, a time-traveling debugger for web applications. ReJS imposes imperceptible tracing overhead, and its logs typically grow less than 1 KB/s. As a result, ReJS is performant enough to be deployed in the wild; real client machines can ship buggy execution traces across the wide area to developer-side machines for debugging.
Categories: Microsoft

FourQ on FPGA: New Hardware Speed Records for Elliptic Curve Cryptography over Large Prime Characteristic Fields

Microsoft Research Publications - Tue, 06/07/2016 - 09:00
We present fast and compact implementations of FourQ (ASIACRYPT 2015) on field-programmable gate arrays (FPGAs), and demonstrate, for the first time, the high efficiency of this new elliptic curve on reconfigurable hardware. By adapting FourQ's algorithms to hardware, we design FPGA-tailored architectures that are significantly faster than any other ECC alternative over large prime characteristic fields. For example, we show that our single-core and multi-core implementations can compute at a rate of 6389 and 64730 scalar multiplications per second, respectively, on a Xilinx Zynq-7020 FPGA, which represent factor-2.5 and 2 speedups in comparison with the corresponding variants of the fastest Curve25519 implementation on the same device. These results show the potential of deploying FourQ on hardware for high-performance and embedded security applications. All the presented implementations exhibit regular, constant-time execution, protecting against timing and simple side-channel attacks.
Categories: Microsoft

VisFlow: A Relational Platform for Efficient Large-Scale Video Analytics

Microsoft Research Publications - Tue, 06/07/2016 - 09:00
We describe VisFlow, a system that efficiently analyzes the feeds from many cameras. Ubiquitous camera deployments are widely used for security, traffic monitoring, and customer analytics. However, existing methods to analyze the video feeds in real-time or post-facto do not scale and are error-prone. Our key contributions are two-fold. Surveillance video is hard to analyze because it has low-resolution, many objects per frame, varying light, etc. By leveraging the fixed perspective of surveillance cameras, we show that typical vision tasks can be performed with high accuracy. Next, to efficiently process many feeds, we use a relational dataflow system. We observe that (i) even vision queries that seem different have common parts (e.g., background subtraction and feature extraction), (ii) often neither camera-level or frame-level parallelism lead to good executions, and (iii) the best execution plans vary with input size. By extending query optimization techniques, VisFlow computes efficient execution plans for vision queries, parallelizing as needed. Evaluation on traffic videos from a large city on complex vision queries shows many fold improvements in accuracy, query completion time and resource usage relative to existing systems.
Categories: Microsoft
Syndicate content

eXTReMe Tracker