Microsoft Research

Syndicate content
Updated: 40 weeks 3 days ago

Renovating computer systems securely and progressively with APRON

Mon, 07/10/2023 - 18:00

This research paper was accepted by 2023 USENIX Annual Technical Conference (ATC), which is dedicated to advancing the field of systems research.

Whether they’re personal computers or cloud instances, it’s crucial to ensure that the computer systems people use every day are reliable and secure. The validity of these systems is critical because if storage devices containing important executables and data become invalid, the entire system is affected. Numerous events can jeopardize the validity of computer systems or the data stored in them, such as malicious attacks like ransomware; hardware or software errors can corrupt a system, and a lack of regular maintenance such as patch installations can cause a system to become outdated. While the ideal scenario would be to create a flawless computer system that prevents such invalid states from occurring, achieving this perfection may prove challenging in practice.

Cyber-resilient system and recovery

A cyber-resilient system is a practical approach for addressing invalid system states. This resilient system effectively identifies suspicious state corruption or preservation by analyzing various internal and external signals. If it confirms any corruption, it recovers the system. In our previous work, which we presented at the 40th IEEE Symposium on Security and Privacy, we demonstrated the feasibility of unconditional system recovery using a very small hardware component. This component forcefully resets the entire system, making it execute trusted tiny code for system boot and recovery when no authenticated deferral request is present.

However, existing recovery mechanisms, including our previous work, primarily focus on when to recover a system rather than how. Consequently, these mechanisms overlook the efficiency and security issues that can arise during system recovery. Typically, these mechanisms incorporate a dedicated recovery environment responsible for executing the recovery task. Upon system reset, if the system is found to be invalid, as illustrated in Figure 1, the recovery environment is invoked. In this scenario, the recovery environment fully restores the system using a reference image downloaded from a reliable source or a separate location where it was securely stored.

Figure 1: System boot with a normal recovery.

Unfortunately, performing a full system recovery leads to prolonged system downtime because the recovery environment is incapable of supporting any other regular task expected from a computer system. In other words, the system remains unavailable during the recovery process. Moreover, choosing to download the reference image only serves to extend overall downtime. Although using the stored image slightly relieves this issue, it introduces security concerns, as the stored image might be outdated. One can argue that a full recovery can be circumvented by inspecting each file or data block for validity and selectively recovering only the affected ones. However, this delta recovery approach is lengthier than a full recovery due to the additional calculations required for determining differences and the inefficient utilization of modern, throughput-oriented block storage devices.

Secure and progressive system renovation

In our paper “APRON: Authenticated and Progressive System Image Renovation,” which we are presenting at the 2023 USENIX Annual Technical Conference (USENIX ATC 2023), we introduce APRON, a novel mechanism for securely renovating a computer system with minimal downtime. APRON differs from conventional recovery mechanisms in a crucial way: it does not fully recover the system within the recovery environment. Instead, it selectively addresses a small set of system components, or data blocks containing them, that are necessary for booting and system recovery, including the operating system kernel and the APRON kernel module, as shown in 2 Once these components are recovered, the system boots into a partially renovated state and can perform regular tasks, progressively recovering other invalid system components as needed.

Figure 2: System boot with APRON.

This design allows APRON to significantly decrease downtime during system recovery by up to 28 times, compared with a normal system recovery, when retrieving portions of the reference image from a remote storage server connected through a 1 Gbps link. In addition, APRON incorporates a background thread dedicated to renovating the remaining invalid system components that might be accessed in the future. This background thread operates with low priority to avoid disrupting important foreground tasks. Throughout both renovation activities, APRON incurs an average runtime overhead of only 9% across a range of real-world applications. Once the renovation process is complete, runtime overhead disappears. 

APRON’s differentiator lies in its unique approach: the APRON kernel module acts as an intermediary between application or kernel threads and the system storage device, allowing it to verify and recover each data block on demand, as shown in Figure 3. When a block is requested, APRON follows a straightforward process. If the requested block is valid, APRON promptly delivers it to the requester. If it is found to be invalid, APRON employs a reference image to fix the block before serving it to the requester.

Figure 3: System storage renovation with APRON.

To efficiently and securely verify arbitrary data blocks, APRON uses a Merkle hash tree, which cryptographically summarizes every data block of the reference image. APRON further cryptographically authenticates the Merkle tree’s root hash value so that a malicious actor cannot tamper with it. To further improve performance, APRON treats zero blocks (data blocks filled with zeros) as a special case and performs deduplication to avoid repeatedly retrieving equivalent blocks. We discuss the technical details of this process in our paper.

Looking forward—extending APRON to container engines and hypervisors

APRON’s simple and widely applicable core design can easily apply to other use cases requiring efficient and secure image recovery or provisioning. We are currently exploring the possibility of implementing APRON within a container engine or hypervisor to realize an agentless APRON for container layers or virtual disk images. By extending APRON’s capabilities to these environments, we aim to provide an efficient and reliable image recovery and provisioning process without needing to modify container instances or add a guest operating system.

The post Renovating computer systems securely and progressively with APRON appeared first on Microsoft Research.

Categories: Microsoft

Research Focus: Week of July 3, 2023

Fri, 07/07/2023 - 22:00

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

In this article
  1. The Best of Both Worlds: Unlocking the Potential of Hybrid Work for Software Engineers
  2. Prompt Engineering: Improving our ability to communicate with LLMs
  3. Overwatch: Learning patterns in code edit sequences
  4. Qlib updates harness adaptive market dynamics modeling and reinforcement learning to address key challenges in financial markets
  5. NEW RESEARCH The Best of Both Worlds: Unlocking the Potential of Hybrid Work for Software Engineers

    The era of hybrid work has created new challenges and opportunities for developers. Their ability to choose where they work and the scheduling flexibility that comes with remote work can be offset by the loss of social interaction, reduced collaboration efficiency and difficulty separating work time from personal time. Companies must be equipped to maintain a successful and efficient hybrid workforce by accentuating the positive elements of hybrid work, while also addressing the challenges.

    In a new study: The Best of Both Worlds: Unlocking the Potential of Hybrid Work for Software Engineers, researchers from Microsoft aim to identify which form of work – whether fully in office, fully at home, or blended – yields the highest productivity and job satisfaction among developers. They analyzed over 3,400 survey responses conducted across 28 companies in seven countries, in partnership with Vista Equity Partners, a leading global asset manager with experience investing in software, data, and technology-enabled organizations.

    The study found that developers face many of the same challenges found in other types of hybrid workplaces. The researchers provide recommendations for addressing these challenges and unlocking more productivity while improving employee satisfaction.

    Read the paper

    SPOTLIGHT: AI focus area

    AI and Microsoft Research

    Learn more about the breadth of AI research at Microsoft

    Learn more NEW INSIGHTS Prompt Engineering: Improving our ability to communicate with LLMs

    Pretrained natural language generation (NLG) models are powerful, but in the absence of contextual information, responses are necessarily generic. The prompt is the primary mechanism for access to NLG capabilities. It is an enormously effective and flexible tool, yet in order to be actively converted to the expected output, a prompt must meet expectations for how information is conveyed. If the prompt is not accurate and precise, the model is left guessing. Prompt engineering aims to bring more context and specificity to generative AI models, providing enough information in the model instructions that the user gets the exact result they want.

    In a recent blog post: Prompt Engineering: Improving our Ability to Communicate with an LLM, researchers from Microsoft explain how they use retrieval augmented generation (RAG) to do knowledge grounding, use advanced prompt engineering to properly set context in the input to guide large language models (LLMs), implement a provenance check for responsible AI, and help users deploy scalable NLG service more safely, effectively, and efficiently.

    Read the article NEW RESOURCE Overwatch: Learning patterns in code edit sequences

    Integrated development environments (IDEs) provide tool support to automate many source code editing tasks. IDEs typically use only the spatial context, i.e., the location where the developer is editing, to generate candidate edit recommendations. However, spatial context alone is often not sufficient to confidently predict the developer’s next edit, and thus IDEs generate many suggestions at a location. Therefore, IDEs generally do not actively offer suggestions. The developer must click on a specific icon or menu and then select from a large list of potential suggestions. As a consequence, developers often miss the opportunity to use the tool support because they are not aware it exists or forget to use it. To better understand common patterns in developer behavior and produce better edit recommendations, tool builders can use the temporal context, i.e., the edits that a developer was recently performing.

    To enable edit recommendations based on temporal context, researchers from Microsoft created Overwatch, a novel technique for learning edit sequence patterns from traces of developers’ edits performed in an IDE. Their experiments show that Overwatch has 78% precision and that it not only completed edits when developers missed the opportunity to use the IDE tool support, but also predicted new edits that have no tool support in the IDE.

    UPDATED RESOURCE Qlib updates harness adaptive market dynamics modeling and reinforcement learning to address key challenges in financial markets

    Qlib is an open-source framework built by Microsoft Research that empowers research into AI technologies applicable to the financial industry. Qlib initially supported diverse machine learning modeling paradigms, including supervised learning. Now, a series of recent updates have added support for market dynamics modeling and reinforcement learning, enabling researchers and engineers to tap into more sophisticated learning methods for advanced trading system construction.

    These updates broaden Qlib’s capabilities and its value proposition for researchers and engineers, empowering them to explore ideas and implement effective quantitative trading strategies. The updates, available on GitHub, make Qlib the first platform to offer diverse learning paradigms aimed at helping researchers and engineers solve key financial market challenges.

    A significant update is the introduction of adaptive concept drift technology for modeling the dynamic nature of financial markets. This feature can help researchers and engineers invent and implement algorithms that can adapt to changes in market trends and behavior over time, which is crucial for maintaining a competitive advantage in trading strategies.

    Qlib’s support for reinforcement learning enables a new feature designed to model continuous investment decisions. This feature assists researchers and engineers in optimizing their trading strategies by learning from interactions with the environment to maximize some notion of cumulative reward.

    Download the code

    Related research:

    DDG-DA: Data Distribution Generation for Predictable Concept Drift Adaptation

    Universal Trading for Order Execution with Oracle Policy Distillation

    The post Research Focus: Week of July 3, 2023 appeared first on Microsoft Research.

Categories: Microsoft

Distributional Graphormer: Toward equilibrium distribution prediction for molecular systems

Fri, 07/07/2023 - 18:01

Structure prediction is a fundamental problem in molecular science because the structure of a molecule determines its properties and functions. In recent years, deep learning methods have made remarkable progress and impact on predicting molecular structures, especially for protein molecules. Deep learning methods, such as AlphaFold and RoseTTAFold, have achieved unprecedented accuracy in predicting the most probable structures for proteins from their amino acid sequences and have been hailed as a game changer in molecular science. However, this method provides only a single snapshot of a protein structure, and structure prediction cannot tell the complete story of how a molecule works.

Proteins are not rigid objects; they are dynamic molecules that can adopt different structures with specific probabilities at equilibrium. Identifying these structures and their probabilities is essential in understanding protein properties and functions, how they interact with other proteins, and the statistical mechanics and thermodynamics of molecular systems. Traditional methods for obtaining these equilibrium distributions, such as molecular dynamics simulations or Monte Carlo sampling (which uses repeated random sampling from a distribution to achieve numerical statistical results), are often computationally expensive and may even become intractable for complex molecules. Therefore, there is a pressing need for novel computational approaches that can accurately and efficiently predict the equilibrium distributions of molecular structures from basic descriptors.

Figure 1. The goal of Distributional Graphormer (DiG). DiG takes the basic descriptor, D, of a molecular system, such as the amino acid sequence for a protein, as input to predict the structures and their probabilities following equilibrium distribution.

In this blog post, we introduce Distributional Graphormer (DiG), a new deep learning framework for predicting protein structures according to their equilibrium distribution. It aims to address this fundamental challenge and open new opportunities for molecular science. DiG is a significant advancement from single structure prediction to structure ensemble modeling with equilibrium distributions. Its distribution prediction capability bridges the gap between the microscopic structures and the macroscopic properties of molecular systems, which are governed by statistical mechanics and thermodynamics. Nevertheless, this is a tremendous challenge, as it requires modeling complex distributions in high-dimensional space to capture the probabilities of different molecular states.

DiG achieves a novel solution for distribution prediction through an advancement of our previous work, Graphormer, which is a general-purpose graph transformer that can effectively model molecular structures. Graphormer has shown excellent performance in molecular science research, demonstrated by applications in quantum chemistry and molecular dynamics simulations, as reported in our previous blog posts (see here and here for more details). Now, we have advanced Graphormer to create DiG, which has a new and powerful capability: using deep neural networks to directly predict target distribution from basic descriptors of molecules.

SPOTLIGHT: AI focus area

AI and Microsoft Research

Learn more about the breadth of AI research at Microsoft

Learn more

DiG tackles this challenging problem. It is based on the idea of simulated annealing, a classic method in thermodynamics and optimization, which has also motivated the recent development of diffusion models that achieved remarkable breakthroughs in AI-generated content (AIGC). Simulated annealing produces a complex distribution by gradually refining a simple distribution through the simulation of an annealing process, allowing it to explore and settle in the most probable states. DiG mimics this process in a deep learning framework for molecular systems. AIGC models are often based on the idea of diffusion models, which are inspired by statistical mechanics and thermodynamics.

DiG is also based on the idea of diffusion models, but we bring this idea back to thermodynamics research, creating a closed loop of inspiration and innovation. We imagine scientists someday will be able to use DiG like an AIGC model for drawing, inputting a simple description, such as an amino acid sequence, and then using DiG to quickly generate realistic and diverse protein structures that follow equilibrium distribution. This will greatly enhance scientists’ productivity and creativity, enabling novel discoveries and applications in fields such as drug design, materials science, and catalysis.

How does DiG work? Figure 2. DiG’s design and backbone architecture.

DiG is based on the idea of diffusion by transforming a simple distribution to a complex distribution using Graphormer. The simple distribution can be a standard Gaussian, and the complex distribution can be the equilibrium distribution of molecular structures. The transformation is done step-by-step, where the whole process mimics the simulated annealing process.

DiG can be trained using different types of data or information. For example, DiG can use energy functions of molecular systems to guide transformation, and it can also use simulated structure data, such as molecular dynamics trajectories, to learn the distribution. More concretely, DiG can use energy functions of molecular systems to guide transformation by minimizing the discrepancy between the energy-based probabilities and the probabilities predicted by DiG. This approach can leverage the prior knowledge of the system and train DiG without stringent dependency on data. Alternatively, DiG can also use simulation data, such as molecular dynamics trajectories, to learn the distribution by maximizing the likelihood of the data under the DiG model.

DiG shows similarly good generalizing abilities on many molecular systems compared with deep learning-based structure prediction methods. This is because DiG inherits the advantages of advanced deep-learning architectures like Graphormer and applies them to the new and challenging task of distribution prediction.  Once trained, DiG can generate molecular structures by reversing the transformation process, starting from a simple distribution and applying neural networks in reverse order. DiG can also provide the probability estimation for each generated structure by computing the change of probability along the transformation process. DiG is a flexible and general framework that can handle different types of molecular systems and descriptors.

Results

We demonstrate DiG’s performance and potential through several molecular sampling tasks covering a broad range of molecular systems, such as proteins, protein-ligand complexes, and catalyst-adsorbate systems. Our results show that DiG not only generates realistic and diverse molecular structures with high efficiency and low computational costs, but it also provides estimations of state densities, which are crucial for computing macroscopic properties using statistical mechanics. Accordingly, DiG presents a significant advancement in statistically understanding microscopic molecules and predicting their macroscopic properties, creating many exciting research opportunities in molecular science.

One major application of DiG is to sample protein conformations, which are indispensable to understanding their properties and functions. Proteins are dynamic molecules that can adopt diverse structures with different probabilities at equilibrium, and these structures are often related to their biological functions and interactions with other molecules. However, predicting the equilibrium distribution of protein conformations is a long-standing and challenging problem due to the complex and high-dimensional energy landscape that governs probability distribution in the conformation space. In contrast to expensive and inefficient molecular dynamics simulations or Monte Carlo sampling methods, DiG generates diverse and functionally relevant protein structures from amino acid sequences at a high speed and a significantly reduced cost.

Figure 3. This illustration shows DiG’s performance when generating multiple conformations of proteins. On the left, DiG-generated structures of the main protease of SARS-CoV-2 virus are projected into 2D space panned with two TICA coordinates. On the right, structures generated by DiG (thin ribbons) are compared with experimentally determined structures (cylindrical figures) in each case.

DiG can generate multiple conformations from the same protein sequence. The left side of Figure 3 shows DiG-generated structures of the main protease of SARS-CoV-2 virus compared with MD simulations and AlphaFold prediction results. The contours (shown as lines) in the 2D space reveal three clusters sampled by extensive MD simulations. DiG generates highly similar structures in clusters II and III, while structures in cluster I are undersampled. In the right panel, DiG-generated structures are aligned to experimental structures for four proteins, each with two distinguishable conformations corresponding to unique functional states. In the upper left, the Adenylate kinase protein has open and closed states, both well sampled by DiG. Similarly, for the drug transport protein LmrP, DiG also generates structures for both states. Here, note that the closed state is experimentally determined (in the lower-right corner, with PDB ID 6t1z), while the other is the AlphaFold predicted model that is consistent with experimental data. In the case of human B-Raf kinase, the major structural difference is localized in the A-loop region and a nearby helix, which are well captured by DiG. The D-ribose binding protein has two separated domains, which can be packed into two distinct conformations. DiG perfectly generated the straight-up conformation, but it is less accurate in predicting the twisted conformation. Nonetheless, besides the straight-up conformation, DiG generated some conformations that appear to be intermediate states.

Another application of DiG is to sample catalyst-adsorbate systems, which are central to heterogeneous catalysis. Identifying active adsorption sites and stable adsorbate configurations is crucial for understanding and designing catalysts, but it is also quite challenging due to the complex surface-molecular interactions. Traditional methods, such as density functional theory (DFT) calculations and molecular dynamics simulations, are time-consuming and costly, especially for large and complex surfaces. DiG predicts adsorption sites and configurations, as well as their probabilities, from the substrate and adsorbate descriptors. DiG can handle various types of adsorbates, such as single atoms or molecules being adsorbed onto different types of substrates, such as metals or alloys.

Figure 4. Adsorption prediction results of single C, H, and O atoms on catalyst surfaces. The predicted probability distribution on catalyst surface is compared to the interaction energy between the adsorbate molecules and the catalyst in the middle and bottom rows.

Applying DiG, we predicted the adsorption sites for a variety of catalyst-adsorbate systems and compared these predicted probabilities with energies obtained from DFT calculations. We found that DiG could find all the stable adsorption sites and generate adsorbate configurations that are similar to the DFT results with high efficiency and at a low cost. DiG estimates the probabilities of different adsorption configurations, in good agreement with DFT energies.

Conclusion

In this blog, we introduced DiG, a deep learning framework that aims to predict the distribution of molecular structures. DiG is a significant advancement from single structure prediction toward ensemble modeling with equilibrium distributions, setting a cornerstone for connecting microscopic structures to macroscopic properties under deep learning frameworks.

DiG involves key ML innovations that lead to expressive generative models, which have been shown to have the capacity to sample multimodal distribution within a given class of molecules. We have demonstrated the flexibility of this approach on different classes of molecules (including proteins, etc.), and we have shown that individual structures generated in this way are chemically realistic. Consequently, DiG enables the development of ML systems that can sample equilibrium distributions of molecules given appropriate training data.

However, we acknowledge that considerably more research is needed to obtain efficient and reliable predictions of equilibrium distributions for arbitrary molecules. We hope that DiG inspires additional research and innovation in this direction, and we look forward to more exciting results and impact from DiG and other related methods in the future.

The post Distributional Graphormer: Toward equilibrium distribution prediction for molecular systems appeared first on Microsoft Research.

Categories: Microsoft

Breaking cross-modal boundaries in multimodal AI: Introducing CoDi, composable diffusion for any-to-any generation

Thu, 06/29/2023 - 18:00

Imagine an AI model that can seamlessly generate high-quality content across text, images, video, and audio, all at once. Such a model would more accurately capture the multimodal nature of the world and human comprehension, seamlessly consolidate information from a wide range of sources, and enable strong immersion in human-AI interactions. This could transform the way humans interact with computers on various tasks, including assistive technology, custom learning tools, ambient computing, and content generation.

In a recent paper: Any-to-Any Generation via Composable Diffusion, Microsoft Azure Cognitive Service Research and UNC NLP present CoDi, a novel generative model capable of processing and simultaneously generating content across multiple modalities. CoDi allows for the synergistic generation of high-quality and coherent outputs spanning various modalities, from assorted combinations of input modalities. CoDi is the latest work of Microsoft’s Project i-Code, which aims to develop integrative and composable multimodal AI. Through extensive experiments, the researchers demonstrate CoDi’s remarkable capabilities.

The challenge of multimodal generative AI

The powerful cross-modal models that have emerged in recent years are mostly capable of generating or processing just one single modality. These models often face limitations in real-world applications where multiple modalities coexist and interact. Chaining modality-specific generative models together in a multi-step generation setting can be cumbersome and slow.

Moreover, independently generated unimodal streams may not be consistent and aligned when stitched together in a post-processing way, such as synchronized video and audio.

To address these challenges, the researchers propose Composable Diffusion (CoDi), the first model capable of simultaneously processing and generating arbitrary combinations of modalities. CoDi employs a novel composable generation strategy that involves building a shared multimodal space by bridging alignment in the diffusion process, enabling the synchronized generation of intertwined modalities, such as temporally aligned video and audio.

Spotlight: On-demand video

AI Explainer: Foundation models ​and the next era of AI

Explore how the transformer architecture, larger models and more data, and in-context learning have helped advance AI from perception to creation.

Watch video The power of composable diffusion Figure 1: CoDi can generate any combination of modalities from any mixture of input modalities.

Training a model to take any mixture of input modalities and flexibly generate any mixture of outputs presents significant computational and data requirements, as the number of combinations for the input and output modalities scales exponentially. And the scarcity of aligned training data for many groups of modalities makes it infeasible to train with all possible input-output combinations. To address these challenges, the researchers propose to build CoDi in a composable and integrative manner.

They start by training each individual modality-specific latent diffusion model (LDM) independently (these LDMs will be smoothly integrated later for joint generation). This approach ensures exceptional single-modality generation quality using widely available modality-specific training data. To allow CoDi to handle any mixture of inputs, input modalities like images, video, audio, and language are projected into the same semantic space. Consequently, the LDM of each modality can flexibly process any mixture of multimodal inputs. The multi-conditioning generation process is done by letting diffusers be conditioned on these inputs via a weighted sum of each input modality’s representation.

One of CoDi’s most significant innovations is its ability to handle many-to-many generation strategies, simultaneously generating any mixture of output modalities. To achieve this, CoDi adds a cross-attention module to each diffuser, and an environment encoder to project the latent variable of different LDMs into a shared latent space.

By freezing the parameters of the LDM and training only the cross-attention parameters and the environment encoder, CoDi can seamlessly generate any group of modalities without training on all possible generation modality combinations, reducing the training objectives to a more manageable number.

Showcasing CoDi’s capabilities

The research demonstrates the novel capacity of joint generation of multiple modalities, such as synchronized video and audio, given separate text, audio, and image prompts. Specifically, in the example shown below, the input text prompt is “teddy bear on a skateboard, 4k, high resolution”, the input image prompt is a picture of Times Square, and the input audio prompt is rain. The generated video, shown in Figure 2, is a teddy bear skateboarding in the rain at Times Square. The generated audio contains the sounds of rain, skateboarding, and street noise, which are synchronized with the video. This shows that CoDi can consolidate information from multiple input modalities and generate coherent and aligned outputs.

Figure 2: The video shows an example of CoDi generating video + audio from text, image and audio input. The input modalities are listed vertically on the left side, including the text “Teddy bear on a skateboard, 4k”, a picture of Times Square, and the waveform of raining ambience. The output is a video with sound. In the video, a Teddy bear is skateboarding in the rain on the street of Times Square. One can also hear synchronized sound of skateboarding and rain.

In addition to its strong joint-modality generation quality, CoDi is also capable of single-to-single modality generation and multi-conditioning generation. It outperforms or matches the unimodal state of the art for single-modality synthesis.

CoDi generation examples Potential real-world applications and looking forward

CoDi’s development unlocks numerous possibilities for real-world applications requiring multimodal integration. For example, in education, CoDi can generate dynamic, engaging materials catering to diverse learning styles, allowing learners to access information tailored to their preferences, while enhancing understanding and knowledge retention. CoDi can support some accessible experiences for people with disabilities, such as providing audio descriptions and visual cues for deaf or low-hearing individuals.

Composable Diffusion marks a significant step towards more engaging and holistic human-computer interactions, establishing a solid foundation for future investigations in generative artificial intelligence.

Visit the CoDi project

The post Breaking cross-modal boundaries in multimodal AI: Introducing CoDi, composable diffusion for any-to-any generation appeared first on Microsoft Research.

Categories: Microsoft

Unlocking the future of computing: The Analog Iterative Machine’s lightning-fast approach to optimization 

Tue, 06/27/2023 - 15:00

Picture a world where computing is not limited by the binary confines of zeros and ones, but instead, is free to explore the vast possibilities of continuous value data. Over the past three years a team of Microsoft researchers has been developing a new kind of analog optical computer that uses photons and electrons to process continuous value data, unlike today’s digital computers that use transistors to crunch through binary data. This innovative new machine has the potential to surpass state-of-the-art digital technology and transform computing in years to come.

The Analog Iterative Machine (AIM) is designed to solve difficult optimization problems, which form the foundation of many industries, such as finance, logistics, transportation, energy, healthcare, and manufacturing. However, traditional digital computers struggle to crack these problems in a timely, energy-efficient and cost-effective manner. This is because the number of possible combinations explodes exponentially as the problem size grows, making it a massive challenge for even the most powerful digital computers. The Traveling Salesman Problem is a classic example. Imagine trying to find the most efficient route for visiting a set of cities just once before returning to the starting point. With only five cities, there are 12 possible routes – but for a 61-city problem, the number of potential routes surpasses the number of atoms in the universe.

AIM addresses two simultaneous trends. First, it sidesteps the diminishing growth of computing capacity per dollar in digital chips – or the unraveling of Moore’s Law. Second, it overcomes the limitations of specialized machines designed for solving optimization problems. Despite over two decades of research and substantial industry investment, such unconventional hardware-based machines have a limited range of practical applications, because they can only address optimization problems with binary values. This painful realization within the optimization community has driven the team to develop AIM, with a design that combines mathematical insights with cutting-edge algorithmic and hardware advancements. The result? An analog optical computer that can solve a much wider range of real-world optimization problems while operating at the speed of light, offering potential speed and efficiency gains of about a hundred times.

Today, AIM is still a research project, but the cross-disciplinary team has recently assembled the world’s first opto-electronic hardware for mixed – continuous and binary – optimization problems. Though presently operating on a limited scale, the initial results are promising, and the team has started scaling up its efforts. This includes a research collaboration with the UK-based multinational bank Barclays to solve an optimization problem critical to the financial markets on the AIM computer. Separate engagements are aimed at gaining more experience in solving industry-specific optimization problems. In June 2023, the team launched an online service that provides an AIM simulator to allow partners to explore the opportunities created by this new kind of computer.

The technology 

Photons possess a remarkable property of not interacting with one another, which has underpinned the internet era by enabling large amounts of data to be transmitted over light across vast distances. However, photons do interact with the matter through which they propagate, allowing for linear operations such as addition and multiplication, which form the basis for optimization applications. For instance, when light falls on the camera sensor on our smartphones, it adds up the incoming photons and generates the equivalent amount of current. Additionally, data transmission over fiber which brings internet connectivity to homes and businesses relies on encoding zeroes and ones onto light by programmatically controlling its intensity. This scaling of light through light-matter interaction multiplies the light intensity by a specific value – multiplication in the optical domain. Beyond optical technologies for linear operations, various other electronic components prevalent in everyday technologies can perform non-linear operations that are also critical for efficient optimization algorithms.

Analog optical computing thus involves constructing a physical system using a combination of analog technologies – both optical and electronic – governed by equations that capture the required computation. This can be very efficient for specific application classes where linear and non-linear operations are dominant. In optimization problems, finding the optimal solution is akin to discovering a needle in an inconceivably vast haystack. The team has developed a new algorithm that is highly efficient at such needle-finding tasks. Crucially, the algorithm’s core operation involves performing hundreds of thousands or even millions of vector-matrix multiplications – the vectors represent the problem variables whose values need to be determined while the matrix encodes the problem itself. These multiplications are executed swiftly and with low energy consumption using commodity optical and electronic technologies, as shown in Figure 1.

Figure 1: Illustration of the AIM computer, which implements massively parallel vector-matrix multiplication using commodity optical technologies (in the back) and non-linearity applied using analog electronics (front). The vector is represented using an array of light sources, the matrix is embedded into the modulator array (shown in grayscale) and the result is collected into the camera sensor. Figure 2: The second-generation AIM computer, with 48 variables, is a rack-mounted appliance.

Thanks to the miniaturization of all these components onto tiny centimeter-scale chips, the entire AIM computer fits into a small rack enclosure – as shown in Figure 2. As light travels incredibly fast – 5 nanoseconds per meter – each iteration within the AIM computer is significantly faster and consumes less electricity than running the same algorithm on a digital computer. Importantly, since the entire problem is embedded into the modulator matrix inside the computer itself, AIM does not require the problem to be transferred back and forth between storage and compute locations. And unlike synchronous digital computers, AIM’s operation is entirely asynchronous. These architectural choices circumvent key historical bottlenecks for digital computers. 

Finally, all technologies used in AIM are already prevalent in consumer products with existing manufacturing ecosystems, which paves the way for a viable computing platform, at full scale, if all the technical challenges can be tamed by the team.

The importance of optimization problems

Optimization problems are mathematical challenges that require finding the best possible solution from a set of feasible alternatives. The modern world relies heavily on efficient solutions to these problems – from managing electricity in our power grids and streamlining goods delivery across sea, air, and land, to optimizing internet traffic routing.

Effectively and efficiently solving optimization problems can significantly improve processes and outcomes across many other industries. Take finance, for example, where portfolio optimization involves selecting the ideal combination of assets to maximize returns while minimizing risks. In healthcare, optimizing patient scheduling can enhance resource allocation and minimize waiting times in hospitals.

For many larger problems, even the world’s biggest supercomputer would take years or even centuries to find the optimal solution to such problems. A common workaround is heuristic algorithms – problem-solving techniques that provide approximate solutions by employing shortcuts or “rules of thumb.” Although these algorithms might not guarantee the discovery of an optimal solution, they are the most practical and efficient methods for finding near-optimal solutions in reasonable timeframes. Now, imagine the immense impact of a computer that could deliver more optimal solutions in a significantly shorter timeframe for these critical problems. In some instances, solving these problems in real-time could create a domino effect of positive outcomes, revolutionizing entire workflows and industries.

QUMO: a world beyond QUBO

For years, researchers, both in industry and academia, have built impressive specialized machines to efficiently solve optimization problems using heuristic algorithms. This includes an array of custom hardware, such as field programmable gate arrays (FPGAs), quantum annealers, and electrical and optical parametric oscillator systems. However, all of them rely on mapping difficult optimization problems to the same binary representation, often referred to as Ising, Max-Cut or QUBO (quadratic unconstrained binary optimization). Unfortunately, none of these efforts have provided a practical alternative to conventional computers. This is because it is very hard to map real-world optimization problems at scale to the binary abstraction, a common theme in the team’s engagement with practitioners across industry and academia.

With AIM, the team has introduced a more expressive mathematical abstraction called QUMO (quadratic unconstrained mixed optimization), which can represent mixed – binary and continuous – variables and is compatible with hardware implementation, making it the “sweetspot” for many practical, heavily-constrained optimization problems. Discussions with industry experts indicate that scaling AIM to 10,000 variables would mean that most of the practical problems discussed earlier are within reach. A problem with 10,000 variables that can be directly mapped to the QUMO abstraction would require an AIM computer with 10,000 physical variables. In contrast, existing specialized machines would need to scale to beyond a million physical variables, well beyond the capabilities of the underlying hardware.

AIM also implements a novel and efficient algorithm for solving such QUMO problems that relies on an advanced form of gradient descent, a technique that is also popular in machine learning. The algorithm shows highly competitive performance and accuracy across various industrially inspired problem benchmarks. It even discovered new best-ever solutions to four problems. The first-generation AIM computer, built last year, solves QUMO optimization problems that are represented with an accuracy of up to 7 bits. The team, shown in Figure 3, has also shown good quantitative agreement between the simulated and the hardware version of the AIM computer to gain further confidence in the viability of these efficiency gains as the computer is scaled up. This paper gives more details about the AIM architecture, its implementation, evaluation and scaling roadmap.

Figure 3: AIM’s design involves innovation at the intersection of optical and analog hardware, mathematics and algorithms, and software and system architecture, which is typified in the cross-disciplinary nature of the team working hand-in-hand towards the mission of building a computer that solves practical problems. Photo of the AIM team – Front row (left to right): Doug Kelly, Jiaqi Chu, James Clegg, Babak Rahmani. Back row: Hitesh Ballani, George Mourgias-Alexandris, Daniel Cletheroe, Francesca Parmigiani, Lucinda Pickup, Grace Brennan, Ant Rowstron, Kirill Kalinin, Jonathan Westcott, Christos Gkantsidis. (Greg O’Shea and Jannes Gladrow do not appear in this photo.) Rethinking optimization with QUMO: A more expressive way of reasoning for experts

AIM’s blueprint for co-designing unconventional hardware with an expressive abstraction and a new algorithm has the potential to spark a new era in optimization techniques, hardware platforms, and automated problem mapping procedures, utilizing the more expressive QUMO abstraction. This exciting journey has already begun, with promising results from mapping problems from diverse domains like finance and healthcare to AIM’s QUMO abstraction. Recent research has already shown that increased expressiveness with continuous variables can substantially expand the real-world business problems that can be tackled. However, to the team’s knowledge, AIM is the first and only hardware to natively support this abstraction.

As we venture into a new abstraction, we must also adopt new ways of thinking. It is crucial for the team to build a strong community to deeply investigate the benefits of embracing QUMO. We invite people who have previously been deterred by the limitations of binary solvers to consider the new opportunities offered by AIM’s QUMO abstraction. To facilitate this, we are releasing our AIM simulator as a service, allowing selected users to get first-hand experience. The initial users are the team’s collaborators at Princeton University and at Cambridge University. They have helped us identify several exciting problems where the AIM computer and its abstraction is a much more natural fit. We are also actively engaging with thought leaders from internal Microsoft divisions and external companies in sectors where optimization is crucial.

Together, we can drive innovation and unlock the true potential of analog optical computing for solving some of the most complex optimization problems across industries.

Ask us about AIM Services

The post Unlocking the future of computing: The Analog Iterative Machine’s lightning-fast approach to optimization  appeared first on Microsoft Research.

Categories: Microsoft

Research Focus: Week of June 19, 2023

Fri, 06/23/2023 - 23:57

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

In this article
  1. Responsible AI Maturity Model
  2. FoundWright helps people re-find web content they previously discovered
  3. Trace-Guided Inductive Synthesis of Recursive Functional Programs
  4. Wait-Free Weak Reference Counting
  5. Disaggregating Stateful Network Functions
  6. Industrial-Strength Controlled Concurrency Testing for C# Programs with Coyote
  7. NEW RESOURCE Responsible AI Maturity Model

    As the use of AI continues to surge, new government regulations are expected. But the organizations that build and use AI technologies needn’t wait to devise best practices for developing and deploying AI systems responsibly. Many companies have adopted responsible AI (RAI) principles as a form of self-regulation. Yet, effectively translating these principles into practice is challenging.

    To help organizations identify their current and desired levels of RAI maturity, researchers at Microsoft have developed the Responsible AI Maturity Model (RAI MM). The RAI MM is a framework containing 24 empirically derived dimensions that are key to an organization’s RAI maturity, and a roadmap of maturity progression so organizations and teams can identify where they are and where they could go next.

    Derived from interviews and focus groups with over 90 RAI specialists and AI practitioners, the RAI MM can help organizations and teams navigate their RAI journey, even as RAI continues to evolve.

    Learn more

    Spotlight: On-Demand EVENT

    Microsoft Research Summit 2022

    On-Demand
    Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.

    Explore sessions NEW RESEARCH FoundWright helps people re-find web content they previously discovered

    Re-finding information is a common task—most online search requests involve re-finding information. However, this can be difficult when people struggle to express what they seek. People may forget exact details of the information they want to re-find, making it hard to craft a query to locate it. People may also struggle to recover information within web repositories, such as bookmarks or history, as these do not capture enough information, or present an experience to allow ambiguous queries. As a result, people can feel overwhelmed and cognitively exhausted when faced with a re-finding task.

    A new paper from Microsoft researchers: FoundWright: A System to Help People Re-find Pages from Their Web-history, introduces a new system to address these problems. FoundWright leverages recent advances in language transformer models to expand people’s ability to express what they seek by defining concepts that can attract documents with semantically similar content. The researchers used FoundWright as a design probe to understand how people create and use concepts; how this expanded ability helps re-finding; and how people engage and collaborate with FoundWright’s machine learning support. The research reveals that this expanded way of expressing re-finding goals complements traditional searching and browsing. 

    Read the paper NEW RESEARCH Trace-Guided Inductive Synthesis of Recursive Functional Programs

    In recent years, researchers have made significant advances in synthesis of recursive functional programs, including progress in inductive synthesis of recursive programs from input-output examples. The latter problem, however, continues to pose several challenges.

    In a new paper: Trace-Guided Inductive Synthesis of Recursive Functional Programs, which received a distinguished paper award from the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2023), researchers from Microsoft and Purdue University propose a novel trace-guided approach to tackle the challenges of ambiguity and generalization in synthesis of recursive functional programs from examples. This approach augments the search space of programs with recursion traces consisting of sequences of recursive subcalls of programs. It is based on a new version space algebra (VSA) for succinct representation and efficient manipulation of pairs of recursion traces and programs that are consistent with each other. The researchers implement this approach in a tool called SyRup. Evaluating SyRup on benchmarks from prior work demonstrates that it not only requires fewer examples to achieve a certain success rate than existing synthesizers, but is also less sensitive to the quality of the examples. 

    These results indicate that utilizing recursion traces to differentiate satisfying programs with similar sizes is applicable to a wide range of tasks. 

    Read the paper NEW RESEARCH Wait-Free Weak Reference Counting

    Reference counting is a common approach to memory management. One challenge with reference counting is cycles that prevent objects from being deallocated. Systems such as the C++ and Rust standard libraries introduce two types of reference: strong and weak. A strong reference allows access to the object and prevents the object from being deallocated, while a weak reference only prevents deallocation. A weak reference can be upgraded to provide a strong reference if other strong references to the object exist. Hence, the upgrade operation is partial, and may fail dynamically. The classic implementation of this upgrade operation is not wait-free—it can take arbitrarily long to complete if there is contention on the reference count.

    In a new paper: Wait-Free Weak Reference Counting, researchers from Microsoft propose a wait-free algorithm for weak reference counting, which requires primitive wait-free atomic operations of “compare and swap”, and “fetch and add”. The paper includes a correctness proof of the algorithm using the Starling verification tool, a full implementation in C++, and a demonstration of the best- and worst-case performance using micro-benchmarks.

    The new algorithm is faster than the classic algorithm in the best case, but has an overhead in the worst case. The researchers present a more complex algorithm that effectively combines the classic algorithm and the wait-free algorithm, delivering much better performance in the worst case, while maintaining the benefits of the wait-free algorithm.

    Read the paper NEW RESEARCH Disaggregating Stateful Network Functions

    For security, isolation, metering, and other purposes, public clouds today implement complex network functions at every server. Today’s implementations, in software or on FPGAs and ASICs that are attached to each host, are becoming increasingly complex and costly, creating bottlenecks to scalability.

    In a new paper: Disaggregating Stateful Network Functions, researchers from Microsoft present a different design that disaggregates network function processing off the host and into shared resource pools by making novel use of appliances which tightly integrate general-purpose ARM cores with high-speed stateful match processing ASICs. When work is skewed across VMs, such disaggregation can offer better reliability and performance over the state of the art, at a lower per-server cost. The paper, which was published at the 2023 USENIX Symposium on Networked Systems Design and Implementation (NSDI), includes solutions to the consequent challenges and presents results from a production deployment at a large public cloud.

    Read the paper Presentation Azure public preview NEW RESEARCH Industrial-Strength Controlled Concurrency Testing for C# Programs with Coyote

    Testing programs with concurrency is challenging because their execution is non-deterministic, making bugs hard to find, re-produce and debug. Non-determinism can cause flaky tests—which may pass or fail without any code changes—creating a significant engineering burden on development teams. As concurrency is essential for building modern multi-threaded or distributed systems, solutions are required to help developers test their concurrent code for correctness.

    Testing concurrent programs comes with two main challenges. First is the problem of reproducibility or control, while the second challenge is the state-space explosion problem. A concurrent program, even with a fixed test input, can have an enormous number of possible behaviors.

    In a new research paper: Industrial-Strength Controlled Concurrency Testing for C# Programs with Coyote, researchers from Microsoft describe the design and implementation of the open-source tool Coyote for testing concurrent programs written in the C# language. This research won a 2023 Best Software Science Paper award from The European Association of Software Science and Technology (EASST).

    Read the paper

    The post Research Focus: Week of June 19, 2023 appeared first on Microsoft Research.

Categories: Microsoft

DeepSpeed ZeRO++: A leap in speed for LLM and chat model training with 4X less communication

Thu, 06/22/2023 - 18:18
Figure 1: Picture of ZeRO++ project highlights. Left top subfigure shows ZeRO++ reduce communication volume by 4x compared with ZeRO stage 3. Right top subfigure shows ZeRO++ performance on RLHF model training, where ZeRO++ achieves 1.3x speedup for RLHF training and 2.x speedup for token generation.

Large AI models are transforming the digital world. Generative language models like Turing-NLG, ChatGPT, and GPT-4, powered by large language models (LLMs), are incredibly versatile, capable of performing tasks like summarization, coding, and translation. Similarly, large multimodal generative models like DALL·E, Microsoft Designer, and Bing Image Creator can generate art, architecture, videos, and other digital assets, empowering content creators, architects, and engineers to explore new frontiers of creative productivity.

However, training these large models requires considerable memory and computing resources across hundreds or even thousands of GPU devices. For instance, training the Megatron-Turing NLG 530B model utilized over 4,000 NVidia A100 GPUs. Efficiently leveraging these resources requires a complex system of optimizations to partition the models into pieces that fit into the memory of individual devices, and to efficiently parallelize the computing across these devices. At the same time, to make large model training easily accessible to the deep learning community, these optimizations must be easy to use.

The ZeRO family of optimizations from DeepSpeed offers a powerful solution to these challenges, and has been widely used to train large and powerful deep learning models TNLG-17B, Bloom-176B, MPT-7B, Jurrasic-1, etc. Despite its transformative capabilities, there are critical scenarios where ZeRO incurs high data transfer overhead across GPUs, making it challenging to achieve high training efficiency. This happens specifically when a) training on a large number of GPUs relative to the global batch size, which results in small per-GPU batch size, requiring frequent communication, or b) training on low-end clusters, where cross-node network bandwidth is limited, resulting in high communication latency. In these scenarios, ZeRO’s ability to offer accessible and efficient training is limited.

To address these limitations, we are releasing ZeRO++, a system of communication optimization strategies built on top of ZeRO to offer unmatched efficiency for large model training, regardless of batch size limitations or cross-device bandwidth constraints. ZeRO++ leverages quantization, in combination with data, and communication remapping, to reduce total communication volume by 4x compared with ZeRO, without impacting model quality. This has two key implications:

  • ZeRO++ accelerates large model pre-training and fine-tuning
    • Small batch-size per GPU: Whether pre-training large models on thousands of GPUs or fine-tuning them on hundreds or even dozens of GPUs, when batch-size per GPU is small, ZeRO++ offers up to 2.2x higher throughput compared to ZeRO, directly reducing training time and cost.
    • Low-bandwidth clusters: ZeRO++ enables low-bandwidth clusters to achieve similar throughput as those with 4x higher bandwidth. Therefore, ZeRO++ makes efficient large model training accessible across a wider variety of clusters.
  • ZeRO++ accelerates ChatGPT-like model training with RLHF

    While ZeRO++ was designed primarily for training, its optimizations automatically also apply to ZeRO-Inference, as the communication overheads are common to training and inference with ZeRO. Consequently, ZeRO++ improves efficiency of workloads like reinforcement learning from human feedback (RLHF) used in training dialogue models, which combines both training and inference.

    Through integration with DeepSpeed-Chat, ZeRO++ can improve the generation phase of RLHF training by up to 2x and reinforcement learning training phase by up to 1.3x compared to original ZeRO.

Next, we’ll take a deeper dive into ZeRO and its communication overheads and discuss the key optimizations in ZeRO++ for addressing them. Then we’ll demonstrate the impact of ZeRO++ on training throughput for different model sizes, batch sizes, and bandwidth constraints. We’ll also discuss how ZeRO++ applies to DeepSpeed-Chat for accelerating the training of dialogue models using RLHF.

Deep dive into ZeRO++ Figure 2: ZeRO optimizer workflow

ZeRO is a memory efficient variation of data parallelism where model states are partitioned across all the GPUs, instead of being replicated, and reconstructed using gather/broadcast-based communication collectives on the fly during training. This allows ZeRO to effectively leverage the aggregate GPU memory and compute across all devices, while offering simplicity and ease-of-use of data-parallel training.

Assume the model size as M. During the forward pass, ZeRO conducts all-gather/broadcast operations to collect parameters for each model layer right before it is needed (in total of size M). In the backward pass, ZeRO adopts a similar communication pattern for parameters at each layer to compute its local gradients (in total of size M). In addition, ZeRO averages and partitions each local-gradient immediately after it is computed using a reduce or reduce-scatter communication collective (in total of size M). In total, ZeRO has a communication volume of 3M, spread evenly across two all-gather/broadcast and one reduce-scatter/reduce operation.

To reduce these communication overheads, ZeRO++ has three sets of communication optimizations, targeting each of the above-mentioned three communication collectives, respectively:

Figure 3: Block-based quantization in qwZ. The figure shows block quantization has better data precision compared with basic quantization. Quantized weight communication for ZeRO (qwZ)

First, to reduce parameter communication volume during all-gather, we adopt quantization on weights to shrink down each model parameter on the fly from FP16 (two bytes) to INT8 (one byte) data type before communicating, and dequantize weights after the communication. However, naively conducting quantization on weights may reduce model training accuracy. To preserve decent model training precision, we adopt block-based quantization, which conducts independent quantization on each subset of model parameters. There is no existing implementation for high performance, block-based quantization. Thus, we implement highly optimized quantization CUDA kernels from scratch that is 3x more accurate and 5x faster compared with basic quantization.

Figure 4: Hierarchical weights partition in hpZ. The figure shows hpZ holds secondary model partitions on each GPU, compared to zero-3 only holding primary model partitions. Hierarchical weight partition for ZeRO (hpZ)

Second, to reduce communication overhead of all-gather on weights during backward pass, we trade GPU memory for communication. More specifically, instead of spreading whole model weights across all the machines as in ZeRO, we maintain a full model copy within each machine. At the expense of higher memory overhead, this allows us to replace the expensive cross-machine all-gather/broadcast on weights with intra-machine all-gather/broadcast, which is substantially faster due to much higher intra-machine communication bandwidth.

Figure 5: End to end workflow of qgZ. This animation figure shows whole workflow of qgZ component, which includes tensor slice reordering, intra-node quantization, intra-node all-to-all communication, intra-node dequantization, intra-node reduction, inter-node quantization, inter-node all-to-all communication, inter-node dequantization, inter-node reduction. Quantized gradient communication for ZeRO (qgZ)

Third, reducing communication cost of gradients using reduce-scatter is even more challenging. Directly applying quantization to reduce communication volume is infeasible. Even if we incorporate block-based quantization as low-precision, the gradient reduction accumulates and amplifies quantization error. To address this, we only quantize gradients before communication, but dequantize them to full precision before any reduction operation. To do this efficiently, we invented an all-to-all-based, novel quantized gradient communication paradigm called qgZ, which is functionally equivalent to compressed reduce-scatter collective operation.

qgZ is designed to solve two challenges: i) overcome significant accuracy loss that would result from low-precision reduction if we were to simply implement reduce-scatter in INT4/INT8, and ii) avoid accuracy degradation and significant latency overhead that would result from a long sequence of quantization and dequantization steps that would be needed by traditional approach to reduce-scatter that are ring- or tree-based, even if we did the reductions in full-precision. Instead of using a ring- or tree-based reduce-scatter algorithm, qgZ is based on a novel hierarchical all-to-all approach.

There are three major steps in qgZ: i) gradient slice reordering, ii) intra-node communication and reduction, and iii) inter-node communication and reduction. First, before any communication happens, we slice the gradient and do tensor slice reordering to guarantee the final gradient placement (i.e., green chunks in Figure 5) is correct on each GPU at the end of the communication. Second, we quantize the reordered gradient slices, conduct all-to-all communication within each node, dequantize the received gradient slices from the all-to-all, and do local reductions. Third, we quantize the local reduced gradients again, conduct inter-node all-to-all communication, dequantize the received gradients again, and compute the final high-precision gradient reduction to get the results as green chunks in Figure 5.

The reason for this hierarchical approach is to reduce cross-node communication volumes. More precisely, given N GPUs per node, model size of M and quantization ratio of Z, single hop all-to-all will generate M*N/Z cross-node traffic. In comparison, with this hierarchical approach, we reduce the cross-node traffic of each GPU from M/Z to M/(Z*N). Thus, the total communication volume is reduced from M*N/Z to M*N/(Z*N) = M/Z. We further optimize end-to-end latency of qgZ by overlapping intra-node and inter-node communication as well as fusing the CUDA kernel for (tensor slice reordering + intra-node quantization) and (intra-node dequantization+ intra-node reduction + inter-node quantization).

Communication VolumeForward all-gather on weightsBackward all-gather on weightsBackward reduce-scatter on gradientsTotalZeRO MMM3MZeRO++0.5M00.25M0.75M Communication volume reduction

By incorporating all three components above, we reduce the cross-node communication volume from 3M down to 0.75M. More specifically, we reduce forward all-gather/broadcast on model weights from M to 0.5M using qwZ. We eliminate the cross-node all-gather during backward propagation using hpZ, reducing the communication from M to 0. Finally, we reduce cross-node reduce-scatter communication during backward-pass from M to 0.25M using qgZ.

ZeRO++ accelerates LLM training

Here we show our evaluation results of ZeRO++ with real-world LLM training scenarios in 384 Nvidia V100 GPUs.

Figure 6: Throughput comparison of zero++ v.s. zero with 400 Gbps interconnect. Figure shows zero++ can achieve up to 1.56x speedup with 1k token per GPU, while achieving 1.41x speedup with 2k token per GPU. High efficiency with small batch per-GPU

High-bandwidth cluster: As shown in Figure 6, we first show ZeRO++ throughput improvement over ZeRO for different model sizes and micro batch sizes with 400Gbps cross-node interconnects using 4x Infiniband (IB), each running at 100Gbps. With 1k token per GPU, ZeRO++ achieves 28% to 36% throughput improvement over ZeRO-3. For 2k micro batch sizes, ZeRO++ achieves 24% to 29% throughput gain over ZeRO-3.

Figure 7: Throughput comparison of ZeRO++ v.s ZeRO with 100Gbps interconnect. Figure shows ZeRO++ achieves 2.21x speedup compared to ZeRO in 1k token per GPU cases, while achieving 1.77x speedup in 2k token per GPU cases.

Low-bandwidth cluster: In low network environments like a 100Gbps network, ZeRO++ performs significantly better than ZeRO-3. As shown in Figure 7, ZeRO++ achieves up to 2.2x speedup in end-to-end throughput, compared to ZeRO-3. On average, ZeRO++ achieves around 2x speedup over ZeRO-3 baseline.

Figure 8: ZeRO++ with low bandwidth interconnect achieves similar throughput as ZeRO with high bandwidth interconnect. Figure shows in both 18B and 138B model sizes, ZeRO++ with low bandwith network achieves similar throughput compared to ZeRO with high bandwidth interconnect. Enabling efficiency equivalence between high and low bandwidth clusters

In addition, ZeRO ++ can achieve comparable system throughput in a low-bandwidth cluster compared with ZeRO in a much higher bandwidth setting. As shown in Figure 8, for both 18B and 138B models, ZeRO++ with 200Gbps cross-node link can reach similar TFLOPs compared with ZeRO-3 in 800 Gbps cross-node link settings. 

Given the excellent scalability of ZeRO++, we envision ZeRO++ as the next generation of ZeRO for training large AI models.

ZeRO++ for RLHF training with DeepSpeed-Chat RLHF training background

ChatGPT-like models are powered by LLMs and fine-tuned using RLHF. RLHF consists of generation (inference) phases and training phases. During the generation phase, the actor model takes a partial conversation as input and generates responses using a sequence of forward passes. Then during the training phase, the critic model ranks the generated responses by quality, providing reinforcement signals for the actor model. The actor model is fine-tuned using these rankings, enabling it to generate more accurate and appropriate responses in subsequent iterations.

RLHF training brings a non-trivial amount of memory pressure as it utilizes four models (actor, reference, critic, reward). Low-rank adaptation (LoRA) is employed to address the memory pressure of RLHF. LoRA freezes the pretrained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, significantly reducing the number of trainable parameters. LoRA speeds up RLHF by reducing memory usage, allowing for larger batch sizes, and thus greatly improves throughput.

DeepSpeed-Chat with ZeRO++ for RLHF training Figure 9: ZeRO++ speedup in RLHF training. Left figure shows ZeRO++ achieves 1.26x speedup for RLHF step1 training. Right figure shows ZeRO++ achieves up to 2.25x speedup in RLHF step3 token generation.

RLHF with LoRA is a unique application for ZeRO++ since most model weights are frozen. This means ZeRO++ can keep these frozen weights quantized in INT4/8 instead of storing them in FP16 and quantizing them before each communication operation. The dequantization after communication is still done to get the weights ready for computation, but the dequantized weights are simply discarded after computation.  

Using ZeRO++ for RLHF training in this way reduces both memory usage and communication volume. This boosts training throughput by reducing communication as well as by enabling larger batch sizes due to reduced memory usage. During the generation phase, ZeRO++ uses hpZ to keep all weight communication within each node to utilize the higher intranode communication bandwidth with reduced communication volume, further improving the generation throughput.

ZeRO++ is integrated into DeepSpeed-Chat to power RLHF training of ChatGPT-like models. In Figure 9, we compare RLHF generation throughput for different sizes of actor models comparing ZeRO with ZeRO++ for 30B and 66B actor models on 32 V100 GPUs. The results show that ZeRO++ enables up to 2.25x better RLHF generation throughput than ZeRO. We also present the speedup for the training phase on 16 V100 GPUs, where ZeRO++ achieves 1.26x better throughput than ZeRO as a result of lower communication and larger batch sizes enabled by ZeRO++.

Release: Try DeepSpeed ZeRO++ today

We are super excited to release DeepSpeed ZeRO++ and make it available for anyone in the AI community. To get started, please visit our GitHub page for LLM training. ZeRO++ for DeepSpeed-Chat will be released in the coming weeks.

Get LLM training Read the technical paper

DeepSpeed-ZeRO++ is part of the DeepSpeed ecosystem. To learn more, please visit our website, where you’ll find detailed blog posts, tutorials, and helpful documentation.

For the latest DeepSpeed news, please follow us on social media:

On Twitter (English) On Twitter (Japanese) On Zhihu (Chinese)

DeepSpeed welcomes your contributions. We encourage you to report issues, contribute PRs, and join discussions on the DeepSpeed GitHub page. Please see our contributing guide for more details. We are open to collaborations with universities, research labs, and companies. For such requests (and other requests unsuitable for GitHub), please directly email to deepspeed-info@microsoft.com.

Contributors

This project was made possible by the contributions of the following people from the DeepSpeed Team:

Guanhua Wang, Heyang Qin, Sam Ade Jacobs, Connor Holmes, Samyam Rajbhandari, Olatunji Ruwase, Ammar Ahmad Awan, Jeff Rasley, Michael Wyatt, Yuxiong He (team lead)

The post DeepSpeed ZeRO++: A leap in speed for LLM and chat model training with 4X less communication appeared first on Microsoft Research.

Categories: Microsoft

Microsoft at CVPR 2023: Pushing the boundaries of computer vision

Tue, 06/20/2023 - 18:19

In the vast realm of artificial intelligence, few fields have captivated our imagination and pushed the boundaries of possibility quite like computer vision. At the core of this domain of research and innovation lies the ambition to empower technologies for real-world vision-based systems, enabling machines to take in and respond to visual stimuli with unparalleled precision and sophistication. Through the combination of AI, deep learning, and vast amounts of data, computer vision has made great strides in recent years, catapulting us into an era in which the seemingly impossible becomes achievable.

The Computer Vision and Pattern Recognition (CVPR) 2023, held June 10 through June 22, is a widely recognized event that brings together leading experts in the field of computer vision. It serves as a platform for showcasing some of the most compelling and innovative work in this domain. 

The contributions presented by Microsoft researchers and their collaborators at this year’s CVPR cover a wide spectrum of research endeavors. From generative models and network pretraining to sign language understanding and neural video codecs, these cutting-edge advancements underscore the evolving capabilities of systems to analyze and extract valuable insights from visual data.

Here are some of the highlights (see below for a list of published papers and their authors): 

Uniting vision, language, and multi-modal encoding

The paper, “Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks” lies at the intersection of vision, language, and multimodal pretraining. To learn from these different forms of data, we present a general-purpose foundational model that treats images as a “foreign language.” The data from different modalities are encoded with Multiway Transformers, a modular architecture that enables modality-specific encoding and deep fusion. The model is pretrained on images, text, and image-text pairs in a way that generalizes the masked language modeling approach to different modalities. By substantially scaling the model and data, we found that these advances in foundational architecture and pretraining lead to excellent transfer performance over a variety of vision and vision-language tasks, including object detection, semantic segmentation, image classification, visual reasoning, visual question answering, image captioning, and cross-modal image retrieval.

Scaling training data for large vision models

The strength of large language models stems from their ability to leverage unlabeled training data on a massive scale. By using this data, these models acquire a broad understanding of language, enhance their generalization abilities, and improve their performance across a wide range of language-related tasks. Inspired by this achievement, our research focuses on the possibilities of scaling training data for large vision models. In the paper “On Data Scaling in Masked Image Modeling,” we explore the effects of data scaling on large vision models that are pretrained through masked image modeling. Through extensive investigation, we discovered that masked image modeling in large vision models requires large-scale data for effective pretraining. However, unlike large language models, large vision models cannot benefit from more data in a non-overfitting scenario. These findings deepen our understanding of masked image modeling and may pave the way for future advancements in large-scale vision models.

Creating 3D avatars with a diffusion network

In the world of image generation, incredible strides have been made in transforming text descriptions into stunning visuals. The rise of DALL-E and diffusion models has brought these cutting-edge tools into the hands of everyday users. In the paper “RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion,” we expand on this innovation by introducing the power of diffusion to 3D avatar generation. To do this, it is necessary to transfer diffusion from 2D to 3D. However, transferring diffusion from 2D to 3D is a significant challenge due to the prohibitive memory and processing costs for producing high-quality results with rich details in 3D. We overcome this problem by proposing the roll-out diffusion network (RODIN), which unrolls a 3D neural radiance field into a single 2D feature plane and performs 3D-aware diffusion on it. Supported by other technical contributions, including latent conditioning to promote global coherence and hierarchical synthesis to further enhance details, RODIN significantly accelerates the otherwise tedious 3D modeling process and opens new opportunities for 3D artists.

Spotlight: Microsoft Research Podcast

AI Frontiers: The Physics of AI with Sébastien Bubeck

What is intelligence? How does it emerge and how do we measure it? Ashley Llorens and machine learning theorist Sébastian Bubeck discuss accelerating progress in large-scale AI and early experiments with GPT-4.

Listen now

Microsoft papers published at CVPR 2023 with their authors:

  1. 3D Human Mesh Estimation from Virtual Markers
    Xiaoxuan Ma, Peking University; Jiajun Su, Peking University; Chunyu Wang, Microsoft Research; Wentao Zhu, Peking University; Yizhou Wang, Peking University and National Engineering Research Center of Visual Technology
  2. 3D Line Mapping Revisited
    Shaohui Liu, ETH Zurich; Yifan Yu, ETH Zurich; Rémi Pautrat, ETH Zurich; Marc Pollefeys, ETH Zurich and Microsoft Research; Viktor Larsson, Lund University
  3. BlendFields: Few-Shot Example-Driven Facial Modeling
    Kacper Kania, Warsaw University of Technology; Stephan J. Garbin, Microsoft Research; Andrea Tagliasacchi, Simon Fraser and University and Google Brain; Virginia Estellers, Microsoft Research; Kwang Moo Yi, University of British Columbia; Julien Valentin, Microsoft Research; Tomasz Trzciński, Jagiellonian University; Marek Kowalski, Microsoft Research
  4. CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning
    Yiting Cheng, Fudan University; Fangyun Wei, Microsoft Research; Jianmin Bao, Microsoft Research; Dong Chen, Microsoft Research; Wenqiang Zhang, Fudan University
  5. Deep Frequency Filtering for Domain Generalization
    Shiqi Lin, University of Science and Technology of China; Zhizheng Zhang, Microsoft Research; Zhipeng Huang, University of Science and Technology of China; Yan Lu, Microsoft Research; Cuiling Lan, Microsoft Research; Peng Chu, Microsoft; Quanzeng You, Microsoft; Jiang Wang, Microsoft; Zicheng Liu, Microsoft Research; Amey Parulkar, Microsoft; Viraj Navkal, Microsoft; Zhibo Chen, University of Science and Technology of China
  6. DeepLSD: Line Segment Detection and Refinement with Deep Image Gradients
    Rémi Pautrat, ETH Zurich; Daniel Barath, ETH Zurich; Viktor Larsson, Lund University; Martin R. Oswald, University of Amsterdam; Marc Pollefeys, ETH Zurich and Microsoft Research
  7. DETRs with Hybrid Matching
    Ding Jia, Peking University; Yuhui Yuan, Microsoft Research; Haodi He, Stanford University; Xiaopei Wu, Zhejiang University; Haojun Yu, Peking University; Weihong Lin, Microsoft Research; Lei Sun, Microsoft Research; Chao Zhang, Peking University; Han Hu, Microsoft Research
  8. EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
    Xinyu Liu, Chinese University of Hong Kong; Houwen Peng, Microsoft Research; Ningxin Zheng, Microsoft Research; Yuqing Yang, Microsoft Research; Han Hu, Microsoft Research; Yixuan Yuan, Chinese University of Hong Kong
  9. Four-View Geometry with Unknown Radial Distortion
    Petr Hruby, Viktor Korotynskiy, Timothy Duff, Luke Oeding, Marc Pollefeys, ETH Zurich and Microsoft Research; Tomas Pajdla, Viktor Larsson, Lund University
  10. High-Fidelity and Freely Controllable Talking Head Video Generation
    Yue Gao, Microsoft Research; Yuan Zhou, Microsoft Research; Jinglu Wang, Microsoft Research; Xiao Li, Microsoft Research; Xiang Ming, Microsoft Research; Yan Lu, Microsoft Research
  11. Human Pose as Compositional Tokens
    Zigang Geng, University of Science and Technology of China and Microsoft Research; Chunyu Wang, Microsoft Research; Yixuan Wei, Tsinghua University and Microsoft Research; Ze Liu, University of Science and Technology of China and Microsoft Research; Houqiang Li, University of Science and Technology of China; Han Hu, Microsoft Research
  12. iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-training for Visual Recognition
    Yixuan Wei, Tsinghua University and Microsoft Research; Yue Cao, Microsoft Research; Zheng Zhang, Microsoft Research; Houwen Peng, Microsoft Research; Zhuliang Yao, Tsinghua University and Microsoft Research; Zhenda Xie, Tsinghua University and Microsoft Research; Han Hu, Microsoft Research; Baining Guo, Microsoft Research
  13. Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
    Wenhui Wang, Microsoft; Hangbo Bao, Microsoft; Li Dong, Microsoft Research; Johan Bjorck, Microsoft; Zhiliang Peng, Microsoft; Qiang Liu, Microsoft; Kriti Aggarwal, Microsoft Research; Owais Khan Mohammed, Microsoft; Saksham Singhal, Microsoft Research; Subhojit Som, Microsoft; Furu Wei, Microsoft Research
  14. Iterative Proposal Refinement for Weakly-Supervised Video Grounding
    Meng Cao, Peking University; Fangyun Wei, Microsoft Research; Can Xu, Microsoft Research; Xiubo Geng, Microsoft Research; Long Chen, Hong Kong University of Science and Technology; Can Zhang, Peking University; Yuexian Zou, Peking University; Tao Shen, Microsoft; Daxin Jiang, Microsoft Research
  15. LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction 
    Zhaoyun Jiang, Xi’an Jiaotong University; Jiaqi Guo, Microsoft Research; Shizhao Sun, Microsoft Research; Huayu Deng, Shanghai Jiaotong University; Zhongkai Wu, Beihang University; Vuksan Mijovic, Microsoft; Zijiang James Yang, Xi’an Jiaotong University; Jian-Guang Lou, Microsoft Research; Dongmei Zhang, Microsoft Research
  16. Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing
    Shruthi Bannur, Microsoft Research; Stephanie Hyland, Microsoft Research; Qianchu Liu, Fernando Pérez García, Microsoft Research; Maximilian Ilse, Microsoft Research; Daniel C. Castro, Microsoft Research; Benedikt Boecking, Harshita Sharma, Microsoft Research; Kenza Bouzid, Microsoft Research; Anja Thieme, Microsoft Research; Anton Schwaighofer, Microsoft Research; Maria Wetscherek, Matthew P. Lungren, Aditya Nori, Microsoft Research; Javier Alvarez-Valle, Microsoft Research; Ozan Oktay Microsoft Research
  17. Look Before You Match: Instance Understanding Matters in Video Object Segmentation
    Junke Wang, Shanghai Collaborative Innovation Center on Intelligent Visual Computing; Dongdong Chen, Microsoft Research; Zuxuan Wu, Shanghai Collaborative Innovation Center on Intelligent Visual Computing; Chong Luo, Microsoft Research; Chuanxin Tang, Microsoft Research; Xiyang Dai, Microsoft Research; Yucheng Zhao, Microsoft Research; Yujia Xie, Microsoft Research; Lu Yuan, Microsoft Research; Yu-Gang Jiang, Shanghai Collaborative Innovation Center on Intelligent Visual Computing
  18. MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
    Xiaoyi Dong, University of Science and Technology of China; Jianmin Bao, Microsoft Research; Yinglin Zheng, Xiamen University; Ting Zhang, Microsoft Research; Dongdong Chen, Microsoft Research; Hao Yang, Microsoft Research; Ming Zeng, Xiamen University; Weiming Zhang, University of Science and Technology of China; Lu Yuan, Microsoft Research; Dong Chen, Microsoft Research; Fang Wen, Microsoft Research; Nenghai Yu, University of Science and Technology of China
  19. MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation
    Bowen Zhang, USTC; Chenyang Qi, HKUST; Pan Zhang, USTC; Bo Zhang, Microsoft Research; HsiangTao Wu, Microsoft; Dong Chen, HKUST; Qifeng Chen, HKUST; Yong Wang, USTC; Fang Wen, Microsoft
  20. MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
    Ludan Ruan, Renmin University of China; Yiyang Ma, Peking University; Huan Yang, Microsoft Research; Huiguo He, Microsoft Research; Bei Liu, Microsoft Research; Jianlong Fu, Microsoft Research; Nicholas Jing Yuan, Microsoft Research; Qin Jin, Renmin University of China; Baining Guo, Microsoft Research
  21. Motion Information Propagation for Neural Video Compression
    Linfeng Qi, University of Science and Technology of China; Jiahao Li, Microsoft Research; Bin Li, Microsoft Research; Houqiang Li, University of Science and Technology of China; Yan Lu, Microsoft Research
  22. Natural Language-Assisted Sign Language Recognition
    Ronglai Zuo, Hong Kong University of Science and Technology; Fangyun Wei, Microsoft Research; Brian Mak, Hong Kong University of Science and Technology
  23. Neural Video Compression with Diverse Contexts
    Jiahao Li, Microsoft Research; Bin Li, Microsoft Research; Yan Lu, Microsoft Research
  24. On Data Scaling in Masked Image Modeling
    Zhenda Xie, Tsinghua University and Microsoft Research; Zheng Zhang, Microsoft Research; Yue Cao, Microsoft Research; Yutong Lin, Xi’an Jiaotong University and Microsoft Research; Yixuan Wei, Tsinghua University and Microsoft Research; Qi Dai, Microsoft Research; Han Hu, Microsoft Research
  25. Paint by Example: Exemplar-based Image Editing with Diffusion Models
    Binxin Yang, University of Science and Technology of China; Shuyang Gu, Microsoft Research; Bo Zhang, Microsoft Research; Ting Zhang, Microsoft Research; Xuejin Chen, University of Science and Technology of China; Xiaoyan Sun, University of Science and Technology of China; Dong Chen, Microsoft Research; Fang Wen, Microsoft Research
  26. ReCo: Region-Controlled Text-to-Image Generation
    Zhengyuan Yang, Microsoft Research; Jianfeng Wang, Microsoft; Zhe Gan, Microsoft; Linjie Li, Microsoft Research; Kevin Lin, Microsoft Research; Chenfei Wu, Microsoft Research; Nan Duan, Microsoft; Zicheng Liu, Microsoft Research; Ce Liu, Microsoft; Michael Zeng, Microsoft Research; Lijuan Wang, Microsoft Research
  27. ResFormer: Scaling ViTs with Multi-Resolution Training
    Rui Tian, Fudan University and Shanghai Collaborative Innovation Center of Intelligent Visual Computing; Zuxuan Wu, Fudan University and Shanghai Collaborative Innovation Center of Intelligent Visual Computing; Qi Dai, Microsoft Research; Han Hu, Microsoft Research; Yu Qiao,Shanghai AI Laboratory; Yu-Gang Jiang, Fudan University and Shanghai Collaborative Innovation Center of Intelligent Visual Computing
  28. Revealing the Dark Secrets of Masked Image Modeling
    Zhenda Xie, Tsinghua University and Microsoft Research; Zigang Geng, University of Science and Technology of China and Microsoft Research; Jingcheng Hu, Tsinghua University and Microsoft Research; Zheng Zhang, Microsoft Research; Han Hu, Microsoft Research; Yue Cao, Microsoft Research
  29. RODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion
    Tengfei Wang, Hong Kong University of Science and Technology; Bo Zhang, Microsoft Research; Ting Zhang, Microsoft Research; Shuyang Gu, Microsoft Research; Jianmin Bao, Microsoft Research; Tadas Baltrusaitis, Microsoft Research; Jingjing Shen, Microsoft Research; Dong Chen, Microsoft Research; Fang Wen, Microsoft Research; Qifeng Chen, Hong Kong University of Science and Technology; Baining Guo, Microsoft Research
  30. SeqTrack: Sequence to Sequence Learning for Visual Object Tracking
    Xin Chen, Dalian University of Technology; Houwen Peng, Microsoft Research; Dong Wang, Dalian University of Technology; Huchuan Lu, Dalian University of Technology and Peng Cheng Laboratory; Han Hu, Microsoft Research
  31. Side Adapter Network for Open-Vocabulary Semantic Segmentation
    Mengde Xu, Huazhong University of Science and Technology and Microsoft Research; Zheng Zhang, Huazhong University of Science and Technology and Microsoft Research; Fangyun Wei, Microsoft Research; Han Hu, Microsoft Research; Xiang Bai; Huazhong University of Science and Technology
  32. Streaming Video Model
    Yucheng Zhao, University of Science and Technology of China; Chong Luo, Microsoft Research; Chuanxin Tang, Microsoft Research; Dongdong Chen, Microsoft Research; Noel Codella, Microsoft Research; Zheng-Jun Zha, University of Science and Technology of China
  33. Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
    Mingfang Zhang, University of Tokyo and Microsoft Research; Jinglu Wang, Microsoft Research; Xiao Li, Microsoft Research; Yifei Huang, University of Tokyo; Yoichi Sato, University of Tokyo; Yan Lu, Microsoft Research
  34. SVFormer: Semi-supervised Video Transformer for Action Recognition
    Zhen Xing, Fudan University and Shanghai Collaborative Innovation Center of Intelligent Visual Computing; Qi Dai, Microsoft Research; Han Hu, Microsoft Research; Jingjing Chen, Fudan University and Shanghai Collaborative Innovation Center of Intelligent Visual Computing; Zuxuan Wu, Fudan University and Shanghai Collaborative Innovation Center of Intelligent Visual Computing; Yu-Gang Jiang, Fudan University and Shanghai Collaborative Innovation Center of Intelligent Visual Computing
  35. TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models
    Sucheng Ren, Microsoft Research; Fangyun Wei, Microsoft Research; Zheng Zhang, Microsoft Research; Han Hu, Microsoft Research
  36. Two-shot Video Object Segmentation
    Kun Yan, Peking University; Xiao Li, Microsoft Research; Fangyun Wei, Microsoft Research; Jinglu Wang, Microsoft Research; Chenbin Zhang, Peking University; Ping Wang, Peking University; Yan Lu, Microsoft Research
  37. Unifying Layout Generation with a Decoupled Diffusion Model
    Mude Hui, Xi’an Jiaotong University; Zhizheng Zhang, Microsoft Research; Xiaoyi Zhang, Microsoft Research; Wenxuan Xie, Microsoft Research; Yuwang Wang, Tsinghua University; Yan Lu, Microsoft Research
  38. VideoTrack: Learning to Track Objects via Video Transformer
    Fei Xie, Shanghai Jiao Tong University; Lei Chu, Microsoft Research; Jiahao Li, Microsoft Research; Yan Lu, Microsoft Research; Chao Ma, Shanghai Jiao Tong University
  39. VolRecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction
    Yufan Ren, EPFL; Fangjinhua Wang ETH Zurich; Tong Zhang, EPFL; Marc Pollefeys, ETH Zurich and Microsoft Research; Sabine Süsstrunk, EPFL
  40. X-Avatar: Expressive Human Avatars
    Kaiyue Shen, ETH Zurich; Chen Guo, ETH Zurich; Manuel Kaufmann, ETH Zurich; Juan Jose Zarate, ETH Zurich; Julien Valentin, Microsoft Research; Jie Song, ETH Zurich; Otmar Hilliges, ETH Zurich
  41. Unifying Vision, Text, and Layout for Universal Document Processing
    Zineng Tang, University of North Carolina (UNC) Chapel Hill; Ziyi Yang, Microsoft Research; Guoxin Wang, Microsoft Research; Yuwei Fang, Microsoft Research; Yang Liu, Microsoft Research; Chenguang Zhu, Microsoft Research; Michael Zeng, Microsoft Research; Cha Zhang, Microsoft Research; Mohit Bansal, University of North Carolina (UNC) Chapel Hill

The post Microsoft at CVPR 2023: Pushing the boundaries of computer vision appeared first on Microsoft Research.

Categories: Microsoft

Improving Subseasonal Forecasting with Machine Learning

Fri, 06/16/2023 - 22:30

This content was previously published by Nature Portfolio and Springer Nature Communities on Nature Portfolio Earth and Environment Community.

Improving our ability to forecast the weather and climate is of interest to all sectors of the economy and to government agencies from the local to the national level. Weather forecasts zero to ten days ahead and climate forecasts seasons to decades ahead are currently used operationally in decision-making, and the accuracy and reliability of these forecasts has improved consistently in recent decades (Troccoli, 2010). However, many critical applications – including water allocation, wildfire management, and drought and flood mitigation – require subseasonal forecasts with lead times in between these two extremes (Merryfield et al., 2020; White et al., 2017).

While short-term forecasting accuracy is largely sustained by physics-based dynamical models, these deterministic methods have limited subseasonal accuracy due to chaos (Lorenz, 1963). Indeed, subseasonal forecasting has long been considered a “predictability desert” due to its complex dependence on both local weather and global climate variables (Vitart et al., 2012). Recent studies, however, have highlighted important sources of predictability on subseasonal timescales, and the focus of several recent large-scale research efforts has been to advance the subseasonal capabilities of operational physics-based models (Vitart et al., 2017; Pegion et al., 2019; Lang et al., 2020). Our team has undertaken a parallel effort to demonstrate the value of machine learning methods in improving subseasonal forecasting.

The Subseasonal Climate Forecast Rodeo

To improve the accuracy of subseasonal forecasts, the U.S. Bureau of Reclamation (USBR) and the National Oceanic and Atmospheric Administration (NOAA) launched the Subseasonal Climate Forecast Rodeo, a yearlong real-time forecasting challenge in which participants aimed to skillfully predict temperature and precipitation in the western U.S. two-to-four weeks and four-to-six weeks in advance. Our team developed a machine learning approach to the Rodeo and a SubseasonalRodeo dataset for training and evaluating subseasonal forecasting systems.

Week 3-4 temperature forecasts and observations for February 5th, 2018. Upper left: Our Rodeo submission. Upper right: Realized temperature anomalies. Bottom left: Forecast of the U.S. operational dynamical model, Climate Forecasting System v2. Bottom right: A standard meteorological forecasting method used as a Rodeo baseline.

Spotlight: Microsoft Research Podcast

AI Frontiers: The Physics of AI with Sébastien Bubeck

What is intelligence? How does it emerge and how do we measure it? Ashley Llorens and machine learning theorist Sébastian Bubeck discuss accelerating progress in large-scale AI and early experiments with GPT-4.

Listen now

Our final Rodeo solution was an ensemble of two nonlinear regression models. The first integrates a diverse collection of meteorological measurements and dynamic model forecasts and prunes irrelevant predictors using a customized multitask model selection procedure. The second uses only historical measurements of the target variable (temperature or precipitation) and introduces multitask nearest neighbor features into a weighted local linear regression. Each model alone outperforms the debiased operational U.S. Climate Forecasting System version 2 (CFSv2), and, over 2011-2018, an ensemble of our regression models and debiased CFSv2 improves debiased CFSv2 skill by 40%-50% for temperature and 129%-169% for precipitation. See our write-up Improving Subseasonal Forecasting in the Western U.S. with Machine Learning for more details. While this work demonstrated the promise of machine learning models for subseasonal forecasting, it also highlighted the complementary strengths of physics- and learning-based approaches and the opportunity to combine those strengths to improve forecasting skill.

Adaptive Bias Correction (ABC)

To harness the complementary strengths of physics- and learning-based models, we next developed a hybrid dynamical-learning framework for improved subseasonal forecasting. In particular, we learn to adaptively correct the biases of dynamical models and apply our novel adaptive bias correction (ABC) to improve the skill of subseasonal temperature and precipitation forecasts.

At subseasonal lead times, weeks 3-4 and 5-6, ABC doubles or triples the forecasting skill of leading operational dynamical models from the U.S. (CFSv2) and Europe (ECMWF).

ABC is an ensemble of three new low-cost, high-accuracy machine learning models: Dynamical++, Climatology++, and Persistence++. Each model trains only on past temperature, precipitation, and forecast data and outputs corrections for future forecasts tailored to the site, target date, and dynamical model. Dynamical++ and Climatology++ learn site- and date-specific offsets for dynamical and climatological forecasts by minimizing forecasting error over adaptively-selected training periods. Persistence++ additionally accounts for recent weather trends by combining lagged observations, dynamical forecasts, and climatology to minimize historical forecasting error for each site.

ABC can be applied operationally as a computationally inexpensive enhancement to any dynamical model forecast, and we use this property to substantially reduce the forecasting errors of eight operational dynamical models, including the state-of-the-art ECMWF model.

ABC can be applied operationally as a computationally inexpensive enhancement to any dynamical model forecast.

A practical implication of these improvements for downstream decision-makers is an expanded geographic range for actionable skill, defined here as spatial skill above a given sufficiency threshold. For example, we vary the weeks 5-6 sufficiency threshold from 0 to 0.6 and find that ABC consistently boosts the number of locales with actionable skill over both raw and operationally-debiased CFSv2 and ECMWF.

ABC consistently boosts the number of locales with forecasting accuracy above a given skill threshold, an important property for operational decision-making in water allocation, wildfire management, and drought and flood mitigation. 

We couple these performance improvements with a practical workflow for explaining ABC skill gains using Cohort Shapley (Mase et al., 2019) and identifying higher-skill windows of opportunity (Mariotti et al., 2020) based on relevant climate variables.

Our “forecast of opportunity” workflow explains ABC skill gains in terms of relevant climate variables observable at forecast time.

To facilitate future deployment and development, we also release our model and workflow code through the subseasonal_toolkit Python package.

The SubseasonalClimateUSA dataset

To train and evaluate our contiguous US models, we developed a SubseasonalClimateUSA dataset housing a diverse collection of ground-truth measurements and model forecasts relevant to subseasonal timescales. The SubseasonalClimateUSA dataset is updated regularly and publicly accessible via the subseasonal_data package. In SubseasonalClimateUSA: A Dataset for Subseasonal Forecasting and Benchmarking, we used this dataset to benchmark ABC against operational dynamical models and seven state-of-the-art deep learning and machine learning methods from the literature. For each subseasonal forecasting task, ABC and its component models provided the best performance.

Percentage improvement in accuracy over operationally-debiased dynamical CFSv2 forecasts. ABC consistently outperforms standard meteorological baselines (Persistence and Climatology) and 7 state-of-the-art machine learning and deep learning methods from the literature. Online learning with optimism and delay

To provide more flexible and adaptive model ensembling in the operational setting of real-time climate and weather forecasting, we developed three new optimistic online learning algorithms — AdaHedgeD, DORM, and DORM+ — that require no parameter tuning and have optimal regret guarantees under delayed feedback.

Each year, the PoolD online learning algorithms produce ensemble forecasts with accuracy comparable to the best individual model in hindsight despite observing only 26 observations per year.

Our open-source Python implementation, available via the PoolD library, provides simple strategies for combining the forecasts of different subseasonal forecasting models, adapting the weights of each model based on real-time performance. See our write-up Online Learning with Optimism and Delay for more details.

Looking forward

We’re excited to continue exploring machine learning applied to subseasonal forecasting on a global scale, and we hope that our open-source packages will facilitate future subseasonal development and benchmarking. If you have ideas for model or dataset development, please contribute to our open-source Python code or contact us!

The post Improving Subseasonal Forecasting with Machine Learning appeared first on Microsoft Research.

Categories: Microsoft

Accounting for past imaging studies: Enhancing radiology AI and reporting

Tue, 06/13/2023 - 20:00

The use of self-supervision from image-text pairs has been a key enabler in the development of scalable and flexible vision-language AI models in not only general domains but also in biomedical domains such as radiology. The goal in the radiology setting is to produce rich training signals without requiring manual labels so the models can learn to accurately recognize and locate findings in the images and relate them to content in radiology reports.

Radiologists use radiology reports to describe imaging findings and offer a clinical diagnosis or a range of possible diagnoses, all of which can be influenced by considering the findings on previous imaging studies. In fact, comparisons with previous images are crucial for radiologists to make informed decisions. These comparisons can provide valuable context for determining whether a condition is a new concern or improving, deteriorating, or stable if an existing condition and can inform more appropriate treatment recommendations. Despite the importance of comparisons, current AI solutions for radiology often fall short in aligning images with report data because of the lack of access to prior scans. Current AI solutions also typically fail to account for the chronological progression of disease or imaging findings often present in biomedical datasets. This can lead to ambiguity in the model training process and can be risky in downstream applications such as automated report generation, where models may make up temporal content without access to past medical scans. In short, this limits the real-world applicability of such AI models to empower caregivers and augment existing workflows.

In our previous work, we demonstrated that multimodal self-supervised learning of radiology images and reports can yield significant performance improvement in downstream applications of machine learning models, such as detecting the presence of medical conditions and localizing these findings within the images. In our latest study, which is being presented at the 2023 IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), we propose BioViL-T, a self-supervised training framework that further increases the data efficiency of this learning paradigm by leveraging the temporal structure present in biomedical datasets. This approach enables the incorporation of temporal information and has the potential to perform complementary self-supervision without the need for additional data, resulting in improved predictive performance.

Our proposed approach can handle missing or spatially misaligned images and can potentially scale to process a large number of prior images. By leveraging the existing temporal structure available in datasets, BioViL-T achieves state-of-the-art results on several downstream benchmarks. We’ve made both our models and source code open source, allowing for a comprehensive exploration and validation of the results discussed in our study. We’ve also released a new multimodal temporal benchmark dataset, MS-CXR-T, to support further research into longitudinal modeling of medical images and text data.

SPOTLIGHT: AI focus area

AI and Microsoft Research

Learn more about the breadth of AI research at Microsoft

Learn more Connecting the data points

Solving for the static case in vision-language processing—that is, learning with pairs of single images and captions—is a natural first step in advancing the field. So it’s not surprising that current biomedical vision-language processing work has largely focused on tasks that are dependent on features or abnormalities present at a single point in time—what is a patient’s current condition, and what is a likely diagnosis?—treating image-text pairs such as x-rays and corresponding reports in today’s datasets as independent data points. When prior imaging findings are referenced in reports, that information is often ignored or removed in the training process. Further, a lack of publicly available datasets containing longitudinal series of imaging examinations and reports has further challenged the incorporation of temporal information into medical imaging benchmarks.

Thanks to our early and close collaboration with practicing radiologists and our long-standing work with Nuance, a leading provider of AI solutions in the radiology space that was acquired by Microsoft in 2022, we’ve been able to better understand clinician workflow in the radiological imaging setting. That includes how radiology data is created, what its different components are, and how routinely radiologists refer to prior studies in the context of interpreting medical images. With these insights, we were able to identify temporal alignment of text across multiple images as a clinically significant research problem. To ground, or associate, report information such as “pleural effusion has improved compared to previous study” with the imaging modality requires access to the prior imaging study. We were able to tackle this challenge without gathering additional data or annotations.

As an innovative solution, we leveraged the metadata from de-identified public datasets like MIMIC-CXR. This metadata preserves the original order and intervals of studies, allowing us to connect various images over time and observe disease progression. Developing more data efficient and smart solutions in the healthcare space, where data sources are scarce, is important if we want to develop meaningful AI solutions.

Figure 1: The proposed self-supervised training framework BioViL-T leverages pairs of radiology reports and sequences of medical images. The training scheme does not require manual expert labels and can scale to a large amount of radiology data to pretrain image and text models required for downstream clinical applications. Addressing the challenges of longitudinal analysis

With current and prior images now available for comparison, the question became, how can a model reason about images coming from different time points? Radiological imaging, especially with planar techniques like radiographs, may show noticeable variation. This can be influenced by factors such as the patient’s posture during capture and the positioning of the device. Notably, these variations become more pronounced when images are taken with longer time gaps in between. To manage variations, current approaches to longitudinal analysis, largely used for fully supervised learning of image models only, require extensive preprocessing, such as image registration, a technique that attempts to align multiple images taken at different times from different viewpoints. In addition to better managing image variation, we wanted a framework that could be applied to cases in which prior images weren’t relevant or available and the task involved only one image.

We designed BioViL-T with these challenges in mind. Its main components are a multi-image encoder, consisting of both a vision transformer and a convolutional neural network (CNN), and a text encoder. As illustrated in Figure 1, in the multi-image encoder, each input image is first encoded with the CNN model to independently extract findings, such as opacities, present in each medical scan. Here, the CNN counteracts the large data demands of transformer-based architectures through its efficiency in extracting lower-level semantic features.

At the next stage, the features across time points are matched and compared in the vision transformer block, then aggregated into a single joint representation incorporating both current and historical radiological information. It’s important to note that the transformer architecture can adapt to either single- or multi-image scenarios, thereby better handling situations in which past images are unavailable, such as when there’s no relevant image history. Additionally, a cross-attention mechanism across image regions reduces the need for extensive preprocessing, addressing potential variations across images.

In the final stage, the multi-image encoder is jointly trained with the text encoder to match the image representations with their text counterparts using masked modeling and contrastive supervision techniques. To improve text representations and model supervision, we utilize the domain-specific text encoder CXR-BERT-general, which is pretrained on clinical text corpora and built on a clinical vocabulary.

Figure 2: Example of current (left) and prior (right) chest x-ray scans. The attention maps computed within the vision transformer show (in purple) how the model interprets disease progression by focusing on these image regions. In this example, the airspace disease seen in the left lung lobe has improved since the prior acquisition. Grounded model prediction

In our work, we found that linking multiple images during pretraining makes for both better language and vision representations, enabling the AI model to better associate information present in both the text and the images. This means that when given a radiology report of a chest x-ray, for example, with the description “increased opacities in the left lower lung compared with prior examination,” a model can more accurately identify, locate, and compare findings, such as opacities. This improved alignment between data modalities is crucial because it allows the model to provide more accurate and relevant insights, such as identifying abnormalities in medical images, generating more accurate diagnostic reports, or tracking the progression of a disease over time.

Two findings were particularly insightful for us during our experimentation with BioViL-T:

  • Today’s language-generating AI models are often trained by masking portions of text and then prompting them to fill in the blanks as a means of encouraging the models to account for context in outputting a prediction. We extended the traditional masked language modeling (MLM) approach to be guided by multi-image context, essentially making the approach multimodal. This, in return, helped us better analyze whether BioViL-T was learning a progression based on provided images or making a random prediction of the masked words based solely on the text context. We gave the model radiology images and reports with progression-related language, such as “improving,” masked. An example input would be “pleural effusion has been [MASKED] since yesterday.” We then tasked the model with predicting the missing word(s) based on single and multi-image inputs. When provided with a single image, the model was unsuccessful in completing the task; however, when provided with a current and prior image, performance improved, demonstrating that the model is basing its prediction on the prior image.
  • Additionally, we found that training on prior images decreases instances of the generative AI model producing ungrounded outputs that seem plausible but are factually incorrect, in this case, when there’s a lack of information. Prior work into radiology report generation utilizes single input images, resulting in the model potentially outputting text that describes progression without having access to past scans. This severely limits the potential adoption of AI solutions in a high-stakes domain such as healthcare. A decrease in ungrounded outputs, however, could enable automated report generation or assistive writing in the future, which could potentially help reduce administrative duties and ease burnout in the healthcare community. Note that these models aren’t intended for any clinical use at the moment, but they’re important proof points to assess the capabilities of healthcare AI.
Moving longitudinal analysis forward

Through our relationships with practicing radiologists and Nuance, we were able to identify and concentrate on a clinically important research problem, finding that accounting for patient history matters if we want to develop AI solutions with value. To help the research community advance longitudinal analysis, we’ve released a new benchmark dataset. MS-CXR-T, which was curated by a board-certified radiologist, consists of current-prior image pairs of chest x-rays labeled with a state of progression for the temporal image classification task and pairs of sentences about disease progression that are either contradictory or capture the same assessment but are phrased differently for the sentence similarity task.

We focused on chest x-rays and lung diseases, but we see our work as having the potential to be extended into other medical imaging settings where analyzing images over time plays an important part in clinician decision-making, such as scenarios involving MRI or CT scans. However far the reach, it’s vital to ensure that models such as BioViL-T generalize well across different population groups and under the various conditions in which medical images are captured. This important part of the journey requires extensive benchmarking of models on unseen datasets. These datasets should widely vary in terms of acquisition settings, patient demographics, and disease prevalence. Another aspect of this work we look forward to exploring and monitoring is the potential role of general foundation models like GPT-4 in domain-specific foundation model training and the benefits of pairing larger foundation models with smaller specialized models such as BioViL-T.

To learn more and to access our text and image models and source code, visit the BioViL-T Hugging Face page and GitHub.

BioViL-T models BioViL-T code Acknowledgments

We’d like to thank our co-authors: Shruthi Bannur, Stephanie Hyland, Qianchu Liu, Fernando Pérez-García, Maximilian Ilse, Daniel C. Castro, Benedikt Boecking, Harshita Sharma, Kenza Bouzid, Anja Thieme, Anton Schwaighofer, Maria Wetscherek, and Aditya Nori. We’d also like to thank Hoifung Poon, Melanie Bernhardt, Melissa Bristow, and Naoto Usuyama for their valuable technical feedback and Hannah Richardson for assisting with compliance reviews.

MEDICAL DEVICE DISCLAIMER

BioViL-T was developed for research purposes and is not designed, intended, or made available as a medical device and should not be used to replace or as a substitute for professional medical advice, diagnosis, treatment, or judgment.

The post Accounting for past imaging studies: Enhancing radiology AI and reporting appeared first on Microsoft Research.

Categories: Microsoft

Research Focus: Week of June 5, 2023

Wed, 06/07/2023 - 18:00

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

In this article
  1. The GPT-x Revolution in Medicine, with Peter Lee 
  2. SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning 
  3. Analyzing Leakage of Personally Identifiable Information in Language Models
  4. Automatic Prompt Optimization with "Gradient Descent" and Beam Search
  5. PODCAST  The GPT-x Revolution in Medicine, with Peter Lee 

    Microsoft Research’s Peter Lee recently sat down to discuss the impact of GPT-4 and large language models in medicine on physician-scientist Eric Topol’s Ground Truths podcast. Drawing from Lee’s recent book, The AI Revolution in Medicine, the conversation includes his early experimentation with GPT-4 and his views of its potential as well as its weaknesses. 

    For example: 

    • GPT-4 excels at evaluating and reviewing content, insightfully spotting inconsistencies and missing citations, and perceiving a lack of inclusivity and diversity in terminology 
    • GPT-4 can help reduce medical errors and coach physicians to consider different diagnoses and show greater empathy to patients 
    • GPT-4 has the potential to empower patients with new tools and to democratize access to expert medical information 
    • AI needs appropriate regulation, particularly in the field of medicine 
    Explore the podcast

    Spotlight: On-Demand EVENT

    Microsoft Research Summit 2022

    On-Demand
    Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.

    Explore sessions NEW RESEARCH  SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning 

    Deploying machine learning models in production may allow adversaries to infer sensitive information about training data. Inference risks range from membership inference to data reconstruction attacks. Inspired by the success of games in cryptography to study security properties, some authors describe privacy inference risks in machine learning using a similar game-based formalism. However, adversary capabilities and goals are often stated in subtly different ways from one presentation to the next, which makes it hard to relate and compose results. 

    In a new research paper, SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning, researchers from Microsoft present a game-based framework to systematize the body of knowledge on privacy inference risks in machine learning. In the paper, which was presented at the 2023 IEEE Symposium on Security and Privacy, the authors use this framework to (1) provide a unifying structure for definitions of inference risks, (2) formally establish known relations among definitions, and (3) uncover hitherto unknown relations that would have been difficult to spot otherwise. 

    Read the paper NEW RESEARCH  Analyzing Leakage of Personally Identifiable Information in Language Models

    Language models (LMs) are widely deployed for performing several different downstream tasks. However, they have been shown to leak information about training data through sentence-level membership inference and reconstruction attacks. Understanding the risk of LMs leaking personally identifiable information (PII) has received less attention. Dataset curation techniques such as scrubbing reduce, but do not prevent, the risk of PII leakage—in practice, scrubbing is imperfect and must balance the trade-off between minimizing disclosure and preserving the utility of the dataset. On the other hand, it is unclear to what extent algorithmic defenses such as differential privacy, designed to guarantee sentence- or user-level privacy, prevent PII disclosure.  

    In a new research paper, Analyzing Leakage of Personally Identifiable Information in Language Models, researchers from Microsoft introduce rigorous game-based definitions for three types of PII leakage via black-box extraction, inference, and reconstruction attacks with only API access to an LM. In the paper, which was presented at the 2023 IEEE Symposium on Security and Privacy, they empirically evaluate the attacks against GPT-2 models fine-tuned with and without defenses in three domains: case law, health care, and e-mail.  

    Their findings show that differential privacy can largely, but not completely, mitigate PII leakage. Traditional data curation approaches such as PII scrubbing are still necessary to achieve sufficient protection. The authors advocate for the design of less aggressive PII scrubbing techniques that account for the protection afforded by DP and achieve a better privacy/utility trade-off. 

    Read the paper Download the code NEW RESEARCH  Automatic Prompt Optimization with “Gradient Descent” and Beam Search

    Large Language Models (LLMs) have shown impressive performance as general-purpose agents, but their abilities remain highly dependent on hand-written prompts, which require onerous trial-and-error work. Automatic or semiautomatic procedures would help people write the best prompts while reducing manual effort. In a recent research paper, Automatic Prompt Optimization with “Gradient Descent” and Beam Search, researchers from Microsoft propose a simple and nonparametric solution to this problem. Automatic Prompt Optimization (APO) is inspired by numerical gradient descent to automatically improve prompts, assuming access to training data and an LLM API. The algorithm uses minibatches of data to form natural language “gradients” that criticize the current prompt. The gradients are then “propagated” into the prompt by editing it in the opposite semantic direction of the gradient. These gradient descent steps are guided by a beam search and bandit selection procedure which significantly improves algorithmic efficiency. Preliminary results across three benchmark NLP tasks and the novel problem of LLM jailbreak detection suggest that APO can outperform prior prompt editing techniques and improve an initial prompt’s performance by up to 31%, by using data to rewrite vague task descriptions into more precise annotation instructions. 

    Read the paper

    The post Research Focus: Week of June 5, 2023 appeared first on Microsoft Research.

Categories: Microsoft

3D telemedicine brings better care to underserved and rural communities, even across continents

Tue, 05/30/2023 - 18:00
Introduction

Providing healthcare in remote or rural areas is challenging, particularly specialized medicine and surgical procedures. Patients may need to travel long distances just to get to medical facilities and to communicate with caregivers. They may not arrive in time to receive essential information before their medical appointments and may have to return home before they can receive crucial follow-up care at the hospital. Some patients may wait several days just to meet with their surgeon. This is a very different experience from that of urban or suburban residents or people in more developed areas, where patients can get to a nearby clinic or hospital with relative ease.

In recent years, telemedicine has emerged as a potential solution for underserved remote populations. The COVID-19 pandemic, which prevented many caregivers and patients from meeting in person, helped popularize virtual medical appointments. Yet 2D telemedicine (2DTM) fails to fully replicate the experience of a face-to-face consultation.

To improve the quality of virtual care, researchers from Microsoft worked with external partners in Scotland to conduct the first validated clinical use of a novel, real-time 360-degree 3D telemedicine system (3DTM). This work produced three studies beginning in 2020, in which 3DTM based on Microsoft’s HoloportationTM communication technology outperformed a 2DTM equivalent. Building on the success of this research, the collaborators conducted a follow-up trial in 2022 with partners in Ghana, where they demonstrated the first intercontinental use of 3DTM. This research provides critical progress toward increasing access to specialized healthcare for rural and underserved communities.

3DTM beats 2DTM in Scotland trials

The dramatic expansion of virtual medicine helped fill a void created by COVID restrictions, but it also underscored the need for more realistic remote consultations. While 2DTM can extend the reach of specialized medicine, it fails to provide doctors and surgeons with the same quantity and quality of information they get from an in-person consultation. Previous research efforts had theorized that 3DTM could raise the bar, but the advantages were purely speculative. Until now, real-time 3DTM had been proposed within a research setting only, because of constraints on complexity, bandwidth, and technology.

In December 2019, researchers from Microsoft began discussing the development of a 3DTM system leveraging Microsoft Holoportation communication technology with collaborators from the Canniesburn Plastic Surgery Unit in Glasgow, Scotland, and Korle Bu Teaching Hospital (KBTH) in Accra, Ghana.

With the emergence of COVID-19 in early 2020, this effort accelerated as part of Microsoft Research’s COVID response, with the recognition that it would allow patients, including those with weakened immune systems, to visit a specialist remotely from the relative safety of a local physician’s office, rather than having to travel to the specialist at a hospital with all the concurrent risk of infection.

The initial research included a deployment in Scotland, with 10 specialized cameras capturing patient images, combining them into a 3D model, and transmitting the 3D image to a medical professional. The patient could view the same images as their doctor, which allowed them to discuss them in real time—almost as if they were in the same room.

Figure 1: A patient participates in a consultation with doctors using the 3D Telemedicine system. The screen allows the patient to view the same images as the clinician.

This work produced three separate studies: a clinician feedback study (23 clinicians, November–December 2020), a patient feedback study (26 patients, July–October 2021), and a study focusing on safety and reliability (40 patients, October 2021–March 2022).

Participatory testing demonstrated improved patient metrics with 3DTM versus 2DTM. Although patients still prefer face-to-face visits, 3DTM was rated significantly higher than 2DTM. Overall patient satisfaction increased to 88 percent with 3DTM from 51 percent with 2DTM; realism, or “presence,” rated higher at 80 percent for 3DTM versus 53 percent for 2DTM; and quality as measured by a Telehealth Usability Questionnaire came in at 85 percent for 3DTM compared with 77 percent for 2DTM. Safety and clinical concordance of 3DTM with a face-to-face consultation were 95 percent – equivalent to or exceeding estimates for 2DTM.

Figure 2: In three studies produced during a trial in Scotland, 3D telemedicine outperformed 2D telemedicine in satisfaction, realism and quality, with a direct correlation between realism and satisfaction.

One of the ultimate goals of telemedicine is to bring the quality of remote consultations closer to face-to-face experiences. This data provides the first evidence that Microsoft’s Holoportation communication technology moves 3DTM closer to this goal than a 2D equivalent.

“We showed that we can do it using off-the-shelf components, making it affordable. And we can deploy it and make it reliable enough so that a doctor or a clinical team could use it to conduct consultations,” said Spencer Fowers, Principal Researcher at Microsoft Research.

Ghana study: 3DTM brings doctors and patients closer

After the successful deployment in Scotland, the team turned its focus to Ghana. The research team visited KBTH in February 2022. That began the collaboration on the next phase of the project and the installation of the first known 3D telemedicine system on the African continent.

Ghana has a population of 31 million people but only 16 reconstructive surgeons, 14 of whom work at KBTH. It’s one of the largest hospitals in west Africa and the country’s main hospital for reconstructive surgery and burn treatment. Traveling to Accra can be difficult for people who live in rural areas of Ghana. It may require a 24-hour bus ride just to get to the clinic. Some patients can’t stay long enough to receive follow-up care or adequate pre-op preparation and counseling. Many people in need of surgery never receive treatment, and those that do may receive incomplete or sub-optimal follow-up care. They show up, have surgery, and go home.

“As a doctor, you typically take it for granted that a patient will come back to see you if they have complications. These are actually very complex operations. But too often in Ghana, the doctors may never see the patient again,” said Steven Lo, a reconstructive surgeon at the Canniesburn Plastic Surgery and Burns Unit in Scotland. Lo has worked for years with KBTH and was the project’s clinical lead in Glasgow.

The researchers worked with surgical team members in Scotland and Ghana to build a portable system with enhanced lighting and camera upgrades compared to the original setup deployed in Scotland. This system would enable patients to meet in 3D with doctors in Scotland and in Ghana, both before and after their surgeries, using Microsoft Holoportation communication technology.

Figure 3: As part of a multidisciplinary team (MDT), doctors in Glasgow visit with patients virtually both before and after their in-person visits at the clinic in Accra. Clinicians in Accra manage follow-up care on site.

The results were multiple successful multidisciplinary team (MDT) engagements—both pre-operative and post-operative—supporting surgeries led by visiting doctors from Scotland at KBTH. The 3DTM system using Microsoft  Holoportation communication technology helped doctors communicate to patients precisely what their surgery would entail ahead of time and then ensure that patients had access to any necessary follow-up procedures and post-operation therapy. The medical team in Glasgow used Microsoft Holoportation communication technology to manipulate and mark up 3D images of their patients. Patients watching from Accra could visualize the procedure, including the exact locations where the surgical incisions would occur.

Figure 4: 3DTM enables better planning, safety, and integration among the international team, plus better patient education and follow-up care.

For a patient who came to KBTH to address a chronic problem with his jaw, this visualization gave him a much better understanding than he had had with previous surgeries, said Levi Ankrah​, a reconstructive surgeon at KBTH​ who participated in the remote consultations and the surgeries in Ghana.

“These are quite complex things to explain. But when the patient could actually see it for himself from the outside, that helped him feel more involved with his care and his follow-up plan,” Ankrah said.

Figure 5: A 3D consultation between a patient in Ghana using “the rig” and doctors in Scotland, who can see the patient and transmit details about his upcoming surgery. Conclusion

One of the ultimate goals of telemedicine is for the quality of remote consultations to get closer to the experience of face-to-face consultations. The data presented in this research suggests significant potential in moving closer to the experience of face-to-face consultations, which is particularly relevant to specialties with a strong 3D focus, such as reconstructive surgery.

Nothing can replace the authenticity and confidence that come from a face-to-face visit with a doctor. But 3DTM shows great promise as a potential state-of-the-art solution for remote telemedicine, replacing current 2DTM virtual visits and driving better access and outcomes for patients.

3D Telemedicine project Watch the video Acknowledgments

We would like to acknowledge the following contributors to this project: Andrea Britto; Thiago Spina; Ben Cutler; Chris O’Dowd; Amber Hoak; Spencer Fowers; David Tittsworth; Whitney Hudson; Mike Shepperd; Johnny Johnson; Steven Lo, Canniesburn Regional Plastic Surgery and Burns Unit, Glasgow; Kwame Darko, Levi Ankrah, and Opoku Ampomah, National Reconstructive Plastic Surgery and Burns Center, Korle Bu Teaching Hospital, Accra. 

Additional thanks to: Korle Bu Teaching Hospital, NHS Scotland West of Scotland Innovation Hub, Canniesburn Plastic Surgery and Burns Unit.

Figure 6: Two views of medical team members. On the left (from left to right): Daniel Dromobi Nii Ntreh, Thiago Spina, Spencer Fowers, Chris O’Dowd, Steven Lo, Arnold Godonu, Andrea Britto. 
 On the right, in medical gear (from left to right): Chris O’Dowd, Kwame Darko, Thiago Spina, Andrea Britto and Spencer Fowers.

The post 3D telemedicine brings better care to underserved and rural communities, even across continents appeared first on Microsoft Research.

Categories: Microsoft

Research Focus: Week of May 22, 2023

Wed, 05/24/2023 - 18:00

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

In this article
  1. Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
  2. DNA storage in thermoresponsive microcapsules for repeated random multiplexed data access
  3. Human-centered AI with Ben Shneiderman, Distinguished University Professor—University of Maryland Department of Computer Science
  4. AI and the New Future of Work – call for proposals
  5. NEW RESEARCH Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

    Emre Kıcıman, Robert Ness, Amit Sharma, Chenhao Tan

    Recent advances in scaling large language models (LLMs) have led to breakthroughs in AI capabilities, including writing code in programming languages, generating stories, poems, essays, and other texts, and strong performance in certain reasoning tasks. LLMs can even create plausible explanations for their outputs, and update their conclusions given new evidence.

    At the same time, LLMs can make absurd claims and basic errors of logic, mathematics, and complex reasoning, which raises questions about their applicability in societally impactful domains such as medicine, science, law, and policy.

    In a new paper: Causal Reasoning and Large Language Models: Opening a New Frontier for Causality, researchers from Microsoft examine the causal capabilities of LLMs. They find that LLMs, on average, can outperform state-of-the-art causal algorithms in graph discovery and counterfactual inference, and can systematize nebulous concepts like necessity and sufficiency of cause by operating solely on natural language input. They show that by capturing commonsense and domain knowledge about causal mechanisms, LLMs open new frontiers for advancing the research, practice, and adoption of causality. The researchers envision pairing LLMs alongside existing causal methods to reduce the required manual effort that has been a major impediment to widespread adoption of causal analysis. 

    Read the paper

    SPOTLIGHT: AI focus area

    AI and Microsoft Research

    Learn more about the breadth of AI research at Microsoft

    Learn more NEW RESEARCH DNA storage in thermoresponsive microcapsules for repeated random multiplexed data access

    As the world generates more and more data, data storage capacity has not kept pace. Traditional long-term storage media such as hard disks or magnetic tape have limited durability and storage density. But DNA has an intrinsic capacity for information storage, durability, and high information density.

    In DNA data storage, a large amount of data is stored together, and it is important to perform random access – selective retrieval of individual data files. This is achieved using polymerase chain reaction (PCR), a molecular process that can exponentially amplify a target file. However, this process can damage the data and cause errors. PCR amplification of multiple files simultaneously creates serious undesired DNA crosstalk. Currently one can only read one file at a time, but not a subset of files in a larger set.

    In a recent paper: DNA storage in thermoresponsive microcapsules for repeated random multiplexed data access, researchers from Microsoft and external colleagues report on their work to develop a microcapsule-based PCR random access. By encapsulating individual files in each capsule, DNA files were physically separated, reducing undesired crosstalk. This enabled the simultaneous reading of all 25 files in the pool, without significant errors. The use of microcapsules also allowed DNA files to be recovered after random access, addressing the destructive reads problem and potentially making DNA data storage more economical.

    Read the paper MICROSOFT RESEARCH TALK Human-centered AI with Ben Shneiderman, Distinguished University Professor—University of Maryland Department of Computer Science

    A new synthesis is emerging that integrates AI technologies with human-computer interaction (HCI) to produce human-centered AI (HCAI). Advocates of HCAI seek to amplify, augment, and enhance human abilities, so as to empower people, build their self-efficacy, support creativity, recognize responsibility, and promote social connections. Researchers, developers, business leaders, policy makers, and others are expanding the technology-centered scope of AI to include HCAI ways of thinking.

    In this recent Microsoft Research Talk: Human-Centered AI: Ensuring Human Control While Increasing Automation Ben Shneiderman discusses his HCAI framework, design metaphors, and governance structures and other ideas drawn from his award-winning new book Human-Centered AI. The talk by Shneiderman, a Distinguished University Professor in the University of Maryland Department of Computer Science, is hosted by Mary Czerwinski, Partner Researcher and Research Manager with Microsoft Research.

    OPPORTUNITIES AI and the New Future of Work – call for proposals

    The Microsoft New Future of Work Initiative is now accepting proposals to fund academic projects that help maximize the impact of LLMs and related AI systems on how work gets done. This call for proposals targets work that specifically supports the use of LLMs in productivity scenarios. The program plans to distribute five $50,000 USD unrestricted awards to support creative research that redefines what work might mean in various contexts. 

    For example: how can we ensure these new technologies truly accelerate productivity rather than having effects on the margins; how can LLMs achieve these gains by augmenting human labor; what is the future of a ‘document’ in a world where natural language can be so easily remixed and repurposed.  

    Proposals will be accepted through June 5, 2023.

    Learn more & apply

    The post Research Focus: Week of May 22, 2023 appeared first on Microsoft Research.

Categories: Microsoft

REACT — A synergistic cloud-edge fusion architecture

Thu, 05/18/2023 - 19:00

This research paper was accepted by the eighth ACM/IEEE Conference on Internet of Things Design and Implementation (IoTDI), which is a premier venue on IoT. The paper describes a framework that leverages cloud resources to execute large deep neural network (DNN) models with higher accuracy to improve the accuracy of models running on edge devices.

Leveraging the cloud and edge concurrently

The internet is evolving towards an edge-computing architecture to support latency-sensitive DNN workloads in the emerging Internet of Things and mobile computing applications domains. However, unlike cloud environments, the edge has limited computing resources and cannot run large, high accuracy DNN models. As a result, past work has focused on offloading some of the computation to the cloud to get around this limitation. However, this comes at the cost of increased latency.

For example, in edge video analytics use cases, such as road traffic monitoring, drone surveillance, and driver assist technology, one can transmit occasional frames to the cloud to perform object detection—a task ideally suited to models hosted on powerful GPUs. On the other hand, the edge handles the interpolating intermediate frames through object tracking—a relatively inexpensive computational task performed using general-purpose CPUs, a low-powered edge GPU, or other edge accelerators (e.g., Intel Movidius Neural Stick). However, for most real-time applications, processing data in the cloud is infeasible due to strict latency constraints.

Spotlight: On-Demand EVENT

Microsoft Research Summit 2022

On-Demand
Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.

Explore sessions

In our research paper, REACT: Streaming Video Analytics On The Edge With Asynchronous Cloud Support, we propose and demonstrate a novel architecture that leverages both the edge and the cloud concurrently to perform redundant computations at both ends. This helps retain the low latency of the edge while boosting accuracy with the power of the cloud. Our key technical contribution is in fusing the cloud inputs, which are received asynchronously, into the stream of computation at the edge, thereby improving the quality of detection without sacrificing latency.

Fusing edge and cloud detections Figure 1(a): Orange and green boxes indicate detection from edge and cloud. Tracking performance degrades with every frame, indicated by the fading shades of blue. Figure 1(b): REACT uses asynchronous cloud detections to correct the box labels and detect more objects.

We illustrate our fusion approach in REACT for object detection in videos. Figure 1 shows the result of object detection using a lightweight edge model. This suffers from both missed objects (e.g., cars in Frame 1 are not detected) and misclassified objects (e.g., the van on the right of the frame that has been misclassified as a car).

To address the challenges of limited edge computation capacity and the drop in accuracy from using edge models, we follow a two-pronged approach. First, since the sequence of video frames are spatiotemporally correlated, it suffices to call edge object detection only once every few frames. As illustrated in Figure 1(a), edge detection runs every fifth frame. As shown in the figure, to interpose the intermediate frames, we employ a comparatively lightweight operation of object tracking. Second, to improve the accuracy of inference, select frames are asynchronously transmitted to the cloud for inference. Depending on network delay and the availability of cloud resources, cloud detections reach the edge device only after a few frames. Next, the newer cloud detections—previously undetected—are merged with the current frame. To do this, we feed the cloud detection, which was made on an old frame, into another instance of the object tracker to “fast forward” to the current time. The newly detected objects can then be merged into the current frame so long as the scene does not change abruptly. Figure 1(b) shows a visual result of our approach on a dashcam video dataset.

Here’s a more detailed description of how REACT goes about combining the edge and the cloud detections. Each detection contains objects represented by a ⟨class_label, bounding_box, confidence_score⟩ tuple. Whenever we receive a new detection (either edge or cloud), we purge from the current list the objects that were previously obtained from the same detection source (either cloud or edge). Then we form a zero matrix of size (c, n). Here, c and are the indices associated with detections from current list and new source, respectively. We populate the matrix cell with the Intersection over Union (IoU) values—if it is greater than 0.5—corresponding to specific current and new detections. We then perform a linear sum assignment, which matches two objects with the maximum overlap. For overlapped objects, we modify the confidence values, bounding box, and class label based on the new detections’ source. Specifically, our analysis reveals that edge detection models could correctly localize objects, but often had false positives, i.e., they assigned class labels incorrectly. In contrast, cloud detections have higher localization error but lower error for class labels. Finally, newer objects (unmatched ones) will then get added to the list of current objects with the returned confidence values, bounding boxes, and class labels. Thus, REACT’s fusion algorithm must consider multiple cases —such as misaligned bounding boxes, class label mismatch, etc. — to consolidate the edge and cloud detections into a single list.

DetectorBackboneWhere#paramsFaster R-CNNResNet50-FPNCloud41.5MRetinaNetResNet50-FPNCloud36.1MCenterNetDLA34Cloud20.1MTinyYOLOv3DN19Edge8.7MSSDMobileNetV2Edge3.4MTable 1: Models used in our evaluation

In our experimentation, we leveraged state-of-the-art computer vision algorithms for getting object detections at the edge and the cloud (see Table 1). Further, we use mAP@0.5 (mean average precision at 0.5 IoU), a metric popular in the computer vision community to measure the performance of object detections. Moreover, to evaluate the efficacy of REACT, we looked at two datasets:

  1. VisDrone: as drone-based surveillance
  2. D2City: dashcam-based driver assist

Based on our evaluation, we observed that REACT outperforms baseline algorithms by as much as 50%. Also, we noted that edge and cloud models can complement each other, and overall performance improves due to our edge-cloud fusion algorithm.

As already noted, the object detector runs only once every few frames and a lightweight object tracking is performed on intermediate frames. Running detection redundantly at both the edge and the cloud allows an application developer to flexibly trade off the frequency of edge versus cloud executions while achieving the same accuracy, as shown in Figure 2. For example, if the edge device experiences thermal throttling, we can pick a lower edge detection frequency (say, once every 20 frames) and complement it with cloud detection once every 30 frames to get mAP@0.5 of around 22.8. However, if there are fewer constraints at the edge, we can increase the edge detection frequency to once every five frames and reduce cloud detections to once every 120 frames to get similar performance (mAP@0.5 of 22.7). This provides a playground for fine-grained programmatic control.

Figure 2: mAP@0.5 values for varying cloud and edge detection frequency on the D2-City dataset. Similar shading corresponds to similar mAP@0.5.

Further, one can amortize the cost of using cloud resources over multiple edge devices by having these share the same cloud hosted model. Specifically, if an application can tolerate a median latency of up to 500 ms, we can support over 60 concurrent devices at a time using the V100 GPU (Figure 3).

Figure 3: 50th percentile response time vs number of edge devices that concurrently share a cloud GPU Conclusion

REACT represents a new paradigm of edge + cloud computing that leverages the resources of each to improve accuracy without sacrificing latency. As we have shown above, the choice between offloading and on-device inference is not binary, and redundant execution at cloud and edge locations complement each other when carefully employed. While we have focused on object detection, we believe that this approach could be employed in other contexts such as human pose-estimation, instance and semantic segmentation applications to have the “best of both worlds.”

The post REACT — A synergistic cloud-edge fusion architecture appeared first on Microsoft Research.

Categories: Microsoft

Achieving Zero-COGS with Microsoft Editor Neural Grammar Checker

Thu, 05/18/2023 - 18:00

Microsoft Editor provides AI-powered writing assistance to millions of users around the world. One of its features that writers of all levels and domains rely on is the grammar checker, which detects grammar errors in a user’s writing and offers suggested corrections and explanations of the detected errors.

The technology behind grammar checker has evolved significantly since the 1970s, when the first-generation tool was based on simple pattern matching. A major breakthrough occurred in 1997, when Microsoft Word 97 introduced a grammar checker that relied on a full-fledged natural language processing system (Heidorn, 2000), enabling more sophisticated and accurate error detection and correction. Another major breakthrough occurred in 2020, when Microsoft launched a neural grammar checker that leveraged deep neural networks with a novel fluency boost learning and inference mechanism, achieving state-of-the-art results on both CoNLL-2014 and JFLEG benchmark datasets[1,2]. In 2022, Microsoft released a highly optimized version of the Microsoft Editor neural grammar checker on expanded endpoints in Word Win32, Word Online, Outlook Online, and the Editor Browser Extension.

In this blog post, we will describe how we have optimized the Editor neural grammar checker model using the Aggressive Decoding algorithm pioneered by Microsoft Research (MSR) and accelerated with high performance ONNX Runtime (ORT). With the Aggressive Decoding algorithm and ORT optimizations, the server model has achieved ~200% increase in inference speed while saving two-thirds of the cost, with no loss of model prediction quality compared to the previous production model.

Spotlight: Microsoft Research Podcast

AI Frontiers: The Physics of AI with Sébastien Bubeck

What is intelligence? How does it emerge and how do we measure it? Ashley Llorens and machine learning theorist Sébastian Bubeck discuss accelerating progress in large-scale AI and early experiments with GPT-4.

Listen now

But we did not stop there. We also implemented EdgeFormer, MSR’s cutting-edge on-device seq2seq modeling technology, to obtain a lightweight generative language model with competitive performance that can be run on a user’s device, allowing us to achieve the ultimate zero-cost-of-goods-sold (COGS) goal.

Shipping a client model offers three other key benefits in addition to achieving zero-COGS:

  1. Increased privacy. A client model that runs locally on the user’s device does not need to send any personal data to a remote server.
  2. Increased availability. A client model operates offline without relying on network connectivity, bandwidth, or server capacity.
  3. Reduced cost and increased scalability. Shipping a client model to a user’s device removes all the computation that a server would be required to execute, which allows us to ship to more customers.

Additionally, we leveraged GPT-3.5 (the most advanced AI model at the time) to generate high-quality training data and identify and remove low-quality training examples, leading to a boost of model performance.

Innovation: Aggressive Decoding

Behind the AI-powered grammar checker in Microsoft Editor is the transformer model, enhanced by cutting-edge research innovations[1,2,3] from MSR for grammar correction. As with most seq2seq tasks, we used autoregressive decoding for high-quality grammar correction. However, conventional autoregressive decoding is very inefficient as it cannot fully utilize modern computing devices (CPUs, GPUs) due to its low computational parallelism, which results in high model serving costs and prevents us from scaling quickly to more (web/desktop) endpoints.

To address the challenge for serving cost reduction, we adopt the latest decoding innovation, Aggressive Decoding,[3] published by MSR researchers Tao Ge and Furu Wei at ACL 2021. Unlike the previous methods that speed up inference at the cost of prediction quality drop, Aggressive Decoding is the first efficient decoding algorithm for lossless speedup of seq2seq tasks, such as grammar checking and sentence rewriting. Aggressive Decoding works for tasks whose inputs and targeted outputs are highly similar. It uses inputs as the targeted outputs and verifies them in parallel instead of decoding sequentially, one-by-one, as in conventional autoregressive decoding. As a result, it can substantially speed up the decoding process, handling trillions of requests per year, without sacrificing quality by better utilizing the powerful parallel computing capabilities of modern computing devices, such PCs with graphics processing units (GPUs).

The figure above shows how Aggressive Decoding works. If we find a bifurcation during Aggressive Decoding, we discard all the predictions after the bifurcation and re-decode them using conventional one-by-one autoregressive decoding. If we find a suffix match (i.e., some advice highlighted with the blue dotted lines) between the output and the input during one-by-one re-decoding, we switch back to Aggressive Decoding by copying the tokens (highlighted with the orange dashed lines) and following the matched tokens in the input to the decoder input by assuming they will be the same. In this way, Aggressive Decoding can guarantee that the generated tokens are identical to autoregressive greedy decoding but with much fewer decoding steps, significantly improving the decoding efficiency.

Offline evaluations

We test Aggressive Decoding in grammar correction and other text rewriting tasks, such as text simplification, with a 6+6 standard transformer as well as a transformer with deep encoder and shallow decoder. All results confirm that Aggressive Decoding can introduce a significant speedup without quality loss.

   CoNLL14NLCC-18WikilargeF0.5speedupF0.5speedupSARIBLEUspeedup6+6 Transformer (beam=1)61.3129.4136.190.716+6 Transformer (AD)61.36.829.47.736.190.78    CoNLL14F0.5speedup12+2 Transformer (beam=1)66.4112+2 Transformer (AD)66.44.2

And it can work even better on more powerful computing devices that excel at parallel computing (e.g., A100):

Online evaluation

We ran an A/B experiment between a Marian server model and an equal size server model with Aggressive Decoding using ONNX Runtime. The latter shows 2x+ improvement @p50 and 3x+ improvement @p95 and @p99 over the Marian runtime, with conventional autoregressive decoding in CPU as shown in the graph below. Moreover, it offers better efficiency stability than the previous autoregressive decoding, which varies drastically in latency (approximately proportional to the sentence length), as Aggressive Decoding substantially reduces the decoding cost with only a few steps of parallel computing regardless of the sentence length. This substantial inference time speedup resulted in a two-thirds COGS reduction in the production endpoints.

Both offline/online evaluations confirm that Aggressive Decoding allows us to achieve significant COGS reduction without any loss of model prediction quality. Based on this intuition, we generalize[4] Aggressive Decoding to more general seq2seq tasks. Its high efficiency with lossless quality makes Aggressive Decoding likely to become the de facto decoding standard for seq2seq tasks and to play a vital role in the cost reduction of seq2seq model deployment.

Accelerate Grammar Checker with ONNX Runtime

ONNX Runtime is a high-performance engine, developed by Microsoft, that runs AI models across various hardware targets. A wide range of ML-powered Microsoft products leverage ONNX Runtime for inferencing performance acceleration. To further reduce the inferencing latency, the PyTorch Grammar Checker with Aggressive Decoding was exported to ONNX format using PyTorch-ONNX exporter, then inferenced with ONNX Runtime, which enables transformer optimizations and quantitation for CPU performance acceleration as well as model size reduction. A number of techniques are enabled in this end-to-end solution to run the advanced grammar checker model efficiently.

PyTorch provides a built-in function to export the PyTorch model to ONNX format with ease. To support the unique architecture of the grammar checker model, we enabled export of complex nested control flows to ONNX in the exporter. During this effort, we also extended the official ONNX specification on sequence type and operators to represent more complex scenarios (i.e., the autoregressive search algorithm). This eliminates the need to separately export model encoder and decoder components and stitch them together later with additional sequence generation implementation for production. With sequence type and operators support in PyTorch-ONNX exporter and ONNX Runtime, we were able to export one single ONNX graph, including encoder and decoder and sequence generation, which brings in both efficient computation and simpler inference logic. Furthermore, the shape type inference component of PyTorch ONNX exporter is enhanced to produce a valid ONNX model under stricter ONNX shape type constraints.

The innovative Aggressive Decoding algorithm introduced in the grammar checker model was originally implemented in Fairseq. To make it ONNX compatible, we reimplemented this Aggressive Decoding algorithm in HuggingFace for easy exporting. When diving into the implementation, we identified certain components that are not directly supported in the ONNX standard operator set (e.g., bifurcation detector). There are two approaches for exporting unsupported operators to ONNX and running with ONNX Runtime. We can either create a graph composing several standard ONNX operators that have equivalent semantics or implement a custom operator in ONNX Runtime with more efficient implementation. ONNX Runtime custom operator capability allows users to implement their own operators to run within ONNX Runtime with more flexibility. This is a tradeoff between implementation cost and performance. Considering the complexity of these components, the composition of standard ONNX operators might become a performance bottleneck. Hence, we introduced custom operators in ONNX Runtime to represent these components.

ONNX Runtime enables transformer optimizations and quantization, showing very promising performance gain on both CPU and GPU. We further enhanced encoder attention fusion and decoder reshape fusion for the grammar checker model. Another big challenge of supporting this model is multiple model subgraphs. We implemented subgraphs fusion in ONNX Runtime transformers optimizer and quantization tool. ONNX Runtime Quantization was applied to the whole model, further improving throughput and latency.

Quality Enhancement by GPT-3.5 LLMs

To further improve the precision and recall of the models in production, we employ the powerful GPT-3.5 as the teacher model. Specifically, the GPT-3.5 model works in the following two ways to help improve the result:

  • Training data augmentation: We fine-tune the GPT-3.5 model and use it to generate labels for massive unannotated texts. The annotations obtained are verified to be of high quality and can be used as augmented training data to enhance the performance of our model.
  • Training data cleaning: We leverage the powerful zero/few-shot capability of GPT-3.5 to distinguish between high-quality and low-quality training examples. The annotations of the identified low-quality examples are then regenerated by the GPT-3.5 model, resulting in a cleaner and higher-quality training set, which directly enhances the performance of our model.
EdgeFormer: Cost-effective parameterization for on-device seq2seq modeling

In recent years, the computational power of client devices has greatly increased, allowing for the use of deep neural networks to achieve the ultimate zero-COGS goal. However, running generative language models on these devices still poses a significant challenge, as the memory efficiency of these models must be strictly controlled. The traditional methods of compression used for neural networks in natural language understanding are often not applicable when it comes to generative language models.

To ship a client grammar model, the model should be highly efficient (e.g., within 100ms latency), which has already been solved by Aggressive Decoding, mentioned earlier. Moreover, the client model must be memory-efficient (e.g., within a 50MB RAM footprint), which is the main bottleneck for a powerful (generative) transformer model (usually over 50 million parameters) to run on a client device.

To address this challenge, we introduce EdgeFormer[6], a cutting-edge on-device seq2seq modeling technology for obtaining lightweight generative language models with competitive performance that can be easily run on a user’s computer.

The main idea of EdgeFormer is two principles, which we proposed for cost-effective parameterization:

  • Encoder-favored parameterization
  • Load-balanced parameterization

We designed EdgeFormer with the above principles of cost-effective parameterization, allowing each parameter to be utilized to its maximum potential, which achieves competitive results despite the stringent computational and memory constraints of client devices.

Based on EdgeFormer, we further propose EdgeLM – the pretrained version of EdgeFormer, which is the first publicly available pretrained on-device seq2seq model that can be easily fine-tuned for seq2seq tasks with strong results. EdgeLM serves as the foundation model of the grammar client model to realize the zero-COGS goal, which achieves over 5x model size compression with minimal quality loss compared to the server model.

Inference cost reduction to empower client-device deployment

Model deployment on client devices has strict requirements on hardware usage, such as memory and disk size, to avoid interference with other user applications. ONNX Runtime shows advantages for on-device deployment along with its lightweight engine and comprehensive client-inference focused solutions, such as ONNX Runtime quantization and ONNX Runtime extensions. In addition, to maintain service quality while meeting shipping requirements, MSR introduced a series of optimization techniques, including system-aware model optimization, model metadata simplification, and deferred parameter loading as well as customized quantization strategy. Based on the EdgeFormer modeling, these system optimizations can further reduce the memory cost by 2.7x, without sacrificing model performance.

We will elaborate on each one in the following sections: 

System-aware model optimization. As the model is represented as a dataflow graph, the major memory cost for this model is from the many subgraphs generated. As shown in the figure below, a branch in the PyTorch code is mapped as a subgraph. Therefore, we optimize the model implementation to reduce the usage of branch instructions. Particularly, we leverage greedy search as the decoder search algorithm, as beam search contains more branch instructions. The usage of this method can reduce memory cost by 38%

Mapping of PyTorch model and ONNX model graph

Model metadata simplification. Also shown in the figure above, the model contains a lot of metadata that consumes memory, such as the node name and type, input and output, and parameters. To reduce the cost, we simplify the metadata to keep only the basic required information for inference. For example, the node name is simplified from a long string to an index. Besides that, we optimize the model graph implementation in ONNX Runtime to keep just one copy of the metadata, rather than duplicating all the available metadata each time a subgraph is generated.

Deferred weight loading in ONNX Runtime. Current model files include both the model graphs and weights, which are then loaded into memory together during model initialization. However, this increases memory usage as shown in the figure below, because the weights will be copied repeatedly during model graph parsing and conversion. To avoid this, we save model graphs and weights separately. During initialization in ONNX Runtime, only the graphs are loaded into memory for actual parsing and conversion. The weights, on the other hand, still reside on disk with only the pointer kept in memory, through file mapping. The actual weight loading to memory will be deferred until the model inference. This technique can reduce the peak memory cost by 50%.

Deferred weights loading by file mapping during model initialization

ONNX Runtime quantization and ONNX Runtime extensions. Quantization is a well-known model compression technique that brings in both performance acceleration and model size reduction while sacrificing model accuracy. ONNX Runtime Quantization offers diverse tuning knobs to allow us to apply customized quantization strategy. Specifically, we customize the strategy as post-training, dynamic, UINT8, per-channel and all-operator quantization, for this model for minimum accuracy impact. Onnxruntime-extensions provides a set of ONNX Runtime custom operators to support the common pre- and post-processing operators for vision, text, and natural language processing models. With it, the pre- and post-processing for this model, including tokenization, string manipulation, and so on, can be integrated into one self-contained ONNX model file, leading to improved performance, simplified deployment, reduced memory usage, and better portability.

Conclusion

In this blog post, we have presented how we leveraged the cutting-edge research innovations from MSR and ONNX Runtime to optimize the server grammar checker model and achieve the ultimate zero-COGS goal with the client grammar checker model. The server model has achieved ~200% increase in inference speed while saving two-thirds of the cost, with no loss of model prediction quality. The client model has achieved over 5x model size compression with minimal quality loss compared to the server model. These optimizations have enabled us to scale quickly to more web and desktop endpoints and provide AI-powered writing assistance to millions of users around the world.

The innovation shared in this blog post is just the first milestone in our long-term continuous effort of COGS reduction for generative AI models. Our proposed approach is not limited to accelerating the neural grammar checker; it can be easily generalized and applied more broadly to scenarios such as abstractive summarization, translation, or search engines to accelerate large language models for COGS reduction[5,8], which is critical not only for Microsoft but also for the entire industry in the artificial general intelligence (AGI) era.

Reference

[1] Tao Ge, Furu Wei, Ming Zhou: Fluency Boost Learning and Inference for Neural Grammatical Error Correction. In ACL 2018.

[2] Tao Ge, Furu Wei, Ming Zhou: Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study. https://arxiv.org/abs/1807.01270

[3] Xin Sun, Tao Ge, Shuming Ma, Jingjing Li, Furu Wei, Houfeng Wang: A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-lingual Language Model. In IJCAI 2022.

[4] Xin Sun, Tao Ge, Furu Wei, Houfeng Wang: Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding. In ACL 2021.

[5] Tao Ge, Heming Xia, Xin Sun, Si-Qing Chen, Furu Wei: Lossless Acceleration for Seq2seq Generation with Aggressive Decoding. https://arxiv.org/pdf/2205.10350.pdf

[6] Tao Ge, Si-Qing Chen, Furu Wei: EdgeFormer: A Parameter-efficient Transformer for On-device Seq2seq Generation. In EMNLP 2022.

[7] Heidorn, George. “Intelligent Writing Assistance.” Handbook of Natural Language Processing. Robert Dale, Hermann L. Moisl, and H. L. Somers, editors. New York: Marcel Dekker, 2000: 181-207.

[8] Nan Yang, Tao Ge, Liang Wang, Binxing Jiao, Daxin Jiang, Linjun Yang, Rangan Majumder, Furu Wei: Inference with Reference: Lossless Acceleration of Large Language Models. https://arxiv.org/abs/2304.04487

The post Achieving Zero-COGS with Microsoft Editor Neural Grammar Checker appeared first on Microsoft Research.

Categories: Microsoft

Large-language models for automatic cloud incident management

Tue, 05/16/2023 - 18:00

This research was accepted by the IEEE/ACM International Conference on Software Engineering (ICSE), which is a forum for researchers, practitioners, and educators to gather, present, and discuss the most recent innovations, trends, experiences, and issues in the field of software engineering.

The Microsoft 365 Systems Innovation research group has a paper accepted at the 45th International Conference on Software Engineering (ICSE), widely recognized as one of the most prestigious research conferences on software engineering. This paper, Recommending Root-Cause and Mitigation Steps for Cloud Incidents using Large Language Models, focuses on using state-of-the-art large language models (LLMs) to help generate recommendations for cloud incident root cause analysis and mitigation plans. With a rigorous study on real production incidents and analysis of several LLMs in different settings using semantic and lexical metrics as well as human evaluation, the research shows the efficacy and future potential of using AI for resolving cloud incidents.

Challenges of building reliable cloud services

Building highly reliable hyperscale cloud services such as Microsoft 365 (M365), which supports the productivity of hundreds of thousands of organizations, is very challenging. This includes the challenge of quickly detecting incidents, then performing root cause analysis and mitigation.

Our recent research starts with understanding the fundamentals of production incidents: we analyze the life cycle of incidents, then determine the common root causes, mitigations, and engineering efforts for resolution. In a previous paper: How to Fight Production Incidents? An Empirical Study on a Large-scale Cloud Service, which won a Best Paper award at SoCC’22, we provide a comprehensive, multi-dimensional empirical study of production incidents from Microsoft Teams. From this study, we envision that automation should support incident diagnosis and help identify the root cause and mitigation steps to quickly resolve an incident and minimize customer impact. We should also leverage past lessons to build resilience for future incidents. We posit that adopting AIOps and using state-of-the-art AI/ML technologies can help achieve both goals, as we show in the ICSE paper.

Spotlight: Microsoft Research Podcast

AI Frontiers: The Physics of AI with Sébastien Bubeck

What is intelligence? How does it emerge and how do we measure it? Ashley Llorens and machine learning theorist Sébastian Bubeck discuss accelerating progress in large-scale AI and early experiments with GPT-4.

Listen now Adapting large-language models for automated incident management

Recent breakthroughs in AI have enabled LLMs to develop a rich understanding of natural language. They can understand and reason over large volumes of data and complete a diverse set of tasks, such as code completion, translation, and Q&A. Given the complexities of incident management, we sought to evaluate the effectiveness of LLMs in analyzing the root cause of production incidents and generating mitigation steps.

Figure 1: Leveraging GPT-3.x for root cause analysis and mitigation

In our recently published ICSE paper, we demonstrated the usefulness of LLMs for production incident diagnosis for the first time. When an incident ticket is created, the author specifies a title for each incident created and describes any relevant details, such as error messages, anomalous behavior, and other details which might help with resolution. We used the title and the summary of a given incident as the input for LLMs and generated root cause and mitigation steps, as shown in Figure 1.

We did a rigorous study on more than 40,000 incidents generated from more than 1000 services and compared several LLMs in zero-shot, fine-tuned, and multi-task settings. We find that fine-tuning the GPT-3 and GPT-3.5 models significantly improves the effectiveness of LLMs for incident data.

Effectiveness of GPT-3.x models at finding root causes ModelBLEU-4ROUGE-LMETEORBERTScoreBLEURTNUBIATop1Top5Top1Top5Top1Top5Top1Top5Top1Top5Top1Top5RoBERTa4.21NA12.83NA9.89NA85.38NA35.66NA33.94NACodeBERT3.38NA10.17NA6.58NA84.88NA33.19NA39.05NACurie3.406.2919.0415.447.2113.6584.9086.3632.6240.0833.5249.76Codex3.446.258.9815.517.3313.8284.8586.3332.5040.1133.6449.77Davinci3.345.948.5315.106.6712.9583.1384.4131.0638.6135.2850.79Davinci-0024.247.1511.4317.210.4216.885.4286.7836.7742.8732.351.34%gain for Davinci-00223.2613.6726.4410.9042.1621.560.610.4912.726.88-8.451.08Table 1: Lexical and semantic performance of different LLMs

In our offline evaluation, we compared the performance of GPT-3.5 against three GPT-3 models by computing several semantic and lexical metrics (which measures the text similarity) between the generated recommendations and the ground truth of root cause or mitigation steps mentioned in incident management (IcM) portal. The average gains for GPT-3.5 metrics for different tasks were as follows: 

  1. For root cause and mitigation recommendation tasks, Davinci-002 (GPT-3.5) provided at least 15.38% and 11.9% gains over all the GPT-3 models, respectively, as shown in Table 1.
  2. When we generated mitigation plans by adding root cause as input to the model, GPT-3.5 model provided at least an 11.16% gain over the GPT-3 models.
  3. LLMs performed better on machine reported incidents (MRIs) as opposed to customer reported incidents (CRIs), due to the repetitive nature of the MRIs.
  4. Fine-tuning LLMs with incident data improved performance significantly. A fine-tuned GPT-3.5 model improved the average lexical similarity score by 45.5% for root cause generation and 131.3% for mitigation generation tasks over zero-shot (i.e., inferencing directly on pretrained GPT-3 or GPT-3.5 model) setting.
Looking through the incident owners’ eyes

In addition to analysis with semantic and lexical metrics, we also interviewed the incident owners to evaluate the effectiveness of the generated recommendations. Overall, GPT-3.5 outperforms GPT-3 in a majority of the metrics. More than 70% of on-call engineers gave a rating of 3 out of 5 or better for the usefulness of recommendations in a real-time production setting.

Looking forward

With future versions of LLMs coming, we expect the performance for automatic incident resolution will further improve, and the need for fine-tuning may decrease. Yet we are in the initial stage, with many open research questions in this field. For instance, how can we incorporate additional context about the incident, such as discussion entries, logs, service metrics, and even dependency graphs of the impacted services to improve the diagnosis? Another challenge is staleness since the models would need to be frequently retrained with the latest incident data. To solve these challenges, we are working on leveraging the latest LLMs combined with retrieval augmented approaches to improve incident diagnosis via a conversational interface, as shown in Figure 2.

Figure 2: Workflow of retrieval-augmented root cause analysis

Moreover, ChatGPT can be actively integrated into the “discussion” of the incident diagnosis. By collecting evidence from available documents and logs, the model can generate coherent, contextual, natural-sounding responses to inquiries and offer corresponding suggestions, thereby facilitating the discussion and accelerating the incident resolution process. We believe this could deliver a step function improvement in the overall incident management process with contextual and meaningful root causes analysis and mitigation, thereby reducing significant human effort required and bolstering reliability and customer satisfaction.

Acknowledgement

This post includes contributions from Toufique Ahmed during his internship at Microsoft.

The post Large-language models for automatic cloud incident management appeared first on Microsoft Research.

Categories: Microsoft

Highlights from CHI 2023

Mon, 05/15/2023 - 19:21

The ways in which people are able to interact with technologies can have a profound effect on a technology’s utility and adoptability. Building computing tools and services around people’s natural styles of work, communication, and play can give technology the value it needs to have meaningful impact. For decades, human-computer interaction (HCI) has examined the relationship between people and computers to help maximize the capabilities of each across a range of experiences and situations.

The ACM CHI Conference on Human Factors in Computing Systems (CHI) is a renowned meeting ground for top talent in the HCI field and a showcase for some of its most compelling work. Hosted April 23 through April 28, this year’s conference drew more than 4,500 participants from 79 countries. Contributions from Microsoft researchers and their collaborators demonstrated the breadth of work inspired by the myriad and diverse ways people use computing today and will in the future.

Check out a few highlights from this year’s conference below, including researchers’ efforts to better understand the role of wellbeing in work, to augment memory through our sense of smell, and to bridge the gap between programmers and code-generating models, which received honorable mention at the conference.

“What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models
CHI 2023 Honorable Mention

Michael Xieyang Liu, Advait Sarkar, Carina Negreanu, Ben Zorn, Jack Williams, Neil Toronto, Andy Gordon

Programming languages are an extremely powerful form of user interface. They also happen to be extremely difficult to learn, especially for non-expert end-user programmers who lack training in computing. What if end-user programmers could instead use a natural language they already know? This prospect can be realized through large language models (LLM): deep neural networks using the transformer architecture, trained on large corpora, and fine-tuned to generate code from natural language. Despite impressive benchmark performance, LLMs are beset with issues in practical use. Lab and field studies have shown that the mapping between natural language and code is poorly understood, that generated code can contain subtle bugs, and that generated code can be difficult to verify.

In their paper, researchers consider the specific problem of abstraction matching: when the user has well-formed intent, how do they select an utterance from the near infinite space of naturalistic utterances that they believe the system will reliably map to a satisfactory solution? This involves “matching” the utterance to the right level of “abstraction” by specifying the utterance at a level of granularity and detail that matches the set of actions the system can take and selecting suitable words and grammar.

View publication

Workplace Rhythm Variability and Emotional Distress in Information Workers

Subigya Kumar Nepal, Javier Hernandez, Judith Amores, Mehrab Bin Morshed, Robert Lewis, Hemma Prafullchandra, Mary Czerwinski

Regularity in daily activities has been linked to positive wellbeing outcomes, but previous studies have mainly focused on clinical populations and traditional daily activities such as sleep and exercise. This research extends prior work by examining the regularity of both self-reported and digital activities of 49 information workers in a four-week naturalistic study. Findings suggest that greater variability in self-reported mood, job demands, lunch time, and sleep quality may be associated with increased stress, anxiety, and depression. However, when it comes to digital activity–based measures, greater variability in rhythm is associated with reduced emotional distress. This study expands our understanding of workers and the potential insights that can be gained from analyzing technology interactions and wellbeing.

View publication

Spotlight: On-Demand EVENT

Microsoft Research Summit 2022

On-Demand
Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.

Explore sessions

Olfactory Wearables for Targeted Memory Reactivation

Judith Amores, Nirmita Mehra, Bjoern Rasch, Pattie Maes

This paper investigates how a smartphone-controlled olfactory wearable might improve memory recall. Researchers conducted a within-subjects experiment with 32 participants using the device and not using the device (control). In the experimental condition, bursts of odor were released during visuo-spatial memory navigation tasks, which also had a language learning component, and rereleased during sleep the following night in the subjects’ home. The researchers found that compared with control, there was an improvement in memory performance when using the scent wearable in memory tasks that involved walking in a physical space. Furthermore, participants recalled more objects and translations when re-exposed to the same scent during the recall test in addition to during sleep. These effects were statistically significant, and in the object recall task, they also persisted for more than a week. This experiment demonstrates a potential practical application of olfactory interfaces that can interact with a user during wake, as well as sleep, to support memory.

View publication

AdHocProx: Sensing Mobile, Ad-Hoc Collaborative Device Formations using Dual Ultra-Wideband Radios

Richard Li, Teddy Seyed, Nicolai Marquardt, Eyal Ofek, Steve Hodges, Mike Sinclair, Hugo Romat, Michel Pahud, Jatin Sharma, William A. S. Buxton, Ken Hinckley, Nathalie Henry Riche

In their paper, researchers present AdHocProx, a system that uses device-relative, inside-out sensing to augment co-located collaboration across multiple devices without recourse to externally anchored beacons or even reliance on Wi-Fi connectivity.

AdHocProx achieves this via sensors, including dual ultra-wideband (UWB) radios for sensing distance and angle to other devices in dynamic, ad-hoc arrangements and capacitive grip to determine where the user’s hands hold the device and to partially correct for the resulting UWB signal attenuation. All spatial sensing and communication take place via the side-channel capability of the UWB radios, suitable for small-group collaboration across up to four devices (eight UWB radios).

Together, these sensors detect proximity and natural, socially meaningful device movements to enable contextual interaction techniques. Researchers find that AdHocProx can obtain 95 percent accuracy recognizing various ad-hoc device arrangements in an offline evaluation, with participants particularly appreciative of interaction techniques that automatically leverage proximity-awareness and relative orientation among multiple devices.

View publication

Escapement: A Tool for Interactive Prototyping with Video via Sensor-Mediated Abstraction of Time

Molly Jane Nicholas, Nicolai Marquardt, Michel Pahud, Nathalie Henry Riche, Hugo Romat, Christopher Collins, David Ledo, Rohan Kadekodi, Badrish Chandramouli, Ken Hinckley

This paper introduces Escapement, a video prototyping tool that introduces a powerful new concept for prototyping screen-based interfaces by flexibly mapping sensor values to dynamic playback control of videos. This recasts the time dimension of video mockups as sensor-mediated interaction.

This abstraction of time as interaction, which the researchers dub video-escapement prototyping, empowers designers to rapidly explore and viscerally experience direct touch or sensor-mediated interactions across one or more device displays. The system affords cross-device and bidirectional remote (telepresent) experiences via cloud-based state sharing across multiple devices. This makes Escapement especially potent for exploring multi-device, dual-screen, or remote-work interactions for screen-based applications. Researchers share the results of observations of long-term usage of video-escapement techniques with experienced interaction designers and articulate design choices for supporting a reflective, iterative, and open-ended creative design process.

View publication

Your Mileage May Vary: Case Study of a Robotic Telepresence Pilot Roll-out for a Hybrid Knowledge Work Organization

Andriana Boudouraki, Joel E. Fischer, Stuart Reeves, Sean Rintel

Organizations wishing to maintain employee satisfaction for hybrid collaboration need to explore flexible solutions that provide value for both remote and on-site employees. This case study reports on the roll-out of a telepresence robot pilot at Microsoft Research Cambridge to test whether robots would provide enjoyable planned and unplanned encounters between remote and on-site employees. Researchers describe the work that was undertaken to prepare for the roll-out, including the occupational health and safety assessment, systems for safety and security, and the information for employees on safe and effective use practices. The pilot ended after three months, and robot use has been discontinued after weighing the opportunities against low adoption and other challenges. The researchers discuss the pros and cons within this organizational setting and make suggestions for future work and roll-outs.

View publication

Focus Time for Wellbeing and Work Engagement of Information Workers 

Koustuv Saha, Shamsi Iqbal 

Having little time for focused work is a major challenge in information work. While research has explored computing-assisted user-facing solutions for protecting time for focused work, there’s limited empirical evidence about the effectiveness of these features on wellbeing and work engagement. Toward this problem, researchers study the effects of automatically scheduling time for focused work on people’s work calendars using the “focus time” feature on Outlook calendars. The researchers conducted an experimental study over six weeks with 15 treatment and 10 control participants, who responded to survey questions on wellbeing and work engagement throughout the study. The researchers found that the treatment participants showed higher wellbeing, including increased excitement, relaxation, and satisfaction, and decreased anger, frustration, tiredness, and stress. The researchers study the needs, benefits, and challenges of scheduling focus time and discuss the importance of and design recommendations for enabling mechanisms and tools supporting focused work.

View publication

The post Highlights from CHI 2023 appeared first on Microsoft Research.

Categories: Microsoft

Microsoft at EuroSys 2023: Systems innovation across the stack to help support an easier, faster, safer, and smarter cloud

Fri, 05/12/2023 - 18:48

EuroSys 2023 is the premier systems conference in Europe, and 2023 marks its 18th edition. Sponsored by ACM SIGOPS Europe and hosted May 8 to May 12, the conference covers a wide range of topics, including operating systems, real-time and networked systems, storage and middleware, and distributed, parallel, and embedded computing, as well as their implications for applications and hardware.

As in previous years, Microsoft has a strong presence in the conference, drawing from research and production teams in Asia, Europe, and the United States, including Azure Systems Research, in collaboration with many universities. This work spans areas including systems for machine learning, serverless computing, datacenter networking, caching, and debugging. We’re also participating in several of the associated workshops and in key aspects of the organization, including Senior Principal Researcher Dushyanth Narayanan as the program co-chair for the main conference.

Here are some of the highlights (see below for more information about Microsoft at EuroSys, including the authors of the published papers):

Datacenter networking

The paper “Saba: Rethinking Datacenter Network Allocation from Application’s Perspective” proposes allocating datacenter network bandwidth according to applications’ sensitivity to bandwidth, achieving significant performance gains compared to fair sharing. In “FlexPass: A Case for Flexible Credit-based Transport for Datacenter Networks,” the authors make the case to incrementally deploy new proactive, credit-based transport protocols in the datacenter.

Serverless computing

In serverless computing, the paper “Palette Load Balancing: Locality Hints for Serverless Functions” proposes adding locality to Function-as-a-Service (FaaS) serverless systems, closing the performance gap between serverful data-intensive applications and their serverless implementation. In “With Great Freedom Comes Great Opportunity: Rethinking Resource Allocation for Serverless Functions,” the authors revisit the FaaS interface and find that correctly choosing memory, CPU, and architecture for each serverless function can allow both providers and customers to improve cost and performance. Finally, in “Groundhog: Efficient Request Isolation in FaaS,” the authors present a system that enables efficient snapshots for better isolation between function invocations. Microsoft is also represented in the SErverless Systems, Applications and MEthodologies (SESAME) workshop with a keynote and work-in-progress paper.

Concurrency debugging

In “WAFFLE: Exposing Memory Ordering Bugs Efficiently with Active Delay Injection,” the authors tackle the difficult problem of finding memory ordering bugs, a type of concurrency bug caused by incorrect timing between a memory access to a particular object and the object’s initialization or deallocation. Their proposed tool uses delay injection techniques and, through key innovations, can expose more bugs with less overhead than the state of the art.

Systems for machine learning

In “SiloD: A Co-design of Caching and Scheduling for Deep Learning Clusters,” the proposed framework treats cache and remote I/O as first-class resources and can integrate different state-of-the-art deep learning scheduling policies in a unified scheduling framework. 

Caching

In “FrozenHot Cache: Rethinking Cache Management for Modern Hardware,” the authors introduce a generic approach to improve the scalability of traditional list-based caches, such as least recently used (LRU), by separating the objects into two regions: a frozen region that serves requests for hot objects with minimal latency by eliminating promotion and locking and a regular dynamic region that uses the existing cache design to achieve workload adaptivity. 

Spotlight: On-Demand EVENT

Microsoft Research Summit 2022

On-Demand
Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.

Explore sessions

Microsoft papers published at EuroSys with their authors:

  1. Saba: Rethinking Datacenter Network Allocation from Application’s Perspective
    M.R. Siavash Katebzadeh, University of Edinburgh; Paolo Costa, Microsoft Research; Boris Grot, University of Edinburgh
  2. FlexPass: A Case for Flexible Credit-based Transport for Datacenter Networks
    Hwijoon Lim, Jaehong Kim, KAIST; Inho Cho, MIT CSAIL; Keon Jang, MPI-SWS, Rubrik; Wei Bai, Microsoft Research; Dongsu Han, KAIST
  3. Palette Load Balancing: Locality Hints for Serverless Functions
    Mania Abdi, Northeastern University; Samuel Ginzburg, Princeton; Xiayue Charles Lin, Anyscale; Jose Faleiro, unaffiliated; Gohar Irfan Chaudhry, Íñigo Goiri, Ricardo Bianchini, Daniel S. Berger, Rodrigo Fonseca, Azure Systems Research
  4. With Great Freedom Comes Great Opportunity: Rethinking Resource Allocation for Serverless Functions
    Muhammad Bilal, Instituto Superior Técnico (ULisboa), INESC-ID, UCLouvain; Marco Canini, KAUST; Rodrigo Fonseca, Azure Systems Research; Rodrigo Rodrigues, Instituto Superior Técnico (ULisboa), INESC-ID
  5. Groundhog: Efficient Request Isolation in FaaS
    Mohamed Alzayat, Max Planck Institute for Software Systems (MPI-SWS); Jonathan Mace, Microsoft Research; Peter Druschel, Deepak Garg, Max Planck Institute for Software Systems (MPI-SWS)
  6. WAFFLE: Exposing Memory Ordering Bugs Efficiently with Active Delay Injection
    Bogdan Alexandru Stoica, Shan Lu, University of Chicago; Madanlal Musuvathi, Suman Nath, Microsoft Research
  7. SiloD: A Co-design of Caching and Scheduling for Deep Learning Clusters
    Hanyu Zhao, Peking University; Zhenhua Han, Microsoft Research; Zhi Yang, Peking University; Quanlu Zhang, Microsoft Research; Mingxia Li, USTC; Fan Yang, Qianxi Zhang, Microsoft Research; Binyang Li, Microsoft; Yuqing Yang, Lili Qiu, Microsoft Research; Lintao Zhang, BaseBit Technologies; Lidong Zhou, Microsoft Research
  8. FrozenHot Cache: Rethinking Cache Management for Modern Hardware
    Ziyue Qiu, University of Science and Technology of China, Microsoft Research, Carnegie Mellon University; Juncheng Yang, Carnegie Mellon University; Juncheng Zhang, University of Science and Technology of China; Cheng Li, University of Science and Technology of China, Anhui Province Key Laboratory of High Performance Computing; Xiaosong Ma, Qatar Computing Research Institute, HBKU; Qi Chen, Mao Yang, Microsoft Research; Yinlong Xu, University of Science and Technology of China, Anhui Province Key Laboratory of High Performance Computing

EuroSys 2023 Organization Committee:

Program Committee:

SESAME Workshop

  • Keynote: Rodrigo Fonseca, Azure Systems Research
  • Work in Progress: The Neglected Cost of Serverless Cluster Management
    Lazar Cvetković, ETH Zürich; Rodrigo Fonseca, Azure Systems Research; Ana Klimovic, ETH Zürich

EuroSys Doctoral Workshop

PaPoC Workshop

EuroMLSys Workshop

SysTEX Workshop

The post Microsoft at EuroSys 2023: Systems innovation across the stack to help support an easier, faster, safer, and smarter cloud appeared first on Microsoft Research.

Categories: Microsoft

Research Focus: Week of May 8, 2023

Wed, 05/10/2023 - 18:00

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

In this article
  1. Microsoft’s danah boyd awarded MIT’s Morison Prize
  2. Microsoft’s Nicole Immorlica receives 2023 SIGecom Test of Time Award
  3. Microsoft’s Lorin Crawford named 2023 COPSS Emerging Leader
  4. Microsoft researchers receive Test of Time award for personalized news recommendation work
  5. A Frequency Domain Approach to Predict Power System Transients
  6. Inference with Reference: Lossless Acceleration of Large Language Models
  7. High-throughput ab initio reaction mechanism exploration in the cloud with automated multi-reference validation
  8. AWARD Microsoft’s danah boyd awarded MIT’s Morison Prize

    danah boyd, a partner researcher at Microsoft Research, has been awarded MIT’s Morison Prize in Science, Technology, and Society, for outstanding work combining humanistic values with effectiveness in the world of practical affairs, particular in science and technology.

    Dr. boyd, who is also a Distinguished Visiting Professor at Georgetown University, is currently conducting a multi-year ethnographic study of the U.S. census to understand how data are made legitimate. Her previous studies have focused on media manipulation, algorithmic bias, privacy practices, social media, and teen culture. 

    To learn more, see the Microsoft Research Summit presentation Statistical Imaginaries: An Ode to Responsible Data Science or the publications Differential Perspectives: Epistemic Disconnects Surrounding the U.S. Census Bureau’s Use of Differential Privacy.

    Award announcement AWARD Microsoft’s Nicole Immorlica receives 2023 SIGecom Test of Time Award

    Nicole Immorlica, a Senior Principal Researcher with Microsoft Research New England, has been awarded the 2023 SIGecom Test of Time Award for her work on a 2005 paper on matching markets. The award from the Association of Computing Machinery (ACM) recognizes “an influential paper or series of papers published between ten and twenty-five years ago that has significantly impacted research or applications exemplifying the interplay of economics and computation.” 

    In the award-winning paper: Marriage, honesty, and stability, Immorlica and a co-author explored centralized two-sided markets, such as the medical residency market, matching participants by running a stable marriage algorithm. While no matching mechanism based on a stable marriage algorithm can guarantee ‘truthfulness’ as a dominant strategy, the paper showed that in certain probabilistic settings, truthfulness is the best strategy for the participants.

    Award announcement

    Spotlight: Microsoft Research Podcast

    AI Frontiers: The Physics of AI with Sébastien Bubeck

    What is intelligence? How does it emerge and how do we measure it? Ashley Llorens and machine learning theorist Sébastian Bubeck discuss accelerating progress in large-scale AI and early experiments with GPT-4.

    Listen now AWARD Microsoft’s Lorin Crawford named 2023 COPSS Emerging Leader

    Lorin Crawford, a principal researcher at Microsoft Research New England, has been named a 2023 COPSS Emerging Leader by the Committee of Presidents of Statistical Societies. The award announcement cited Crawford’s path-breaking research combining theory and methods of mathematics, statistics and computing to generate new knowledge and insight about the genetic basis of disease, and exceptional mentoring of students from multiple scientific disciplines.

    The award recognizes the important role of early-career statistical scientists in shaping the future of their discipline. The selection criteria are designed to highlight contributions in areas not traditionally recognized by other early-career awards in the statistical sciences.

    Crawford, who is also a faculty member at Brown University’s School of Public Health, focuses on developing novel and efficient algorithms that address complex problems in quantitative genetics, cancer pharmacology, molecular genomics, and geometric morphometrics.

    Award announcement AWARD Microsoft researchers receive Test of Time award for personalized news recommendation work

    A paper co-authored by two Microsoft researchers has received a 2023 Seoul Test of Time Award from the International World Wide Web Conference Committee (IW3C2). The 2020 paper: A Contextual-Bandit Approach to Personalized News Article Recommendation, was written by John Langford and Robert Schapire, along with two industry colleagues. The authors proposed a new approach for personalized recommendation using contextual bandit algorithms. According to the IW3C2, the paper now has more than 2,730 citations and has become foundational research in the area of recommendation systems.

    The award announcement also states: “The paper addressed fundamental challenges in real-world recommendation systems via computationally efficient algorithms grounded in learning theory. It also showed that recommendation algorithms can be reliably evaluated offline, enabling algorithm selection without operational impact, and that contextual bandits can yield significant gains in user engagement.”

    Award announcement NEW RESEARCH A Frequency Domain Approach to Predict Power System Transients

    The dynamics of power grids are governed by a large number of nonlinear differential and algebraic equations (DAEs). To safely run the system, operators need to check that the states described by these DAEs stay within prescribed limits after various potential faults. However, current numerical solvers of DAEs are often too slow for real-time system operations. In addition, detailed system parameters are often not exactly known. Machine learning approaches have been proposed to reduce the computational efforts, but existing methods generally suffer from overfitting and failures to predict unstable behaviors.

    In a new paper: A Frequency Domain Approach to Predict Power System Transients, Microsoft researchers propose a novel framework to predict power system transients by learning in the frequency domain. The intuition is that although the system behavior is complex in the time domain, relatively few dominant modes exist in the frequency domain. Therefore, the researchers learn to predict by constructing neural networks with Fourier transform and filtering layers. System topology and fault information are encoded by taking a multi-dimensional Fourier transform, allowing researchers to leverage the fact that the trajectories are sparse both in time and spatial frequencies. This research shows that the proposed approach does not need detailed system parameters, greatly speeds up prediction computations and is highly accurate for different fault types.

    Read the paper NEW RESEARCH Inference with Reference: Lossless Acceleration of Large Language Models

    The growing use of large foundation models like GPT-3.5/4 for real-world applications has raised concerns about high deployment costs. While general methodologies such as quantization, pruning, compression, and distillation help reduce costs. At test time, output tokens must be decoded (sequentially) one by one, which poses significant challenges for LLMs to be deployed at scale.

    In a new paper: Inference with Reference: Lossless Acceleration of Large Language Models, Microsoft researchers study accelerating LLM inference by improving the efficiency of autoregressive decoding. In multiple real-world applications, this research shows that an LLM’s output tokens often come from its context. For example, in a retrieval-augmented generation scenario for a search engine, an LLM’s context usually includes relevant documents retrieved from an external corpus as reference according to a query, and its output usually contains many text spans found in the reference (i.e., retrieved documents). Motivated by this observation, the researchers propose an LLM accelerator (LLMA) to losslessly speed inference with references. Its improved computational parallelism allows LLMA to achieve over 2x speed-up for LLMs, with identical generation results as greedy decoding, in many practical generation scenarios where significant overlap between in-context reference and outputs exists. The researchers are collaborating with the Bing search team to explore integrating this technique into snippet/caption generation, Bing chat, and other potential scenarios.

    Read the paper NEW RESEARCH High-throughput ab initio reaction mechanism exploration in the cloud with automated multi-reference validation

    Quantum chemical calculations on atomistic systems have evolved into a standard approach to studying molecular matter. But these calculations often involve a significant amount of manual input and expertise. Most of these calculations could be automated, alleviating the need for expertise in software and hardware accessibility.

    In a new paper: High-throughput ab initio reaction mechanism exploration in the cloud with automated multi-reference validation, researchers from Microsoft present the AutoRXN workflow, an automated workflow for exploratory high-throughput electronic structure calculations of molecular systems.

    This workflow i) uses density functional theory methods to deliver minimum and transition-state structures and corresponding energies and properties, (ii) launches coupled cluster calculations for optimized structures to provide more accurate energy and property estimates, and (iii) evaluates multi-reference diagnostics to back check the coupled cluster results and subjects them to automated multi-configurational calculations for potential multi-configurational cases.

    All calculations take place in a cloud environment and support massive computational campaigns. Key features of all components of the AutoRXN workflow are autonomy, stability, and minimum operator interference.

    The paper was recently published in the Journal of Chemistry and Physics.

    Read the paper

    The post Research Focus: Week of May 8, 2023 appeared first on Microsoft Research.

Categories: Microsoft

Research Focus: Week of May 8, 2023

Wed, 05/10/2023 - 18:00

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

In this article
  1. Microsoft’s danah boyd awarded MIT’s Morison Prize
  2. Microsoft’s Nicole Immorlica receives 2023 SIGecom Test of Time Award
  3. Microsoft’s Lorin Crawford named 2023 COPSS Emerging Leader
  4. Microsoft researchers receive Test of Time award for personalized news recommendation work
  5. A Frequency Domain Approach to Predict Power System Transients
  6. Inference with Reference: Lossless Acceleration of Large Language Models
  7. High-throughput ab initio reaction mechanism exploration in the cloud with automated multi-reference validation
  8. AWARD Microsoft’s danah boyd awarded MIT’s Morison Prize

    danah boyd, a partner researcher at Microsoft Research, has been awarded MIT’s Morison Prize in Science, Technology, and Society, for outstanding work combining humanistic values with effectiveness in the world of practical affairs, particular in science and technology.

    Dr. boyd, who is also a Distinguished Visiting Professor at Georgetown University, is currently conducting a multi-year ethnographic study of the U.S. census to understand how data are made legitimate. Her previous studies have focused on media manipulation, algorithmic bias, privacy practices, social media, and teen culture. 

    To learn more, see the Microsoft Research Summit presentation Statistical Imaginaries: An Ode to Responsible Data Science or the publications Differential Perspectives: Epistemic Disconnects Surrounding the U.S. Census Bureau’s Use of Differential Privacy.

    Award announcement AWARD Microsoft’s Nicole Immorlica receives 2023 SIGecom Test of Time Award

    Nicole Immorlica, a Senior Principal Researcher with Microsoft Research New England, has been awarded the 2023 SIGecom Test of Time Award for her work on a 2005 paper on matching markets. The award from the Association of Computing Machinery (ACM) recognizes “an influential paper or series of papers published between ten and twenty-five years ago that has significantly impacted research or applications exemplifying the interplay of economics and computation.” 

    In the award-winning paper: Marriage, honesty, and stability, Immorlica and a co-author explored centralized two-sided markets, such as the medical residency market, matching participants by running a stable marriage algorithm. While no matching mechanism based on a stable marriage algorithm can guarantee ‘truthfulness’ as a dominant strategy, the paper showed that in certain probabilistic settings, truthfulness is the best strategy for the participants.

    Award announcement

    Spotlight: Microsoft Research Podcast

    AI Frontiers: The Physics of AI with Sébastien Bubeck

    What is intelligence? How does it emerge and how do we measure it? Ashley Llorens and machine learning theorist Sébastian Bubeck discuss accelerating progress in large-scale AI and early experiments with GPT-4.

    Listen now AWARD Microsoft’s Lorin Crawford named 2023 COPSS Emerging Leader

    Lorin Crawford, a principal researcher at Microsoft Research New England, has been named a 2023 COPSS Emerging Leader by the Committee of Presidents of Statistical Societies. The award announcement cited Crawford’s path-breaking research combining theory and methods of mathematics, statistics and computing to generate new knowledge and insight about the genetic basis of disease, and exceptional mentoring of students from multiple scientific disciplines.

    The award recognizes the important role of early-career statistical scientists in shaping the future of their discipline. The selection criteria are designed to highlight contributions in areas not traditionally recognized by other early-career awards in the statistical sciences.

    Crawford, who is also a faculty member at Brown University’s School of Public Health, focuses on developing novel and efficient algorithms that address complex problems in quantitative genetics, cancer pharmacology, molecular genomics, and geometric morphometrics.

    Award announcement AWARD Microsoft researchers receive Test of Time award for personalized news recommendation work

    A paper co-authored by two Microsoft researchers has received a 2023 Seoul Test of Time Award from the International World Wide Web Conference Committee (IW3C2). The 2020 paper: A Contextual-Bandit Approach to Personalized News Article Recommendation, was written by John Langford and Robert Schapire, along with two industry colleagues. The authors proposed a new approach for personalized recommendation using contextual bandit algorithms. According to the IW3C2, the paper now has more than 2,730 citations and has become foundational research in the area of recommendation systems.

    The award announcement also states: “The paper addressed fundamental challenges in real-world recommendation systems via computationally efficient algorithms grounded in learning theory. It also showed that recommendation algorithms can be reliably evaluated offline, enabling algorithm selection without operational impact, and that contextual bandits can yield significant gains in user engagement.”

    Award announcement NEW RESEARCH A Frequency Domain Approach to Predict Power System Transients

    The dynamics of power grids are governed by a large number of nonlinear differential and algebraic equations (DAEs). To safely run the system, operators need to check that the states described by these DAEs stay within prescribed limits after various potential faults. However, current numerical solvers of DAEs are often too slow for real-time system operations. In addition, detailed system parameters are often not exactly known. Machine learning approaches have been proposed to reduce the computational efforts, but existing methods generally suffer from overfitting and failures to predict unstable behaviors.

    In a new paper: A Frequency Domain Approach to Predict Power System Transients, Microsoft researchers propose a novel framework to predict power system transients by learning in the frequency domain. The intuition is that although the system behavior is complex in the time domain, relatively few dominant modes exist in the frequency domain. Therefore, the researchers learn to predict by constructing neural networks with Fourier transform and filtering layers. System topology and fault information are encoded by taking a multi-dimensional Fourier transform, allowing researchers to leverage the fact that the trajectories are sparse both in time and spatial frequencies. This research shows that the proposed approach does not need detailed system parameters, greatly speeds up prediction computations and is highly accurate for different fault types.

    Read the paper NEW RESEARCH Inference with Reference: Lossless Acceleration of Large Language Models

    The growing use of large foundation models like GPT-3.5/4 for real-world applications has raised concerns about high deployment costs. While general methodologies such as quantization, pruning, compression, and distillation help reduce costs. At test time, output tokens must be decoded (sequentially) one by one, which poses significant challenges for LLMs to be deployed at scale.

    In a new paper: Inference with Reference: Lossless Acceleration of Large Language Models, Microsoft researchers study accelerating LLM inference by improving the efficiency of autoregressive decoding. In multiple real-world applications, this research shows that an LLM’s output tokens often come from its context. For example, in a retrieval-augmented generation scenario for a search engine, an LLM’s context usually includes relevant documents retrieved from an external corpus as reference according to a query, and its output usually contains many text spans found in the reference (i.e., retrieved documents). Motivated by this observation, the researchers propose an LLM accelerator (LLMA) to losslessly speed inference with references. Its improved computational parallelism allows LLMA to achieve over 2x speed-up for LLMs, with identical generation results as greedy decoding, in many practical generation scenarios where significant overlap between in-context reference and outputs exists. The researchers are collaborating with the Bing search team to explore integrating this technique into snippet/caption generation, Bing chat, and other potential scenarios.

    Read the paper NEW RESEARCH High-throughput ab initio reaction mechanism exploration in the cloud with automated multi-reference validation

    Quantum chemical calculations on atomistic systems have evolved into a standard approach to studying molecular matter. But these calculations often involve a significant amount of manual input and expertise. Most of these calculations could be automated, alleviating the need for expertise in software and hardware accessibility.

    In a new paper: High-throughput ab initio reaction mechanism exploration in the cloud with automated multi-reference validation, researchers from Microsoft present the AutoRXN workflow, an automated workflow for exploratory high-throughput electronic structure calculations of molecular systems.

    This workflow i) uses density functional theory methods to deliver minimum and transition-state structures and corresponding energies and properties, (ii) launches coupled cluster calculations for optimized structures to provide more accurate energy and property estimates, and (iii) evaluates multi-reference diagnostics to back check the coupled cluster results and subjects them to automated multi-configurational calculations for potential multi-configurational cases.

    All calculations take place in a cloud environment and support massive computational campaigns. Key features of all components of the AutoRXN workflow are autonomy, stability, and minimum operator interference.

    The paper was recently published in the Journal of Chemistry and Physics.

    Read the paper

    The post Research Focus: Week of May 8, 2023 appeared first on Microsoft Research.

Categories: Microsoft

eXTReMe Tracker