Microsoft Research Publications

Syndicate content
Keep current with all the latest Microsoft Research Publications and Technical Reports
Updated: 8 years 26 weeks ago

Making Sense of Temporal Queries with Interactive Visualization

Sat, 05/07/2016 - 08:00
As real-time monitoring and analysis become increasingly important, researchers and developers turn to data stream management systems (DSMS’s) for fast, efficient ways to pose temporal queries over their datasets. However, these systems are inherently complex, and even database experts find it difficult to understand the behavior of DSMS queries. To help analysts better understand these temporal queries, we developed StreamTrace, an interactive visualization tool that breaks down how a temporal query processes a given dataset, step-by-step. The design of StreamTrace is based on input from expert DSMS users; we evaluated the system with a lab study of programmers who were new to streaming queries. Results from the study demonstrate that StreamTrace can help users to verify that queries behave as expected and to isolate the regions of a query that may be causing unexpected results.
Categories: Microsoft

Reasoning in Vector Space: An Exploratory Study of Question Answering

Mon, 05/02/2016 - 08:00
Question answering tasks have shown remarkable progress with distributed vector representation. In this paper, we investigate the recently proposed Facebook bAbI tasks which consist of twenty different categories of questions that require complex reasoning. Because the previous work on bAbI are all end-to-end models, errors could come from either an imperfect understanding of semantics or in certain steps of the reasoning. For clearer analysis, we propose two vector space models inspired by Tensor Product Representation (TPR) to perform knowledge encoding and logical reasoning based on common-sense inference. They together achieve near-perfect accuracy on all categories including positional reasoning and path finding that have proved difficult for most of the previous approaches. We hypothesize that the difficulties in these categories are due to the multi-relations in contrast to uni-relational characteristic of other categories. Our exploration sheds light on designing more sophisticated dataset and moving one step toward integrating transparent and interpretable formalism of TPR into existing learning paradigms.
Categories: Microsoft

Filo: consolidated consensus as a cloud service

Sun, 05/01/2016 - 08:00
Categories: Microsoft

Password Guidance

Sun, 05/01/2016 - 08:00
This paper provides Microsoft’s recommendations for password management based on current research and lessons from our own experience as one of the largest Identity Providers (IdPs) in the world. It covers recommendations for end users and identity administrators. Microsoft sees over 10 million username/password pair attacks every day. This gives us a unique vantage point to understand the role of passwords in account takeover. The guidance in this paper is scoped to users of Microsoft’s identity platforms (Azure Active Directory, Active Directory, and Microsoft account) though it generalizes to other platforms.
Categories: Microsoft

Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups

Sun, 05/01/2016 - 08:00
We propose a new method for training computationally efficient and compact convolutional neural networks (CNNs) using a novel sparse connection structure that resembles a tree root. Our sparse connection structure facilitates a significant reduction in computational cost and number of parameters of state-of-the-art deep CNNs without compromising accuracy. We validate our approach by using it to train more efficient variants of state-of-the-art CNN architectures, evaluated on the CIFAR10 and ILSVRC datasets. Our results show similar or higher accuracy than the baseline architectures with much less compute, as measured by CPU and GPU timings. For example, for ResNet 50, our model has 40% fewer parameters, 45% fewer floating point operations, and is 31% (12%) faster on a CPU (GPU). For the deeper ResNet 200 our model has 25% fewer floating point operations and 44% fewer parameters, while maintaining state-of-the-art accuracy. For GoogLeNet, our model has 7% fewer parameters and is 21% (16%) faster on a CPU (GPU).
Categories: Microsoft

Measuring Neural Net Robustness with Constraints

Sun, 05/01/2016 - 08:00
Despite having high accuracy, neural nets have been shown to be susceptible to adversarial examples, where a small perturbation to an input can cause it to become mislabeled. We propose metrics for measuring the robustness of a neural net and devise a novel algorithm for approximating these metrics based on an encoding of robustness as a linear program. We show how our metrics can be used to evaluate the robustness of deep neural nets with experiments on the MNIST and CIFAR-10 datasets. Our algorithm generates more informative estimates of robustness metrics compared to estimates based on existing algorithms. Furthermore, we show how existing approaches to improving robustness “overfit” to adversarial examples generated using a specific algorithm. Finally, we show that our techniques can be used to additionally improve neural net robustness both according to the metrics that we propose, but also according to previously proposed metrics.
Categories: Microsoft

Things We Own Together: Sharing Possessions at Home

Sun, 05/01/2016 - 08:00
Sharing is an important facet of human relationships, yet there is a lack of research on how people share ownership of possessions. This paper reports on a study that investigates shared ownership of physical and digital possessions through interviews with couples and families in 13 households. We offer a more nuanced definition of shared ownership and show that certain practices, which are central to sharing physical objects, are not supported in the sharing of digital content. We suggest potential approaches to address this, focusing in particular on how the sharing of possessions plays a role in the building of relationships and is done against a backdrop of trust.
Categories: Microsoft

Modeling Signals Embedded in a Euclidean Domain

Sun, 05/01/2016 - 08:00
Graphs are often used to model signals defined on a set of points embedded in a Euclidean domain. Examples are distributed sensor readings, measures of congestion in a transportation network, samples in a feature space, and colors on a 3D point clouds. However, it may be better to model such signals as samples of a Gaussian Process defined on the Euclidean domain. We show, on a 3D point cloud example, that Karhunen Loeve Transforms (KLTs) based on Gaussian Process models can have significantly higher energy compaction and coding gain than KLTs based on sparse graph models. The latter KLTs are known as Graph Transforms; we call the former Gaussian Process Transforms.
Categories: Microsoft

Functions of Code-Switching in Tweets: An Annotation Scheme and Some Initial Experiments

Sun, 05/01/2016 - 08:00
Code-Switching (CS) is very common among multilinguals who switch between two or more languages when communicating or having a dialogue with each other. People have not constrained CS to just spoken form but also have introduced this concept to written text. Due to the popularity of social-media, people have used this platform to perform CS in the text form. This gave rise to the need of computational processing of the code-switched data. In this study, we focus on CS between English and Hindi in the Twitter corpus which is an informal text. With the help of this data, we have done a detailed linguistic study of various aspects of CS. For understanding, processing, and generation of code-switched data, we need annotated code-switched data. Hence, in this paper, we present an annotation scheme for annotating the functions of CS in Hindi-English (Hi-En) code-switched tweets and we also present some initial experiments. In this effort, we are focussing on CS in text data from social-media whereas earlier studies have focused on CS in spoken data from a small number of speakers.
Categories: Microsoft

Case The Bones of the System: A Study of Logging and Telemetry at Microsoft

Sun, 05/01/2016 - 08:00
Large software organizations are transitioning to event data platforms as they culturally shift to better support data-driven decision making. This paper offers a case study at Microsoft during such a transition. Through qualitative interviews of 28 participants, and a quantitative survey of 1,823 respondents, we catalog a diverse set of activities that leverage event data sources, identify challenges in conducting these activities, and describe tensions that emerge in data-driven cultures as event data flow through these activities within the organization. We find that the use of event data span every job role in our interviews and survey, that different perspectives on event data create tensions between roles or teams, and that professionals report social and technical challenges across activities.
Categories: Microsoft

Automated Synthesis and Analysis of Switching Gene Regulatory Networks

Sun, 05/01/2016 - 08:00
Studying the gene regulatory networks (GRNs) that govern how cells change into specific cell types with unique roles throughout development is an active area of experimental research. The fate specification process can be viewed as a program prescribing the system dynamics, governed by a network of genetic interactions. To investigate the possibility that GRNs are not fixed but rather change their topology, for example as cells progress through commitment, we introduce the concept of Switching Gene Regulatory Networks (SGRNs) to enable the modelling and analysis of network reconfiguration. We define the synthesis problem of constructing SGRNs that are guaranteed to satisfy a set of constraints representing experimental observations of cell behaviour. We propose a solution to this problem that employs methods based upon Satisfiability Modulo Theories (SMT) solvers, and evaluate the feasibility and scalability of our approach by considering a set of synthetic benchmarks exhibiting possible biological behaviour of cell development. We outline how our approach is applied to a more realistic biological system, by considering a simplified network involved in the processes of neuron maturation and fate specification in the mammalian cortex.
Categories: Microsoft

Improved bounded-strength decoupling schemes for local Hamiltonians

Sun, 05/01/2016 - 08:00
We address the task of switching off the Hamiltonian of a system by removing all internal and system-environment couplings. We propose dynamical decoupling schemes, that use only bounded-strength controls, for quantum many-body systems with local system Hamiltonians and local environmental couplings. To do so, we introduce the combinatorial concept of balanced-cycle orthogonal arrays (BOAs) and show how to construct them from classical error-correcting codes. The derived decoupling schemes may be useful as a primitive for more complex schemes, e.g., for Hamiltonian simulation. For the case of n qubits and a 2-local Hamiltonian, the length of the resulting decoupling scheme scales as O(n log(n)), improving over the previously best-known schemes that scaled quadratically with n. More generally, using balanced-cycle orthogonal arrays constructed from families of BCH codes, we show that bounded-strength decoupling for any local Hamiltonian can be achieved.
Categories: Microsoft

Information Flows in Encrypted Databases

Sun, 05/01/2016 - 08:00
In encrypted databases, sensitive data is protected from an untrusted server by encrypting columns using partially homomorphic encryption schemes, and storing encryption keys in a trusted client. However, encrypting columns and protecting encryption keys does not ensure confidentiality - sensitive data can leak during query processing due to information flows through the trusted client. In this paper, we propose SecureSQL, an encrypted database that partitions query processing between an untrusted server and a trusted client while ensuring the absence of information flows. Our evaluation based on OLTP benchmarks suggests that SecureSQL can protect against explicit flows with low overheads (< 30%). However, protecting against implicit flows can be expensive because it precludes the use of key databases optimizations and introduces additional round trips between client and server.
Categories: Microsoft

Networks of Gratitude: Structures of Thanks and User Expectations in Workplace Appreciation Systems

Sun, 05/01/2016 - 08:00
Appreciation systems―platforms for users to exchange thanks and praise―are becoming common in the workplace, where employees share appreciation, managers are notified, and aggregate scores are sometimes made visible. Who do people thank on these systems, and what do they expect from each other and their managers? After introducing the design affordances of 13 appreciation systems, we discuss a system we call Gratia, in use at a large multinational company for over four years. Using logs of 422,000 appreciation messages and user surveys, we explore the social dynamics of use and ask if use of the system addresses the recognition problem. We find that while thanks is mostly exchanged among employees at the same level and different parts of the company, addressing the recognition problem, managers do not always act on that recognition in ways that employees expect.
Categories: Microsoft

Universal Models of Multivariate Temporal Point Processes

Sun, 05/01/2016 - 08:00
With the rapidly increasing availability of event stream data there is growing interest in multivariate temporal point process models to capture both qualitative and quantitative features of this type of data. Recent research on multivariate point processes have focused in inference and estimation problems for restricted classes of models such as continuous time Bayesian networks, Markov jump processes, Gaussian Cox processes, and Hawkes Processes. In this paper, we study the expressive power and learnability of Graphical Event Models (GEMs) --- the analogue of directed graphical models for multivariate temporal point processes. In particular, we describe a set of Graphical Event Models (GEMs) and show that this class can universally approximate any smooth multivariate temporal point process. We also describe a universal learning algorithm for this class of GEMs and show, under a mild set of assumptions, learnability results for both the dependency structures and distributions in this class. Our consistency results demonstrate the possibility of learning about both qualitative and quantitative dependencies from rich event stream data.
Categories: Microsoft

Surviving an "Eternal September" — How an Online Community Managed a Surge of Newcomers

Sun, 05/01/2016 - 08:00
We present a qualitative analysis of interviews with participants in the NoSleep community within Reddit where millions of fans and writers of horror fiction congregate. We explore how the community handled a massive, sudden, and sustained increase in new members. Although existing theory and stories like Usenet's infamous "Eternal September" suggest that large influxes of newcomers can hurt online communities, our interviews suggest that NoSleep survived without major incident. We propose that three features of NoSleep allowed it to manage the rapid influx of newcomers gracefully: (1) an active and well-coordinated group of administrators, (2) a shared sense of community which facilitated community moderation, and (3) technological systems that mitigated norm violations. We also point to several important trade-offs and limitations.
Categories: Microsoft

eXTReMe Tracker