Microsoft
Task Completion Platform: A self-serve multi-domain goal oriented dialogue platform
Categories: Microsoft
Semi-Supervised Query Classification Using Matrix Sketching
The enormous scale of unlabeled text available today necessitates scalable schemes for representation learning in natural language processing. For instance, in this paper we are interested in classifying the intent of a user query. While our labeled data is quite limited, we have access to virtually an unlimited amount of unlabeled queries, which could be used to induce useful representations: for instance by principal component analysis (PCA). However, it is prohibitive to even store the data in memory due to its sheer size, let alone apply conventional batch algorithms. In this work, we apply the recently proposed matrix sketching algorithm to entirely obviate the problem with scalability (Liberty, 2013). This algorithm approximates the data within a specified memory bound while preserving the covariance structure necessary for PCA. Using matrix sketching, we significantly improve the user intent classification accuracy by leveraging large amounts of unlabeled queries.
Categories: Microsoft
Platypus - Indoor Localization and Identification through Sensing Electric Potential Changes in Human Bodies
Platypus is the first system to localize and identify people by remotely and passively sensing changes in their body electric potential which occur naturally during walking. While it uses three or more electric potential sensors with a maximum range of 2 m, as a tag-free system it does not require the user to carry any special hardware. We describe the physical principles behind body electric potential changes, and a predictive mathematical model of how this affects a passive electric field sensor. By inverting this model and combining data from sensors, we infer a method for localizing people and experimentally demonstrate a median localization error of 0.16 m. We also use the model to remotely infer the change in body electric potential with a mean error of 8.8 % compared to direct contact-based measurements. We show how the reconstructed body electric potential differs from person to person and thereby how to perform identification. Based on short walking sequences of 5 s, we identify four users with an accuracy of 94 %, and 30 users with an accuracy of 75 %. We demonstrate that identification features are valid over multiple days, though change with footwear.
Categories: Microsoft
The Label Complexity of Mixed-Initiative Classifier Training
Mixed-initiative classifier training, where the human teacher can choose which items to label or to label items chosen by the computer, has enjoyed empirical success but without a rigorous statistical learning theoretical justification. We analyze the label complexity of a simple mixed-initiative training mechanism using teaching dimension and active learning. We show that mixed-initiative training is advantageous compared to either computer-initiated (represented by active learning) or human-initiated classifier training. The advantage exists across all human teaching abilities, from optimal to completely unhelpful teachers. We further improve classifier training by educating the human teachers. This is done by showing example optimal teaching sets to the human teachers. We conduct Mechanical Turk human experiments on two stylistic classifier training tasks to illustrate our approach.
Categories: Microsoft
Accelerating Relational Databases by Leveraging Remote Memory and RDMA
Memory is a crucial resource in relational databases (RDBMSs). When there is insufficient memory, RDBMSs are forced to use slower media such as SSDs or HDDs, which can significantly degrade workload performance. Cloud database services are deployed in data centers where network adapters supporting remote direct memory access (RDMA) at low latency and high bandwidth are becoming prevalent. We study the novel problem of how a Symmetric Multi-Processing (SMP) RDBMS, whose memory demands exceed locally-available memory, can leverage available remote memory in the cluster accessed via RDMA to improve query performance. We expose available memory on remote servers using a lightweight file API that allows an SMP RDBMS to leverage the benefits of remote memory with modest changes. We identify and implement several novel scenarios to demonstrate these benefits, and address design challenges that are crucial for efficient implementation. We implemented the scenarios in Microsoft SQL Server engine and present the first end-to-end study to demonstrate benefits of remote memory for a variety of micro-benchmarks and industry-standard benchmarks. Compared to using disks when memory is insufficient, we improve the throughput and latency of queries with short reads and writes by 3× to 10×, while improving the latency of multiple TPC-H and TPC-DS queries by 2× to 100×.
Categories: Microsoft
Automated Demand-driven Resource Scaling in Relational Database-as-a-Service
Relational Database-as-a-Service (DaaS) platforms today support the abstraction of a resource container that guarantees a fixed amount of resources. Tenants are responsible for selecting a container size suitable for their workloads, which they can change to leverage the cloud’s elasticity. However, automating this task is daunting for most tenants since estimating resource demands for arbitrary SQL workloads in an RDBMS is complex and challenging. In addition, workloads and resource requirements can vary significantly within minutes to hours, and container sizes vary by orders of magnitude both in the amount of resources as well as monetary cost. We present a solution to enable a DaaS to auto-scale container sizes on behalf of its tenants. Approaches to auto-scale stateless services, such as web servers, that rely on historical resource utilization as the primary signal, often perform poorly for stateful database servers which are significantly more complex. Our solution derives a set of robust signals from database engine telemetry and combines them to significantly improve accuracy of demand estimation for database workloads resulting in more accurate scaling decisions. Our solution raises the abstraction by allowing tenants to reason about monetary budget and query latency rather than resources. We prototyped our approach in Microsoft Azure SQL Database and ran extensive experiments using workloads with realistic time-varying resource demand patterns obtained from production traces. Compared to an approach that uses only resource utilization to estimate demand, our approach results in 1.5× to 3× lower monetary costs while achieving comparable query latencies.
Categories: Microsoft
A Design and Verification Methodology for Secure Isolated Regions
Hardware support for isolated execution (such as Intel SGX) enables development of applications that keep their code and data confidential even while running in a hostile or compromised host. However, automatically verifying that such applications satisfy confidentiality remains challenging. We present a methodology for designing such applications in a way that enables certifying their confidentiality. Our methodology consists of forcing the application to communicate with the external world through a narrow interface, compiling it with runtime checks that aid verification, and linking it with a small runtime that implements the narrow interface. The runtime includes core services such as secure communication channels and memory management. We formalize this restriction restriction on the application as Information Release Confinement (IRC), and we show that it allows us to decompose the task of proving confidentiality into (a) one-time, human assisted functional verification of the runtime to ensure that it does not leak secrets, (b) automatic verification of the application’s machine code to ensure that it satisfies IRC and does not directly read or corrupt the runtime’s internal state. We present /CONFIDENTIAL: a verifier for IRC that is modular, automatic, and keeps our compiler out of the trusted computing base. Our evaluation suggests that the methodology scales to real-world applications.
Categories: Microsoft
Enabling Privacy-Preserving Incentives for Mobile Crowd Sensing Systems
Recent years have witnessed the proliferation of mobile crowd sensing (MCS) systems that leverage the public crowd equipped with various mobile devices (e.g., smartphones, smartglasses, smartwatches) for large scale sensing tasks. Because of the importance of incentivizing worker participation in such MCS systems, several auction-based incentive mechanisms have been proposed in past literature. However, these mechanisms fail to consider the preservation of workers’ bid privacy. Therefore, different from prior work, we propose a differentially private incentive mechanism that preserves the privacy of each worker’s bid against the other honest-but-curious workers. The motivation of this design comes from the concern that a worker’s bid usually contains her private information that should not be disclosed. We design our incentive mechanism based on the single-minded reverse combinatorial auction. Specifically, we design a differentially private, approximately truthful, individual rational, and computationally efficient mechanism that approximately minimizes the platform’s total payment with a guaranteed approximation ratio. The advantageous properties of the proposed mechanism are justified through not only rigorous theoretical analysis but also extensive simulations.
Categories: Microsoft
Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee
Data volumes are growing exponentially for our decision-support systems making it challenging to ensure interactive response time for ad-hoc queries without increasing cost of hardware. Aggregation queries with Group By that produce an aggregate value for every combination of values in the grouping columns are the most important class of ad-hoc queries. As small errors are usually tolerable for such queries, approximate query processing (AQP) has the potential to answer them over very large datasets much faster. In many cases analysts require the distribution of (group, aggvalue) pairs in the estimated answer to be guaranteed within a certain error threshold of the exact distribution. Existing AQP techniques are inadequate for two main reasons. First, users cannot express such guarantees. Second, sampling techniques used in traditional AQP can produce arbitrarily large errors even for SUM queries. To address those limitations, we first introduce a new precision metric, called distribution precision, to express such error guarantees. We then study how to provide fast approximate answers to aggregation queries with distribution precision guaranteed within a user-specified error bound. The main challenges are to provide rigorous error guarantees and to handle arbitrary highly selective predicates without maintaining large-sized samples. We propose a novel sampling scheme called measure-biased sampling to address the former challenge. For the latter, we propose two new indexes to augment in-memory samples. Like other sampling-based AQP techniques, our solution supports any aggregate that can be estimated from random samples. In addition to deriving theoretical guarantees, we conduct experimental study to compare our system with state-of-the-art AQP techniques and a commercial column-store database system on both synthetic and real enterprise datasets. Our system provides a median speed-up of more than 100x with around 5% distribution error compared with the commercial database.
Categories: Microsoft
Quickr: Lazily Approximating Complex Ad-Hoc Queries in Big Data Clusters
We present a system that approximates the answer to complex ad-hoc queries in big-data clusters by injecting samplers on-the-fly and without requiring pre-existing samples. Improvements can be substantial when big-data queries take multiple passes over data and when samplers execute early in the query plan. We present a new universe sampler which is able to sample multiple join inputs. By incorporating samplers natively into a cost-based query optimizer, we automatically generate plans with appropriate samplers at appropriate locations. We devise an accuracy analysis method using which we ensure that query plans with samplers will not miss groups and that aggregate values are within a small ratio of their true value. An implementation on a cluster with tens of thousands of machines shows that queries in the TPC-DS benchmark use a median of 2x fewer resources. In contrast, approaches that construct input samples even when given 10x the size of the input to store samples improve only 22% of the queries, i.e. a median speed up of 0x.
Categories: Microsoft
An Efficient Algorithm for Contextual Bandits with Knapsacks, and an Extension to Concave Objectives
Categories: Microsoft
Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization
The emergence of wearable devices such as portable cameras and smart glasses makes it possible to record life logging first-person videos. Browsing such long unstructured videos is time-consuming and tedious. This paper studies the discovery of moments of user’s major or special interest (i.e., highlights) in a video, for generating the summarization of first-person videos. Specifically, we propose a novel pairwise deep ranking model that employs deep learning techniques to learn the relationship between highlight and non-highlight video segments. A two-stream network structure by representing video segments from complementary information on appearance of video frames and temporal dynamics across frames is developed for video highlight detection. Given a long personal video, equipped with the highlight detection model, a highlight score is assigned to each segment. The obtained highlight segments are applied for summarization in two ways: video timelapse and video skimming. The former plays the highlight (non-highlight) segments at low (high) speed rates, while the latter assembles the sequence of segments with the highest scores. On 100 hours of first-person videos for 15 unique sports categories, our highlight detection achieves the improvement over the state-of-the-art RankSVM method by 10.5% in terms of accuracy. Moreover, our approaches produce video summary with better quality by a user study from 35 human subjects.
Categories: Microsoft
CloudBuild: Microsoft’s Distributed and Caching Build Service
Thousands of Microsoft engineers build and test hundreds of software products several times a day. It is essential that this continuous integration scales, guarantees short feedback cycles, and functions reliably with minimal human intervention. This paper describes CloudBuild, the build service infrastructure developed within Microsoft over the last few years. CloudBuild is responsible for all aspects of a continuous integration workflow, including builds, test and code analysis, as well as drops, package and symbol creation and storage. CloudBuild supports multiple build languages as long as they fulfill a coarse grained, file IO based contract. CloudBuild uses content based caching to run build-related tasks only when needed. Lastly, it builds on many machines in parallel. CloudBuild offers a reliable build service in the presence of unreliable components. It aims to rapidly onboard teams and hence has to support non-deterministic build tools and specification languages that under-declare dependencies. We will outline how we addressed these challenges and characterize the operations of CloudBuild. CloudBuild has on-boarded hundreds of codebases with only man-months of effort each. Some of these codebases are used by thousands of developers. The speed ups of build and test range from 1.3x to 10x, and service availability is 99%.
Categories: Microsoft
Jointly Modeling Embedding and Translation to Bridge Video and Language
Automatically describing video content with natural language is a fundamental challenge of computer vision. Recurrent Neural Networks (RNNs), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with the given previous words and the visual content, while the relationship between sentence semantics and visual content is not holistically exploited. As a result, the generated sentences may be contextually correct but the semantics (e.g., subjects, verbs or objects) are not true. This paper presents a novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visual-semantic embedding. The former aims to locally maximize the probability of generating the next word given previous words and visual content, while the latter is to create a visual-semantic embedding space for enforcing the relationship between the semantics of the entire sentence and visual content. The experiments on YouTube2Text dataset show that our proposed LSTM-E achieves to-date the best published performance in generating natural sentences: 45.3% and 31.0% in terms of BLEU@4 and METEOR, respectively. Superior performances are also reported on two movie description datasets (M-VAD and MPII-MD). In addition, we demonstrate that LSTM-E outperforms several state-of-the-art techniques in predicting Subject-Verb-Object (SVO) triplets.
Categories: Microsoft
Parameter Estimation for Generalized Thurstone Choice Models
We consider the maximum likelihood parameter estimation problem for a generalized Thurstone choice model, where choices are from comparison sets of two or more items. We provide tight characterizations of the mean square error, as well as necessary and sufficient conditions for correct classification when each item belongs to one of two classes. These results provide insights into how the estimation accuracy depends on the choice of a generalized Thurstone choice model and the structure of comparison sets. We find that for a priori unbiased structures of comparisons, e.g., when comparison sets are drawn independently and uniformly at random, the number of observations needed to achieve a prescribed estimation accuracy depends on the choice of a generalized Thurstone choice model. For a broad set of generalized Thurstone choice models, which includes all popular instances used in practice, the estimation error is shown to be largely insensitive to the cardinality of comparison sets. On the other hand, we found that there exist generalized Thurstone choice models for which the estimation error decreases much faster with the cardinality of comparison sets. We report results of empirical evaluations using schedules of contests as observed in some popular sport competitions and online crowdsourcing systems.
Categories: Microsoft
MSR-VTT: A Large Video Description Dataset for Bridging Video and Language
While there has been increasing interest in the task of describing video with natural language, current computer vision algorithms are still severely limited in terms of the variability and complexity of the videos and their associated language that they can recognize. This is in part due to the simplicity of current benchmarks, which mostly focus on specific fine-grained domains with limited videos and simple descriptions. While researchers have provided several benchmark datasets for image captioning, we are not aware of any large-scale video description dataset with comprehensive categories yet diverse video content. In this paper we present MSR-VTT (standing for “MSR Video to Text”) which is a new large-scale video benchmark for video understanding, especially the emerging task of translating video to text. This is achieved by collecting 257 popular queries from a commercial video search engine, with 118 videos for each query. In its current version, MSR-VTT provides 10K web video clips with 41.2 hours and 200K clip-sentence pairs in total, covering the most comprehensive categories and diverse visual content, and representing the largest dataset in terms of sentence and vocabulary. Each clip is annotated with about 20 natural sentences by 1,327 AMT workers. We present a detailed analysis of MSR-VTT in comparison to a complete set of existing datasets, together with a summarization of different state-of-the-art video-to-text approaches. We also provide an extensive evaluation of these approaches on this dataset, showing that the hybrid Recurrent Neural Networkbased approach, which combines single-frame and motion representations with soft-attention pooling strategy, yields the best generalization capability on MSR-VTT.
Categories: Microsoft
Lower Runtime Bounds for Integer Programs
We present a technique to infer lower bounds on the worst-case runtime complexity of integer programs. To this end, we construct symbolic representations of program executions using a framework for iterative, under-approximating program simplification. The core of this simplification is a method for (under-approximating) program acceleration based on recurrence solving and a variation of ranking functions. Afterwards, we deduce asymptotic lower bounds from the resulting simplified programs. We implemented our technique in our tool LoAT and showthat it infers non-trivial lower bounds for a large number of examples.
Categories: Microsoft