Microsoft Research Publications
Keep current with all the latest Microsoft Research Publications and Technical Reports
Updated: 8 years 26 weeks ago
You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images
Video concept learning often requires a large set of training samples. In practice, however, acquiring noise-free training labels with sufficient positive examples is very expensive. A plausible solution for training data collection is by sampling from the vast quantities of images and videos on the Web. Such a solution is motivated by the assumption that the retrieved images or videos are highly correlated with the query. Still, a number of challenges remain. First, Web videos are often untrimmed. Thus, only parts of the videos are relevant to the query. Second, the retrieved Web images are always highly relevant to the issued query. However, thoughtlessly utilizing the images in the video domain may even hurt the performance due to the well-known semantic drift and domain gap problems. As a result, a valid question is how Web images and videos interact for video concept learning. In this paper, we propose a Lead–Exceed Neural Network (LENN), which reinforces the training on Web images and videos in a curriculum manner. Specifically, the training proceeds by inputting frames of Web videos to obtain a network. The Web images are then filtered by the learnt network and the selected images are additionally fed into the network to enhance the architecture and further trim the videos. In addition, Long Short-Term Memory (LSTM) can be applied on the trimmed videos to explore temporal information. Encouraging results are reported on UCF101, TRECVID 2013 and 2014 MEDTest in the context of both action recognition and event detection. Without using human annotated exemplars, our proposed LENN can achieve 74.4% accuracy on UCF101 dataset.
Categories: Microsoft
FlashBack: Immersive Virtual Reality on Mobile Devices via Rendering Memoization
Virtual reality head-mounted displays (VR HMDs) are attracting users with the promise of full sensory immersion in virtual environments. Creating the illusion of immersion for a near-eye display results in very heavy rendering workloads: low latency, high framerate, and high visual quality are all needed. Tethered VR setups in which the HMD is bound to a powerful gaming desktop limit mobility and exploration, and are difficult to deploy widely. Products such as Google Cardboard and Samsung Gear VR purport to offer any user a mobile VR experience, but their GPUs are too power-constrained to produce an acceptable framerate and latency, even for scenes of modest visual quality. We present FlashBack, an unorthodox design point for HMD VR that eschews all real-time scene rendering. Instead, FlashBack aggressively precomputes and caches all possible images that a VR user might encounter. FlashBack memoizes costly rendering effort in an offline step to build a cache full of panoramic images. During runtime, FlashBack constructs and maintains a hierarchical storage cache index to quickly lookup images that the user should be seeing. On a cache miss, FlashBack uses fast approximations of the correct image while concurrently fetching more closely-matching entries from its cache for future requests. Moreover, FlashBack not only works for static scenes, but also for dynamic scenes with moving and animated objects. We evaluate a prototype implementation of FlashBack and report up to a 8× improvement in framerate, 97x reduction in energy consumption per frame, and 15× latency reduction compared to a locally-rendered mobile VR setup. In some cases, FlashBack even delivers better framerates and responsiveness than a tethered HMD configuration on graphically complex scenes. Paper to appear
Categories: Microsoft
Task Completion Platform: A self-serve multi-domain goal oriented dialogue platform
Categories: Microsoft
Semi-Supervised Query Classification Using Matrix Sketching
The enormous scale of unlabeled text available today necessitates scalable schemes for representation learning in natural language processing. For instance, in this paper we are interested in classifying the intent of a user query. While our labeled data is quite limited, we have access to virtually an unlimited amount of unlabeled queries, which could be used to induce useful representations: for instance by principal component analysis (PCA). However, it is prohibitive to even store the data in memory due to its sheer size, let alone apply conventional batch algorithms. In this work, we apply the recently proposed matrix sketching algorithm to entirely obviate the problem with scalability (Liberty, 2013). This algorithm approximates the data within a specified memory bound while preserving the covariance structure necessary for PCA. Using matrix sketching, we significantly improve the user intent classification accuracy by leveraging large amounts of unlabeled queries.
Categories: Microsoft
Platypus - Indoor Localization and Identification through Sensing Electric Potential Changes in Human Bodies
Platypus is the first system to localize and identify people by remotely and passively sensing changes in their body electric potential which occur naturally during walking. While it uses three or more electric potential sensors with a maximum range of 2 m, as a tag-free system it does not require the user to carry any special hardware. We describe the physical principles behind body electric potential changes, and a predictive mathematical model of how this affects a passive electric field sensor. By inverting this model and combining data from sensors, we infer a method for localizing people and experimentally demonstrate a median localization error of 0.16 m. We also use the model to remotely infer the change in body electric potential with a mean error of 8.8 % compared to direct contact-based measurements. We show how the reconstructed body electric potential differs from person to person and thereby how to perform identification. Based on short walking sequences of 5 s, we identify four users with an accuracy of 94 %, and 30 users with an accuracy of 75 %. We demonstrate that identification features are valid over multiple days, though change with footwear.
Categories: Microsoft
The Label Complexity of Mixed-Initiative Classifier Training
Mixed-initiative classifier training, where the human teacher can choose which items to label or to label items chosen by the computer, has enjoyed empirical success but without a rigorous statistical learning theoretical justification. We analyze the label complexity of a simple mixed-initiative training mechanism using teaching dimension and active learning. We show that mixed-initiative training is advantageous compared to either computer-initiated (represented by active learning) or human-initiated classifier training. The advantage exists across all human teaching abilities, from optimal to completely unhelpful teachers. We further improve classifier training by educating the human teachers. This is done by showing example optimal teaching sets to the human teachers. We conduct Mechanical Turk human experiments on two stylistic classifier training tasks to illustrate our approach.
Categories: Microsoft
Accelerating Relational Databases by Leveraging Remote Memory and RDMA
Memory is a crucial resource in relational databases (RDBMSs). When there is insufficient memory, RDBMSs are forced to use slower media such as SSDs or HDDs, which can significantly degrade workload performance. Cloud database services are deployed in data centers where network adapters supporting remote direct memory access (RDMA) at low latency and high bandwidth are becoming prevalent. We study the novel problem of how a Symmetric Multi-Processing (SMP) RDBMS, whose memory demands exceed locally-available memory, can leverage available remote memory in the cluster accessed via RDMA to improve query performance. We expose available memory on remote servers using a lightweight file API that allows an SMP RDBMS to leverage the benefits of remote memory with modest changes. We identify and implement several novel scenarios to demonstrate these benefits, and address design challenges that are crucial for efficient implementation. We implemented the scenarios in Microsoft SQL Server engine and present the first end-to-end study to demonstrate benefits of remote memory for a variety of micro-benchmarks and industry-standard benchmarks. Compared to using disks when memory is insufficient, we improve the throughput and latency of queries with short reads and writes by 3× to 10×, while improving the latency of multiple TPC-H and TPC-DS queries by 2× to 100×.
Categories: Microsoft
Automated Demand-driven Resource Scaling in Relational Database-as-a-Service
Relational Database-as-a-Service (DaaS) platforms today support the abstraction of a resource container that guarantees a fixed amount of resources. Tenants are responsible for selecting a container size suitable for their workloads, which they can change to leverage the cloud’s elasticity. However, automating this task is daunting for most tenants since estimating resource demands for arbitrary SQL workloads in an RDBMS is complex and challenging. In addition, workloads and resource requirements can vary significantly within minutes to hours, and container sizes vary by orders of magnitude both in the amount of resources as well as monetary cost. We present a solution to enable a DaaS to auto-scale container sizes on behalf of its tenants. Approaches to auto-scale stateless services, such as web servers, that rely on historical resource utilization as the primary signal, often perform poorly for stateful database servers which are significantly more complex. Our solution derives a set of robust signals from database engine telemetry and combines them to significantly improve accuracy of demand estimation for database workloads resulting in more accurate scaling decisions. Our solution raises the abstraction by allowing tenants to reason about monetary budget and query latency rather than resources. We prototyped our approach in Microsoft Azure SQL Database and ran extensive experiments using workloads with realistic time-varying resource demand patterns obtained from production traces. Compared to an approach that uses only resource utilization to estimate demand, our approach results in 1.5× to 3× lower monetary costs while achieving comparable query latencies.
Categories: Microsoft
A Design and Verification Methodology for Secure Isolated Regions
Hardware support for isolated execution (such as Intel SGX) enables development of applications that keep their code and data confidential even while running in a hostile or compromised host. However, automatically verifying that such applications satisfy confidentiality remains challenging. We present a methodology for designing such applications in a way that enables certifying their confidentiality. Our methodology consists of forcing the application to communicate with the external world through a narrow interface, compiling it with runtime checks that aid verification, and linking it with a small runtime that implements the narrow interface. The runtime includes core services such as secure communication channels and memory management. We formalize this restriction restriction on the application as Information Release Confinement (IRC), and we show that it allows us to decompose the task of proving confidentiality into (a) one-time, human assisted functional verification of the runtime to ensure that it does not leak secrets, (b) automatic verification of the application’s machine code to ensure that it satisfies IRC and does not directly read or corrupt the runtime’s internal state. We present /CONFIDENTIAL: a verifier for IRC that is modular, automatic, and keeps our compiler out of the trusted computing base. Our evaluation suggests that the methodology scales to real-world applications.
Categories: Microsoft
Enabling Privacy-Preserving Incentives for Mobile Crowd Sensing Systems
Recent years have witnessed the proliferation of mobile crowd sensing (MCS) systems that leverage the public crowd equipped with various mobile devices (e.g., smartphones, smartglasses, smartwatches) for large scale sensing tasks. Because of the importance of incentivizing worker participation in such MCS systems, several auction-based incentive mechanisms have been proposed in past literature. However, these mechanisms fail to consider the preservation of workers’ bid privacy. Therefore, different from prior work, we propose a differentially private incentive mechanism that preserves the privacy of each worker’s bid against the other honest-but-curious workers. The motivation of this design comes from the concern that a worker’s bid usually contains her private information that should not be disclosed. We design our incentive mechanism based on the single-minded reverse combinatorial auction. Specifically, we design a differentially private, approximately truthful, individual rational, and computationally efficient mechanism that approximately minimizes the platform’s total payment with a guaranteed approximation ratio. The advantageous properties of the proposed mechanism are justified through not only rigorous theoretical analysis but also extensive simulations.
Categories: Microsoft
Sample + Seek: Approximating Aggregates with Distribution Precision Guarantee
Data volumes are growing exponentially for our decision-support systems making it challenging to ensure interactive response time for ad-hoc queries without increasing cost of hardware. Aggregation queries with Group By that produce an aggregate value for every combination of values in the grouping columns are the most important class of ad-hoc queries. As small errors are usually tolerable for such queries, approximate query processing (AQP) has the potential to answer them over very large datasets much faster. In many cases analysts require the distribution of (group, aggvalue) pairs in the estimated answer to be guaranteed within a certain error threshold of the exact distribution. Existing AQP techniques are inadequate for two main reasons. First, users cannot express such guarantees. Second, sampling techniques used in traditional AQP can produce arbitrarily large errors even for SUM queries. To address those limitations, we first introduce a new precision metric, called distribution precision, to express such error guarantees. We then study how to provide fast approximate answers to aggregation queries with distribution precision guaranteed within a user-specified error bound. The main challenges are to provide rigorous error guarantees and to handle arbitrary highly selective predicates without maintaining large-sized samples. We propose a novel sampling scheme called measure-biased sampling to address the former challenge. For the latter, we propose two new indexes to augment in-memory samples. Like other sampling-based AQP techniques, our solution supports any aggregate that can be estimated from random samples. In addition to deriving theoretical guarantees, we conduct experimental study to compare our system with state-of-the-art AQP techniques and a commercial column-store database system on both synthetic and real enterprise datasets. Our system provides a median speed-up of more than 100x with around 5% distribution error compared with the commercial database.
Categories: Microsoft
Quickr: Lazily Approximating Complex Ad-Hoc Queries in Big Data Clusters
We present a system that approximates the answer to complex ad-hoc queries in big-data clusters by injecting samplers on-the-fly and without requiring pre-existing samples. Improvements can be substantial when big-data queries take multiple passes over data and when samplers execute early in the query plan. We present a new universe sampler which is able to sample multiple join inputs. By incorporating samplers natively into a cost-based query optimizer, we automatically generate plans with appropriate samplers at appropriate locations. We devise an accuracy analysis method using which we ensure that query plans with samplers will not miss groups and that aggregate values are within a small ratio of their true value. An implementation on a cluster with tens of thousands of machines shows that queries in the TPC-DS benchmark use a median of 2x fewer resources. In contrast, approaches that construct input samples even when given 10x the size of the input to store samples improve only 22% of the queries, i.e. a median speed up of 0x.
Categories: Microsoft
An Efficient Algorithm for Contextual Bandits with Knapsacks, and an Extension to Concave Objectives
Categories: Microsoft
Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization
The emergence of wearable devices such as portable cameras and smart glasses makes it possible to record life logging first-person videos. Browsing such long unstructured videos is time-consuming and tedious. This paper studies the discovery of moments of user’s major or special interest (i.e., highlights) in a video, for generating the summarization of first-person videos. Specifically, we propose a novel pairwise deep ranking model that employs deep learning techniques to learn the relationship between highlight and non-highlight video segments. A two-stream network structure by representing video segments from complementary information on appearance of video frames and temporal dynamics across frames is developed for video highlight detection. Given a long personal video, equipped with the highlight detection model, a highlight score is assigned to each segment. The obtained highlight segments are applied for summarization in two ways: video timelapse and video skimming. The former plays the highlight (non-highlight) segments at low (high) speed rates, while the latter assembles the sequence of segments with the highest scores. On 100 hours of first-person videos for 15 unique sports categories, our highlight detection achieves the improvement over the state-of-the-art RankSVM method by 10.5% in terms of accuracy. Moreover, our approaches produce video summary with better quality by a user study from 35 human subjects.
Categories: Microsoft
CloudBuild: Microsoft’s Distributed and Caching Build Service
Thousands of Microsoft engineers build and test hundreds of software products several times a day. It is essential that this continuous integration scales, guarantees short feedback cycles, and functions reliably with minimal human intervention. This paper describes CloudBuild, the build service infrastructure developed within Microsoft over the last few years. CloudBuild is responsible for all aspects of a continuous integration workflow, including builds, test and code analysis, as well as drops, package and symbol creation and storage. CloudBuild supports multiple build languages as long as they fulfill a coarse grained, file IO based contract. CloudBuild uses content based caching to run build-related tasks only when needed. Lastly, it builds on many machines in parallel. CloudBuild offers a reliable build service in the presence of unreliable components. It aims to rapidly onboard teams and hence has to support non-deterministic build tools and specification languages that under-declare dependencies. We will outline how we addressed these challenges and characterize the operations of CloudBuild. CloudBuild has on-boarded hundreds of codebases with only man-months of effort each. Some of these codebases are used by thousands of developers. The speed ups of build and test range from 1.3x to 10x, and service availability is 99%.
Categories: Microsoft
Jointly Modeling Embedding and Translation to Bridge Video and Language
Automatically describing video content with natural language is a fundamental challenge of computer vision. Recurrent Neural Networks (RNNs), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with the given previous words and the visual content, while the relationship between sentence semantics and visual content is not holistically exploited. As a result, the generated sentences may be contextually correct but the semantics (e.g., subjects, verbs or objects) are not true. This paper presents a novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visual-semantic embedding. The former aims to locally maximize the probability of generating the next word given previous words and visual content, while the latter is to create a visual-semantic embedding space for enforcing the relationship between the semantics of the entire sentence and visual content. The experiments on YouTube2Text dataset show that our proposed LSTM-E achieves to-date the best published performance in generating natural sentences: 45.3% and 31.0% in terms of BLEU@4 and METEOR, respectively. Superior performances are also reported on two movie description datasets (M-VAD and MPII-MD). In addition, we demonstrate that LSTM-E outperforms several state-of-the-art techniques in predicting Subject-Verb-Object (SVO) triplets.
Categories: Microsoft
Parameter Estimation for Generalized Thurstone Choice Models
We consider the maximum likelihood parameter estimation problem for a generalized Thurstone choice model, where choices are from comparison sets of two or more items. We provide tight characterizations of the mean square error, as well as necessary and sufficient conditions for correct classification when each item belongs to one of two classes. These results provide insights into how the estimation accuracy depends on the choice of a generalized Thurstone choice model and the structure of comparison sets. We find that for a priori unbiased structures of comparisons, e.g., when comparison sets are drawn independently and uniformly at random, the number of observations needed to achieve a prescribed estimation accuracy depends on the choice of a generalized Thurstone choice model. For a broad set of generalized Thurstone choice models, which includes all popular instances used in practice, the estimation error is shown to be largely insensitive to the cardinality of comparison sets. On the other hand, we found that there exist generalized Thurstone choice models for which the estimation error decreases much faster with the cardinality of comparison sets. We report results of empirical evaluations using schedules of contests as observed in some popular sport competitions and online crowdsourcing systems.
Categories: Microsoft