Blogroll
uLink: Enabling user-defined deep linking to app content
Web deep links are instrumental to many fundamental user experiences such as navigating to one web page from another, bookmarking a page, or sharing it with others. Such experiences are not possible with individual pages inside mobile apps, since historically mobile apps did not have links equivalent to web deep links. Mobile deep links, introduced in recent years, still lack many important properties of web deep links. Unlike web links, mobile deep links need significant developer effort, cover a small number of predefined pages, and are defined statically to navigate to a page for a given link, but not to dynamically generate a link for a given page. We propose uLink, a novel deep linking mechanism that addresses these problems. uLink is implemented as an application library, which transparently tracks data- and UI-event-dependencies of app pages, and encodes the information in links to the pages; when a link is invoked, the information is utilized to recreate the target page quickly and accurately. uLink also employs techniques, based on static and dynamic analysis of the app, that can provide feedback to users about whether a link may break in the future due to, e.g., modifications of external resources such as a file the link depends on. We have implemented uLink on Android. Our evaluation with 34 (of 1000 most downloaded) Android apps shows that compared to existing mobile deep links, uLink requires minimal developer effort, achieves significantly higher coverage, and can provide accurate user feedback on a broken link. Video: https://youtu.be/CkEGtwtpomc
Categories: Microsoft
Design of Mirror Assembly for an Agile Reconfigurable Data Center Interconnect
Free-space data center interconnects (wireless, optical) require a reflection mechanism to avoid obstacles/interference that can arise due concentrating signals in a plane consisting of source/destination nodes. A proper reflection mechanism above the communicating nodes enables a beam to depart the source, hit the reflector structure well above the source and destination, and head towards the desired destination by utilizing all three spatial dimensions. This technical report involves the design of a mirror assembly (i.e. reflector structure) for 3-D beam steering in data center networks that make use of the diffraction properties of Digital Micromirror Devices (DMDs).
Categories: Microsoft
Stable Matching Algorithm for an Agile Reconfigurable Data Center Interconnect
This document describes the scheduling algorithm used in the ProjecToR data-center interconnect system. The algorithm assigns bundles to laser-photodetector edges as they arrive online with the goal of minimizing the bundle/flow completion times over all bundles in the system. We compare the performance of the proposed stable-matching algorithm to the optimal performance of a more powerful hindsight optimal algorithm that is aware of all the bundles arriving in the system and can operate offline. We also give an integer linear programming formulation for the problem of maximizing instantaneous throughput.
Categories: Microsoft
Shrink – Prescribing Resiliency Solutions for Streaming
Streaming query deployments make up a vital part of cloud oriented applications. They vary widely in their data, logic, and statefulness, and are typically executed in multi-tenant distributed environments with varying uptime SLAs. In order to achieve these SLAs, one of a number of proposed resiliency strategies is employed to protect against failure. This paper has introduced the first, comprehensive, cloud friendly comparison between different resiliency techniques for streaming queries. In this paper, we introduce models which capture the costs associated with different resiliency strategies, and through a series of experiments which implement and validate these models, show that (1) there is no single resiliency strategy which efficiently handles most streaming scenarios; (2) the optimization space is too complex for a person to employ a “rules of thumb” approach; and (3) there exists a clear generalization of periodic checkpointing that is worth considering in many cases. Finally, the models presented in this paper can be adapted to fit a wide variety of resiliency strategies, and likely have important consequences for cloud services beyond those that are obviously streaming.
Categories: Microsoft
Secure Data Exchange: A Marketplace in the Cloud
A vast amount of data belonging to companies and individuals is currently stored in the cloud in encrypted form by trustworthy service providers such as Microsoft, Amazon, and Google. Unfortunately, the only way for the cloud to use the data in computations is to first decrypt it, then compute on it, and finally re-encrypt it, resulting in a problematic trade-off between value/utility and security. At a high level, our goal in this paper is to present a general and practical cryptographic solution to this dilemma. More precisely, we describe a scenario that we call Secure Data Exchange (SDE), where several data owners are storing private encrypted data in a semi-honest non-colluding cloud, and an evaluator (a third party) wishes to engage in a secure function evaluation on the data belonging to some subset of the data owners. We require that none of the parties involved learns anything beyond what they already know and what is revealed by the function, even when the parties (except the cloud) are active malicious. We also recognize the ubiquity of scenarios where the lack of an efficient SDE protocol prevents for example business transactions, research collaborations, or mutually beneficial computations on aggregated private data from taking place, and discuss several such scenarios in detail. Our main result is an efficient and practical protocol for enabling SDE using Secure MultiParty Computation (MPC) in a novel adaptation of the server-aided setting. We also present the details of an implementation along with performance numbers.
Categories: Microsoft
Subspace Clustering with a Twist
Subspace segmentation or clustering can be defined as the process of assigning subspace labels to a set of data points assumed to lie on the union of multiple low-dimensional, linear subspaces. Given that each point can be efficiently expressed using a linear combination of other points from the same subspace, a variety of segmentation algorithms built upon $\ell_1$, nuclear norm, and other convex penalties have recently shown state-of-the-art robustness on multiple benchmarks. However, what if instead of observing the original data points, we instead only have access to transformed, or `twisted' so to speak, measurements? Here we consider underdetermined affine transformations that may arise in computer vision applications such as bidirectional reflectance distribution function (BRDF) estimation. Unfortunately most existing approaches, convex or otherwise, do not address this highly useful generalization. To fill this void, we proceed by deriving a probabilistic model that simultaneously estimates the latent data points and subspace memberships using simple EM update rules. Moreover, in certain restricted settings this approach is guaranteed to produce the correct clustering. Finally a wide range of corroborating empirical evidence, including a BRDF estimation task, speaks to the practical efficacy of this algorithm.
Categories: Microsoft
Analysis of VB Factorizations for Sparse and Low-Rank Estimation
Variational Bayesian (VB) factorial approximations anchor a wide variety of probabilistic models, where tractable posterior inference is almost never possible. This basic strategy is particularly attractive when estimating structured low-dimensional models of high-dimensional data, exemplified by the search for minimal rank and/or sparse approximations to observed data. To this end, VB models are frequently deployed across applications including multi-task learning, robust PCA, subspace clustering, matrix completion, affine rank minimization, source localization, compressive sensing, and assorted combinations thereof. Perhaps surprisingly however, there exists almost no attendant theoretical explanation for how various VB factorizations operate, and in which situations one may be preferable to another. We address this relative void by comparing arguably two of the most popular factorizations, one built upon Gaussian scale mixture priors, the other bilinear Gaussian priors, both of which can favor minimal rank or sparsity depending on the context. More specifically, by reexpressing the respective VB objective functions, we weigh multiple factors related to local minima avoidance, feature transformation invariance and correlation, and computational complexity to arrive at insightful conclusions useful in explaining performance and deciding which VB flavor is advantageous. We also envision that the principles explored here are quite relevant to other structured inverse problems where VB serves as a viable solution.
Categories: Microsoft
You Lead, We Exceed: Labor-Free Video Concept Learning by Jointly Exploiting Web Videos and Images
Video concept learning often requires a large set of training samples. In practice, however, acquiring noise-free training labels with sufficient positive examples is very expensive. A plausible solution for training data collection is by sampling from the vast quantities of images and videos on the Web. Such a solution is motivated by the assumption that the retrieved images or videos are highly correlated with the query. Still, a number of challenges remain. First, Web videos are often untrimmed. Thus, only parts of the videos are relevant to the query. Second, the retrieved Web images are always highly relevant to the issued query. However, thoughtlessly utilizing the images in the video domain may even hurt the performance due to the well-known semantic drift and domain gap problems. As a result, a valid question is how Web images and videos interact for video concept learning. In this paper, we propose a Lead–Exceed Neural Network (LENN), which reinforces the training on Web images and videos in a curriculum manner. Specifically, the training proceeds by inputting frames of Web videos to obtain a network. The Web images are then filtered by the learnt network and the selected images are additionally fed into the network to enhance the architecture and further trim the videos. In addition, Long Short-Term Memory (LSTM) can be applied on the trimmed videos to explore temporal information. Encouraging results are reported on UCF101, TRECVID 2013 and 2014 MEDTest in the context of both action recognition and event detection. Without using human annotated exemplars, our proposed LENN can achieve 74.4% accuracy on UCF101 dataset.
Categories: Microsoft
FlashBack: Immersive Virtual Reality on Mobile Devices via Rendering Memoization
Virtual reality head-mounted displays (VR HMDs) are attracting users with the promise of full sensory immersion in virtual environments. Creating the illusion of immersion for a near-eye display results in very heavy rendering workloads: low latency, high framerate, and high visual quality are all needed. Tethered VR setups in which the HMD is bound to a powerful gaming desktop limit mobility and exploration, and are difficult to deploy widely. Products such as Google Cardboard and Samsung Gear VR purport to offer any user a mobile VR experience, but their GPUs are too power-constrained to produce an acceptable framerate and latency, even for scenes of modest visual quality. We present FlashBack, an unorthodox design point for HMD VR that eschews all real-time scene rendering. Instead, FlashBack aggressively precomputes and caches all possible images that a VR user might encounter. FlashBack memoizes costly rendering effort in an offline step to build a cache full of panoramic images. During runtime, FlashBack constructs and maintains a hierarchical storage cache index to quickly lookup images that the user should be seeing. On a cache miss, FlashBack uses fast approximations of the correct image while concurrently fetching more closely-matching entries from its cache for future requests. Moreover, FlashBack not only works for static scenes, but also for dynamic scenes with moving and animated objects. We evaluate a prototype implementation of FlashBack and report up to a 8× improvement in framerate, 97x reduction in energy consumption per frame, and 15× latency reduction compared to a locally-rendered mobile VR setup. In some cases, FlashBack even delivers better framerates and responsiveness than a tethered HMD configuration on graphically complex scenes. Paper to appear
Categories: Microsoft
Task Completion Platform: A self-serve multi-domain goal oriented dialogue platform
Categories: Microsoft
Semi-Supervised Query Classification Using Matrix Sketching
The enormous scale of unlabeled text available today necessitates scalable schemes for representation learning in natural language processing. For instance, in this paper we are interested in classifying the intent of a user query. While our labeled data is quite limited, we have access to virtually an unlimited amount of unlabeled queries, which could be used to induce useful representations: for instance by principal component analysis (PCA). However, it is prohibitive to even store the data in memory due to its sheer size, let alone apply conventional batch algorithms. In this work, we apply the recently proposed matrix sketching algorithm to entirely obviate the problem with scalability (Liberty, 2013). This algorithm approximates the data within a specified memory bound while preserving the covariance structure necessary for PCA. Using matrix sketching, we significantly improve the user intent classification accuracy by leveraging large amounts of unlabeled queries.
Categories: Microsoft
Platypus - Indoor Localization and Identification through Sensing Electric Potential Changes in Human Bodies
Platypus is the first system to localize and identify people by remotely and passively sensing changes in their body electric potential which occur naturally during walking. While it uses three or more electric potential sensors with a maximum range of 2 m, as a tag-free system it does not require the user to carry any special hardware. We describe the physical principles behind body electric potential changes, and a predictive mathematical model of how this affects a passive electric field sensor. By inverting this model and combining data from sensors, we infer a method for localizing people and experimentally demonstrate a median localization error of 0.16 m. We also use the model to remotely infer the change in body electric potential with a mean error of 8.8 % compared to direct contact-based measurements. We show how the reconstructed body electric potential differs from person to person and thereby how to perform identification. Based on short walking sequences of 5 s, we identify four users with an accuracy of 94 %, and 30 users with an accuracy of 75 %. We demonstrate that identification features are valid over multiple days, though change with footwear.
Categories: Microsoft
The Label Complexity of Mixed-Initiative Classifier Training
Mixed-initiative classifier training, where the human teacher can choose which items to label or to label items chosen by the computer, has enjoyed empirical success but without a rigorous statistical learning theoretical justification. We analyze the label complexity of a simple mixed-initiative training mechanism using teaching dimension and active learning. We show that mixed-initiative training is advantageous compared to either computer-initiated (represented by active learning) or human-initiated classifier training. The advantage exists across all human teaching abilities, from optimal to completely unhelpful teachers. We further improve classifier training by educating the human teachers. This is done by showing example optimal teaching sets to the human teachers. We conduct Mechanical Turk human experiments on two stylistic classifier training tasks to illustrate our approach.
Categories: Microsoft
Accelerating Relational Databases by Leveraging Remote Memory and RDMA
Memory is a crucial resource in relational databases (RDBMSs). When there is insufficient memory, RDBMSs are forced to use slower media such as SSDs or HDDs, which can significantly degrade workload performance. Cloud database services are deployed in data centers where network adapters supporting remote direct memory access (RDMA) at low latency and high bandwidth are becoming prevalent. We study the novel problem of how a Symmetric Multi-Processing (SMP) RDBMS, whose memory demands exceed locally-available memory, can leverage available remote memory in the cluster accessed via RDMA to improve query performance. We expose available memory on remote servers using a lightweight file API that allows an SMP RDBMS to leverage the benefits of remote memory with modest changes. We identify and implement several novel scenarios to demonstrate these benefits, and address design challenges that are crucial for efficient implementation. We implemented the scenarios in Microsoft SQL Server engine and present the first end-to-end study to demonstrate benefits of remote memory for a variety of micro-benchmarks and industry-standard benchmarks. Compared to using disks when memory is insufficient, we improve the throughput and latency of queries with short reads and writes by 3× to 10×, while improving the latency of multiple TPC-H and TPC-DS queries by 2× to 100×.
Categories: Microsoft
Automated Demand-driven Resource Scaling in Relational Database-as-a-Service
Relational Database-as-a-Service (DaaS) platforms today support the abstraction of a resource container that guarantees a fixed amount of resources. Tenants are responsible for selecting a container size suitable for their workloads, which they can change to leverage the cloud’s elasticity. However, automating this task is daunting for most tenants since estimating resource demands for arbitrary SQL workloads in an RDBMS is complex and challenging. In addition, workloads and resource requirements can vary significantly within minutes to hours, and container sizes vary by orders of magnitude both in the amount of resources as well as monetary cost. We present a solution to enable a DaaS to auto-scale container sizes on behalf of its tenants. Approaches to auto-scale stateless services, such as web servers, that rely on historical resource utilization as the primary signal, often perform poorly for stateful database servers which are significantly more complex. Our solution derives a set of robust signals from database engine telemetry and combines them to significantly improve accuracy of demand estimation for database workloads resulting in more accurate scaling decisions. Our solution raises the abstraction by allowing tenants to reason about monetary budget and query latency rather than resources. We prototyped our approach in Microsoft Azure SQL Database and ran extensive experiments using workloads with realistic time-varying resource demand patterns obtained from production traces. Compared to an approach that uses only resource utilization to estimate demand, our approach results in 1.5× to 3× lower monetary costs while achieving comparable query latencies.
Categories: Microsoft
A Design and Verification Methodology for Secure Isolated Regions
Hardware support for isolated execution (such as Intel SGX) enables development of applications that keep their code and data confidential even while running in a hostile or compromised host. However, automatically verifying that such applications satisfy confidentiality remains challenging. We present a methodology for designing such applications in a way that enables certifying their confidentiality. Our methodology consists of forcing the application to communicate with the external world through a narrow interface, compiling it with runtime checks that aid verification, and linking it with a small runtime that implements the narrow interface. The runtime includes core services such as secure communication channels and memory management. We formalize this restriction restriction on the application as Information Release Confinement (IRC), and we show that it allows us to decompose the task of proving confidentiality into (a) one-time, human assisted functional verification of the runtime to ensure that it does not leak secrets, (b) automatic verification of the application’s machine code to ensure that it satisfies IRC and does not directly read or corrupt the runtime’s internal state. We present /CONFIDENTIAL: a verifier for IRC that is modular, automatic, and keeps our compiler out of the trusted computing base. Our evaluation suggests that the methodology scales to real-world applications.
Categories: Microsoft
Enabling Privacy-Preserving Incentives for Mobile Crowd Sensing Systems
Recent years have witnessed the proliferation of mobile crowd sensing (MCS) systems that leverage the public crowd equipped with various mobile devices (e.g., smartphones, smartglasses, smartwatches) for large scale sensing tasks. Because of the importance of incentivizing worker participation in such MCS systems, several auction-based incentive mechanisms have been proposed in past literature. However, these mechanisms fail to consider the preservation of workers’ bid privacy. Therefore, different from prior work, we propose a differentially private incentive mechanism that preserves the privacy of each worker’s bid against the other honest-but-curious workers. The motivation of this design comes from the concern that a worker’s bid usually contains her private information that should not be disclosed. We design our incentive mechanism based on the single-minded reverse combinatorial auction. Specifically, we design a differentially private, approximately truthful, individual rational, and computationally efficient mechanism that approximately minimizes the platform’s total payment with a guaranteed approximation ratio. The advantageous properties of the proposed mechanism are justified through not only rigorous theoretical analysis but also extensive simulations.
Categories: Microsoft