Microsoft Research Publications

Syndicate content
Keep current with all the latest Microsoft Research Publications and Technical Reports
Updated: 7 years 45 weeks ago

MSR-VTT: A Large Video Description Dataset for Bridging Video and Language

Wed, 06/01/2016 - 09:00
While there has been increasing interest in the task of describing video with natural language, current computer vision algorithms are still severely limited in terms of the variability and complexity of the videos and their associated language that they can recognize. This is in part due to the simplicity of current benchmarks, which mostly focus on specific fine-grained domains with limited videos and simple descriptions. While researchers have provided several benchmark datasets for image captioning, we are not aware of any large-scale video description dataset with comprehensive categories yet diverse video content. In this paper we present MSR-VTT (standing for “MSR Video to Text”) which is a new large-scale video benchmark for video understanding, especially the emerging task of translating video to text. This is achieved by collecting 257 popular queries from a commercial video search engine, with 118 videos for each query. In its current version, MSR-VTT provides 10K web video clips with 41.2 hours and 200K clip-sentence pairs in total, covering the most comprehensive categories and diverse visual content, and representing the largest dataset in terms of sentence and vocabulary. Each clip is annotated with about 20 natural sentences by 1,327 AMT workers. We present a detailed analysis of MSR-VTT in comparison to a complete set of existing datasets, together with a summarization of different state-of-the-art video-to-text approaches. We also provide an extensive evaluation of these approaches on this dataset, showing that the hybrid Recurrent Neural Networkbased approach, which combines single-frame and motion representations with soft-attention pooling strategy, yields the best generalization capability on MSR-VTT.
Categories: Microsoft

Lower Runtime Bounds for Integer Programs

Wed, 06/01/2016 - 09:00
We present a technique to infer lower bounds on the worst-case runtime complexity of integer programs. To this end, we construct symbolic representations of program executions using a framework for iterative, under-approximating program simplification. The core of this simplification is a method for (under-approximating) program acceleration based on recurrence solving and a variation of ranking functions. Afterwards, we deduce asymptotic lower bounds from the resulting simplified programs. We implemented our technique in our tool LoAT and showthat it infers non-trivial lower bounds for a large number of examples.
Categories: Microsoft

Deep Neural Decision Forests

Wed, 06/01/2016 - 09:00
Categories: Microsoft

The Global Patch Collider

Wed, 06/01/2016 - 09:00
This paper proposes a novel extremely efficient, fully-parallelizable, task-specific algorithm for the computation of global point-wise correspondences in images and videos. Our algorithm, the Global Patch Collider, is based on detecting unique collisions between image points using a collection of learned tree structures that act as conditional hash functions. In contrast to conventional approaches that rely on pairwise distance computation, our algorithm isolates distinctive pixel pairs that hit the same leaf during traversal through multiple learned tree structures. The split functions stored at the intermediate nodes of the trees are trained to ensure that only visually similar patches or their geometric or photometric transformed versions fall into the same leaf node. The matching process involves passing all pixel positions in the images under analysis through the tree structures. We then compute matches by isolating points that uniquely collide with each other ie. fell in the same empty leaf in multiple trees. Our algorithm is linear in the number of pixels but can be made constant time on a parallel computation architecture as the tree traversal for individual image points is decoupled. We demonstrate the efficacy of our method by using it to perform optical flow matching and stereo matching on some challenging benchmarks. Experimental results show that not only is our method extremely computationally efficient, but it is also able to match or outperform state of the art methods that are much more complex.
Categories: Microsoft

Rich Image Captioning in the Wild

Wed, 06/01/2016 - 09:00
We present an image caption system that addresses new challenges of automatically describing images in the wild. The challenges include high quality caption quality with respect to human judgments, out-of-domain data handling, and low latency required in many applications. Built on top of a state-of-the-art framework, we developed a deep vision model that detects a broad range of visual concepts, an entity recognition model that identifies celebrities and landmarks, and a confidence model for the caption output. Experimental results show that our caption engine outperforms previous state-of-the-art systems significantly on both in-domain dataset (i.e. MS COCO) and out-of-domain datasets.
Categories: Microsoft

Large Scale and Streaming Time Series Segmentation and Piece-Wise Approximation: Extended Version

Sat, 05/28/2016 - 09:00
Segmenting a time series or approximating it with piecewise linear function is often needed when handling data in the time domain to detect outliers, clean data, detect events and more. The data varies from ECG signals, traffic monitors to stock prices and sensor networks. Modern data-sets of this type are large and in many cases are infinite in the sense that the data is a stream rather than a finite sample. Therefore, in order to segment it, an algorithm has to scale gracefully with the size of the data. Dynamic Programming (DP) can findthe optimal segmentation, however, the DP approach has a complexity of O T 2 thus cannot handle datasets with millions of elements, nor can it handle streaming data. Therefore, various heuristics are used in practice to handle the data. This study shows that if the approximation measure has an inverse triangle inequality property (ITIP), the solution of the dynamic program can be computed in linear time and streaming data can be handled too. The ITIP is shown to hold in many cases of interest. The speedup due to the new algorithms is evaluated on a variety of data-sets to be in the range of 8 8200x over the DP solution without sacrificing accuracy. Confidence intervals for segmentations are derived as well.
Categories: Microsoft

FourQNEON: Faster Elliptic Curve Scalar Multiplications on ARM Processors

Thu, 05/26/2016 - 09:00
We present a high speed, high security implementation of the recently proposed elliptic curve FourQ (ASIACRYPT 2015) for 32-bit ARM processors with NEON support. Exploiting the versatile and compact arithmetic of this curve, we design a vectorized implementation that achieves high-performance across a large variety of ARM architectures. Our software is fully protected against timing and cache attacks, and showcases the impressive speed of FourQ when compared with other curve-based alternatives. For example, one single variable-base scalar multiplication is computed in about 235,000 Cortex-A8 cycles or 132,000 Cortex-A15 cycles which, compared to the fastest genus 2 Kummer and Curve25519 implementations on the same platforms, offers speedups between 1.3x-1.7x and between 2.1x-2.4x, respectively. In comparison with the NIST standard curve K-283, we achieve speedups above 4x and 5.5x.
Categories: Microsoft

Biggish: A solution for the inverse privacy problem

Tue, 05/24/2016 - 09:00
An item of your personal information is inversely private if some party has access to it but you do not. Inverse privacy is ubiquitous. Each interaction you have with commercial and other institutions generates inversely private data. The inverse privacy problem is unjustified inaccessibility of your inversely private data to you. Elsewhere a subset of these authors determined that the problem has a market-based solution that provides consumers with large amounts of their personal data to be mined and processed to benefit them. Here we sketch a particular solution.
Categories: Microsoft

E-TIPSY: Search Query Corpus Annotated with Entities, Term Importance, POS Tags, and Syntactic Parses

Tue, 05/24/2016 - 09:00
We present E-TIPSY, a search query corpus annotated with named Entities, Term Importance, POS tags, and SYntactic parses. This corpus contains crowdsourced (gold) annotations of the three most important terms in each query. In addition, it contains automatically produced annotations of named entities, part-of-speech tags, and syntactic parses for the same queries. This corpus comes in two formats: (1) Sober Subset: annotations that two or more crowd workers agreed upon, and (2) Full Glass: all annotations. We analyze the strikingly low correlation between term importance and syntactic headedness, which invites research into effective ways of combining these different signals. Our corpus can serve as a benchmark for term importance methods aimed at improving search engine quality and as an initial step toward developing a dataset of gold linguistic analysis of web search queries. In addition, it can be used as a basis for linguistic inquiries into the kind of expressions used in search.
Categories: Microsoft

Speeding up the Number Theoretic Transform for Faster Ideal Lattice-Based Cryptography

Mon, 05/23/2016 - 09:00
The Number Theoretic Transform (NTT) provides efficient algorithms for cyclic and nega-cyclic convolutions, which have many applications in computer arithmetic, e.g., for multiplying large integers and large degree polynomials. It is commonly used in cryptographic schemes that are based on the hardness of the Ring Learning With Errors (R-LWE) problem to efficiently implement modular polynomial multiplication. We present a new modular reduction technique that is tailored for the special moduli required by the NTT. Based on this reduction, we speed up the NTT and propose faster, multi-purpose algorithms. We present two implementations of these algorithms: a portable C implementation and a high-speed implementation using assembly with AVX2 instructions. In comparison to state-of-the-art implementations of the NTT using the same parameters, our C and assembly implementations achieve factor-1.90 and factor-1.25 speedups, respectively. To demonstrate the improved efficiency in an application example, we benchmarked the algorithms in the context of the R-LWE key exchange protocol that has recently been proposed by Alkim, Ducas, Poppelmann and Schwabe. In this case, our C and assembly implementations compute the full key exchange 1.40 and 1.34 times faster, respectively. These results are achieved with full protection against timing attacks.
Categories: Microsoft

Information Dissemination in Heterogeneous-Intent Networks

Sun, 05/22/2016 - 09:00
Individual’s motivations for participating in social networks can vary greatly, and this heterogeneity is reflected in the wide-variety of behaviors and communication styles we see in large social networks. In this paper, we present a first study of the basic properties of communication in a heterogeneous-intent network and their implications. First, through both user studies and large-scale data analyses, we validate that individual’s specific intents affect their style of communication and their selection of messages. Secondly, we propose a simple structural model of information diffusion within a heterogeneous-intent network and present the results of experiments exploring its emergent properties, finding that it recreates aspects of empirically observed cascades that are not captured in previous models.
Categories: Microsoft

Training A Quantum Optimizer

Wed, 05/18/2016 - 09:00
We study a variant of the quantum approximate optimization algorithm [ E. Farhi, J. Goldstone, and S. Gutmann, arXiv:1411.4028] with slightly different parametrization and different objective: rather than looking for a state which approximately solves an optimization problem, our goal is to find a quantum algorithm that, given an instance of MAX-2-SAT, will produce a state with high overlap with the optimal state. Using a machine learning approach, we chose a ``training set" of instances and optimized the parameters to produce large overlap for the training set. We then tested these optimized parameters on a larger instance set. As a training set, we used a subset of the hard instances studied by E. Crosson, E. Farhi, C. Yen-Yu Lin, H.-H. Lin, and P. Shor (CFLLS) [arXiv:1401.7320]. When tested on the full set, the parameters that we find produce significantly larger overlap than the optimized annealing times of CFLLS. Testing on other random instances from 20 to 28 bits continues to show improvement over annealing, with the improvement being most notable on the hardest instances. Further tests on instances of MAX-3-SAT also showed improvement on the hardest instances. This algorithm may be a possible application for near-term quantum computers with limited coherence times.
Categories: Microsoft

The quality of online answers to parents who suspect that their child has an Autism Spectrum Disorder

Wed, 05/18/2016 - 09:00
The growing diagnosis and public awareness of Autism Spectrum Disorders (ASD) leads more parents to seek answers to their suspicions for ASD in their child on Internet forums. This study describes an analysis of the quality of content of 371 answers on Yahoo Answers (YA), a social question and answer forum, to parents querying whether their child has ASD. We contrasted the perceived quality of answers by clinicians with that of parents. The study tested the feasibility of automatically assisting parents in selecting answers with higher quality using a predictive model based on the text of answers and the attributes of answerers.
Categories: Microsoft

Towards an Open-Domain Framework for Distilling the Outcomes of Personal Experiences from Social Media Timelines

Tue, 05/17/2016 - 09:00
Millions of people share details about their real-world experiences on social media. This provides an opportunity to observe the outcomes of common and critical situations and actions for individual and societal benefit. In this paper, we discuss our efforts to design and build an open-domain framework for mining the outcomes of any given experience from social media timelines. Through a number of example situations and actions across multiple domains, we discuss the kinds of outcomes we are able to extract and their relevance.
Categories: Microsoft

Recommendations meet web browsing: Enhancing Collaborative

Mon, 05/16/2016 - 09:00
Collaborative filtering (CF) recommendation systems are one of the most popular and successful methods for recommending products to people. CF systems work by finding similarities between different people according to their past purchases, and using these similarities to suggest possible items of interest. Here we investigate how CF systems can be enhanced using Internet browsing data and search engine query logs, both of which represent a rich profile of individuals’ interests. We introduce two approaches to enhancing user modeling using this data. Our approaches preserve the privacy of individuals while significantly enhancing model accuracy. We present extensive experimentation based on one-class, implicit feedback matrix factorization. We do not assume the existence of explicit ratings, but rather rely on unweighted, positive signals of the kind available in most commercial contexts. We demonstrate the value of our approach on two real datasets each comprising of the activities of tens of thousands of individuals. The first dataset details the downloads of Windows Phone 8 mobile applications and the second - item views in an online retail store. Both datasets are enhanced using anonymized Internet browsing logs. Our results show that prediction accuracy is improved by up to 72%. This improvement is largest when building a model which can predict for the entire catalog of items, not just popular ones. Finally, we discuss additional benefits of our approach, which include: improved recommendations for users with few past purchases, and enabling recommendations based on short-term purchase intent.
Categories: Microsoft

Differences in physical status, mental state and online behavior of people in pro-anorexia web communities

Tue, 05/10/2016 - 09:00
Background: There is a debate about the effects of pro-anorexia (colloquially referred to as pro-ana) websites. Research suggests that the effect of these websites is not straightforward. Indeed, the actual function of these sites is disputed, with studies indicating both negative and positive effects. Aim: This is the first study which systematically examined the differences between pro-anorexia web communities in four main aspects: web language used (posts); web interests/search behaviors (queries); users' self-reported weight status and weight goals; and associated self-reported mood/pathology. Methods:We collected three primary sources of data, including messages posed on three pro-anawebsites, a survey completed by over 1000 participants of a pro-ana website, and the searches made on the Bing search engine of pro-anorexia users. These data were analyzed for content, reported demographics and pathology, and behavior over time. Results: Although members of the main pro-ana website investigated appear to be depressed, with high rates of self-harm and suicide attempts, users are significantly more interested in treatment, have wishes of procreation and reported the highest goal weights among the investigated sites. In contrast, users of other pro-ana websites investigated, are more interested in morbid themes including depression, self-harm and suicide. The percentage of severely malnourished website users, in general, appears to be small (20%). Conclusions: Our results indicate that a new strategy is required to facilitate the communication between mental health specialists and pro-ana web users, recognizing the differences in harm associated with different websites.
Categories: Microsoft

Factoring with qutrits: Shor's algorithm on ternary and metaplectic quantum architectures

Tue, 05/10/2016 - 09:00
We determine the cost of performing Shor's algorithm for integer factorization on a ternary quantum computer, using two natural models of universal fault tolerant computing on ternary quantum systems: (i) a model based on magic state distillation that assumes the availability of the ternary Clifford gates, projective measurements, classical control and (ii) a model based on a metaplectic topological quantum computer (MTQC). Arguably, a natural choice to implement Shor's algorithm on a ternary quantum computer is to translate the entire arithmetic into a ternary form. However, it is also possible to simply emulate the standard binary version of the algorithm by encoding each qubit in a three level system. In this paper we address this emulation approach and analyze the complexity of implementing Shor's period finding function in both models, (i) and (ii).We compare the costs in terms of magic state counts required in each mode and find that a binary emulation implementation of Shor's algorithm on a ternary quantum computer requires slightly smaller circuit depth than the corresponding implementation in the binary Clifford+T framework. The reason for this are simplifications for binary arithmetic that can be leveraged over ternary gate sets. We also highlight that magic state preparation on MTQC requires magic state preprocessor of asymptotically smaller size which gives the MTQC solution a significant advantage over the binary framework.
Categories: Microsoft

Peer-to-peer in the workplace: A view from the road

Sat, 05/07/2016 - 09:00
This paper contributes to the growing literature on peer-to-peer (P2P) applications through an ethnographic study of auto-rickshaw drivers in Bengaluru, India. We describe how the adoption of a P2P application, Ola, which connects passengers to rickshaws, changes drivers work practices. Ola is part of the ‘peer services’ phenomenon which enable new types of ad-hoc trade in labour, skills and goods. Auto-rickshaw drivers present an interesting case because prior to Ola few had used Smartphones or the Internet. Furthermore, as financially vulnerable workers in the informal sector, concerns about driver welfare become prominent. Whilst technologies may promise to improve livelihoods, they do not necessarily deliver [57]. We describe how Ola does lit-tle to change the uncertainty which characterizes an auto drivers’ day. This leads us to consider how a more equitable and inclusive system might be designed.
Categories: Microsoft

Meerkat and Periscope: I Stream, You Stream, Apps Stream for Live Streams

Sat, 05/07/2016 - 09:00
We conducted a mixed methods study of the use of the Meerkat and Periscope apps for live streaming video and audio broadcasts from a mobile device. We crowdsourced a task to describe the content, setting, and other characteristics of 767 live streams. We also interviewed 20 frequent streamers to explore their motivations and experiences. Together, the data provide a snapshot of early live streaming use practices. We found a diverse range of activities broadcast, which interviewees said were used to build their personal brand. They described live streaming as providing an authentic, unedited view into their lives. They liked how the interaction with viewers shaped the content of their stream. We found some evidence for multiple live streams from the same event, which represent an opportunity for multiple perspectives on events of shared public interest.
Categories: Microsoft

eXTReMe Tracker