Blogroll

Jointly Modeling Embedding and Translation to Bridge Video and Language

Microsoft Research Publications - Wed, 06/01/2016 - 09:00
Automatically describing video content with natural language is a fundamental challenge of computer vision. Recurrent Neural Networks (RNNs), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with the given previous words and the visual content, while the relationship between sentence semantics and visual content is not holistically exploited. As a result, the generated sentences may be contextually correct but the semantics (e.g., subjects, verbs or objects) are not true. This paper presents a novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visual-semantic embedding. The former aims to locally maximize the probability of generating the next word given previous words and visual content, while the latter is to create a visual-semantic embedding space for enforcing the relationship between the semantics of the entire sentence and visual content. The experiments on YouTube2Text dataset show that our proposed LSTM-E achieves to-date the best published performance in generating natural sentences: 45.3% and 31.0% in terms of BLEU@4 and METEOR, respectively. Superior performances are also reported on two movie description datasets (M-VAD and MPII-MD). In addition, we demonstrate that LSTM-E outperforms several state-of-the-art techniques in predicting Subject-Verb-Object (SVO) triplets.
Categories: Microsoft

Parameter Estimation for Generalized Thurstone Choice Models

Microsoft Research Publications - Wed, 06/01/2016 - 09:00
We consider the maximum likelihood parameter estimation problem for a generalized Thurstone choice model, where choices are from comparison sets of two or more items. We provide tight characterizations of the mean square error, as well as necessary and sufficient conditions for correct classification when each item belongs to one of two classes. These results provide insights into how the estimation accuracy depends on the choice of a generalized Thurstone choice model and the structure of comparison sets. We find that for a priori unbiased structures of comparisons, e.g., when comparison sets are drawn independently and uniformly at random, the number of observations needed to achieve a prescribed estimation accuracy depends on the choice of a generalized Thurstone choice model. For a broad set of generalized Thurstone choice models, which includes all popular instances used in practice, the estimation error is shown to be largely insensitive to the cardinality of comparison sets. On the other hand, we found that there exist generalized Thurstone choice models for which the estimation error decreases much faster with the cardinality of comparison sets. We report results of empirical evaluations using schedules of contests as observed in some popular sport competitions and online crowdsourcing systems.
Categories: Microsoft

MSR-VTT: A Large Video Description Dataset for Bridging Video and Language

Microsoft Research Publications - Wed, 06/01/2016 - 09:00
While there has been increasing interest in the task of describing video with natural language, current computer vision algorithms are still severely limited in terms of the variability and complexity of the videos and their associated language that they can recognize. This is in part due to the simplicity of current benchmarks, which mostly focus on specific fine-grained domains with limited videos and simple descriptions. While researchers have provided several benchmark datasets for image captioning, we are not aware of any large-scale video description dataset with comprehensive categories yet diverse video content. In this paper we present MSR-VTT (standing for “MSR Video to Text”) which is a new large-scale video benchmark for video understanding, especially the emerging task of translating video to text. This is achieved by collecting 257 popular queries from a commercial video search engine, with 118 videos for each query. In its current version, MSR-VTT provides 10K web video clips with 41.2 hours and 200K clip-sentence pairs in total, covering the most comprehensive categories and diverse visual content, and representing the largest dataset in terms of sentence and vocabulary. Each clip is annotated with about 20 natural sentences by 1,327 AMT workers. We present a detailed analysis of MSR-VTT in comparison to a complete set of existing datasets, together with a summarization of different state-of-the-art video-to-text approaches. We also provide an extensive evaluation of these approaches on this dataset, showing that the hybrid Recurrent Neural Networkbased approach, which combines single-frame and motion representations with soft-attention pooling strategy, yields the best generalization capability on MSR-VTT.
Categories: Microsoft

Lower Runtime Bounds for Integer Programs

Microsoft Research Publications - Wed, 06/01/2016 - 09:00
We present a technique to infer lower bounds on the worst-case runtime complexity of integer programs. To this end, we construct symbolic representations of program executions using a framework for iterative, under-approximating program simplification. The core of this simplification is a method for (under-approximating) program acceleration based on recurrence solving and a variation of ranking functions. Afterwards, we deduce asymptotic lower bounds from the resulting simplified programs. We implemented our technique in our tool LoAT and showthat it infers non-trivial lower bounds for a large number of examples.
Categories: Microsoft

The Global Patch Collider

Microsoft Research Publications - Wed, 06/01/2016 - 09:00
This paper proposes a novel extremely efficient, fully-parallelizable, task-specific algorithm for the computation of global point-wise correspondences in images and videos. Our algorithm, the Global Patch Collider, is based on detecting unique collisions between image points using a collection of learned tree structures that act as conditional hash functions. In contrast to conventional approaches that rely on pairwise distance computation, our algorithm isolates distinctive pixel pairs that hit the same leaf during traversal through multiple learned tree structures. The split functions stored at the intermediate nodes of the trees are trained to ensure that only visually similar patches or their geometric or photometric transformed versions fall into the same leaf node. The matching process involves passing all pixel positions in the images under analysis through the tree structures. We then compute matches by isolating points that uniquely collide with each other ie. fell in the same empty leaf in multiple trees. Our algorithm is linear in the number of pixels but can be made constant time on a parallel computation architecture as the tree traversal for individual image points is decoupled. We demonstrate the efficacy of our method by using it to perform optical flow matching and stereo matching on some challenging benchmarks. Experimental results show that not only is our method extremely computationally efficient, but it is also able to match or outperform state of the art methods that are much more complex.
Categories: Microsoft

Rich Image Captioning in the Wild

Microsoft Research Publications - Wed, 06/01/2016 - 09:00
We present an image caption system that addresses new challenges of automatically describing images in the wild. The challenges include high quality caption quality with respect to human judgments, out-of-domain data handling, and low latency required in many applications. Built on top of a state-of-the-art framework, we developed a deep vision model that detects a broad range of visual concepts, an entity recognition model that identifies celebrities and landmarks, and a confidence model for the caption output. Experimental results show that our caption engine outperforms previous state-of-the-art systems significantly on both in-domain dataset (i.e. MS COCO) and out-of-domain datasets.
Categories: Microsoft

Large Scale and Streaming Time Series Segmentation and Piece-Wise Approximation: Extended Version

Microsoft Research Publications - Sat, 05/28/2016 - 09:00
Segmenting a time series or approximating it with piecewise linear function is often needed when handling data in the time domain to detect outliers, clean data, detect events and more. The data varies from ECG signals, traffic monitors to stock prices and sensor networks. Modern data-sets of this type are large and in many cases are infinite in the sense that the data is a stream rather than a finite sample. Therefore, in order to segment it, an algorithm has to scale gracefully with the size of the data. Dynamic Programming (DP) can findthe optimal segmentation, however, the DP approach has a complexity of O T 2 thus cannot handle datasets with millions of elements, nor can it handle streaming data. Therefore, various heuristics are used in practice to handle the data. This study shows that if the approximation measure has an inverse triangle inequality property (ITIP), the solution of the dynamic program can be computed in linear time and streaming data can be handled too. The ITIP is shown to hold in many cases of interest. The speedup due to the new algorithms is evaluated on a variety of data-sets to be in the range of 8 8200x over the DP solution without sacrificing accuracy. Confidence intervals for segmentations are derived as well.
Categories: Microsoft

FourQNEON: Faster Elliptic Curve Scalar Multiplications on ARM Processors

Microsoft Research Publications - Thu, 05/26/2016 - 09:00
We present a high speed, high security implementation of the recently proposed elliptic curve FourQ (ASIACRYPT 2015) for 32-bit ARM processors with NEON support. Exploiting the versatile and compact arithmetic of this curve, we design a vectorized implementation that achieves high-performance across a large variety of ARM architectures. Our software is fully protected against timing and cache attacks, and showcases the impressive speed of FourQ when compared with other curve-based alternatives. For example, one single variable-base scalar multiplication is computed in about 235,000 Cortex-A8 cycles or 132,000 Cortex-A15 cycles which, compared to the fastest genus 2 Kummer and Curve25519 implementations on the same platforms, offers speedups between 1.3x-1.7x and between 2.1x-2.4x, respectively. In comparison with the NIST standard curve K-283, we achieve speedups above 4x and 5.5x.
Categories: Microsoft

Biggish: A solution for the inverse privacy problem

Microsoft Research Publications - Tue, 05/24/2016 - 09:00
An item of your personal information is inversely private if some party has access to it but you do not. Inverse privacy is ubiquitous. Each interaction you have with commercial and other institutions generates inversely private data. The inverse privacy problem is unjustified inaccessibility of your inversely private data to you. Elsewhere a subset of these authors determined that the problem has a market-based solution that provides consumers with large amounts of their personal data to be mined and processed to benefit them. Here we sketch a particular solution.
Categories: Microsoft

E-TIPSY: Search Query Corpus Annotated with Entities, Term Importance, POS Tags, and Syntactic Parses

Microsoft Research Publications - Tue, 05/24/2016 - 09:00
We present E-TIPSY, a search query corpus annotated with named Entities, Term Importance, POS tags, and SYntactic parses. This corpus contains crowdsourced (gold) annotations of the three most important terms in each query. In addition, it contains automatically produced annotations of named entities, part-of-speech tags, and syntactic parses for the same queries. This corpus comes in two formats: (1) Sober Subset: annotations that two or more crowd workers agreed upon, and (2) Full Glass: all annotations. We analyze the strikingly low correlation between term importance and syntactic headedness, which invites research into effective ways of combining these different signals. Our corpus can serve as a benchmark for term importance methods aimed at improving search engine quality and as an initial step toward developing a dataset of gold linguistic analysis of web search queries. In addition, it can be used as a basis for linguistic inquiries into the kind of expressions used in search.
Categories: Microsoft

Speeding up the Number Theoretic Transform for Faster Ideal Lattice-Based Cryptography

Microsoft Research Publications - Mon, 05/23/2016 - 09:00
The Number Theoretic Transform (NTT) provides efficient algorithms for cyclic and nega-cyclic convolutions, which have many applications in computer arithmetic, e.g., for multiplying large integers and large degree polynomials. It is commonly used in cryptographic schemes that are based on the hardness of the Ring Learning With Errors (R-LWE) problem to efficiently implement modular polynomial multiplication. We present a new modular reduction technique that is tailored for the special moduli required by the NTT. Based on this reduction, we speed up the NTT and propose faster, multi-purpose algorithms. We present two implementations of these algorithms: a portable C implementation and a high-speed implementation using assembly with AVX2 instructions. In comparison to state-of-the-art implementations of the NTT using the same parameters, our C and assembly implementations achieve factor-1.90 and factor-1.25 speedups, respectively. To demonstrate the improved efficiency in an application example, we benchmarked the algorithms in the context of the R-LWE key exchange protocol that has recently been proposed by Alkim, Ducas, Poppelmann and Schwabe. In this case, our C and assembly implementations compute the full key exchange 1.40 and 1.34 times faster, respectively. These results are achieved with full protection against timing attacks.
Categories: Microsoft

Information Dissemination in Heterogeneous-Intent Networks

Microsoft Research Publications - Sun, 05/22/2016 - 09:00
Individual’s motivations for participating in social networks can vary greatly, and this heterogeneity is reflected in the wide-variety of behaviors and communication styles we see in large social networks. In this paper, we present a first study of the basic properties of communication in a heterogeneous-intent network and their implications. First, through both user studies and large-scale data analyses, we validate that individual’s specific intents affect their style of communication and their selection of messages. Secondly, we propose a simple structural model of information diffusion within a heterogeneous-intent network and present the results of experiments exploring its emergent properties, finding that it recreates aspects of empirically observed cascades that are not captured in previous models.
Categories: Microsoft

Training A Quantum Optimizer

Microsoft Research Publications - Wed, 05/18/2016 - 09:00
We study a variant of the quantum approximate optimization algorithm [ E. Farhi, J. Goldstone, and S. Gutmann, arXiv:1411.4028] with slightly different parametrization and different objective: rather than looking for a state which approximately solves an optimization problem, our goal is to find a quantum algorithm that, given an instance of MAX-2-SAT, will produce a state with high overlap with the optimal state. Using a machine learning approach, we chose a ``training set" of instances and optimized the parameters to produce large overlap for the training set. We then tested these optimized parameters on a larger instance set. As a training set, we used a subset of the hard instances studied by E. Crosson, E. Farhi, C. Yen-Yu Lin, H.-H. Lin, and P. Shor (CFLLS) [arXiv:1401.7320]. When tested on the full set, the parameters that we find produce significantly larger overlap than the optimized annealing times of CFLLS. Testing on other random instances from 20 to 28 bits continues to show improvement over annealing, with the improvement being most notable on the hardest instances. Further tests on instances of MAX-3-SAT also showed improvement on the hardest instances. This algorithm may be a possible application for near-term quantum computers with limited coherence times.
Categories: Microsoft

The quality of online answers to parents who suspect that their child has an Autism Spectrum Disorder

Microsoft Research Publications - Wed, 05/18/2016 - 09:00
The growing diagnosis and public awareness of Autism Spectrum Disorders (ASD) leads more parents to seek answers to their suspicions for ASD in their child on Internet forums. This study describes an analysis of the quality of content of 371 answers on Yahoo Answers (YA), a social question and answer forum, to parents querying whether their child has ASD. We contrasted the perceived quality of answers by clinicians with that of parents. The study tested the feasibility of automatically assisting parents in selecting answers with higher quality using a predictive model based on the text of answers and the attributes of answerers.
Categories: Microsoft

Towards an Open-Domain Framework for Distilling the Outcomes of Personal Experiences from Social Media Timelines

Microsoft Research Publications - Tue, 05/17/2016 - 09:00
Millions of people share details about their real-world experiences on social media. This provides an opportunity to observe the outcomes of common and critical situations and actions for individual and societal benefit. In this paper, we discuss our efforts to design and build an open-domain framework for mining the outcomes of any given experience from social media timelines. Through a number of example situations and actions across multiple domains, we discuss the kinds of outcomes we are able to extract and their relevance.
Categories: Microsoft

Recommendations meet web browsing: Enhancing Collaborative

Microsoft Research Publications - Mon, 05/16/2016 - 09:00
Collaborative filtering (CF) recommendation systems are one of the most popular and successful methods for recommending products to people. CF systems work by finding similarities between different people according to their past purchases, and using these similarities to suggest possible items of interest. Here we investigate how CF systems can be enhanced using Internet browsing data and search engine query logs, both of which represent a rich profile of individuals’ interests. We introduce two approaches to enhancing user modeling using this data. Our approaches preserve the privacy of individuals while significantly enhancing model accuracy. We present extensive experimentation based on one-class, implicit feedback matrix factorization. We do not assume the existence of explicit ratings, but rather rely on unweighted, positive signals of the kind available in most commercial contexts. We demonstrate the value of our approach on two real datasets each comprising of the activities of tens of thousands of individuals. The first dataset details the downloads of Windows Phone 8 mobile applications and the second - item views in an online retail store. Both datasets are enhanced using anonymized Internet browsing logs. Our results show that prediction accuracy is improved by up to 72%. This improvement is largest when building a model which can predict for the entire catalog of items, not just popular ones. Finally, we discuss additional benefits of our approach, which include: improved recommendations for users with few past purchases, and enabling recommendations based on short-term purchase intent.
Categories: Microsoft

Differences in physical status, mental state and online behavior of people in pro-anorexia web communities

Microsoft Research Publications - Tue, 05/10/2016 - 09:00
Background: There is a debate about the effects of pro-anorexia (colloquially referred to as pro-ana) websites. Research suggests that the effect of these websites is not straightforward. Indeed, the actual function of these sites is disputed, with studies indicating both negative and positive effects. Aim: This is the first study which systematically examined the differences between pro-anorexia web communities in four main aspects: web language used (posts); web interests/search behaviors (queries); users' self-reported weight status and weight goals; and associated self-reported mood/pathology. Methods:We collected three primary sources of data, including messages posed on three pro-anawebsites, a survey completed by over 1000 participants of a pro-ana website, and the searches made on the Bing search engine of pro-anorexia users. These data were analyzed for content, reported demographics and pathology, and behavior over time. Results: Although members of the main pro-ana website investigated appear to be depressed, with high rates of self-harm and suicide attempts, users are significantly more interested in treatment, have wishes of procreation and reported the highest goal weights among the investigated sites. In contrast, users of other pro-ana websites investigated, are more interested in morbid themes including depression, self-harm and suicide. The percentage of severely malnourished website users, in general, appears to be small (20%). Conclusions: Our results indicate that a new strategy is required to facilitate the communication between mental health specialists and pro-ana web users, recognizing the differences in harm associated with different websites.
Categories: Microsoft

Factoring with qutrits: Shor's algorithm on ternary and metaplectic quantum architectures

Microsoft Research Publications - Tue, 05/10/2016 - 09:00
We determine the cost of performing Shor's algorithm for integer factorization on a ternary quantum computer, using two natural models of universal fault tolerant computing on ternary quantum systems: (i) a model based on magic state distillation that assumes the availability of the ternary Clifford gates, projective measurements, classical control and (ii) a model based on a metaplectic topological quantum computer (MTQC). Arguably, a natural choice to implement Shor's algorithm on a ternary quantum computer is to translate the entire arithmetic into a ternary form. However, it is also possible to simply emulate the standard binary version of the algorithm by encoding each qubit in a three level system. In this paper we address this emulation approach and analyze the complexity of implementing Shor's period finding function in both models, (i) and (ii).We compare the costs in terms of magic state counts required in each mode and find that a binary emulation implementation of Shor's algorithm on a ternary quantum computer requires slightly smaller circuit depth than the corresponding implementation in the binary Clifford+T framework. The reason for this are simplifications for binary arithmetic that can be leveraged over ternary gate sets. We also highlight that magic state preparation on MTQC requires magic state preprocessor of asymptotically smaller size which gives the MTQC solution a significant advantage over the binary framework.
Categories: Microsoft
Syndicate content

eXTReMe Tracker