Microsoft

The quality of online answers to parents who suspect that their child has an Autism Spectrum Disorder

Microsoft Research Publications - Wed, 05/18/2016 - 09:00
The growing diagnosis and public awareness of Autism Spectrum Disorders (ASD) leads more parents to seek answers to their suspicions for ASD in their child on Internet forums. This study describes an analysis of the quality of content of 371 answers on Yahoo Answers (YA), a social question and answer forum, to parents querying whether their child has ASD. We contrasted the perceived quality of answers by clinicians with that of parents. The study tested the feasibility of automatically assisting parents in selecting answers with higher quality using a predictive model based on the text of answers and the attributes of answerers.
Categories: Microsoft

Towards an Open-Domain Framework for Distilling the Outcomes of Personal Experiences from Social Media Timelines

Microsoft Research Publications - Tue, 05/17/2016 - 09:00
Millions of people share details about their real-world experiences on social media. This provides an opportunity to observe the outcomes of common and critical situations and actions for individual and societal benefit. In this paper, we discuss our efforts to design and build an open-domain framework for mining the outcomes of any given experience from social media timelines. Through a number of example situations and actions across multiple domains, we discuss the kinds of outcomes we are able to extract and their relevance.
Categories: Microsoft

Recommendations meet web browsing: Enhancing Collaborative

Microsoft Research Publications - Mon, 05/16/2016 - 09:00
Collaborative filtering (CF) recommendation systems are one of the most popular and successful methods for recommending products to people. CF systems work by finding similarities between different people according to their past purchases, and using these similarities to suggest possible items of interest. Here we investigate how CF systems can be enhanced using Internet browsing data and search engine query logs, both of which represent a rich profile of individuals’ interests. We introduce two approaches to enhancing user modeling using this data. Our approaches preserve the privacy of individuals while significantly enhancing model accuracy. We present extensive experimentation based on one-class, implicit feedback matrix factorization. We do not assume the existence of explicit ratings, but rather rely on unweighted, positive signals of the kind available in most commercial contexts. We demonstrate the value of our approach on two real datasets each comprising of the activities of tens of thousands of individuals. The first dataset details the downloads of Windows Phone 8 mobile applications and the second - item views in an online retail store. Both datasets are enhanced using anonymized Internet browsing logs. Our results show that prediction accuracy is improved by up to 72%. This improvement is largest when building a model which can predict for the entire catalog of items, not just popular ones. Finally, we discuss additional benefits of our approach, which include: improved recommendations for users with few past purchases, and enabling recommendations based on short-term purchase intent.
Categories: Microsoft

Differences in physical status, mental state and online behavior of people in pro-anorexia web communities

Microsoft Research Publications - Tue, 05/10/2016 - 09:00
Background: There is a debate about the effects of pro-anorexia (colloquially referred to as pro-ana) websites. Research suggests that the effect of these websites is not straightforward. Indeed, the actual function of these sites is disputed, with studies indicating both negative and positive effects. Aim: This is the first study which systematically examined the differences between pro-anorexia web communities in four main aspects: web language used (posts); web interests/search behaviors (queries); users' self-reported weight status and weight goals; and associated self-reported mood/pathology. Methods:We collected three primary sources of data, including messages posed on three pro-anawebsites, a survey completed by over 1000 participants of a pro-ana website, and the searches made on the Bing search engine of pro-anorexia users. These data were analyzed for content, reported demographics and pathology, and behavior over time. Results: Although members of the main pro-ana website investigated appear to be depressed, with high rates of self-harm and suicide attempts, users are significantly more interested in treatment, have wishes of procreation and reported the highest goal weights among the investigated sites. In contrast, users of other pro-ana websites investigated, are more interested in morbid themes including depression, self-harm and suicide. The percentage of severely malnourished website users, in general, appears to be small (20%). Conclusions: Our results indicate that a new strategy is required to facilitate the communication between mental health specialists and pro-ana web users, recognizing the differences in harm associated with different websites.
Categories: Microsoft

Factoring with qutrits: Shor's algorithm on ternary and metaplectic quantum architectures

Microsoft Research Publications - Tue, 05/10/2016 - 09:00
We determine the cost of performing Shor's algorithm for integer factorization on a ternary quantum computer, using two natural models of universal fault tolerant computing on ternary quantum systems: (i) a model based on magic state distillation that assumes the availability of the ternary Clifford gates, projective measurements, classical control and (ii) a model based on a metaplectic topological quantum computer (MTQC). Arguably, a natural choice to implement Shor's algorithm on a ternary quantum computer is to translate the entire arithmetic into a ternary form. However, it is also possible to simply emulate the standard binary version of the algorithm by encoding each qubit in a three level system. In this paper we address this emulation approach and analyze the complexity of implementing Shor's period finding function in both models, (i) and (ii).We compare the costs in terms of magic state counts required in each mode and find that a binary emulation implementation of Shor's algorithm on a ternary quantum computer requires slightly smaller circuit depth than the corresponding implementation in the binary Clifford+T framework. The reason for this are simplifications for binary arithmetic that can be leveraged over ternary gate sets. We also highlight that magic state preparation on MTQC requires magic state preprocessor of asymptotically smaller size which gives the MTQC solution a significant advantage over the binary framework.
Categories: Microsoft

Peer-to-peer in the workplace: A view from the road

Microsoft Research Publications - Sat, 05/07/2016 - 09:00
This paper contributes to the growing literature on peer-to-peer (P2P) applications through an ethnographic study of auto-rickshaw drivers in Bengaluru, India. We describe how the adoption of a P2P application, Ola, which connects passengers to rickshaws, changes drivers work practices. Ola is part of the ‘peer services’ phenomenon which enable new types of ad-hoc trade in labour, skills and goods. Auto-rickshaw drivers present an interesting case because prior to Ola few had used Smartphones or the Internet. Furthermore, as financially vulnerable workers in the informal sector, concerns about driver welfare become prominent. Whilst technologies may promise to improve livelihoods, they do not necessarily deliver [57]. We describe how Ola does lit-tle to change the uncertainty which characterizes an auto drivers’ day. This leads us to consider how a more equitable and inclusive system might be designed.
Categories: Microsoft

Meerkat and Periscope: I Stream, You Stream, Apps Stream for Live Streams

Microsoft Research Publications - Sat, 05/07/2016 - 09:00
We conducted a mixed methods study of the use of the Meerkat and Periscope apps for live streaming video and audio broadcasts from a mobile device. We crowdsourced a task to describe the content, setting, and other characteristics of 767 live streams. We also interviewed 20 frequent streamers to explore their motivations and experiences. Together, the data provide a snapshot of early live streaming use practices. We found a diverse range of activities broadcast, which interviewees said were used to build their personal brand. They described live streaming as providing an authentic, unedited view into their lives. They liked how the interaction with viewers shaped the content of their stream. We found some evidence for multiple live streams from the same event, which represent an opportunity for multiple perspectives on events of shared public interest.
Categories: Microsoft

Making Sense of Temporal Queries with Interactive Visualization

Microsoft Research Publications - Sat, 05/07/2016 - 09:00
As real-time monitoring and analysis become increasingly important, researchers and developers turn to data stream management systems (DSMS’s) for fast, efficient ways to pose temporal queries over their datasets. However, these systems are inherently complex, and even database experts find it difficult to understand the behavior of DSMS queries. To help analysts better understand these temporal queries, we developed StreamTrace, an interactive visualization tool that breaks down how a temporal query processes a given dataset, step-by-step. The design of StreamTrace is based on input from expert DSMS users; we evaluated the system with a lab study of programmers who were new to streaming queries. Results from the study demonstrate that StreamTrace can help users to verify that queries behave as expected and to isolate the regions of a query that may be causing unexpected results.
Categories: Microsoft

Reasoning in Vector Space: An Exploratory Study of Question Answering

Microsoft Research Publications - Mon, 05/02/2016 - 09:00
Question answering tasks have shown remarkable progress with distributed vector representation. In this paper, we investigate the recently proposed Facebook bAbI tasks which consist of twenty different categories of questions that require complex reasoning. Because the previous work on bAbI are all end-to-end models, errors could come from either an imperfect understanding of semantics or in certain steps of the reasoning. For clearer analysis, we propose two vector space models inspired by Tensor Product Representation (TPR) to perform knowledge encoding and logical reasoning based on common-sense inference. They together achieve near-perfect accuracy on all categories including positional reasoning and path finding that have proved difficult for most of the previous approaches. We hypothesize that the difficulties in these categories are due to the multi-relations in contrast to uni-relational characteristic of other categories. Our exploration sheds light on designing more sophisticated dataset and moving one step toward integrating transparent and interpretable formalism of TPR into existing learning paradigms.
Categories: Microsoft

Password Guidance

Microsoft Research Publications - Sun, 05/01/2016 - 09:00
This paper provides Microsoft’s recommendations for password management based on current research and lessons from our own experience as one of the largest Identity Providers (IdPs) in the world. It covers recommendations for end users and identity administrators. Microsoft sees over 10 million username/password pair attacks every day. This gives us a unique vantage point to understand the role of passwords in account takeover. The guidance in this paper is scoped to users of Microsoft’s identity platforms (Azure Active Directory, Active Directory, and Microsoft account) though it generalizes to other platforms.
Categories: Microsoft

Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups

Microsoft Research Publications - Sun, 05/01/2016 - 09:00
We propose a new method for training computationally efficient and compact convolutional neural networks (CNNs) using a novel sparse connection structure that resembles a tree root. Our sparse connection structure facilitates a significant reduction in computational cost and number of parameters of state-of-the-art deep CNNs without compromising accuracy. We validate our approach by using it to train more efficient variants of state-of-the-art CNN architectures, evaluated on the CIFAR10 and ILSVRC datasets. Our results show similar or higher accuracy than the baseline architectures with much less compute, as measured by CPU and GPU timings. For example, for ResNet 50, our model has 40% fewer parameters, 45% fewer floating point operations, and is 31% (12%) faster on a CPU (GPU). For the deeper ResNet 200 our model has 25% fewer floating point operations and 44% fewer parameters, while maintaining state-of-the-art accuracy. For GoogLeNet, our model has 7% fewer parameters and is 21% (16%) faster on a CPU (GPU).
Categories: Microsoft

Measuring Neural Net Robustness with Constraints

Microsoft Research Publications - Sun, 05/01/2016 - 09:00
Despite having high accuracy, neural nets have been shown to be susceptible to adversarial examples, where a small perturbation to an input can cause it to become mislabeled. We propose metrics for measuring the robustness of a neural net and devise a novel algorithm for approximating these metrics based on an encoding of robustness as a linear program. We show how our metrics can be used to evaluate the robustness of deep neural nets with experiments on the MNIST and CIFAR-10 datasets. Our algorithm generates more informative estimates of robustness metrics compared to estimates based on existing algorithms. Furthermore, we show how existing approaches to improving robustness “overfit” to adversarial examples generated using a specific algorithm. Finally, we show that our techniques can be used to additionally improve neural net robustness both according to the metrics that we propose, but also according to previously proposed metrics.
Categories: Microsoft

Things We Own Together: Sharing Possessions at Home

Microsoft Research Publications - Sun, 05/01/2016 - 09:00
Sharing is an important facet of human relationships, yet there is a lack of research on how people share ownership of possessions. This paper reports on a study that investigates shared ownership of physical and digital possessions through interviews with couples and families in 13 households. We offer a more nuanced definition of shared ownership and show that certain practices, which are central to sharing physical objects, are not supported in the sharing of digital content. We suggest potential approaches to address this, focusing in particular on how the sharing of possessions plays a role in the building of relationships and is done against a backdrop of trust.
Categories: Microsoft

Modeling Signals Embedded in a Euclidean Domain

Microsoft Research Publications - Sun, 05/01/2016 - 09:00
Graphs are often used to model signals defined on a set of points embedded in a Euclidean domain. Examples are distributed sensor readings, measures of congestion in a transportation network, samples in a feature space, and colors on a 3D point clouds. However, it may be better to model such signals as samples of a Gaussian Process defined on the Euclidean domain. We show, on a 3D point cloud example, that Karhunen Loeve Transforms (KLTs) based on Gaussian Process models can have significantly higher energy compaction and coding gain than KLTs based on sparse graph models. The latter KLTs are known as Graph Transforms; we call the former Gaussian Process Transforms.
Categories: Microsoft

Functions of Code-Switching in Tweets: An Annotation Scheme and Some Initial Experiments

Microsoft Research Publications - Sun, 05/01/2016 - 09:00
Code-Switching (CS) is very common among multilinguals who switch between two or more languages when communicating or having a dialogue with each other. People have not constrained CS to just spoken form but also have introduced this concept to written text. Due to the popularity of social-media, people have used this platform to perform CS in the text form. This gave rise to the need of computational processing of the code-switched data. In this study, we focus on CS between English and Hindi in the Twitter corpus which is an informal text. With the help of this data, we have done a detailed linguistic study of various aspects of CS. For understanding, processing, and generation of code-switched data, we need annotated code-switched data. Hence, in this paper, we present an annotation scheme for annotating the functions of CS in Hindi-English (Hi-En) code-switched tweets and we also present some initial experiments. In this effort, we are focussing on CS in text data from social-media whereas earlier studies have focused on CS in spoken data from a small number of speakers.
Categories: Microsoft

Case The Bones of the System: A Study of Logging and Telemetry at Microsoft

Microsoft Research Publications - Sun, 05/01/2016 - 09:00
Large software organizations are transitioning to event data platforms as they culturally shift to better support data-driven decision making. This paper offers a case study at Microsoft during such a transition. Through qualitative interviews of 28 participants, and a quantitative survey of 1,823 respondents, we catalog a diverse set of activities that leverage event data sources, identify challenges in conducting these activities, and describe tensions that emerge in data-driven cultures as event data flow through these activities within the organization. We find that the use of event data span every job role in our interviews and survey, that different perspectives on event data create tensions between roles or teams, and that professionals report social and technical challenges across activities.
Categories: Microsoft

Automated Synthesis and Analysis of Switching Gene Regulatory Networks

Microsoft Research Publications - Sun, 05/01/2016 - 09:00
Studying the gene regulatory networks (GRNs) that govern how cells change into specific cell types with unique roles throughout development is an active area of experimental research. The fate specification process can be viewed as a program prescribing the system dynamics, governed by a network of genetic interactions. To investigate the possibility that GRNs are not fixed but rather change their topology, for example as cells progress through commitment, we introduce the concept of Switching Gene Regulatory Networks (SGRNs) to enable the modelling and analysis of network reconfiguration. We define the synthesis problem of constructing SGRNs that are guaranteed to satisfy a set of constraints representing experimental observations of cell behaviour. We propose a solution to this problem that employs methods based upon Satisfiability Modulo Theories (SMT) solvers, and evaluate the feasibility and scalability of our approach by considering a set of synthetic benchmarks exhibiting possible biological behaviour of cell development. We outline how our approach is applied to a more realistic biological system, by considering a simplified network involved in the processes of neuron maturation and fate specification in the mammalian cortex.
Categories: Microsoft
Syndicate content

eXTReMe Tracker