Microsoft

Gated Graph Sequence Neural Networks

Microsoft Research Publications - Sun, 05/01/2016 - 08:00

Graph-structured data appears frequently in domains including chemistry, natural language semantics, social networks, and knowledge bases. In this work, we study feature learning techniques for graph-structured inputs. Our starting point is previous work on Graph Neural Networks (Scarselli et al., 2009), which we modify to use gated recurrent units and modern optimization techniques and then extend to output sequences. The result is a flexible and broadly useful class of neural network models that has favorable inductive biases relative to purely sequence-based models (e.g., LSTMs) when the problem is graph-structured. We demonstrate the capabilities on some simple AI (bAbI) and graph algorithm learning tasks. We then show it achieves state-of-the-art performance on a problem from program verification, in which subgraphs need to be matched to abstract data structures.

Categories: Microsoft

Finding Email in a Multi-Account, Multi-Device World

Microsoft Research Publications - Sun, 05/01/2016 - 08:00

Email is far from dead; in fact the volume of messages exchanged daily, the number of accounts per user, and the number of devices on which email is accessed have been constantly “Email growing. Most previous studies on email have focused on management and retrieval behaviour within a single account and on a single device. In this paper, we examine how people retrieve email in today’s ecosystem through an in-depth qualitative diary study with 16 participants. We found that personal and work accounts are managed differently, resulting in diverse retrieval strategies: while work accounts are more structured and thus email is retrieved through folders, personal accounts have fewer folders and users rely primarily on the built-in search option. Moreover, retrieval occurs primarily on laptops and PCs compared to smartphones. We explore the reasons, and uncover barriers and workarounds related to managing multiple accounts and devices. Finally, we consider new design possibilities for email clients to better support how email is used today.

Categories: Microsoft

The Big Distraction: The Impact of Popular TV on Online Retail Sales

Microsoft Research Publications - Sun, 05/01/2016 - 08:00

Timing online auctions to attract a large number of prospective buyers is important for sellers. This study examines whether online auction sellers need to account for exogenous effects like TV viewing when timing and predicting their auction results. An ongoing debate questions whether TV viewers can spread their attention across multiple devices while watching TV, for example, by concurrently shopping online or posting on social media. Recent research has focused on understanding cross-media effects; however, little attention has been given to TV viewership’s relationship with a very important economic activity, namely participation in online auctions. We examine this potential cross-media effect by analyzing the four-year sales history of a German online auction platform and addressing potential endogeneity problems with an instrumental variable approach. We use three different instrumental variables that have different advantages and disadvantages but can, in sum, be used for triangulation as they lead to the same result. The analyses reveal a significant negative cross media effect between TV consumption and online auction sales, indicating that TV consumption and online auction sales might compete for the scarce attention of consumers and are thus substitutes for each other rather than complements.

Categories: Microsoft

Complete addition formulas for prime order elliptic curves

Microsoft Research Publications - Sun, 05/01/2016 - 08:00

An elliptic curve addition law is said to be complete if it correctly computes the sum of any two points in the elliptic curve group. One of the main reasons for the increased popularity of Edwards curves in the ECG community is that they can allow a complete group law that is also relatively efficient (e.g., when compared to all known addition laws on Edwards curves). Such complete addition formulas can simplify the task of an ECG implementer and, at the same time, can greatly reduce the potential vulnerabilities of a cryptosystem. Unfortunately, until now, complete addition laws that are relatively efficient have only been proposed on curves of composite order and have thus been incompatible with all of the currently standardized prime order curves. In this paper we present optimized addition formulas that are complete on every prime order short Weierstrass curve defined over a field k with char(k) not 2 or 3. Compared to their incomplete counterparts, these formulas require a larger number of field additions, but interestingly require fewer field multiplications. We discuss how these formulas can be used to achieve secure, exception-free implementations on all of the prime order curves in the NIST (and many other) standards.

Categories: Microsoft

Tattio: Fabrication of Aesthetic and Functional Temporary Tattoos

Microsoft Research Publications - Sun, 05/01/2016 - 08:00

We present Tattio, a fabrication process that draws from current body decoration processes (i.e., jewelry like metallic temporary tattoos) for the creation of on skin technology. The fabrication process generates functional components such as NFC tags, circuitry, to Thermochromic tattoos, while maintaining the aesthetics and user experience of existing metallic temporary tattoos. The fabrication process is low cost, accessible and customizable; we seek to enable individuals to design, make, and wear their own skin technology creations. We present the fabrication flow, fabricated components, and also an initial user study probing perceptions towards wearing Tattio circuitry.

Categories: Microsoft

The Social Comfort of Wearable Light Therapy for Seasonal Affective Disorder

Microsoft Research Publications - Sun, 05/01/2016 - 08:00

We explored the social comfort and user experience of wearable form factors as a portable option for Bright Light Therapy (BLT). BLT remains the predominant therapy for Seasonal Affective Disorder despite a non-compliance rate of ~70% commonly attributed to the inconvenience of prolonged daily sitting in front of light boxes. To date, attempts to address convenience using wearable/portable light treatment options have been met with limited success for nuanced reasons (i.e., stigma, efficacy, etc.). In an effort to more substantively explore factors related to wearability, convenience, and contextual appropriateness/acceptability of on-body light therapy usage, we developed and evaluated six fashion-aligned wearable therapy prototypes leveraging light-emitting materials and low-profile hardware. Our results showed that participants preferred more mainstream and convenient form factors (e.g., glasses, golfer’s hat, scarf), were open to wearing their BLT in certain public and private locations, and appreciated device duality and the fashionable potential of treatment (to counter stigma).

Categories: Microsoft

"What Went Right and What Went Wrong": An Analysis of 155 Postmortems from Game Development

Microsoft Research Publications - Sun, 05/01/2016 - 08:00

In game development, software teams often conduct postmortems to reflect on what went well and what went wrong in a project. The postmortems are shared publicly on gaming sites or at developer conferences. In this paper, we present an analysis of 155 postmortems published on the gaming site Gamasutra.com. We identify characteristics of game development, link the characteristics to positive and negative experiences in the postmortems and distill a set of best practices and pitfalls for game development.

Categories: Microsoft

Beliefs, Practices, and Personalities of Software Engineers: A Survey in a Large Software Company

Microsoft Research Publications - Sun, 05/01/2016 - 08:00

In this paper we present the results from a survey about the beliefs, practices, and personalities of software engineers in a large software company. The survey received 797 responses. We report statistics about beliefs of software engineers, their work practices, as well as differences in those with respect to personality traits. For example, we observed no personality differences between developers and testers; managers were conscientious and more extraverted. We observed several differences for engineers who are listening to music and for engineers who have built a tool. We also observed that engineers who agree with the statement “Agile development is awesome” were more extroverted and less neurotic.

Categories: Microsoft

Belief & Evidence in Empirical Software Engineering

Microsoft Research Publications - Sun, 05/01/2016 - 08:00

Empirical software engineering has produced a steady stream of evidence-based results concerning the factors that affect important outcomes such as cost, quality, and interval. However, programmers often also have strongly-held a priori opinions about these issues. These opinions are important, since developers are highlytrained professionals whose beliefs would doubtless affect their practice. As in evidence-based medicine, disseminating empirical findings to developers is a key step in ensuring that the findings impact practice. In this paper, we describe a case study, on the prior beliefs of developers at Microsoft, and the relationship of these beliefs to actual empirical data on the projects in which these developers work. Our findings are that a) programmers do indeed have very strong beliefs on certain topics b) their beliefs are primarily formed based on personal experience, rather than on findings in empirical research and c) beliefs can vary with each project, but do not necessarily correspond with actual evidence in that project. Our findings suggest that more effort should be taken to disseminate empirical findings to developers and that more in-depth study the interplay of belief and evidence in software practice is needed.

Categories: Microsoft

The Emerging Role of Data Scientists on Software Development Teams

Microsoft Research Publications - Sun, 05/01/2016 - 08:00

Creating and running software produces large amounts of raw data about the development process and the customer usage, which can be turned into actionable insight with the help of skilled data scientists. Unfortunately, data scientists with the analytical and software engineering skills to analyze these large data sets have been hard to come by; only recently have software companies started to develop competencies in software-oriented data analytics. To understand this emerging role, we interviewed data scientists across several product groups at Microsoft. In this paper, we describe their education and training background, their missions in software engineering contexts, and the type of problems on which they work. We identify five distinct working styles of data scientists: (1) Insight Providers, who work with engineers to collect the data needed to inform decisions that managers make; (2) Modeling Specialists, who use their machine learning expertise to build predictive models; (3) Platform Builders, who create data platforms, balancing both engineering and data analysis concerns; (4) Polymaths, who do all data science activities themselves; and (5) Team Leaders, who run teams of data scientists and spread best practices. We further describe a set of strategies that they employ to increase the impact and actionability of their work.

Categories: Microsoft

RETracer: Triaging Crashes by Reverse Execution from Partial Memory Dumps

Microsoft Research Publications - Sun, 05/01/2016 - 08:00

Many software providers operate crash reporting services to automatically collect crashes from millions of customers and file bug reports. Precisely triaging crashes is necessary and important for software providers because the millions of crashes that may be reported every day are critical in identifying high impact bugs. However, the triaging accuracy of existing systems is limited, as they rely only on the syntactic information of the stack trace at the moment of a crash without analyzing program semantics. In this paper, we present RETracer, the first system to triage software crashes based on program semantics reconstructed from memory dumps. RETracer was designed to meet the requirements of large-scale crash reporting services. RETracer performs binarylevel backward taint analysis without a recorded execution trace to understand how functions on the stack contribute to the crash. The main challenge is that the machine state at an earlier time cannot be recovered completely from a memory dump, since most instructions are information destroying. We have implemented RETracer for x86 and x86-64 native code, and compared it with the existing crash triaging tool used by Microsoft. We found that RETracer eliminates two thirds of triage errors based on a manual analysis of 140 bugs fixed in Microsoft Windows and Office. RETracer has been deployed as the main crash triaging system on Microsoft’s crash reporting service.

Categories: Microsoft

Robust and Efficient Multiple Alignment of Unsynchronized Meeting Recordings

Microsoft Research Publications - Sun, 05/01/2016 - 08:00

This paper proposes a way to generate a single high-quality audio recording of a meeting using no equipment other than participants’ personal devices. Each participant in the meeting uses their mobile device as a local recording node, and they begin recording whenever they arrive in an unsynchronized fashion. The main problem in generating a single summary recording is to temporally align the various audio recordings in a robust and efficient manner. We propose a way to do this using an adaptive audio fingerprint based on spectrotemporal eigenfilters, where the fingerprint design is learned on-the-fly in a totally unsupervised way to perform well on the data at hand. The adaptive fingerprints require only a few seconds of data to learn a robust design, and they require no tuning. Our method uses an iterative, greedy two-stage alignment algorithm which finds a rough alignment using indexing techniques, and then performs a more fine-grained alignment based on Hamming distance. Our proposed system achieves >99% alignment accuracy on challenging alignment scenarios extracted from the ICSI meeting corpus, and it outperforms five other well-known and state-ofthe-art fingerprint designs. We conduct extensive analyses of the factors that affect the robustness of the adaptive fingerprints, and we provide a simple heuristic that can be used to adjust the fingerprint’s robustness according to the amount of computation we are willing to perform.

Categories: Microsoft

LatticeCrypto

Microsoft Research Downloads - Fri, 04/29/2016 - 05:45

LatticeCrypto is a high-performance and portable software library that implements lattice-based cryptographic algorithms. The first release of the library provides an implementation of lattice-based key exchange with security based on the Ring Learning With Errors (R-LWE) problem using new algorithms for the underlying Number Theoretic Transform (NTT). The chosen parameters provide at least 128 bits of security against attackers running classical and quantum computers.

Categories: Microsoft

SIDH Library

Microsoft Research Downloads - Wed, 04/27/2016 - 03:35

SIDH is a fast and portable software library that implements a new suite of algorithms for Supersingular Isogeny Diffie-Hellman (SIDH) key exchange. The chosen parameters aim to provide 128 bits of security against attackers running a large-scale quantum computer, and 192 bits of security against classical algorithms. SIDH has the option of a hybrid key exchange that combines supersingular isogeny Diffie-Hellman with a high-security classical elliptic curve Diffie-Hellman key exchange at a small overhead.

Categories: Microsoft

Program for TPC-H Data Generation with Skew

Microsoft Research Downloads - Wed, 04/27/2016 - 00:00

The schema and queries of the TPC-H (formerly TPC-D) benchmark are widely used by people in the database community. One of the requirements of the benchmark is that data for columns in the database are generated from a uniform distribution. However, this requirement makes it hard for users to conclude about the robustness/effectiveness of their system since real world data distributions are often non-uniform. We have therefore created a new data generation program for TPC-H that is capable of generating a database where the columns have non-uniform (skewed) data distributions. In particular, the program can generate data from a Zipfian distribution, where the Zipf value (z), which controls the degree of skew in the data, is a parameter that can be specified to the program. In addition, the program allows the generation of a database with “mixed” data distribution where the skew of a column in the database is randomly chosen from the Zipfian values {0,1,2,3,4}. Note that the total number of rows in the tables and the total database size are not affected by our changes.

Categories: Microsoft

RIoT - A Foundation for Trust in the Internet of Things

Microsoft Research Publications - Thu, 04/21/2016 - 08:00

RIoT (Robust Internet-of-Things) is an architecture for providing foundational trust services to computing devices. The trust services include device identity, sealing, attestation, and data integrity. The term “Robust” is used because the minimal trusted computing base is tiny, and because RIoT capabilities can remotely re-establish trust in devices that have been compromised by malware. The term IoT is used because these services can be provided at low cost on even the tiniest of devices.

Categories: Microsoft

Diverse Algebra Word Problem Dataset with Derivation Annotations

Microsoft Research Downloads - Tue, 04/19/2016 - 23:14

This dataset provides training and testing examples for solving algebra word problems automatically. In addition to have 1000 completely new problems, we augmented the data by annotating the full derivations (template + alignments) for each word problem. We also performed cross-dataset cleaning across all datasets, so that the template annotation across different sets are unified. The instances are coming from the following resources: (1) 1000 new training/testing data with diverse templates and narratives crawled from algebra.com. (Our contribution) (2) http://groups.csail.mit.edu/rbg/code/wordprobs/ (3) http://research.microsoft.com/en-us/projects/dolphin/ (their dataset contains non-linear problems. We took a subset of the problems that are linear there) . Please cite the corresponding report (and the original papers of the other datasets) if you found the dataset useful. We are aware that some of the problems might contain annotation error.

Categories: Microsoft

Ripples of mediatization: Social media and the exposure of the pool interview

Microsoft Research Publications - Sun, 04/17/2016 - 08:00

During the 2011 UK public sector protests, controversy ignited over the “Miliband Loop”, an unedited video from a pool interview showing Labour leader Ed Miliband to have provided largely the same answer in response to six questions. The interviewer subsequently complained in a TwitLonger that the incident epitomized the clash of public relations and journalism. In this paper we unpack the practical production of the pool interview as a delamination of the interview-as-lived from the interview-as-media-production-mechanism. We then explore professional and public understanding (or lack thereof) of exposure of this delamination issue and its relation to politics. While the controversy did not directly affect Miliband׳s position as leader, it is clear that the Internet is a dangerous place for the old rules of mediatization.

Categories: Microsoft

Table Cell Search for Question Answering

Microsoft Research Publications - Mon, 04/11/2016 - 08:00

Tables are pervasive on the Web. Informative web tables range across a large variety of topics, which can naturally serve as a significant resource to satisfy user information needs. Driven by such observations, in this paper, we investigate an important yet largely under-addressed problem: Given millions of tables, how to precisely retrieve table cells to answer a user question. This work proposes a novel table cell search framework to attack this problem. We first formulate the concept of a relational chain which connects two cells in a table and represents the semantic relation between them. With the help of search engine snippets, our framework generates a set of relational chains pointing to potentially correct answer cells. We further employ deep neural networks to conduct more fine-grained inference on which relational chains best match the input question and finally extract the corresponding answer cells. Based on millions of tables crawled from the Web, we evaluate our framework in the open-domain question answering (QA) setting, using both the well-known WebQuestions dataset and user queries mined from Bing search engine logs. On WebQuestions, our framework is comparable to state-of-the-art QA systems based on knowledge bases (KBs), while on Bing queries, it outperforms other systems with a 56.7% relative gain. Moreover, when combined with results from our framework, KB-based QA performance can obtain a relative improvement of 28.1% to 66.7%, demonstrating that web tables supply rich knowledge that might not exist or is difficult to be identified in existing KBs.

Categories: Microsoft

Improving Document Ranking with Dual Word Embeddings

Microsoft Research Publications - Mon, 04/11/2016 - 08:00

This paper investigates the popular neural word embedding method Word2vec as a source of evidence in document ranking. In contrast to NLP applications of word2vec, which tend to use only the input embeddings, we retain both the input and the output embeddings, allowing us to calculate a different word similarity that may be more suitable for document ranking. We map the query words into the input space and the document words into the output space, and compute a relevance score by aggregating the cosine similarities across all the query-document word pairs. We postulate that the proposed Dual Embedding Space Model (DESM) provides evidence that a document is about a query term, in addition to and complementing the traditional term frequency based approach.

Categories: Microsoft

Navigation

Geo Tracker

User login

Microsoft

Gated Graph Sequence Neural Networks

Finding Email in a Multi-Account, Multi-Device World

The Big Distraction: The Impact of Popular TV on Online Retail Sales

Complete addition formulas for prime order elliptic curves

Tattio: Fabrication of Aesthetic and Functional Temporary Tattoos

The Social Comfort of Wearable Light Therapy for Seasonal Affective Disorder

"What Went Right and What Went Wrong": An Analysis of 155 Postmortems from Game Development

Beliefs, Practices, and Personalities of Software Engineers: A Survey in a Large Software Company

Belief & Evidence in Empirical Software Engineering

The Emerging Role of Data Scientists on Software Development Teams

RETracer: Triaging Crashes by Reverse Execution from Partial Memory Dumps

Robust and Efficient Multiple Alignment of Unsynchronized Meeting Recordings

LatticeCrypto

SIDH Library

Program for TPC-H Data Generation with Skew

RIoT - A Foundation for Trust in the Internet of Things

Diverse Algebra Word Problem Dataset with Derivation Annotations

Ripples of mediatization: Social media and the exposure of the pool interview

Table Cell Search for Question Answering

Improving Document Ranking with Dual Word Embeddings

Search

Twitter

RSS Feedburner

Add This

Dilbert...

SkyDrive

Security