Blogroll

"What Went Right and What Went Wrong": An Analysis of 155 Postmortems from Game Development

Microsoft Research Publications - Sun, 05/01/2016 - 09:00
In game development, software teams often conduct postmortems to reflect on what went well and what went wrong in a project. The postmortems are shared publicly on gaming sites or at developer conferences. In this paper, we present an analysis of 155 postmortems published on the gaming site Gamasutra.com. We identify characteristics of game development, link the characteristics to positive and negative experiences in the postmortems and distill a set of best practices and pitfalls for game development.
Categories: Microsoft

Beliefs, Practices, and Personalities of Software Engineers: A Survey in a Large Software Company

Microsoft Research Publications - Sun, 05/01/2016 - 09:00
In this paper we present the results from a survey about the beliefs, practices, and personalities of software engineers in a large software company. The survey received 797 responses. We report statistics about beliefs of software engineers, their work practices, as well as differences in those with respect to personality traits. For example, we observed no personality differences between developers and testers; managers were conscientious and more extraverted. We observed several differences for engineers who are listening to music and for engineers who have built a tool. We also observed that engineers who agree with the statement “Agile development is awesome” were more extroverted and less neurotic.
Categories: Microsoft

Belief & Evidence in Empirical Software Engineering

Microsoft Research Publications - Sun, 05/01/2016 - 09:00
Empirical software engineering has produced a steady stream of evidence-based results concerning the factors that affect important outcomes such as cost, quality, and interval. However, programmers often also have strongly-held a priori opinions about these issues. These opinions are important, since developers are highlytrained professionals whose beliefs would doubtless affect their practice. As in evidence-based medicine, disseminating empirical findings to developers is a key step in ensuring that the findings impact practice. In this paper, we describe a case study, on the prior beliefs of developers at Microsoft, and the relationship of these beliefs to actual empirical data on the projects in which these developers work. Our findings are that a) programmers do indeed have very strong beliefs on certain topics b) their beliefs are primarily formed based on personal experience, rather than on findings in empirical research and c) beliefs can vary with each project, but do not necessarily correspond with actual evidence in that project. Our findings suggest that more effort should be taken to disseminate empirical findings to developers and that more in-depth study the interplay of belief and evidence in software practice is needed.
Categories: Microsoft

The Emerging Role of Data Scientists on Software Development Teams

Microsoft Research Publications - Sun, 05/01/2016 - 09:00
Creating and running software produces large amounts of raw data about the development process and the customer usage, which can be turned into actionable insight with the help of skilled data scientists. Unfortunately, data scientists with the analytical and software engineering skills to analyze these large data sets have been hard to come by; only recently have software companies started to develop competencies in software-oriented data analytics. To understand this emerging role, we interviewed data scientists across several product groups at Microsoft. In this paper, we describe their education and training background, their missions in software engineering contexts, and the type of problems on which they work. We identify five distinct working styles of data scientists: (1) Insight Providers, who work with engineers to collect the data needed to inform decisions that managers make; (2) Modeling Specialists, who use their machine learning expertise to build predictive models; (3) Platform Builders, who create data platforms, balancing both engineering and data analysis concerns; (4) Polymaths, who do all data science activities themselves; and (5) Team Leaders, who run teams of data scientists and spread best practices. We further describe a set of strategies that they employ to increase the impact and actionability of their work.
Categories: Microsoft

RETracer: Triaging Crashes by Reverse Execution from Partial Memory Dumps

Microsoft Research Publications - Sun, 05/01/2016 - 09:00
Many software providers operate crash reporting services to automatically collect crashes from millions of customers and file bug reports. Precisely triaging crashes is necessary and important for software providers because the millions of crashes that may be reported every day are critical in identifying high impact bugs. However, the triaging accuracy of existing systems is limited, as they rely only on the syntactic information of the stack trace at the moment of a crash without analyzing program semantics. In this paper, we present RETracer, the first system to triage software crashes based on program semantics reconstructed from memory dumps. RETracer was designed to meet the requirements of large-scale crash reporting services. RETracer performs binarylevel backward taint analysis without a recorded execution trace to understand how functions on the stack contribute to the crash. The main challenge is that the machine state at an earlier time cannot be recovered completely from a memory dump, since most instructions are information destroying. We have implemented RETracer for x86 and x86-64 native code, and compared it with the existing crash triaging tool used by Microsoft. We found that RETracer eliminates two thirds of triage errors based on a manual analysis of 140 bugs fixed in Microsoft Windows and Office. RETracer has been deployed as the main crash triaging system on Microsoft’s crash reporting service.
Categories: Microsoft

Robust and Efficient Multiple Alignment of Unsynchronized Meeting Recordings

Microsoft Research Publications - Sun, 05/01/2016 - 09:00
This paper proposes a way to generate a single high-quality audio recording of a meeting using no equipment other than participants’ personal devices. Each participant in the meeting uses their mobile device as a local recording node, and they begin recording whenever they arrive in an unsynchronized fashion. The main problem in generating a single summary recording is to temporally align the various audio recordings in a robust and efficient manner. We propose a way to do this using an adaptive audio fingerprint based on spectrotemporal eigenfilters, where the fingerprint design is learned on-the-fly in a totally unsupervised way to perform well on the data at hand. The adaptive fingerprints require only a few seconds of data to learn a robust design, and they require no tuning. Our method uses an iterative, greedy two-stage alignment algorithm which finds a rough alignment using indexing techniques, and then performs a more fine-grained alignment based on Hamming distance. Our proposed system achieves >99% alignment accuracy on challenging alignment scenarios extracted from the ICSI meeting corpus, and it outperforms five other well-known and state-ofthe-art fingerprint designs. We conduct extensive analyses of the factors that affect the robustness of the adaptive fingerprints, and we provide a simple heuristic that can be used to adjust the fingerprint’s robustness according to the amount of computation we are willing to perform.
Categories: Microsoft

LatticeCrypto

Microsoft Research Downloads - Fri, 04/29/2016 - 06:45
LatticeCrypto is a high-performance and portable software library that implements lattice-based cryptographic algorithms. The first release of the library provides an implementation of lattice-based key exchange with security based on the Ring Learning With Errors (R-LWE) problem using new algorithms for the underlying Number Theoretic Transform (NTT). The chosen parameters provide at least 128 bits of security against attackers running classical and quantum computers.
Categories: Microsoft

SIDH Library

Microsoft Research Downloads - Wed, 04/27/2016 - 04:35
SIDH is a fast and portable software library that implements a new suite of algorithms for Supersingular Isogeny Diffie-Hellman (SIDH) key exchange. The chosen parameters aim to provide 128 bits of security against attackers running a large-scale quantum computer, and 192 bits of security against classical algorithms. SIDH has the option of a hybrid key exchange that combines supersingular isogeny Diffie-Hellman with a high-security classical elliptic curve Diffie-Hellman key exchange at a small overhead.
Categories: Microsoft

Program for TPC-H Data Generation with Skew

Microsoft Research Downloads - Wed, 04/27/2016 - 01:00
The schema and queries of the TPC-H (formerly TPC-D) benchmark are widely used by people in the database community. One of the requirements of the benchmark is that data for columns in the database are generated from a uniform distribution. However, this requirement makes it hard for users to conclude about the robustness/effectiveness of their system since real world data distributions are often non-uniform. We have therefore created a new data generation program for TPC-H that is capable of generating a database where the columns have non-uniform (skewed) data distributions. In particular, the program can generate data from a Zipfian distribution, where the Zipf value (z), which controls the degree of skew in the data, is a parameter that can be specified to the program. In addition, the program allows the generation of a database with “mixed” data distribution where the skew of a column in the database is randomly chosen from the Zipfian values {0,1,2,3,4}. Note that the total number of rows in the tables and the total database size are not affected by our changes.
Categories: Microsoft

RIoT - A Foundation for Trust in the Internet of Things

Microsoft Research Publications - Thu, 04/21/2016 - 09:00
RIoT (Robust Internet-of-Things) is an architecture for providing foundational trust services to computing devices. The trust services include device identity, sealing, attestation, and data integrity. The term “Robust” is used because the minimal trusted computing base is tiny, and because RIoT capabilities can remotely re-establish trust in devices that have been compromised by malware. The term IoT is used because these services can be provided at low cost on even the tiniest of devices.
Categories: Microsoft

Diverse Algebra Word Problem Dataset with Derivation Annotations

Microsoft Research Downloads - Wed, 04/20/2016 - 00:14
This dataset provides training and testing examples for solving algebra word problems automatically. In addition to have 1000 completely new problems, we augmented the data by annotating the full derivations (template + alignments) for each word problem. We also performed cross-dataset cleaning across all datasets, so that the template annotation across different sets are unified. The instances are coming from the following resources: (1) 1000 new training/testing data with diverse templates and narratives crawled from algebra.com. (Our contribution) (2) http://groups.csail.mit.edu/rbg/code/wordprobs/ (3) http://research.microsoft.com/en-us/projects/dolphin/ (their dataset contains non-linear problems. We took a subset of the problems that are linear there) . Please cite the corresponding report (and the original papers of the other datasets) if you found the dataset useful. We are aware that some of the problems might contain annotation error.
Categories: Microsoft

Ripples of mediatization: Social media and the exposure of the pool interview

Microsoft Research Publications - Sun, 04/17/2016 - 09:00
During the 2011 UK public sector protests, controversy ignited over the “Miliband Loop”, an unedited video from a pool interview showing Labour leader Ed Miliband to have provided largely the same answer in response to six questions. The interviewer subsequently complained in a TwitLonger that the incident epitomized the clash of public relations and journalism. In this paper we unpack the practical production of the pool interview as a delamination of the interview-as-lived from the interview-as-media-production-mechanism. We then explore professional and public understanding (or lack thereof) of exposure of this delamination issue and its relation to politics. While the controversy did not directly affect Miliband׳s position as leader, it is clear that the Internet is a dangerous place for the old rules of mediatization.
Categories: Microsoft

Table Cell Search for Question Answering

Microsoft Research Publications - Mon, 04/11/2016 - 09:00
Tables are pervasive on the Web. Informative web tables range across a large variety of topics, which can naturally serve as a significant resource to satisfy user information needs. Driven by such observations, in this paper, we investigate an important yet largely under-addressed problem: Given millions of tables, how to precisely retrieve table cells to answer a user question. This work proposes a novel table cell search framework to attack this problem. We first formulate the concept of a relational chain which connects two cells in a table and represents the semantic relation between them. With the help of search engine snippets, our framework generates a set of relational chains pointing to potentially correct answer cells. We further employ deep neural networks to conduct more fine-grained inference on which relational chains best match the input question and finally extract the corresponding answer cells. Based on millions of tables crawled from the Web, we evaluate our framework in the open-domain question answering (QA) setting, using both the well-known WebQuestions dataset and user queries mined from Bing search engine logs. On WebQuestions, our framework is comparable to state-of-the-art QA systems based on knowledge bases (KBs), while on Bing queries, it outperforms other systems with a 56.7% relative gain. Moreover, when combined with results from our framework, KB-based QA performance can obtain a relative improvement of 28.1% to 66.7%, demonstrating that web tables supply rich knowledge that might not exist or is difficult to be identified in existing KBs.
Categories: Microsoft

Improving Document Ranking with Dual Word Embeddings

Microsoft Research Publications - Mon, 04/11/2016 - 09:00
This paper investigates the popular neural word embedding method Word2vec as a source of evidence in document ranking. In contrast to NLP applications of word2vec, which tend to use only the input embeddings, we retain both the input and the output embeddings, allowing us to calculate a different word similarity that may be more suitable for document ranking. We map the query words into the input space and the document words into the output space, and compute a relevance score by aggregating the cosine similarities across all the query-document word pairs. We postulate that the proposed Dual Embedding Space Model (DESM) provides evidence that a document is about a query term, in addition to and complementing the traditional term frequency based approach.
Categories: Microsoft

A Software Methodology for Compiling Quantum Programs

Microsoft Research Publications - Tue, 04/05/2016 - 09:00
Quantum computers promise to transform our notions of computation by offering a completely new paradigm. To achieve scalable quantum computation, optimizing compilers and a corresponding software design flow will be essential. We present a software architecture for compiling quantum programs from a high-level language program to hardware-specific instructions. We describe the necessary layers of abstraction and their differences and similarities to classical layers of a computer-aided design flow. For each layer of the stack, we discuss the underlying methods for compilation and optimization. Our software methodology facilitates more rapid innovation among quantum algorithm designers, quantum hardware engineers, and experimentalists. It enables scalable compilation of complex quantum algorithms and can be targeted to any specific quantum hardware implementation.
Categories: Microsoft

Emerging and Recurring Data-Driven Storytelling Techniques: Analysis of a Curated Collection of Recent Stories

Microsoft Research Publications - Sun, 04/03/2016 - 09:00
Storytelling with data is becoming an important component of many fields such as graphic design, the advocacy of causes, and journalism. New techniques for integrating data visualization into narrative stories have now become commonplace. Authors are enabling new reader experiences, such as linking textual narrative and data visualizations through dynamic queries embedded in the text. Novel means of communicating position and navigating within the narrative also have merged, such as utilizing scrolling to advance narration and initiate animations. We advance the study of narrative visualization through an analysis of a curated collection of recent data-driven stories shared on the web. Drawing from the results of this analysis, we present a set of techniques being employed in these examples, organized under four high-level categories that help authors to tell stories in creative ways: communicating narrative and explaining data, linking separated story elements, enhancing structure and navigation, and providing controlled exploration. We describe the benefits of each storytelling technique along with a number of example applications of the ideas through recent data-driven stories. Additionally, we discuss the trends we observed as well as how the field has evolved and grown. Finally, we conclude with a discussion of areas for future research.
Categories: Microsoft

Learning to Verify the Heap

Microsoft Research Publications - Fri, 04/01/2016 - 09:00
We present a data-driven verification framework to automatically prove memory safety and functional correctness of heap programs. For this, we introduce a novel statistical machine learning technique that maps observed program states to (possibly disjunctive) separation logic formulas describing the invariant shape of data structures at relevant program locations. We then attempt to verify these predictions using a theorem prover, where counterexamples to a predicted invariant are used as additional input to the shape predictor in a refinement loop. After obtaining valid shape invariants, we use a second learning algorithm to strengthen them with data invariants, again employing a refinement loop using the underlying theorem prover. We have implemented our techniques in Cricket, an extension of the GRASShopper verification tool. Cricket is able to automatically prove memory safety and correctness of implementations of a variety of classical list-manipulating algorithms such as insertionsort.
Categories: Microsoft

Analyzing Runtime and Size Complexity of Integer Programs

Microsoft Research Publications - Fri, 04/01/2016 - 09:00
We present a modular approach to automatic complexity analysis of integer programs. Based on a novel alternation between finding symbolic time bounds for program parts and using these to infer bounds on the absolute values of program variables, we can restrict each analysis step to a small part of the program while maintaining a high level of precision. The bounds computed by our method are polynomial or exponential expressions that depend on the absolute values of input parameters. We show how to extend our approach to arbitrary cost measures, allowing to use our technique to find upper bounds for other expended resources, such as network requests or memory consumption. Our contributions are implemented in the open source tool KoAT, and extensive experiments show the performance and power of our implementation in comparison with other tools.
Categories: Microsoft

The Dialog State Tracking Challenge Series: A Review

Microsoft Research Publications - Fri, 04/01/2016 - 09:00
In a spoken dialog system, dialog state tracking refers to the task of correctly inferring the state of the conversation – such as the user’s goal – given all of the dialog history up to that turn. Dialog state tracking is crucial to the success of a dialog system, yet until recently there were no common resources, hampering progress. The Dialog State Tracking Challenge series of 3 tasks introduced the first shared testbed and evaluation metrics for dialog state tracking, and has underpinned three key advances in dialog state tracking: the move from generative to discriminative models; the adoption of discriminative sequential techniques; and the incorporation of the speech recognition results directly into the dialog state tracker. This paper reviews this research area, covering both the challenge tasks themselves and summarizing the work they have enabled.
Categories: Microsoft
Syndicate content

eXTReMe Tracker