Blogroll
Exploring limits to prediction in complex social systems: Predicting cascade size on Twitter
How predictable is success in complex social systems? In spite of a recent profusion of prediction studies that exploit online social and information network data, this question remains unanswered, in part because it has not been adequately specified. In this paper we attempt to clarify the question by presenting a simple stylized model of success that attributes prediction error to one of two generic sources: insufficiency of available data and/or models on the one hand; and inherent unpredictability of complex social systems on the other. We then use this model to motivate an illustrative empirical study of information cascade size prediction on Twitter. Despite an unprecedented volume of information about users, content, and past performance, our best performing models can explain less than half of the variance in cascade sizes. In turn, this result suggests that even with unlimited data predictive performance would be bounded well below deterministic accuracy. Finally, we explore this potential bound theoretically using simulations of a diffusion process on a random scale free network similar to Twitter. We show that although higher predictive power is possible in theory, such performance requires a homogeneous system and perfect ex-ante knowledge of it: even a small degree of uncertainty in estimating product quality or slight variation in quality across products leads to substantially more restrictive bounds on predictability. We conclude that realistic bounds on predictive accuracy are not dissimilar from those we have obtained empirically, and that such bounds for other complex social systems for which data is more difficult to obtain are likely even lower.
Categories: Microsoft
Query-Less: Predicting Task Repetition for NextGen Proactive Search and Recommendation Engines
Categories: Microsoft
Toward Full Elasticity in Distributed Static Analysis
In this paper we present the design and implementation of an elastic static analysis framework that is designed to scale with the size of the input. Our approach is based on the actor programming model and is deployed in the cloud. This provides a degree of elasticity for CPU, memory and storage resources. To demonstrate the potential of our technique, we show how a typical call graph analysis can be implemented. We experimentally validate the analysis using a combination of both synthetic and real benchmarks. The results show that our analysis scales well in terms of memory pressure independently of the input size, as we add more machines. Despite using stock hardware and incurring a non-trivial communication overhead, our processing time for projects of close to 1M LOC is about 15 minutes. As the number of machines increases, we show that the analysis time does not suffer. Lastly, we demonstrate that querying the results can be performed with a median latency of well under 20 ms.
Categories: Microsoft
XFabric: A Reconfigurable In-Rack Network for Rack-Scale Computers
Rack-scale computers are dense clusters with hundreds of micro-servers per rack. Designed for data center workloads, they can have significant power, cost and performance benefits over current racks. The rack network can be distributed, with small packet switches embedded on each processor as part of a system-on-chip (SoC) design. Ingress/egress traffic is forwarded by SoCs that have direct uplinks to the data center. Such fabrics are not fully provisioned and the chosen topology and uplink placement impacts performance for different workloads. XFabric is a rack-scale network that reconfigures the topology and uplink placement using a circuit-switched physical layer over which SoCs perform packet switching. To satisfy tight power and space requirements in the rack, XFabric does not use a single large circuit switch, instead relying on a set of independent smaller circuit switches. This introduces partial reconfigurability, as some ports in the rack cannot be connected by a circuit. XFabric optimizes the physical topology and manages uplinks, efficiently coping with partial reconfigurability. It significantly outperforms static topologies and has a performance similar to fully reconfigurable fabrics. We demonstrate the benefits of XFabric using flow-based simulations and a prototype built with electrical crosspoint switch ASICs.
Categories: Microsoft
Video Cube
VideoCube allows one to load an AVI movie file as a volume, and play back the movie sampling space and time in different ways. It also provides a single cutting plane for interactively viewing single spacetime slices of the video.
Categories: Microsoft
StereoMatcher
StereoMatcher is an implementation of some commonly used two-frame stereo matching algorithms. It also contains code to evaluate the quality of a computed depth map relative to a ground truth image.
Categories: Microsoft
Pan - Source
Pan is an experimental embedded language and compiler for image synthesis and manipulation, based on principles from functional programming.
Categories: Microsoft
Pan - Components
Pan is an experimental embedded language and compiler for image synthesis and manipulation, based on principles from functional programming.
Categories: Microsoft
Mppt
Mppt is an add-in for PowerPoint that allows a presenter to multicast PowerPoint slides, including animations and effects, to a group of viewers.
Categories: Microsoft
Mping
Mping is a simple command line application that sends and receives multicast packets.
Categories: Microsoft
JCLUSTER
JCLUSTER is a fast simple clustering program that produces hierarchical, binary branching, tree structured clusters.
Categories: Microsoft
Functional Reactive Animation
Fran is a Haskell library (or "embedded language") for interactive animations with 2D and 3D graphics and sound.
Categories: Microsoft
SizeCap: Coordinating Energy Storage Sizing and Power Capping for Fuel Cell Powered Data Centers
Fuel cells are a promising power source for future data centers, offering high energy efficiency, low greenhouse gas emissions, and high reliability. However, due to mechanical limitations related to fuel delivery, fuel cells are slow to adjust to sudden increases in data center power demands, which can result in temporary power shortfalls. To mitigate the impact of power shortfalls, prior work has proposed to either perform power capping by throttling the servers, or by leveraging energy storage devices (ESDs) that can temporarily provide enough power to make up for the shortfall while the fuel cells ramp up power generation. Both approaches have disadvantages: power capping conservatively limits server performance and can lead to service level agreement (SLA) violations, while ESD-only solutions must significantly overprovision energy storage capacity to tolerate the shortfalls caused by worst-case (i.e., largest) power surges, which greatly increases the total cost of ownership (TCO). We propose SizeCap, the first ESD sizing framework for fuel cell powered data centers, which coordinates ESD sizing with power capping to allow data centers to employ a cost-effective solution to power shortfalls. SizeCap sizes the ESD just large enough to cover the majority of power surges, but not the worst-case surges that occur infrequently, to greatly reduce TCO. It then uses the smaller capacity ESD in conjunction with power capping to cover the power shortfalls caused by worst-case power surges. As part of our new flexible framework, we propose multiple power capping policies with different degrees of awareness of fuel cell and workload behavior, and evaluate their impact on workload performance and ESD size. Using traces from production data center systems, we demonstrate that SizeCap significantly reduces the ESD size (by 85% for a workload with infrequent yet large power surges, and by 50% for a workload with frequent power surges) without violating any SLAs.
Categories: Microsoft
<b>XForms</b> for Archives: Toward a more thoroughly integrated <b>...</b>
I am making updates to our systems in preparation for the initial publication of NEH/Mellon EBooks. Part of the project is to thoroughly integrate these EBooks with our collection, archives, IGCH, and related project databases.
Categories: Development, Technology
Opinion analysis: Justices strike a blow against state health-care <b>...</b>
For him, the issue clearly was one of institutional competency. The Court can only say that state data collection is, or is not, preempted. The Department of Labor, by contrast, can define a standardized format that would obviate ...
Categories: Development
NROSH+ <b>data collection</b> - News stories - GOV.UK
These documents are located here temporarily to provide access when the NROSH+ system closes from 5 March 2016 to 31 March 2016.
Categories: Development
FLO Cycling - Wheel Design Series Step 1 - <b>Data Collection</b>
In order to collect real-world data, we needed to build a device that could record live wind data while riding. Not only did the sensor have to be accurate, it needed to be precise, and it needed to collect measurements at a high ...
Categories: Development
KS1/2 Sample tests – <b>data collection</b> | Ramblings of a Teacher
In an effort to help, I am hoping to collect data from as many schools as possible so that we can draw some comparisons. I'm asking Year 2/Year 6 teachers who have used the sample tests to share the raw score data for their ...
Categories: Development