HCI Bibliography Home | HCI Conferences | USER Archive | Detailed Records | RefWorks | EndNote | Hide Abstracts
USER Tables of Contents: 12

Proceedings of the 2012 International Workshop on User Evaluation for Software Engineering Researchers

Fullname:Proceedings of the First International Workshop on User Evaluation for Software Engineering Researchers
Editors:Andrew Begel; Caitlin Sadowski
Location:Zurich, Switzerland
Standard No:ISBN: 978-1-4673-1859-4; ACM DL: Table of Contents; hcibib: USER12
Links:Workshop Website | Conference Website
Combining experiments and grounded theory to evaluate a research prototype: lessons from the Umple model-oriented programming technology BIBAFull-Text 1-4
  Omar Badreddin; Timothy C. Lethbridge
Research prototypes typically lack the level of quality and readiness required for industrial deployment. Hence, conducting realistic experimentation with professional users that reflect real life tasks is challenging. Experimentation with toy examples and tasks suffers from significant threats to external validity. Consequently, results from such experiments fail to gain confidence or mitigate risks, a prerequisite for industrial adoption. This paper presents two empirical studies conducted to evaluate a model-oriented programming language called Umple; a grounded theory study and a controlled experiment of comprehension. Evaluations of model-oriented programming is particularly challenging. First, there is a need to provide for highly sophisticated development environments for realistic evaluation. Second, the scarcity of experienced users poses additional challenges. In this paper we discuss our experiences, lessons learned, and future considerations in the evaluation of a research prototype tool.
User evaluation of a domain-oriented end-user design environment for building 3D virtual chemistry experiments BIBAFull-Text 5-8
  Ying Zhong; Chang Liu
Three-dimensional virtual world technologies have the potential to be applied in the domain of education. However, end users such as teachers found it difficult to apply virtual world technologies because of technical issues. This paper discusses the technical difficulties end users face when developing 3D virtual worlds. We investigate the problem from the perspective of end-user programming and propose a methodology for solving this problem. In order to evaluate this methodology, a domain-oriented end-user design environment implementing the methodology has been developed and applied in the domain of educational virtual chemistry laboratory. Two user studies are designed to assess the methodology from two different perspectives. The first user study evaluates the usability of the methodology. The second user study assesses the usability of virtual experiments generated using the methodology.
An experiment in developing small mobile phone applications comparing on-phone to off-phone development BIBAFull-Text 9-12
  Tuan A. Nguyen; Sarker T. A. Rumee; Christoph Csallner; Nikolai Tillmann
TouchDevelop represents a radically new mobile application development model, as TouchDevelop enables mobile application development on a mobile device. I.e., with TouchDevelop, the task of programming say a Windows Phone is shifted from the desktop computer to the mobile phone itself. We describe a first experiment on independent, nonexpert subjects to compare programmer productivity using TouchDevelop vs. using a more traditional approach to mobile application development.
How helpful are automated debugging tools? BIBAFull-Text 13-16
  Jeremias Rößler
The field of automated debugging, which is concerned with the automation of identifying and correcting a failure's root cause, has made tremendous advancements in the past. However, some of the reported progress may be due to unrealistic assumptions that underlie the evaluation of automated debugging tools. These unrealistic assumptions concern the work process of developers and their ability to detect faulty code without explanatory context, as well as the size and arrangement of fixes. Instead of trying to locate the fault, we propose to help the developer understand it, thus enabling her to decide which fix she deems most appropriate. This would entail the need to employ a completely different evaluation scheme that bases on feedback from actual users of the tools in realistic usage scenarios. With this paper we propose the details for a first such user study.
Evaluating live sequence charts as a programming technique for non-programmers BIBAFull-Text 17-20
  Michal Gordon; David Harel
Behavioral programming is a recent programming paradigm that uses independent scenarios to program the behavior of reactive systems. Live sequence charts (LSC) is a visual formalism that implements the approach of behavioral programming. The approach attempts to liberate programming by allowing the user to program the behavior of reactive systems by scenarios. We would like to evaluate the approach and seek the naturalness of the best interface for creating the visual artifact of LSCs. Several such interfaces, among which is a novel interactive natural language (NL) interface, exist. Initial testing indicates that the LSCs' NL interface may be preferred by programmers to procedural programming and that in certain tasks LSCs may be a viable and more natural alternative to conventional programming. Many challenges exist in trying to prove the intuitive and natural nature of a new programming paradigm, which differs from others not only in syntax but in many other respects. We describe these challenges in this proposal.
Do we stop learning from our mistakes when using automatic code analysis tools?: an experiment proposal BIBAFull-Text 21-24
  Jan-Peter Ostberg; Stefan Wagner
When we learn how to program, we often do that by trial and error. We struggle with the syntax and with our own understanding of how the idea of the program should look like in the specific programming language. Today there is a huge amount of tools available, which automatically check your code and recommend alterations to the code for the sake of maintainability or correctness. The question, that has not yet been asked by science, is: Are we still learning something from these mistakes, besides the knowledge, that such mistakes will be corrected for us? In the following we will propose an experimental setup, that aims to answer this question.
Towards an evaluation of bidirectional model-driven spreadsheets BIBAFull-Text 25-28
  Jácome Cunha; João Paulo Fernandes; Jorge Mendes; João Saraiva
Spreadsheets are widely recognized as popular programming systems with a huge number of spreadsheets being created every day. Also, spreadsheets are often used in the decision processes of profit-oriented companies. While this illustrates their practical importance, studies have shown that up to 90% of real-world spreadsheets contain errors.
   In order to improve the productivity of spreadsheet end-users, the software engineering community has proposed to employ model-driven approaches to spreadsheet development.
   In this paper we describe the evaluation of a bidirectional model-driven spreadsheet environment. In this environment, models and data instances are kept in conformity, even after an update on any of these artifacts. We describe the issues of an empirical study we plan to conduct, based on our previous experience with end-user studies. Our goal is to assess if this model-driven spreadsheet development framework does in fact contribute to improve the productivity of spreadsheet users.
Revisiting bug triage and resolution practices BIBAFull-Text 29-30
  Olga Baysal; Reid Holmes; Michael W. Godfrey
Bug triaging is an error-prone, tedious and time-consuming task. However, little qualitative research has been done on the actual use of bug tracking systems, bug triage, and resolution processes. We are planning to conduct a qualitative study to understand the dynamics of bug triage and fixing process, as well as bug reassignments and reopens. We will study interviews conducted with Mozilla Core and Firefox developers to get insights into the primary obstacles developers face during the bug fixing process. Is the triage process flawed? Does bug review slow things down? Does approval takes too long? We will also categorize the main reasons for bug reassignments and reopens. We will then combine results with a quantitative study of Firefox bug reports, focusing on factors related to bug report edits and number of people involved in handling the bug.
Is essence a measure of maintainability? BIBAFull-Text 31-34
  Dmitrijs Zaparanuks; Matthias Hauswirth
We recently published a paper at ECOOP presenting a new software design metric, essence, that quantifies the amount of indirection in a software design. The reviews were overwhelmingly positive and included statements such as "The evaluation of the metric is fantastic." However, we also received feedback from senior researchers who do not believe that we have meaningfully evaluated our metric. This paper represents our effort towards a meaningful evaluation of essence. Given our lack of experience in human-subject studies, we hope to receive valuable feedback on our proposed study design.
Evaluating awareness information in distributed collaborative editing by software-engineers BIBAFull-Text 35-38
  Julia Schenk
In co-located collaborative software development activities like pair programming, side-by-side programming, code reviews or code walkthroughs, the individuals automatically gain a fine granular mutual understanding of where in the shared workspace the other participants are, what they are doing and what their levels of interest are. These points of so called awareness information are critical for an efficient and smooth collaboration but cannot be obtained via the natural mechanisms in virtual teams. Application sharing and groupware for collaborative editing are widely used for collaborative tasks in distributed software development but considered from the awareness and flexibility aspect they are far off the co-located setting. To better support virtual team collaboration by improving tools for distributed software development it is necessary to evaluate awareness and its impacts to certain collaborative situations. Awareness itself is an invisible phenomenon and due to its intangible nature cannot be easily observed or measured. Thus we recorded virtual teams using Saros, a groupware for distributed collaborative party programming, respectively VNC and now analyse these videos using the grounded theory methodology. This approach for evaluating awareness leads to various problems concerning the recording setup and time exposure for analysis.
An experimental study of a design-driven, tool-based development approach BIBAFull-Text 39-42
  Quentin Enard; Christine Louberry; Charles Consel; Xavier Blanc
Design-driven software development approaches have long been praised for their many benefits on the development process and the resulting software system. This paper discusses a step towards assessing these benefits by proposing an experimental study that involves a design-driven, tool-based development approach. This study raises various questions including whether a design-driven approach improves software quality and whether the tool-based approach improves productivity. In examining these questions, we explore specific issues such as the approaches that should be involved in the comparison, the metrics that should be used, and the experimental framework that is required.
Industrially validating longitudinal static and dynamic analyses BIBAFull-Text 43-44
  Reid Holmes; David Notkin; Mark Hancock
Software systems gradually evolve over time, becoming increasingly difficult to understand as new features are added and old defects are repaired. Some modifications are harder to understand than others; e.g., an explicit method call is usually easy to trace in the source code, while a reflective method call may perplex both developers and analysis tools. Our tool, the Inconsistency Inspector, collects static and dynamic call graphs of systems and composes them to help developers more systematically address the static and dynamic implications of a change to a system.
   We have quantitatively validated the Inconsistency Inspector and have convinced ourselves that it can expose both interesting and surprising facets of a system's evolution. An initial case study with an industrial organization showed promise leading to the Inconsistency Inspector being installed at the organization for the past several months in preparation for a more in depth analysis.
   In July 2012 we will have the opportunity to examine 8 months of industrial data, enabling us to perform an in-depth longitudinal evaluation of how their system has evolved and whether the Inconsistency Inspector can expose surprising and helpful facts for the industrial team. At the USER workshop, we hope to gather opinions about evaluation options for validating the industrial utility of our approach and the complex longitudinal data we have collected.
User evaluation of a domain specific program comprehension tool BIBAFull-Text 45-48
  Leon Moonen
The user evaluation in this paper concerns a domain-specific tool to support the comprehension of large safety-critical component-based software systems for the maritime sector. We discuss the context and motivation of our research, and present the user-specific details of our tool, called FlowTracker. We include a walk-through of the system and present the profiles of our prospective users. Next, we discuss the design of an exploratory qualitative study that we have conducted to evaluate the usability and effectiveness of our tool. We conclude with a summary of lessons learned and challenges that we see for user evaluation of such domain-specific program comprehension tools.
Stakeholder involvement into quality definition and evaluation for service-oriented systems BIBAFull-Text 49-52
  Vladimir A. Shekhovtsov; Heinrich C. Mayr; Christian Kop
The paper addresses the matter of quality in the software process for service-oriented systems. We argue for the need of involving the users/stakeholders into the specification and evaluation of quality (requirements) and we develop means for supporting such an involvement. For this purpose we introduce classifications of user and quality types and as a basis for the characterization of evaluation cases.