Active Projects »

Gameful Computational Thinking

Inspired by CS for All?  Eager to contribute?  The Programming Systems Lab, led by Professor Gail Kaiser, is building a collaborative game-based learning and assessment system that infuses computational thinking in grade 6-8 curricula.  Near-term projects involve: Tooling Scratch with game design features Expanding a visual assessment language based in Blockly Enhancing an assessment server […]


Record/Replay Bug Reproduction for Java

There will inevitably continue to be bugs that are not detected by any testing approach, but eventually impact users who then file bug reports. Reproducing field failures in the development environment can be difficult, however, especially in the case of software that behaves non-deterministically, relies on remote resources, or has complex reproduction steps (the users […]


Toward Trustworthy Mutable Replay for Security Patches

Society is increasingly reliant on software, but deployed software contains security vulnerabilities and other bugs that can threaten privacy, property and even human lives. When a security vulnerability or critical error is discovered, a software patch is issued to attempt to fix the problem, but patches themselves can be incorrect, inadequate, and break necessary functionality. […]


Dynamic Information Flow Analysis

We are investigating an approach to runtime information flow analysis for managed languages that tracks metadata about data values through the execution of a program. We first considered metadata that propagates labels representing the originating source of each data value, e.g., sensitive data from the address book or GPS of a mobile device that should […]


Sound Build Acceleration

Sound Build Acceleration: Our empirical studies found that the bulk of the clock time during the builds of the ~2000 largest and most popular Java open source software applications is spent running test cases, so we seek to speed up large builds by reducing testing time. This is an important problem because real-world industry builds […]


Dynamic Code Similarity

“Code clones” are statically similar code fragments dispersed via copy/paste or independently writing lookalike code; best practice removes clones (refactoring) or tracks them (e.g., to ensure bugs fixed in one clone are also fixed in others). We instead study dynamically similar code, for two different similarity models. One model is functional similarity, finding code fragments […]


Crunch is a web proxy, usable with essentially all web browsers, that performs content extraction (or clutter reduction) from HTML web pages. Crunch includes a flexible plug-in API so that various heuristics can be integrated to act as filters, collectively, to remove non-content and perform content extraction.

This proxy has evolved from a program where individual settings had to be tweaked by hand by the end user, to an extraction system that is designed to adapt to the user’s workflow and needs, classifying web pages based on genre and utilizing this information to extract content in similar manners from similar sites. It reduces human involvement in applying heuristic settings for websites and instead tries to automate the job by detecting and utilizing the content genre of a given website.

One of the major goals of Crunch is to be able to make web pages more accessible to people with disabilities and we believed that preprocessing web pages with Crunch would make inaccessible web pages more accessible.













Suhit Gupta, Gail Kaiser, “CRUNCH – Web-based Collaboration for Persons with Disabilities”, W3C Web Accessibility Initiative, Teleconference on Making Collaboration Technologies Accessible for Persons with Disabilities, Apr 2003.

Suhit Gupta, Gail Kaiser, David Neistadt, Peter Grimm “DOM-based Content Extraction of HTML Documents” WWW2003

Suhit Gupta; Gail E Kaiser, Peter Grimm, Michael F Chiang, Justin Starren, “Automating Content Extraction of HTML Documents” World Wide Web Journal, January 2004

Michael F. Chiang, Roy G. Cole, Suhit Gupta, Gail E Kaiser, Justin Starren, “World Wide Web Accessibility by Visually Disabled Patients: Problems and Solutions”, Submitted to the Journal of Opthalmology, January 2004

Suhit Gupta; Gail E Kaiser, Salvatore Stolfo, “Extracting Context To Improve Accuracy For HTML Content Extraction”, Poster at the World Wide Web Conference 2005

Suhit Gupta, Gail E Kaiser, Salvatore Stolfo, Hila Becker, Genre Classification of Websites Using Search Engine Snippets for Content Extraction”, Submitted to SIGIR 2005

Suhit Gupta, Gail Kaiser, “Extracting content from accessible webpages”, Proceedings of the 2005 International Cross-Disciplinary Workshop on Web Accessibility (W4A), May 2005