Active Projects »

Exploring Applications of Symbolic Execution to Testing, Debugging and Patch Validation

As a first step, we are investigating an approach to runtime information flow analysis for managed languages that tracks metadata about data values through the execution of a program.  We first considered metadata that propagates labels representing the originating source of each data value, e.g., sensitive data from the address book or GPS of a mobile […]


Reducing Testing Overhead

Unit test virtualization: significantly reducing the time to setup unit tests


Graphical Analysis of Program Behaviors to Discover Opportunities for New APIs

A joint project encompassing computer architecture, machine learning and software engineering


Finding Bugs in Machine Learning, Data Mining and Big Data Applications

Automating metamorphic testing techniques at runtime


Post-Deployment Checking for Bugs, Security Vulnerabilities and Privacy Breaches

Executing tests in the deployment environment, using the state of the running application


Crunch is a web proxy, usable with essentially all web browsers, that performs content extraction (or clutter reduction) from HTML web pages. Crunch includes a flexible plug-in API so that various heuristics can be integrated to act as filters, collectively, to remove non-content and perform content extraction.

This proxy has evolved from a program where individual settings had to be tweaked by hand by the end user, to an extraction system that is designed to adapt to the user’s workflow and needs, classifying web pages based on genre and utilizing this information to extract content in similar manners from similar sites. It reduces human involvement in applying heuristic settings for websites and instead tries to automate the job by detecting and utilizing the content genre of a given website.

One of the major goals of Crunch is to be able to make web pages more accessible to people with disabilities and we believed that preprocessing web pages with Crunch would make inaccessible web pages more accessible.













Suhit Gupta, Gail Kaiser, “CRUNCH – Web-based Collaboration for Persons with Disabilities”, W3C Web Accessibility Initiative, Teleconference on Making Collaboration Technologies Accessible for Persons with Disabilities, Apr 2003.

Suhit Gupta, Gail Kaiser, David Neistadt, Peter Grimm “DOM-based Content Extraction of HTML Documents” WWW2003

Suhit Gupta; Gail E Kaiser, Peter Grimm, Michael F Chiang, Justin Starren, “Automating Content Extraction of HTML Documents” World Wide Web Journal, January 2004

Michael F. Chiang, Roy G. Cole, Suhit Gupta, Gail E Kaiser, Justin Starren, “World Wide Web Accessibility by Visually Disabled Patients: Problems and Solutions”, Submitted to the Journal of Opthalmology, January 2004

Suhit Gupta; Gail E Kaiser, Salvatore Stolfo, “Extracting Context To Improve Accuracy For HTML Content Extraction”, Poster at the World Wide Web Conference 2005

Suhit Gupta, Gail E Kaiser, Salvatore Stolfo, Hila Becker, Genre Classification of Websites Using Search Engine Snippets for Content Extraction”, Submitted to SIGIR 2005

Suhit Gupta, Gail Kaiser, “Extracting content from accessible webpages”, Proceedings of the 2005 International Cross-Disciplinary Workshop on Web Accessibility (W4A), May 2005