CRUNCH

 

Active Projects »

Gameful Computational Thinking

Inspired by CS for All?  Eager to contribute?  The Programming Systems Lab, led by Professor Gail Kaiser, is building a collaborative game-based learning and assessment system that infuses computational thinking in grade 6-8 curricula.  Near-term projects involve: Tooling Scratch with game design features Expanding a visual assessment language based in Blockly Enhancing an assessment server […]

 

Toward Trustworthy Mutable Replay for Security Patches

Society is increasingly reliant on software, but deployed software contains security vulnerabilities and other bugs that can threaten privacy, property and even human lives. When a security vulnerability or other severe defect is discovered, a software patch is issued to attempt to fix the problem – but patches themselves can be incorrect, inadequate, and break mission-critical […]

 

Dynamic Code Similarity

“Code clones” are statically similar code fragments dispersed via copy/paste or independently writing lookalike code; best practice removes clones (refactoring) or tracks them (e.g., to ensure bugs fixed in one clone are also fixed in others). We instead study dynamically similar code, for two different similarity models. One model is functional similarity, finding code fragments […]

 
 

Crunch is a web proxy, usable with essentially all web browsers, that performs content extraction (or clutter reduction) from HTML web pages. Crunch includes a flexible plug-in API so that various heuristics can be integrated to act as filters, collectively, to remove non-content and perform content extraction.

This proxy has evolved from a program where individual settings had to be tweaked by hand by the end user, to an extraction system that is designed to adapt to the user’s workflow and needs, classifying web pages based on genre and utilizing this information to extract content in similar manners from similar sites. It reduces human involvement in applying heuristic settings for websites and instead tries to automate the job by detecting and utilizing the content genre of a given website.

One of the major goals of Crunch is to be able to make web pages more accessible to people with disabilities and we believed that preprocessing web pages with Crunch would make inaccessible web pages more accessible.

 

 

 

 

 

 

 

 

 

 

 

Publications

Suhit Gupta, Gail Kaiser, “CRUNCH – Web-based Collaboration for Persons with Disabilities”, W3C Web Accessibility Initiative, Teleconference on Making Collaboration Technologies Accessible for Persons with Disabilities, Apr 2003.

Suhit Gupta, Gail Kaiser, David Neistadt, Peter Grimm “DOM-based Content Extraction of HTML Documents” WWW2003

Suhit Gupta; Gail E Kaiser, Peter Grimm, Michael F Chiang, Justin Starren, “Automating Content Extraction of HTML Documents” World Wide Web Journal, January 2004

Michael F. Chiang, Roy G. Cole, Suhit Gupta, Gail E Kaiser, Justin Starren, “World Wide Web Accessibility by Visually Disabled Patients: Problems and Solutions”, Submitted to the Journal of Opthalmology, January 2004

Suhit Gupta; Gail E Kaiser, Salvatore Stolfo, “Extracting Context To Improve Accuracy For HTML Content Extraction”, Poster at the World Wide Web Conference 2005

Suhit Gupta, Gail E Kaiser, Salvatore Stolfo, Hila Becker, Genre Classification of Websites Using Search Engine Snippets for Content Extraction”, Submitted to SIGIR 2005

Suhit Gupta, Gail Kaiser, “Extracting content from accessible webpages”, Proceedings of the 2005 International Cross-Disciplinary Workshop on Web Accessibility (W4A), May 2005