Project Student Advertisements
The Programming Systems Laboratory (PSL) is seeking graduate and advanced undergraduate students for individual and team research and development projects, possibly some user studies. Preference is for students interested in participating for multiple consecutive semesters, potentially including summer(s). Prerequisites (except as otherwise specified for particular projects): COMS W3157 or equivalent programming experience in Java or C/C++. Recommended co-requisites: any one or more of COMS W4111: Introduction to Databases, W4156: Advanced Software Engineering, COMS W4444: Programming & Problem Solving are desirable, but not necessary. Non-majors are very welcome, particularly students with background in the life sciences, social sciences or statistics. Time commitment approximately 12 hours per week for a 3-point project. However, projects are graded based on results rather than effort, so prospective project students must have strong time management and organizational skills. Most work will be conducted in the Programming Systems Lab, located in 6LE1 CEPSR; some work can be conducted remotely. Initial projects are only for academic credit, but a particularly spectacular project could lead to a GRA or paid part-time position for a later semester (and/or summer).
Topics of current interest include but are not limited to:
- social computing and collaborative filtering for genomics and other scientific research
- tradeoffs between privacy preservation, regulatory localization, green computing and other societal issues
- MMORPG gaming concepts for engaging software development processes and software engineering education
- better approaches to maintaining the reliability of complex software systems
- novel techniques for finding and fixing security vulnerabilities
- building and operating reliable software for the smart grid, green buildings, and other cyber-physical systems
Contact Jonathan Bell (jbell@cs.columbia.edu) to be matched to an ongoing or suggested project, or to help define your own.
Current projects:
World of Warcraft data visualization - As part of an ongoing project, we have built a data set containing over 10 million World of Warcraft characters and their profiles. We are seeking project students to build a website to provide one-click reporting and visualization for this data. Candidates must have experience with MySQL and PHP.
Contact: Jonathan Bell (jbell@cs.columbia.edu)
Gamifying Software Engineering – We are studying methods to apply engaging qualities from games to a software engineering environment. We are creating a system called HALO SE (Highly Addictive, sociaLly Optimized Software Engineering) to implement this approach. This method follows a general move to “gamify” everyday life, or make it more “gameful.” HALO represents software engineering tasks as quests and uses a storyline to bind multiple quests together – users must complete quests in order to advance the plot. Our current implementation of HALO is a plugin for the eclipse IDE. We have positions available for students at multiple levels:
- We have positions available for introductory computer science students (with general familiarity with Java) who are creative and are able to develop game-like content (stories and to some extent, artwork). These positions will have limited programming required.
- We have positions available for advanced computer science students (with extensive familiarity with Java and J2EE) to continue to build our HALO system. Applicants for this position should possess a command of Java. Sample projects include:
- HALO currently appears to students as a simple task list. We would like to have this extended to support graphics and animations, through moving avatars (similar to Microsoft’s Clippy). This project will be developed within an Eclipse plugin and use Glassfish and JAX-WS.
- We would like to scale HALO across multiple courses. To do so, we will need to create integrated support systems, documentation, and bring general polish to the system. We are seeking project students with familiarity with Java UI programming for this purpose.
Contact: Jonathan Bell (jbell@cs.columbia.edu)
Social Software Engineering – We have developed a tool called genSpace that enables collaboration and knowledge sharing via social networking metaphors. This tool is currently integrated with the geWorkbench platform for integrated genomics and computational biology, and is designed with a component-based, plug-in architecture. There are many different projects available, which are described below.
- We are looking for students to investigate new CSCW (computer supported collaborative work) techniques, and to implement them within genSpace. Sample projects include:
- Users of geWorkbench often obtain their datasets from a repository provided by NCBI. We would like to provide an ability to search for and download NCBI datasets from within genSpace (a webservices API is available). genSpace should also allow users to organize their datasets on their computer, storing annotated information with each (ex, source of the file, experiments that it’s used for, etc). This project will be developed with Glassfish and JAX-WS.
- genSpace currently tracks all activity that users perform within geWorkbench. We would like to provide users with an ability to annotate each analysis that they perform with notes. There must also be an interface for sharing these notes with other users within their networks, and reviewing past notes. This project will be developed with Glassfish and JAX-WS.
- genSpace is currently a component within geWorkbench. We would like to abstract it to also function as a component in other applications, like Bioclipse (and other Eclipse-based workbenches)
- genSpace currently provides recommendations and tool information only for users who have an active connection to the genSpace server. We are seeking project students to build a client-side caching system. Candidates must have demonstrated experience with Java.
Contact: Jonathan Bell (jbell@cs.columbia.edu)
Recommender Systems and Privacy – As recommender systems like Amazon, Netflix, etc. become popular, privacy of recommendations is becoming a major concern. For this project, we are looking for students to study the software architectures of various recommender systems with a focus on privacy and privacy laws and compare them to the state of the art in academia. Applicants must be proficient in software engineering and should have received a grade of A- or higher in COMS 4156 Advanced Software Engineering (or similar) and should have a background in software design and privacy. Contact: Swapneel Sheth, swapneel@cs.columbia.edu.
Privacy User Study – Related to the above, we’re also looking for project students to do a user study on privacy concerns relevant to software engineers. Applicants should be have a background in user studies, possibly from a previous class such as COMS 4170 User Interface Design. Contact: Swapneel Sheth, swapneel@cs.columbia.edu.
Software Reliability of Deployed Systems – We are seeking to develop new approaches to detecting defects in fielded systems that are not possible to fully test and debug in the lab – which includes virtually every complex software system. One promising approach is for an application to automatically test itself while it is running in the deployment environment. This kind of testing can done at the system level, e.g., to detect configuration errors peculiar to the installation, or at the internal unit (function or class) level, which can detect a much wider range of problems by running crafted test cases in the varied application states reached during customer operation. We call the latter in vivo testing (‘in vivo’ refers to inside the living organism – the software system – in the field as opposed to ‘in vitro’ in the glass, i.e., the development lab). We have previously developed a proof-of-concept implementation of an in vivo framework that required source code modification to insert test case instrumentation. We now plan to implement a more robust, higher performance framework using dynamic binary instrumentation, and apply the approach more generally, including to security concerns (see below) and concurrent systems. We also need to develop a more concrete approach to devising the in vivo test cases. Applicants should be proficient in software engineering and systems programming. Contact: Jon Bell, jbell@cs.columbia.edu.
Testing Nontestable Programs - We are investigating how to detect bugs in so-called non-testable programs, where there is no test oracle to determine whether or not a given output is correct for an arbitrary input. This problem is common in machine learning, data mining, information retrieval, simulation, optimization and scientific computing applications. Our current approach is based on ‘metamorphic testing’, a general technique for creating follow-up test cases based on existing ones, particularly those that have not revealed any failure, in order to try to uncover flaws. Instead of being an approach for test case selection, it is a methodology of reusing input test data to create additional test cases whose outputs can be predicted. In metamorphic testing, if input x produces an output f(x), the function’s metamorphic properties can then be used to guide the creation of a transformation function t, which can then be applied to the input to produce t(x); this transformation then allows us to predict the output f(t(x)), based on the (already known) value of f(x). If the output is not as expected, then a defect must exist. Of course, this can only show the existence of defects and cannot demonstrate their absence, since the correct output cannot be known in advance (and even if the outputs are as expected, both could be incorrect), but metamorphic testing provides a powerful technique to reveal defects in otherwise nontestable programs by use of a built-in pseudo-oracle. We have applied this approach to a wide range of nontestable programs and indeed found many conventional bugs, such as off-by-one errors. We now plan to extend this technique to address application and system state, both in-memory and the file system, in combination with in vivo testing above. Applicants should be proficient in software engineering and systems programming. Contact: Jon Bell, jbell@cs.columbia.edu.
Treating Security Vulnerabilities as a Class of Nontestable Programs - We seek to explore whether and how the techniques developed for nontestable programs, see above, might apply to finding security vulnerabilities. Here the target programs may indeed be testable in the conventional sense, for finding more mundane errors regarding their functionality, but lack an appropriate specification to determine whether an arbitrary input (or series of inputs) violates security requirements. Applicants should be proficient in software engineering and systems programming. Contact: Jon Bell, jbell@cs.columbia.edu.
