genSpace
About genSpace
Many researchers in the field of computational biology use the geWorkbench tool, a Java-based open-source platform for integrated genomics. Using a component architecture it allows individually developed plug-ins to be configured into complex bioinformatic applications. At present there are more than 70 available plug-ins supporting the visualization and analysis of gene expression and sequence data, however (without genSpace) it is purely a stand-alone application and has very limited collaboration capabilities.
This project, started in Fall 2007, is designed to develop proof-of-concept collaboration facilities suitable for inclusion in geWorkbench and, at least in principle, other plugin-based tool integration environments. The project aims to include a few of the well-known collaboration mechanisms, possibly including but not necessarily limited to instant messaging (as in AIM), social networking (as in Facebook), knowledge repositories (as in Usenet newsgroups) and shared folders (as in Webdav). However, only those existing collaboration mechanisms that will in some manner facilitate new research towards novel mechanisms will be chosen.
Some basic functionality has already been implemented: logging which analysis tools users run, the ability to log in to genSpace, rudimentary integration with the jClaim chat client, and a server-side instant messaging bot that allows for simple user interaction, such as the creation of social networks.
In Spring 2008, we integrated all of this functionality into an end-to-end system which allows all geWorkbench users to log into genSpace, communicate with other users, find experts on the tools they are using, and find other researchers who are similar to them. We also built visualization tools that allow users to see their social networks, and to see “collaboratively created workflows”, which are automatically generated based on other users’ activities.
A paper about our work was published at the Social Software Engineering and Applications (SoSEA) workshop colocated with ASE in 2008. In 2010, our work was published at the Social Software Engineering (SSE) workshop colocated with SE and at the Recommender Systems for Software Engineering (RSSE) workshop colocated with ICSE. In 2011, our work was published at the International Conference on Data Engineering and Internet Technology (DEIT). The links for our papers can be found below.
In Summer 2008, we began the development of a recommendation system in which genSpace can recommend actions based on what has been observed. We also implemented an admin user interface and performed further enhancement of some of the server-side features.
In Fall 2008, we implemented many new features which make use of our recommendation system. Users can view popular workflows which include or start with a particular tool. e.g., A user can choose to see all the workflows which involve ARACNE. Users can also view overall geWorkbench statistics such as the top 3 most popular tools of all time and the most popular workflows. We also added a Real Time Workflow Suggestion feature where genSpace provides real-time suggestions depending on the users’s activities. e.g., If a user has run analysis A followed by analysis B, genSpace can suggest that the next best tool to use is C and that the historical superflows which start with A and B are A, B, C, D and A, B, C, F. In addition, users can now rate and comment on different tools and workflows. The users can rate the tools using the genSpace plugin in geWorkbench. The plugin also provides a link to a web application where users can read and post comments. The web application is written in PHP and Symfony.
In Summer 2009, we evaluated our cache implementation for the user requests and compared it against using pure SQL.
In Fall 2009, we implemented an XMPP chat server for users logged into genSpace to chat with each other. We also created a proof-of-concept implementation for users to share individual workflows and geWorkbench windows (screen sharing similar to web conference systems) with other users.
Over 2010, we implemented a user interface that allows users to perform actions such as joining social networks, seeing who is logged in, and getting advice and recommendations from the system. This is integrated with geWorkbench and serves as an alternative to using an IM client for working with social networks.
During 2011, we upgraded the genSpace recommendation engine to be based on the popular Mahout library. We also implemented a new SOAP-based architecture for genSpace based on the JAX/WS, enhancing the compatibility of the genSpace protocol. Additionally, we created a web-based usage reporting tool to allow geWorkbench developers to gain insight into how researchers are using geWorkbench, providing suggestions to focus development efforts. Finally, we created systems for sharing workflows between users, and suggesting data sets.
In 2012, we created a research notebook that is integrated with the analysis-logging functionality of genSpace, allowing researchers to annotate their log files with notes regarding each analysis.
For 2013, we are continuing to investigate the possibility of creating recommendations for datasets, as well as the potential privacy concerns that this would imply. We are also looking to understand the software engineering implications of retrofitting social networking capabilities onto standalone applications, e.g., caching, fault tolerance, privacy, etc.
This research is in collaboration with the Center for the Multiscale Analysis of Genomic and Cellular Networks (MAGNet) on the Columbia University Health Sciences campus, which is funded by NIH and NCI.
Team Members
Faculty
Prof. Gail Kaiser, kaiser [at] cs.columbia.edu
PhD Students
Fang-Hsiang (Mike) Su, mikefhsu [at] cs.columbia.edu
Masters Students
Nikhil Sarda, ns2847 [at] columbia.edu
Former PhD Students
Jon Bell, jbell [at] cs.columbia.edu
Swapneel Sheth, swapneel [at] cs.columbia.edu
Chris Murphy, cmurphy [at] cs.columbia.edu
Former project students
Diana Chang
Anureet Dhillon
Gowri Kanugovi
Mayur Lodha
Koichiro Matsunaga
Lakshmi Nadig
Joshua Nankin
Cheng Niu
Gaurav Pandey
Hyuksoo Seo
Yuan Wang
Eric Schmidt
Nan Luo
Danielle Cauthen
Flavio Antonelli
Ning Yu
Jason Halpern
Evgeny Fedetov
Aditya Bir
Alison Yang
Links
Papers, Presentations, etc.
DEIT 2011 paper and slides – “Towards using Cached Data Mining for Large Scale Recommender Systems”
RSSE 2010 paper and poster – “The weHelp Reference Architecture for Community-Driven Recommender Systems”
C2B2 retreat posters (1 and 2), April 2010
SSE 2010 paper and workshop presentation – “weHelp: A Reference Architecture for Social Recommender Systems”
C2B2 retreat presentation and poster, March 2009
SoSEA 2008 paper and workshop presentation – “genSpace: Exploring Social Networking Metaphors for Knowledge Sharing and Scientific Collaborative Work”
C2B2 retreat presentation and poster, April 2008
Documentation
genSpace wiki
geWorkbench wiki
C2B2 project management wiki
Source Code
geWorkbench repository (login required)
Contact: Fang-hsiang (Mike) Su
Available student project positions:
We are not currently actively seeking new students for this project.
