Wednesday, September 7, 2011

Paper Reading #4: Gestalt

Gestalt: Integrated Support for Implementation and Analysis in Machine Learning


Kayur Patel, Naomi Bancroft, Steven M. Drucker, James Fogarty, Andrew J. Ko, James Landay


Kayur Patel is a PhD student at the University of Washington and focuses on research related to machine learning.
Naomi Bancroft is a recent graduate of the University of Washington and currently works for Google.
Steven M. Drucker is a Principal Researcher at Microsoft Research and also acts as an affiliate professor at the University of Washington.
James A. Fogarty is an assistant professor at the University of Washington.
Andrew J. Ko is an assistant professor at the University of Washington and has a PhD from Carnegie Mellon University.
James Landay is a professor at the University of Washington specializing in human-computer interaction and previously worked at Intel.


This paper was presented at the UIST '10 Proceedings of the 23rd annual ACM symposium on User interface software and technology.


Summary


In this paper, the researchers discuss how machine learning can be used to improve the software space and create a tool that makes the task of implementing said methods into new software. Machine learning makes many things easier when programming but requires a different set of skill to do so (debugging vs. analyzing) leading to its slow adoption in industry. The hypothesis proposed is that if machine learning were made easier to adapt to problems, then programmers will use said technology to solve problems in better ways than ever.


Gestalt is the product being introduced in this paper as an IDE that allows developers to easily create machine learning applications using movie review analysis and pen-based gesture analysis as examples. The classification pipeline commonly found in machine learning programs was used as a basis for the design of the IDE. This pipeline consists of the information gathering, parsing, attribute generation, training, and testing steps.


Gestalt is designed with traditional IDEs in mind so as to make adoption easy for developers familiar with similar products, such as Eclipse, Microsoft Visual Studio, etc. The implementation perspective is identical to how other IDEs work in that a file can be opened, modified, and run from this perspective. The difference is that in addition to traditional programming support, all steps in the classification pipeline are made editable so that users can modify certain tasks to fit their needs. For each step in the pipeline, both an implementation and analysis perspective exists to allow developers to see what is being changed on the analysis side when implementation is changed.


Gestalt is different then what is already being used for machine learning implementation because similar programs often only support either implementation or analysis perspectives or contain both but does not give the developer enough flexibility to change attributes being observed from data. 


Testing for the effectiveness of Gestalt consisted of observing 8 participants debug machine learning applications in both a basline and Gestalt manner. The difference in bugs found was the variable being measured. Solutions to the problems were created beforehand and bugs were inserted for the participants to find. The results showed a significant increase in the number of bugs found using Gestalt than standard baseline debugging.


The researchers concluded that having an IDE like Gestalt, that able to implement a classification pipeline and analyze data as it moves through the pipeline, greatly contributes to the success of developers solving machine learning problems and improves upon what is already capable using the domain-specific products being used currently.


Presentation by the developers


Discussion


I think the researchers achieved what was being attempted and effectively showed that the product improved upon what developers can do when debugging their machine learning programs. With advancements like this making machine learning more practical, it is conceivable that machine learning will continue growing in popularity and begin to be used in industry as a solution not just an experiment. This paper is interesting because it tackles one of the most common problems found today which is how to make programming easier for the developer. Future work in this field is almost guaranteed and will be integral in finding user generated solutions in a user generated world.

No comments:

Post a Comment