Meet team 4: The SLIF Team
Amr Ahmed - Carnegie Mellon University, United States
Andrew Arnold - Carnegie Mellon University, United States
Luís Pedro Coelho - Joint Carnegie Mellon University - University of Pittsburgh PhD. Program in Computational Biology, United States
Saboor Sheikh - Carnegie Mellon University, United States
Eric Xing - Carnegie Mellon University, United States
William Cohen - Carnegie Mellon University, United States
Robert F. Murphy - Joint Carnegie Mellon University - University of Pittsburgh PhD. Program in Computational Biology, United States
Project Title: Structured Literature Image Finder
Project Website: http://murphylab.web.cmu.edu/services/SLIF/
>> Academic background of team members
Amr Ahmed is originally from Egypt. He studied Computer Engineering at the University of Cairo in Egypt where he got his BSc and MSc degrees.
He is currently pursuing his PhD in the Language Technologies Institute and his secondary Master degree in the Machine Learning Department, both within the School of Computer Science, Carnegie Mellon University. His advisor is Dr. Xing
Andrew Arnold, born in Los Angeles, CA, received his bachelor's degree in computer science from Columbia University in the City of New York. He is currently pursuing his Ph.D. in machine learning at Carnegie Mellon University, working with his advisor, Dr. Cohen, on robust methods for learning from text.
Luís Pedro Coelho was born in the coast city of Southampton but grew up in Lisbon. He studied computer science at the Technical University of Lisbon
and spent one year in Vienna as an exchange student. Although Luís Pedro had initially planned to graduate and move on to an industry job, intellectual curiosity led him to enroll in graduate courses as electives and obtain an M.S. He wrote his dissertation about theoretical work on machine learning problems
(how to handle noisy data for certain parameter estimation problems). At the
same time, he came into contact with people working on bioinformatics and
starting attending their research presentations. He was attracted by the large, complex, unsolved problems that were both theoretically challenging and of huge practical importance as they cross over to medicine. He therefore entered the Joint Carnegie Mellon University-University of Pittsburgh Ph.D. program in Computational Biology. He hopes to build up his skills and knowledge so as to be able to become a future researcher with an impact. His advisor is Dr. Murphy.
Reminded always of Mark Twain's advice to not let schooling interfere with
one's education, Luís Pedro has taken care to participate in a mix of side projects
and hobbies. He has participated in the university's theatre group with which
he traveled to international festivals to play. He has received one prize in
a short story competition and also written open source software for the KDE
project. He has taught computer usage in underprivileged neighbourhoods and
designed webpages for nonprofits.
Luís Pedro gave the presentation on behalf of the SLIF team during the semi-final round.
Joshua Kangas received his Bachelor of Science degree in Computer Science from Truman State University in Kirksville, Missouri. He is a first-year student in the Joint Carnegie Mellon University-University of Pittsburgh Ph.D. program in Computational Biology, working with Dr. Murphy.
Abdul-Saboor Sheikh was born in Kuwait City, Kuwait but grew up in Karachi, Pakistan. He obtained his bachelors in computer science from Sir Syed University of Engineering & Technology in Karachi. Later he went to Germany for graduate studies and obtained his MS in data and knowledge engineering from Otto-von-Guericke University Magdeburg. Currently he is working as a research programmer for Dr. Murphy in the Center for Bioimage Informatics at Carnegie Mellon University.
Eric Xing is an assistant professor in the Machine Learning Department, the Language Technologies Institute, and the Computer Science Department within the School of Computer Science at Carnegie Mellon University. His principal research interests lie in the development of machine learning and statistical methodology; especially for building quantitative models and predictive understandings of the evolutionary mechanism, regulatory circuitry, and developmental processes of biological systems; and for building computational intelligence systems involving automated learning, reasoning, and decision-making in open, evolving possible worlds. Professor Xing received his B.S. in Physics from Tsinghua University, his first Ph.D. in Molecular Biology and Biochemistry from Rutgers University, and then his second Ph.D. in Computer Science from UC Berkeley. He has been a member of the faculty at CMU since 2004, and his current work involves, 1) graphical models, Bayesian methodologies, inference algorithms, and optimization techniques for analyzing and mining high-dimensional, longitudinal, and relational data; 2) computational and comparative genomic analysis of biological sequences, systems biology investigation of gene regulation, and statistical analysis of genetic variation, demography and disease linkage; and 3) application of statistical learning in social networks, text/image mining, vision, and machine translation. He is a recipient of the NSF Career Award, and the Sloan Research Fellowship in Computer Science.
William Cohen received his bachelor's degree in Computer Science from Duke University in 1984, and a PhD in Computer Science from Rutgers University in 1990. From 1990 to 2000 Dr. Cohen worked at AT&T Bell Labs and later AT&T Labs-Research, and from April 2000 to May 2002 Dr. Cohen worked at Whizbang Labs, a company specializing in extracting information from the web. Dr. Cohen is member of the board of the International Machine Learning Society, is an Associate Editor for the journal Artificial Intelligence, and has served as an action editor for the Journal of Machine Learning Research, the journal Machine Learning and the Journal of Artificial Intelligence Research. He was Program Co-Chair of the 1994 and 2006 International Machine Learning Conference, and General Chair of the 2008 Conference, and has served on more than 20 program committees or advisory committees.
Dr. Cohen's research interests include information integration and machine learning, particularly information extraction, text categorization and learning from large datasets. He holds seven patents related to learning, discovery, information retrieval, and data integration, and is the author of more than 100 publications.
Robert F. (Bob) Murphy is the Ray and Stephanie Lane Professor of Computational Biology and Director of the Ray and Stephanie Lane Center for Computational Biology at Carnegie Mellon University. He also is Professor of Biological Sciences, Biomedical Engineering, and Machine Learning. He directs (with Ivet Bahar) the joint Carnegie Mellon University-University of Pittsburgh Ph.D. Program in Computational Biology. In 2003 he obtained a major grant from the National Science Foundation to found the Center for Bioimage Informatics at Carnegie Mellon (of which he and Jelena Kovacevic were the initial Directors). From 2005-2007, he served as the first full-term chair of NIH’s Biodata Management and Analysis Study Section. He was named a Fellow of the American Institute for Medical and Biological Engineering in 2006, and he received an Alexander von Humboldt Foundation Research Award in 2008. Dr. Murphy has received research grants from the National Institutes of Health, the National Science Foundation, the American Cancer Society, the American Heart Association, the Arthritis Foundation, and the Rockefeller Brothers Fund. He has co-edited two books and published over 150 research papers. He is President of the International Society for Advancement of Cytometry and is on the Editorial Boards of Cytometry and the Journal of Proteome Research. He was named as the first External Senior Fellow of the School of Life Sciences in the Freiburg (Germany) Institute for Advanced Studies in 2008.
Dr. Murphy’s career has centered on combining fluorescence-based cell measurement methods with quantitative and computational methods. His group at Carnegie Mellon pioneered the application of machine learning methods to high-resolution fluorescence microscope images depicting subcellular location patterns in the mid 1990’s. This work led to the development of the first systems for automatically recognizing all major organelle patterns in 2D and 3D images. He currently leads NIH-funded projects for proteome-wide determination of subcellular location in 3T3 cells (with Peter Berget and Jonathan Jarvik) and continued development of the SLIF system for automated extraction of information from text and images in online journal articles (with William Cohen and Eric Xing). He has authored or co-authored more than 150 publications.
Eric, William and Bob are also members of the faculty of the Lane Center for Computational Biology.
Our project comprises two phases, information extraction from biological
articles and then synthesis of the extracted information in a form that
eases visualization and semantic retrieval.
Amr mainly worked on the second phase where he utilizes probabilities topic models to discover themes in the extracted information. These themes summarize the collection, and provide a structured way of browsing and managing the collection, in addition to enabling retrieval queries for similar information (like figures, panels, or papers).
Andrew worked mostly on the caption text processing, especially improving named entity recognition, matching panels with caption text, and finding information on scale bar dimensions.
Luis worked on developing robust classification of sub-cellular location from
fluorescent micrograph images. His solution was to use active learning to generate a hand-labeled dataset on which a classifier could be learned that achieved maximal performance.
Josh worked on augmenting the image recognition system to recognize other image types such as gels or non-FMI micrographs. He addressed this problem using active learning to determine which images would be most beneficial to label by hand. Using these labeled images and their associated features, classifiers were trained for various image categories. He also worked to match panel serial labels with panel images in order to improve panel classification through the inclusion of caption scope information.
Saboor was responsible for implementation of many of the methods used, especially the paper processing pipeline that creates the SLIF database and the web application that provides access to it.
Eric, William, and Bob provided advice and direction for the project.
>> What was the problem you were trying to solve, and how does your
solution address it?
The overall problem we were trying to address is the ability to find relevant information from the combination of image and text in life science literature. One approach to this problem we implemented was to use figures as a proxy in helping biologists find relevant life science articles of interest. Our approach comprises two phases: an information extraction phase that builds a layered representation of the papers (the figures, their constituents panels, biological entities mentioned in the figures, etc.) and then a knowledge synthesis phase that takes this layered representation and discovers latent themes (that we call topics) in the collection. These themes serve as the basis for visualization and semantic retrieval. It gives the biologist a bird’s-eye view of the collection and guides them in structurally browsing the otherwise unstructured collection of articles.
INTERVIEW WITH TEAM 4
>> Current research interests related to Elsevier Grand Challenge
We are working on the SLIF (Subcellular Location Image Finder) project, which was the first system to extract information from both text and images in biological journal articles. A major goal of SLIF (which is supported by a grant from NIH) is utilizing figures to provide better searching, better summarization, and better information extraction from biological literature
>> Why were you inspired to enter the Grand Challenge?
Our focus on maximizing information extraction led us to enter the Grand Challenge.
>> What do you see as the greatest challenge in finalizing your Grand Challenge project? (whether substantive, logistical, team composition, working solo etc.)
The greatest challenge is dealing with the extreme variation in the way scientific data are represented in figures, even for just a single type of figure panel like a microscope image. A close second is the absence of standard practices for describing what protein (or other
macromolecule) is depicted in a particular image.
>> What would you do with the prize money?
If we won, we would use the prize money for subscriptions to Elsevier journals (or something else)!
>> Does your team have a name? If not, what would best personify your Grand Challenge equip?
Our SLIF team is best personified by being willing to tackle very hard problems!
Cookies are set by this site. To decline them or learn more, visit our Cookies page.