Number of co-authors:11
Number of publications with 3 favourite co-authors:Nick Koudas:Divesh Srivastava:Beibei Li:
Panagiotis G. Ipeirotis's 3 most productive colleagues in number of publications:Luis Gravano:19Kenneth R. Wood:17Divesh Srivastava:8
go to course
87% booked. Starts in 8 days
go to course
User Experience: The Beginner's Guide
86% booked. Starts in 9 days
Marc Hassenzahl explains the fascinating concept of User Experience and Experience Design. Commentaries by Don Norman, Eric Reiss, Mark Blythe, and Whitney Hess
User Experience and Experience Design !
Our Latest Books
The Social Design of Technical Systems: Building technologies for communities. 2nd Edition
by Brian Whitworth and Adnan Ahmad
Gamification at Work: Designing Engaging Business Software
by Janaki Mythily Kumar and Mario Herger
The Social Design of Technical Systems: Building technologies for communities
by Brian Whitworth and Adnan Ahmad
The Encyclopedia of Human-Computer Interaction, 2nd Ed.
by Mads Soegaard and Rikke Friis Dam
Panagiotis G. Ipeirotis
Current place of employment: New York University
Panagiotis G. Ipeirotis joined NYU Stern in Fall 2004 as an
Assistant Professor of Information Systems in the Information, Operations, &
Management Sciences Department. His area of expertise is databases and
information retrieval, with an emphasis on management of textual data. His
research interests include web searching, text and web mining, data cleaning and
data integration. He received his Ph.D. degree in Computer Science from Columbia
University in 2004. He also received an M.Sc. degree in Computer Science from
Columbia University in 2001 and a B.Sc. degree from the Computer Engineering and
Informatics Department (CEID) of the University of Patras, Greece in 1999.
Publications by Panagiotis G. Ipeirotis (bibliography)
Li, Beibei, Ghose, Anindya and Ipeirotis, Panagiotis G. (2011): Towards a theory model for product search. In: Proceedings of the 2011 International Conference on the World Wide Web 2011. pp. 327-336. Available online
With the growing pervasiveness of the Internet, online search for products and services is constantly increasing. Most product search engines are based on adaptations of theoretical models devised for information retrieval. However, the decision mechanism that underlies the process of buying a product is different than the process of locating relevant documents or objects. We propose a theory model for product search based on expected utility theory from economics. Specifically, we propose a ranking technique in which we rank highest the products that generate the highest surplus, after the purchase. In a sense, the top ranked products are the "best value for money" for a specific user. Our approach builds on research on "demand estimation" from economics and presents a solid theoretical foundation on which further research can build on. We build algorithms that take into account consumer demographics, heterogeneity of consumer preferences, and also account for the varying price of the products. We show how to achieve this without knowing the demographics or purchasing histories of individual consumers but by using aggregate demand data. We evaluate our work, by applying the techniques on hotel search. Our extensive user studies, using more than 15,000 user-provided ranking comparisons, demonstrate an overwhelming preference for the rankings generated by our techniques, compared to a large number of existing strong state-of-the-art baselines.
© All rights reserved Li et al. and/or ACM Press
Li, Beibei, Ghose, Anindya and Ipeirotis, Panagiotis G. (2011): A demo search engine for products. In: Proceedings of the 2011 International Conference on the World Wide Web 2011. pp. 233-236. Available online
Most product search engines today build on models of relevance devised for information retrieval. However, the decision mechanism that underlies the process of buying a product is different than the process of locating relevant documents or objects. We propose a theory model for product search based on expected utility theory from economics. Specifically, we propose a ranking technique in which we rank highest the products that generate the highest surplus, after the purchase. We instantiate our research by building a demo search engine for hotels that takes into account consumer heterogeneous preferences, and also accounts for the varying hotel price. Moreover, we achieve this without explicitly asking the preferences or purchasing histories of individual consumers but by using aggregate demand data. This new ranking system is able to recommend consumers products with "best value for money" in a privacy-preserving manner. The demo is accessible at http://nyuhotels.appspot.com/
© All rights reserved Li et al. and/or ACM Press
Ipeirotis, Panagiotis G. and Paritosh, Praveen K. (2011): Managing crowdsourced human computation: a tutorial. In: Proceedings of the 2011 International Conference on the World Wide Web 2011. pp. 287-288. Available online
The tutorial covers an emerging topic of wide interest: Crowdsourcing. Specifically, we cover areas of crowdsourcing related to managing structured and unstructured data in a web-related content. Many researchers and practitioners today see the great opportunity that becomes available through easily-available crowdsourcing platforms. However, most newcomers face the same questions: How can we manage the (noisy) crowds to generate high quality output? How to estimate the quality of the contributors? How can we best structure the tasks? How can we get results in small amounts of time and minimizing the necessary resources? How to setup the incentives? How should such crowdsourcing markets be setup? Their presented material will cover topics from a variety of fields, including computer science, statistics, economics, and psychology. Furthermore, the material will include real-life examples and case studies from years of experience in running and managing crowdsourcing applications in business settings.
© All rights reserved Ipeirotis and Paritosh and/or ACM Press
Ipeirotis, Panagiotis G. and Gravano, Luis (2008): Classification-aware hidden-web text database selection. In ACM Transactions on Information Systems, 26 (2) p. 6. Available online
Many valuable text databases on the web have noncrawlable contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over multiple such "hidden-web" text databases at once through a unified query interface. An important step in the metasearching process is database selection, or determining which databases are the most relevant for a given user query. The state-of-the-art database selection techniques rely on statistical summaries of the database contents, generally including the database vocabulary and associated word frequencies. Unfortunately, hidden-web text databases typically do not export such summaries, so previous research has developed algorithms for constructing approximate content summaries from document samples extracted from the databases via querying. We present a novel "focused-probing" sampling algorithm that detects the topics covered in a database and adaptively extracts documents that are representative of the topic coverage of the database. Our algorithm is the first to construct content summaries that include the frequencies of the words in the database. Unfortunately, Zipf's law practically guarantees that for any relatively large database, content summaries built from moderately sized document samples will fail to cover many low-frequency words; in turn, incomplete content summaries might negatively affect the database selection process, especially for short queries with infrequent words. To enhance the sparse document samples and improve the database selection decisions, we exploit the fact that topically similar databases tend to have similar vocabularies, so samples extracted from databases with a similar topical focus can complement each other. We have developed two database selection algorithms that exploit this observation. The first algorithm proceeds hierarchically and selects the best categories for a query, and then sends the query to the appropriate databases in the chosen categories. The second algorithm uses "shrinkage," a statistical technique for improving parameter estimation in the face of sparse data, to enhance the database content summaries with category-specific words. We describe how to modify existing database selection algorithms to adaptively decide (at runtime) whether shrinkage is beneficial for a query. A thorough evaluation over a variety of databases, including 315 real web databases as well as TREC data, suggests that the proposed sampling methods generate high-quality content summaries and that the database selection algorithms produce significantly more relevant database selection decisions and overall search results than existing algorithms.
© All rights reserved Ipeirotis and Gravano and/or ACM Press
Dakka, Wisam, Gravano, Luis and Ipeirotis, Panagiotis G. (2008): Answering general time sensitive queries. In: Shanahan, James G., Amer-Yahia, Sihem, Manolescu, Ioana, Zhang, Yi, Evans, David A., Kolcz, Aleksander, Choi, Key-Sun and Chowdhury, Abdur (eds.) Proceedings of the 17th ACM Conference on Information and Knowledge Management - CIKM 2008 October 26-30, 2008, Napa Valley, California, USA. pp. 1437-1438. Available online
Ghose, Anindya and Ipeirotis, Panagiotis G. (2007): Designing novel review ranking systems: predicting the usefulness and impact of reviews. In: Gini, Maria L., Kauffman, Robert J., Sarppo, Donna, Dellarocas, Chrysanthos and Dignum, Frank (eds.) Proceedings of the 9th International Conference on Electronic Commerce - ICEC 2007 August 19-22, 2007, Minneapolis, MN, USA. pp. 303-310. Available online
Dakka, Wisam, Ipeirotis, Panagiotis G. and Wood, Kenneth R. (2005): Automatic construction of multifaceted browsing interfaces. In: Herzog, Otthein, Schek, Hans-Jorg and Fuhr, Norbert (eds.) Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management October 31 - November 5, 2005, Bremen, Germany. pp. 768-775. Available online
Gravano, Luis, Ipeirotis, Panagiotis G. and Sahami, Mehran (2003): QProber: A system for automatic classification of hidden-Web databases. In ACM Transactions on Information Systems, 21 (1) pp. 1-41. Available online
The contents of many valuable Web-accessible databases are only available through search interfaces and are hence invisible to traditional Web "crawlers." Recently, commercial Web sites have started to manually organize Web-accessible databases into Yahoo!-like hierarchical classification schemes. Here we introduce QProber, a modular system that automates this classification process by using a small number of query probes, generated by document classifiers. QProber can use a variety of types of classifiers to generate the probes. To classify a database, QProber does not retrieve or inspect any documents or pages from the database, but rather just exploits the number of matches that each query probe generates at the database in question. We have conducted an extensive experimental evaluation of QProber over collections of real documents, experimenting with different types of document classifiers and retrieval models. We have also tested our system with over one hundred Web-accessible databases. Our experiments show that our system has low overhead and achieves high classification accuracy across a variety of databases.
© All rights reserved Gravano et al. and/or ACM Press
Gravano, Luis, Ipeirotis, Panagiotis G., Koudas, Nick and Srivastava, Divesh (2003): Text joins in an RDBMS for web data integration. In: Proceedings of the 2003 International Conference on the World Wide Web 2003. pp. 90-101. Available online
The integration of data produced and collected across autonomous, heterogeneous web services is an increasingly important and challenging problem. Due to the lack of global identifiers, the same entity (e.g., a product) might have different textual representations across databases. Textual data is also often noisy because of transcription errors, incomplete information, and lack of standard formats. A fundamental task during data integration is matching of strings that refer to the same entity. In this paper, we adopt the widely used and established cosine similarity metric from the information retrieval field in order to identify potential string matches across web sources. We then use this similarity metric to characterize this key aspect of data integration as a join between relations on textual attributes, where the similarity of matches exceeds a specified threshold. Computing an exact answer to the text join can be expensive. For query processing efficiency, we propose a sampling-based join approximation strategy for execution in a standard, unmodified relational database management system (RDBMS), since more and more web sites are powered by RDBMSs with a web-based front end. We implement the join inside an RDBMS, using SQL queries, for scalability and robustness reasons. Finally, we present a detailed performance evaluation of an implementation of our algorithm within a commercial RDBMS, using real-life data sets. Our experimental results demonstrate the efficiency and accuracy of our techniques.
© All rights reserved Gravano et al. and/or ACM Press
Ipeirotis, Panagiotis G., Barry, Tom and Gravano, Luis (2002): Extending SDARTS: extracting metadata from web databases and interfacing with the open archives initiative. In: JCDL02: Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries 2002. pp. 162-170. Available online
SDARTS is a protocol and toolkit designed to facilitate metasearching. SDARTS combines two complementary existing protocols, SDLIP and STARTS, to define a uniform interface that collections should support for searching and exporting metasearch-related metadata. SDARTS also includes a toolkit with wrappers that are easily customized to make both local and remote document collections SDARTS-compliant. This paper describes two significant ways in which we have extended the SDARTS toolkit. First, we have added a tool that automatically builds rich content summaries for remote web collections bym probing the collections with appropriate queries. These content summaries can then be used by a metasearcher to select over which collections to evaluate a given query. Second, we have enhanced the SDARTS toolkit so that all SDARTS-compliant collections export their metadata under the emerging Open Archives Initiative (OAI) protocol. Conversely, the SDARTS toolkit now also allows all OAI-compliant collections to be made SDARTS-compliant with minimal effort. As a result, we implemented a bridge between SDARTS and OAI, which will facilitate easy interoperability among a potentially large number of collections. The SDARTS toolkit, with all related documentation and source code, is publicly available at http://sdarts.cs.columbia.edu.
© All rights reserved Ipeirotis et al. and/or ACM Press
Green, Noah, Ipeirotis, Panagiotis G. and Gravano, Luis (2001): SDLIP + STARTS = SDARTS A Protocol and Toolkit for Metasearching. In: JCDL01: Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries 2001. pp. 207-214. Available online
In this paper we describe how we combined SDLIP and STARTS, two complementary protocols for searching over distributed document collections. The resulting protocol, which we call SDARTS, is simple yet expressible enough to enable building sophisticated metasearch engines. SDARTS can be viewed as an instantiation of SDLIP with metasearch-specific elements from STARTS. We also report on our experience building three SDARTS-compliant wrappers: for locally available plain-text document collections, for locally available XML document collections, and for external web-accessible collections. These wrappers were developed to be easily customizable for new collections. Our work was developed as part of Columbia University's Digital Libraries Initiative--Phase 2 (DLI2) project, which involves the departments of Computer Science, Medical Informatics, and Electrical Engineering, the Columbia University libraries, and a large number of industrial partners. The main goal of the project is to provide personalized access to a distributed patient-care digital library.
© All rights reserved Green et al. and/or ACM Press
Ipeirotis, Panagiotis G., Gravano, Luis and Sahami, Mehran (2001): PERSIVAL Demo: Categorizing Hidden-Web Resources. In: JCDL01: Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries 2001. p. 454. Available online
Join our community and advance:
Page maintainer: The Editorial Team