Publication statistics

Pub. period:2000-2009
Pub. count:12
Number of co-authors:27



Co-authors

Number of publications with 3 favourite co-authors:

Steve Lawrence:5
Robert Krovetz:5
Clyde Lee Giles:4

 

 

Productive colleagues

David M. Pennock's 3 most productive colleagues in number of publications:

Clyde Lee Giles:71
Steve Lawrence:20
Hemant K. Bhargava:17
 
 
 

Upcoming Courses

go to course
User-Centred Design - Module 2
Starts tomorrow LAST CALL!
go to course
Design Thinking: The Beginner's Guide
Starts the day after tomorrow !
 
 

Featured chapter

Marc Hassenzahl explains the fascinating concept of User Experience and Experience Design. Commentaries by Don Norman, Eric Reiss, Mark Blythe, and Whitney Hess

User Experience and Experience Design !

 
 

Our Latest Books

 
 
The Social Design of Technical Systems: Building technologies for communities. 2nd Edition
by Brian Whitworth and Adnan Ahmad
start reading
 
 
 
 
Gamification at Work: Designing Engaging Business Software
by Janaki Mythily Kumar and Mario Herger
start reading
 
 
 
 
The Social Design of Technical Systems: Building technologies for communities
by Brian Whitworth and Adnan Ahmad
start reading
 
 
 
 
The Encyclopedia of Human-Computer Interaction, 2nd Ed.
by Mads Soegaard and Rikke Friis Dam
start reading
 
 

David M. Pennock

Add description
Rename / change spelling
Add publication
 

Publications by David M. Pennock (bibliography)

 what's this?
2009
 
Edit | Del

Feigenbaum, Joan, Parkes, David C. and Pennock, David M. (2009): Computational challenges in e-commerce. In Communications of the ACM, 52 (1) pp. 70-74.

2005
 
Edit | Del

Mangold, Bernard, Dooley, Mike, Flake, Gary William, Hoffman, Havi, Kasturi, Tejaswi, Pennock, David M. and Dornfest, Rael (2005): The Tech Buzz Game. In IEEE Computer, 38 (7) pp. 94-97.

2004
 
Edit | Del

Park, Seung-Taek, Pennock, David M., Giles, Clyde Lee and Krovetz, Robert (2004): Analysis of lexical signatures for improving information persistence on the World Wide Web. In ACM Transactions on Information Systems, 22 (4) pp. 540-572.

A lexical signature (LS) consisting of several key words from a Web document is often sufficient information for finding the document later, even if its URL has changed. We conduct a large-scale empirical study of nine methods for generating lexical signatures, including Phelps and Wilensky's original proposal (PW), seven of our own static variations, and one new dynamic method. We examine their performance on the Web over a 10-month period, and on a TREC data set, evaluating their ability to both (1) uniquely identify the original (possibly modified) document, and (2) locate other relevant documents if the original is lost. Lexical signatures chosen to minimize document frequency (DF) are good at unique identification but poor at finding relevant documents. PW works well on the relatively small TREC data set, but acts almost identically to DF on the Web, which contains billions of documents. Term-frequency-based lexical signatures (TF) are very easy to compute and often perform well, but are highly dependent on the ranking system of the search engine used. The term-frequency inverse-document-frequency- (TFIDF-) based method and hybrid methods (which combine DF with TF or TFIDF) seem to be the most promising candidates among static methods for generating effective lexical signatures. We propose a dynamic LS generator called Test&Select (TS) to mitigate LS conflict. TS outperforms all eight static methods in terms of both extracting the desired document and finding relevant information, over three different search engines. All LS methods show significant performance degradation as documents in the corpus are edited.

© All rights reserved Park et al. and/or ACM Press

2003
 
Edit | Del

Feng, Juan, Bhargava, Hemant K. and Pennock, David M. (2003): Comparison of allocation rules for paid placement advertising in search engines. In: Sadeh, Norman M., Dively, Mary Jo, Kauffman, Robert J., Labrou, Yannis, Shehory, Onn, Telang, Rahul and Cranor, Lorrie Faith (eds.) Proceedings of the 5th International Conference on Electronic Commerce - ICEC 2003 September 30 - October 03, 2003, Pittsburgh, Pennsylvania, USA. pp. 294-299.

 
Edit | Del

Dave, Kushal, Lawrence, Steve and Pennock, David M. (2003): Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 2003 International Conference on the World Wide Web 2003. pp. 519-528.

The web contains a wealth of product reviews, but sifting through them is a daunting task. Ideally, an opinion mining tool would process a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good). We begin by identifying the unique properties of this problem and develop a method for automatically distinguishing between positive and negative reviews. Our classifier draws on information retrieval techniques for feature extraction and scoring, and the results for various metrics and heuristics vary depending on the testing situation. The best methods work as well as or better than traditional machine learning. When operating on individual sentences collected from web searches, performance is limited due to noise and ambiguity. But in the context of a complete web-based tool and aided by a simple method for grouping sentences into attributes, the results are qualitatively quite useful.

© All rights reserved Dave et al. and/or ACM Press

2002
 
Edit | Del

Park, Seung-Taek, Pennock, David M., Giles, Clyde Lee and Krovetz, Robert (2002): Analysis of lexical signatures for finding lost or related documents. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2002. pp. 11-18.

A lexical signature of a web page is often sufficient for finding the page, even if its URL has changed. We conduct a large-scale empirical study of eight methods for generating lexical signatures, including Phelps and Wilensky's [14] original proposal (PW) and seven of our own variations. We examine their performance on the web and on a TREC data set, evaluating their ability both to uniquely identify the original document and to locate other relevant documents if the original is lost. Lexical signatures chosen to minimize document frequency (DF) are good at unique identification but poor at finding relevant documents. PW works well on the relatively small TREC data set, but acts almost identically to DF on the web, which contains billions of documents. Term-frequency-based lexical signatures (TF) are very easy to compute and often perform well, but are highly dependent on the ranking system of the search engine used. In general, TFIDF-based method and hybrid methods (which combine DF with TF or TFIDF) seem to be the most promising candidates for generating effective lexical signatures.

© All rights reserved Park et al. and/or ACM Press

 
Edit | Del

Schein, Andrew I., Popescul, Alexandrin, Ungar, Lyle H. and Pennock, David M. (2002): Methods and metrics for cold-start recommendations. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2002. pp. 253-260.

We have developed a method for recommending items that combines content and collaborative data under a single probabilistic framework. We benchmark our algorithm against a naive Bayes classifier on the cold-start problem, where we wish to recommend items that no one in the community has yet rated. We systematically explore three testing methodologies using a publicly available data set, and explain how these methods apply to specific real-world applications. We advocate heuristic recommenders when benchmarking to give competent baseline performance. We introduce a new performance metric, the CROC curve, and demonstrate empirically that the various components of our testing strategy combine to obtain deeper understanding of the performance characteristics of recommender systems. Though the emphasis of our testing is on cold-start recommending, our methods for recommending and evaluation are general.

© All rights reserved Schein et al. and/or ACM Press

 
Edit | Del

Glover, Eric J., Pennock, David M., Lawrence, Steve and Krovetz, Robert (2002): Inferring hierarchical descriptions. In: Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management November 4-9, 2002, McLean, VA, USA. pp. 507-514.

 
Edit | Del

Chakrabarti, Soumen, Joshi, Mukul M., Punera, Kunal and Pennock, David M. (2002): The structure of broad topics on the web. In: Proceedings of the 2002 International Conference on the World Wide Web 2002. pp. 251-262.

The Web graph is a giant social network whose properties have been measured and modeled extensively in recent years. Most such studies concentrate on the graph structure alone, and do not consider textual properties of the nodes. Consequently, Web communities have been characterized purely in terms of graph structure and not on page content. We propose that a topic taxonomy such as Yahoo! or the Open Directory provides a useful framework for understanding the structure of content-based clusters and communities. In particular, using a topic taxonomy and an automatic classifier, we can measure the background distribution of broad topics on the Web, and analyze the capability of recent random walk algorithms to draw samples which follow such distributions. In addition, we can measure the probability that a page about one broad topic will link to another broad topic. Extending this experiment, we can measure how quickly topic context is lost while walking randomly on the Web graph. Estimates of this topic mixing distance may explain why a global PageRank is still meaningful in the context of broad queries. In general, our measurements may prove valuable in the design of community-specific crawlers and link-based ranking systems.

© All rights reserved Chakrabarti et al. and/or ACM Press

 
Edit | Del

Glover, Eric J., Tsioutsiouliklis, Kostas, Lawrence, Steve, Pennock, David M. and Flake, Gary W. (2002): Using web structure for classifying and describing web pages. In: Proceedings of the 2002 International Conference on the World Wide Web 2002. pp. 562-569.

The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents (documents that link to the target document) for search. We analyze the relative utility of document text, and the text in citing documents near the citation, for classification and description. Results show that the text in citing documents, when available, often has greater discriminative and descriptive power than the text in the target document itself. The combination of evidence from a document and citing documents can improve on either information source alone. Moreover, by ranking words and phrases in the citing documents according to expected entropy loss, we are able to accurately name clusters of web pages, even with very few positive examples. Our results confirm, quantify, and extend previous research using web structure in these areas, introducing new methods for classification and description of pages.

© All rights reserved Glover et al. and/or ACM Press

2001
 
Edit | Del

Lawrence, Steve, Pennock, David M., Flake, Gary William, Krovetz, Robert, Coetzee, Frans, Glover, Eric J., Nielsen, Finn rup, Kruger, Andries and Giles, Clyde Lee (2001): Persistence of Web References in Scientific Research. In IEEE Computer, 34 (2) pp. 26-31.

2000
 
Edit | Del

Lawrence, Steve, Coetzee, Frans, Flake, Gary William, Pennock, David M., Krovetz, Robert, Nielsen, Finn rup, Kruger, Andries and Giles, Clyde Lee (2000): Persistence of information on the web: Analyzing citations contained in research articles. In: Proceedings of the 2000 ACM CIKM International Conference on Information and Knowledge Management November 6-11, 2000, McLean, VA, USA. pp. 235-242.

 
Add publication
Show list on your website
 

Join our community and advance:

Your
Skills

Your
Network

Your
Career

 
Join our community!
 
 
 

Changes to this page (author)

18 Aug 2009: Modified
09 Jul 2009: Modified
09 Jul 2009: Modified
09 Jul 2009: Modified
01 Jun 2009: Modified
01 Jun 2009: Modified
30 May 2009: Modified
29 May 2009: Modified
29 May 2009: Modified
29 May 2009: Modified
29 May 2009: Modified
24 Jun 2007: Modified
24 Jun 2007: Modified
23 Jun 2007: Added

Page Information

Page maintainer: The Editorial Team
URL: http://www.interaction-design.org/references/authors/david_m__pennock.html

Publication statistics

Pub. period:2000-2009
Pub. count:12
Number of co-authors:27



Co-authors

Number of publications with 3 favourite co-authors:

Steve Lawrence:5
Robert Krovetz:5
Clyde Lee Giles:4

 

 

Productive colleagues

David M. Pennock's 3 most productive colleagues in number of publications:

Clyde Lee Giles:71
Steve Lawrence:20
Hemant K. Bhargava:17
 
 
 

Upcoming Courses

go to course
User-Centred Design - Module 2
Starts tomorrow LAST CALL!
go to course
Design Thinking: The Beginner's Guide
Starts the day after tomorrow !
 
 

Featured chapter

Marc Hassenzahl explains the fascinating concept of User Experience and Experience Design. Commentaries by Don Norman, Eric Reiss, Mark Blythe, and Whitney Hess

User Experience and Experience Design !

 
 

Our Latest Books

 
 
The Social Design of Technical Systems: Building technologies for communities. 2nd Edition
by Brian Whitworth and Adnan Ahmad
start reading
 
 
 
 
Gamification at Work: Designing Engaging Business Software
by Janaki Mythily Kumar and Mario Herger
start reading
 
 
 
 
The Social Design of Technical Systems: Building technologies for communities
by Brian Whitworth and Adnan Ahmad
start reading
 
 
 
 
The Encyclopedia of Human-Computer Interaction, 2nd Ed.
by Mads Soegaard and Rikke Friis Dam
start reading