Publication statistics

Pub. period:1999-2008
Pub. count:23
Number of co-authors:24



Co-authors

Number of publications with 3 favourite co-authors:

Ophir Frieder:17
Steven M. Beitzel:13
David A. Grossman:13

 

 

Productive colleagues

Abdur Chowdhury's 3 most productive colleagues in number of publications:

Ophir Frieder:54
Lorrie Faith Crano..:44
David A. Grossman:23
 
 
 

Upcoming Courses

go to course
Emotional Design: How to make products people will love
92% booked. Starts in 3 days
go to course
UI Design Patterns for Successful Software
84% booked. Starts in 11 days
 
 

Featured chapter

Marc Hassenzahl explains the fascinating concept of User Experience and Experience Design. Commentaries by Don Norman, Eric Reiss, Mark Blythe, and Whitney Hess

User Experience and Experience Design !

 
 

Our Latest Books

 
 
The Social Design of Technical Systems: Building technologies for communities. 2nd Edition
by Brian Whitworth and Adnan Ahmad
start reading
 
 
 
 
Gamification at Work: Designing Engaging Business Software
by Janaki Mythily Kumar and Mario Herger
start reading
 
 
 
 
The Social Design of Technical Systems: Building technologies for communities
by Brian Whitworth and Adnan Ahmad
start reading
 
 
 
 
The Encyclopedia of Human-Computer Interaction, 2nd Ed.
by Mads Soegaard and Rikke Friis Dam
start reading
 
 

Abdur Chowdhury

Personal Homepage:
ir.cs.georgetown.edu/~abdur/index.html


 

Publications by Abdur Chowdhury (bibliography)

 what's this?
2008
 
Edit | Del

Jensen, Eric C., Beitzel, Steven M., Chowdhury, Abdur and Frieder, Ophir (2008): Repeatable evaluation of search services in dynamic environments. In ACM Transactions on Information Systems, 26 (1) p. 1. Available online

In dynamic environments, such as the World Wide Web, a changing document collection, query population, and set of search services demands frequent repetition of search effectiveness (relevance) evaluations. Reconstructing static test collections, such as in TREC, requires considerable human effort, as large collection sizes demand judgments deep into retrieved pools. In practice it is common to perform shallow evaluations over small numbers of live engines (often pairwise, engine A vs. engine B) without system pooling. Although these evaluations are not intended to construct reusable test collections, their utility depends on conclusions generalizing to the query population as a whole. We leverage the bootstrap estimate of the reproducibility probability of hypothesis tests in determining the query sample sizes required to ensure this, finding they are much larger than those required for static collections. We propose a semiautomatic evaluation framework to reduce this effort. We validate this framework against a manual evaluation of the top ten results of ten Web search engines across 896 queries in navigational and informational tasks. Augmenting manual judgments with pseudo-relevance judgments mined from Web taxonomies reduces both the chances of missing a correct pairwise conclusion, and those of finding an errant conclusion, by approximately 50%.

© All rights reserved Jensen et al. and/or ACM Press

 
Edit | Del

Shanahan, James G., Amer-Yahia, Sihem, Manolescu, Ioana, Zhang, Yi, Evans, David A., Kolcz, Aleksander, Choi, Key-Sun and Chowdhury, Abdur (eds.) Proceedings of the 17th ACM Conference on Information and Knowledge Management - CIKM 2008 October 26-30, 2008, Napa Valley, California, USA.

2007
 
Edit | Del

Beitzel, Steven M., Jensen, Eric C., Chowdhury, Abdur and Frieder, Ophir (2007): Varying approaches to topical web query classification. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2007. pp. 783-784. Available online

Topical classification of web queries has drawn recent interest because of the promise it offers in improving retrieval effectiveness and efficiency. However, much of this promise depends on whether classification is performed before or after the query is used to retrieve documents. We examine two previously unaddressed issues in query classification: pre versus post-retrieval classification effectiveness and the effect of training explicitly from classified queries versus bridging a classifier trained using a document taxonomy. Bridging classifiers map the categories of a document taxonomy onto those of a query classification problem to provide sufficient training data. We find that training classifiers explicitly from manually classified queries outperforms the bridged classifier by 48% in F1 score. Also, a pre-retrieval classifier using only the query terms performs merely 11% worse than the bridged classifier which requires snippets from retrieved documents.

© All rights reserved Beitzel et al. and/or ACM Press

 
Edit | Del

Beitzel, Steven M., Jensen, Eric C., Lewis, David D., Chowdhury, Abdur and Frieder, Ophir (2007): Automatic classification of Web queries using very large unlabeled query logs. In ACM Transactions on Information Systems, 25 (2) p. 9. Available online

Accurate topical classification of user queries allows for increased effectiveness and efficiency in general-purpose Web search systems. Such classification becomes critical if the system must route queries to a subset of topic-specific and resource-constrained back-end databases. Successful query classification poses a challenging problem, as Web queries are short, thus providing few features. This feature sparseness, coupled with the constantly changing distribution and vocabulary of queries, hinders traditional text classification. We attack this problem by combining multiple classifiers, including exact lookup and partial matching in databases of manually classified frequent queries, linear models trained by supervised learning, and a novel approach based on mining selectional preferences from a large unlabeled query log. Our approach classifies queries without using external sources of information, such as online Web directories or the contents of retrieved pages, making it viable for use in demanding operational environments, such as large-scale Web search services. We evaluate our approach using a large sample of queries from an operational Web search engine and show that our combined method increases recall by nearly 40% over the best single method while maintaining adequate precision. Additionally, we compare our results to those from the 2005 KDD Cup and find that we perform competitively despite our operational restrictions. This suggests it is possible to topically classify a significant portion of the query stream without requiring external sources of information, allowing for deployment in operationally restricted environments.

© All rights reserved Beitzel et al. and/or ACM Press

 
Edit | Del

Beitzel, Steven M., Jensen, Eric C., Chowdhury, Abdur, Frieder, Ophir and Grossman, David A. (2007): Temporal analysis of a very large topically categorized Web query log. In JASIST - Journal of the American Society for Information Science and Technology, 58 (2) pp. 166-178. Available online

2006
 
Edit | Del

Egelman, Serge, Cranor, Lorrie Faith and Chowdhury, Abdur (2006): An analysis of P3P-enabled web sites among top-20 search results. In: Fox, Mark S. and Spencer, Bruce (eds.) Proceedings of the 8th International Conference on Electronic Commerce - ICEC 2006 2006, Fredericton, New Brunswick, Canada. pp. 197-207. Available online

2005
 
Edit | Del

Beitzel, Steven M., Jensen, Eric C., Frieder, Ophir, Grossman, David A., Lewis, David D., Chowdhury, Abdur and Kolcz, Aleksandr (2005): Automatic web query classification using labeled and unlabeled training data. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2005. pp. 581-582. Available online

Accurate topical categorization of user queries allows for increased effectiveness, efficiency, and revenue potential in general-purpose web search systems. Such categorization becomes critical if the system is to return results not just from a general web collection but from topic-specific databases as well. Maintaining sufficient categorization recall is very difficult as web queries are typically short, yielding few features per query. We examine three approaches to topical categorization of general web queries: matching against a list of manually labeled queries, supervised learning of classifiers, and mining of selectional preference rules from large unlabeled query logs. Each approach has its advantages in tackling the web query classification recall problem, and combining the three techniques allows us to classify a substantially larger proportion of queries than any of the individual techniques. We examine the performance of each approach on a real web query stream and show that our combined method accurately classifies 46% of queries, outperforming the recall of the best single approach by nearly 20%, with a 7% improvement in overall effectiveness.

© All rights reserved Beitzel et al. and/or ACM Press

 
Edit | Del

Beitzel, Steven M., Jensen, Eric C., Frieder, Ophir, Chowdhury, Abdur and Pass, Greg (2005): Surrogate scoring for improved metasearch precision. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2005. pp. 583-584. Available online

We describe a method for improving the precision of metasearch results based upon scoring the visual features of documents' surrogate representations. These surrogate scores are used during fusion in place of the original scores or ranks provided by the underlying search engines. Visual features are extracted from typical search result surrogate information, such as title, snippet, URL, and rank. This approach specifically avoids the use of search engine-specific scores and collection statistics that are required by most traditional fusion strategies. This restriction correctly reflects the use of metasearch in practice, in which knowledge of the underlying search engines' strategies cannot be assumed. We evaluate our approach using a precision-oriented test collection of manually-constructed binary relevance judgments for the top ten results from ten web search engines over 896 queries. We show that our visual fusion approach significantly outperforms the rCombMNZ fusion algorithm by

© All rights reserved Beitzel et al. and/or ACM Press

 
Edit | Del

Jensen, Eric C., Beitzel, Steven M., Grossman, David A., Frieder, Ophir and Chowdhury, Abdur (2005): Predicting query difficulty on the web by learning visual clues. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2005. pp. 615-616. Available online

We describe a method for predicting query difficulty in a precision-oriented web search task. Our approach uses visual features from retrieved surrogate document representations (titles, snippets, etc.) to predict retrieval effectiveness for a query. By training a supervised machine learning algorithm with manually evaluated queries, visual clues indicative of relevance are discovered. We show that this approach has a moderate correlation of 0.57 with precision at 10 scores from manual relevance judgments of the top ten documents retrieved by ten web search engines over 896 queries. Our findings indicate that difficulty predictors which have been successful in recall-oriented ad-hoc search, such as clarity metrics, are not nearly as correlated with engine performance in precision-oriented tasks such as this, yielding a maximum correlation of 0.3. Additionally, relying only on visual clues avoids the need for collection statistics that are required by these prior approaches. This enables our approach to be employed in environments where these statistics are unavailable or costly to retrieve, such as metasearch.

© All rights reserved Jensen et al. and/or ACM Press

 
Edit | Del

Jensen, Eric C., Beitzel, Steven M., Frieder, Ophir and Chowdhury, Abdur (2005): A framework for determining necessary query set sizes to evaluate web search effectiveness. In: Proceedings of the 2005 International Conference on the World Wide Web 2005. pp. 1176-1177. Available online

We describe a framework of bootstrapped hypothesis testing for estimating the confidence in one web search engine outperforming another over any randomly sampled query set of a given size. To validate this framework, we have constructed and made available a precision-oriented test collection consisting of manual binary relevance judgments for each of the top ten results of ten web search engines across 896 queries and the single best result for each of those queries. Results from this bootstrapping approach over typical query set sizes indicate that examining repeated statistical tests is imperative, as a single test is quite likely to find significant differences that do not necessarily generalize. We also find that the number of queries needed for a repeatable evaluation in a dynamic environment such as the web is much higher than previously studied.

© All rights reserved Jensen et al. and/or ACM Press

2004
 
Edit | Del

Beitzel, Steven M., Jensen, Eric C., Chowdhury, Abdur, Grossman, David A. and Frieder, Ophir (2004): Hourly analysis of a very large topically categorized web query log. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 321-328. Available online

We review a query log of hundreds of millions of queries that constitute the total query traffic for an entire week of a general-purpose commercial web search service. Previously, query logs have been studied from a single, cumulative view. In contrast, our analysis shows changes in popularity and uniqueness of topically categorized queries across the hours of the day. We examine query traffic on an hourly basis by matching it against lists of queries that have been topically pre-categorized by human editors. This represents 13% of the query traffic. We show that query traffic from particular topical categories differs both from the query stream as a whole and from other categories. This analysis provides valuable insight for improving retrieval effectiveness and efficiency. It is also relevant to the development of enhanced query disambiguation, routing, and caching algorithms.

© All rights reserved Beitzel et al. and/or ACM Press

 
Edit | Del

Beitzel, Steven M., Jensen, Eric C., Chowdhury, Abdur, Grossman, David A. and Frieder, Ophir (2004): Evaluation of filtering current news search results. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 494-495. Available online

We describe an evaluation of result set filtering techniques for providing ultra-high precision in the task of presenting related news for general web queries. In this task, the negative user experience generated by retrieving non-relevant documents has a much worse impact than not retrieving relevant ones. We adapt cost-based metrics from the document filtering domain to this result filtering problem in order to explicitly examine the tradeoff between missing relevant documents and retrieving non-relevant ones. A large manual evaluation of three simple threshold filters shows that the basic approach of counting matching title terms outperforms also incorporating selected abstract terms based on part-of-speech or higher-level linguistic structures. Simultaneously, leveraging these cost-based metrics allows us to explicitly determine what other tasks would benefit from these alternative techniques.

© All rights reserved Beitzel et al. and/or ACM Press

 
Edit | Del

Beitzel, Steven M., Jensen, Eric C., Chowdhury, Abdur, Grossman, David A., Frieder, Ophir and Goharian, Nazli (2004): Fusion of effective retrieval strategies in the same information retrieval system. In JASIST - Journal of the American Society for Information Science and Technology, 55 (10) pp. 859-868. Available online

2003
 
Edit | Del

Beitzel, Steven M., Jensen, Eric C., Chowdhury, Abdur, Grossman, David A. and Frieder, Ophir (2003): Using manually-built web directories for automatic evaluation of known-item retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2003. pp. 373-374. Available online

Information retrieval system evaluation is complicated by the need for manually assessed relevance judgments. Large manually-built directories on the web open the door to new evaluation procedures. By assuming that web pages are the known relevant items for queries that exactly match their title, we use the ODP (Open Directory Project) and Looksmart directories for system evaluation. We test our approach with a sample from a log of ten million web queries and show that such an evaluation is unbiased in terms of the directory used, stable with respect to the query set selected, and correlated with a reasonably large manual evaluation.

© All rights reserved Beitzel et al. and/or ACM Press

 
Edit | Del

Beitzel, Steven M., Jensen, Eric C., Chowdhury, Abdur and Grossman, David A. (2003): Using titles and category names from editor-driven taxonomies for automatic evaluation. In: Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management November 2-8, 2003, New Orleans, Louisiana, USA. pp. 17-23. Available online

 
Edit | Del

Chowdhury, Abdur and Pass, Greg (2003): Operational requirements for scalable search systems. In: Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management November 2-8, 2003, New Orleans, Louisiana, USA. pp. 435-442. Available online

 
Edit | Del

Ma, Ling, Goharian, Nazli, Chowdhury, Abdur and Chung, Misun (2003): Extracting unstructured data from template generated web documents. In: Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management November 2-8, 2003, New Orleans, Louisiana, USA. pp. 512-515. Available online

2002
 
Edit | Del

Chowdhury, Abdur, Frieder, Ophir, Grossman, David A. and McCabe, Mary Catherine (2002): Collection statistics for fast duplicate document detection. In ACM Transactions on Information Systems, 20 (2) pp. 171-191. Available online

We present a new algorithm for duplicate document detection that uses collection statistics. We compare our approach with the state-of-the-art approach using multiple collections. These collections include a 30 MB 18,577 web document collection developed by Excite@Home and three NIST collections. The first NIST collection consists of 100 MB 18,232 LA-Times documents, which is roughly similar in the number of documents to the Excite&at;Home collection. The other two collections are both 2 GB and are the 247,491-web document collection and the TREC disks 4 and 5 -- 528,023 document collection. We show that our approach called I-Match, scales in terms of the number of documents and works well for documents of all sizes. We compared our solution to the state of the art and found that in addition to improved accuracy of detection, our approach executed in roughly one-fifth the time.

© All rights reserved Chowdhury et al. and/or ACM Press

 
Edit | Del

Chowdhury, Abdur, McCabe, M. Catherine, Grossman, David A. and Frieder, Ophir (2002): Document normalization revisited. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2002. pp. 381-382. Available online

Cosine Pivoted Document Length Normalization has reached a point of stability where many researchers indiscriminately apply a specific value of 0.2 regardless of the collection. Our efforts, however, demonstrate that applying this specific value without tuning for the document collection degrades average

© All rights reserved Chowdhury et al. and/or ACM Press

 
Edit | Del

Chowdhury, Abdur and Soboroff, Ian (2002): Automatic evaluation of world wide web search services. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2002. pp. 421-422. Available online

Users of the World-Wide Web are not only confronted by an immense overabundance of information, but also by a plethora of tools for searching for the web pages that suit their information needs. Web search engines differ widely in interface, features, coverage of the web, ranking methods, delivery of advertising, and more. In this paper, we present a method for comparing search engines automatically based on how they rank known item search results. Because the engines perform their search on overlapping (but different) subsets of the web collected at different points in time, evaluation of search engines poses significant challenges to the traditional information retrieval methodology. Our method uses known item searching; comparing the relative ranks of the items in the search engines' rankings. Our approach automatically constructs known item queries using query log analysis and automatically constructs the result via analysis of editor comments from the ODP (Open Directory Project). Additionally, we present our comparison on five (Lycos, Netscape, Fast, Google, HotBot) well-known search services and find that some services perform known item searches better than others, but the majority are statistically equivalent.

© All rights reserved Chowdhury and Soboroff and/or ACM Press

2001
 
Edit | Del

Chowdhury, Abdur, Frieder, Ophir, Grossman, David A. and McCabe, Catherine (2001): Analyses of multiple-evidence combinations for retrieval strategies. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2001. pp. 394-395. Available online

2000
 
Edit | Del

McCabe, M. Catherine, Lee, Jinho, Chowdhury, Abdur, Grossman, David A. and Frieder, Ophir (2000): On the Design and Evaluation of a Multi-Dimensional Approach to Information Retrieval. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2000. pp. 363-365. Available online

1999
 
Edit | Del

McCabe, M. Catherine, Chowdhury, Abdur, Grossman, David A. and Frieder, Ophir (1999): A Unified Environment for Fusion of Information Retrieval Approaches. In: Proceedings of the 1999 ACM CIKM International Conference on Information and Knowledge Management November 2-6, 1999, Kansas City, Missouri, USA. pp. 330-334. Available online

 
Add publication
Show list on your website
 
 

Join our community and advance:

Your
Skills

Your
Network

Your
Career

 
Join our community!
 
 
 

Page Information

Page maintainer: The Editorial Team
URL: http://www.interaction-design.org/references/authors/abdur_chowdhury.html