It is easy for me to access this knowledge pool, I want it to grow so that I can grow along

Last 3 Donors


Support us

Funding progress for 2010:

W. Bruce Croft

No picture of W. Bruce Croft available - click to provide one
Has also published under the name of:
"W. B. Croft"



About the author:
No description available of W. Bruce Croft...
ADD DESCRIPTION
ADD PUBLICATION
SHARE YOUR RESEARCH

Publications by W. Bruce Croft (bibliography)

 what's this?

» 2008 «

Edit | Del

Lee, Kyung Soon, Croft, W. Bruce and Allan, James (2008): A cluster-based resampling method for pseudo-relevance feedback. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2008. pp. 235-242. Available online

Typical pseudo-relevance feedback methods assume the top-retrieved documents are relevant and use these pseudo-relevant documents to expand terms. The initial retrieval set can, however, contain a great deal of noise. In this paper, we present a cluster-based resampling method to select better pseudo-relevant documents based on the relevance model. The main idea is to use document clusters to find dominant documents for the initial retrieval set, and to repeatedly feed the documents to emphasize the core topics of a query. Experimental results on large-scale web TREC collections show significant improvements over the relevance model. For justification of the resampling approach, we examine relevance density of feedback documents. A higher relevance density will result in greater retrieval accuracy, ultimately approaching true relevance feedback. The resampling approach shows higher relevance density than the baseline relevance model on all collections, resulting in better retrieval accuracy in pseudo-relevance feedback. This result indicates that the proposed method is effective for pseudo-relevance feedback.

Copyrights may apply

Edit | Del

Xue, Xiaobing, Jeon, Jiwoon and Croft, W. Bruce (2008): Retrieval models for question and answer archives. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2008. pp. 475-482. Available online

Retrieval in a question and answer archive involves finding good answers for a user's question. In contrast to typical document retrieval, a retrieval model for this task can exploit question similarity as well as ranking the associated answers. In this paper, we propose a retrieval model that combines a translation-based language model for the question part with a query likelihood approach for the answer part. The proposed model incorporates word-to-word translation probabilities learned through exploiting different sources of information. Experiments show that the proposed translation based language model for the question part outperforms baseline methods significantly. By combining with the query likelihood language model for the answer part, substantial additional effectiveness improvements are obtained.

Copyrights may apply

Edit | Del

Bendersky, Michael and Croft, W. Bruce (2008): Discovering key concepts in verbose queries. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2008. pp. 491-498. Available online

Current search engines do not, in general, perform well with longer, more verbose queries. One of the main issues in processing these queries is identifying the key concepts that will have the most impact on effectiveness. In this paper, we develop and evaluate a technique that uses query-dependent, corpus-dependent, and corpus-independent features for automatic extraction of key concepts from verbose queries. We show that our method achieves higher accuracy in the identification of key concepts than standard weighting methods such as inverse document frequency. Finally, we propose a probabilistic model for integrating the weighted key concepts identified by our method into a query, and demonstrate that this integration significantly improves retrieval effectiveness for a large set of natural language description queries derived from TREC topics on several newswire and web collections.

Copyrights may apply

Edit | Del

Seo, Jangwon and Croft, W. Bruce (2008): Local text reuse detection. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2008. pp. 571-578. Available online

Text reuse occurs in many different types of documents and for many different reasons. One form of reuse, duplicate or near-duplicate documents, has been a focus of researchers because of its importance in Web search. Local text reuse occurs when sentences, facts or passages, rather than whole documents, are reused and modified. Detecting this type of reuse can be the basis of new tools for text analysis. In this paper, we introduce a new approach to detecting local text reuse and compare it to other approaches. This comparison involves a study of the amount and type of reuse that occurs in real documents, including TREC newswire and blog collections.

Copyrights may apply

Edit | Del

Seo, Jangwon and Croft, W. Bruce (2008): Blog site search using resource selection. In: Shanahan, James G., Amer-Yahia, Sihem, Manolescu, Ioana, Zhang, Yi, Evans, David A., Kolcz, Aleksander, Choi, Key-Sun and Chowdhury, Abdur (eds.) Proceedings of the 17th ACM Conference on Information and Knowledge Management - CIKM 2008 October 26-30, 2008, Napa Valley, California, USA. pp. 1053-1062. Available online

Edit | Del

Croft, W. Bruce (2008): Unsolved problems in search: (and how we approach them). In: Shanahan, James G., Amer-Yahia, Sihem, Manolescu, Ioana, Zhang, Yi, Evans, David A., Kolcz, Aleksander, Choi, Key-Sun and Chowdhury, Abdur (eds.) Proceedings of the 17th ACM Conference on Information and Knowledge Management - CIKM 2008 October 26-30, 2008, Napa Valley, California, USA. p. 1001. Available online

» 2007 «

Edit | Del

Strohman, Trevor and Croft, W. Bruce (2007): Efficient document retrieval in main memory. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2007. pp. 175-182. Available online

Disk access performance is a major bottleneck in traditional information retrieval systems. Compared to system memory, disk bandwidth is poor, and seek times are worse. We circumvent this problem by considering query evaluation strategies in main memory. We show how new accumulator trimming techniques combined with inverted list skipping can produce extremely high performance retrieval systems without resorting to methods that may harm effectiveness. We evaluate our techniques using Galago, a new retrieval system designed for efficient query processing. Our system achieves a 69% improvement in query throughput over previous methods.

Copyrights may apply

Edit | Del

Metzler, Donald and Croft, W. Bruce (2007): Latent concept expansion using markov random fields. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2007. pp. 311-318. Available online

Query expansion, in the form of pseudo-relevance feedback or relevance feedback, is a common technique used to improve retrieval effectiveness. Most previous approaches have ignored important issues, such as the role of features and the importance of modeling term dependencies. In this paper, we propose a robust query expansion technique based on the Markov random field model for information retrieval. The technique, called latent concept expansion, provides a mechanism for modeling term dependencies during expansion. Furthermore, the use of arbitrary features within the model provides a powerful framework for going beyond simple term occurrence features that are implicitly used by most other expansion techniques. We evaluate our technique against relevance models, a state-of-the-art language modeling query expansion technique. Our model demonstrates consistent and significant improvements in retrieval effectiveness across several TREC data sets. We also describe how our technique can be used to generate meaningful multi-term concepts for tasks such as query suggestion/reformulation.

Copyrights may apply

Edit | Del

Zhou, Yun and Croft, W. Bruce (2007): Query performance prediction in web search environments. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2007. pp. 543-550. Available online

Current prediction techniques, which are generally designed for content-based queries and are typically evaluated on relatively homogenous test collections of small sizes, face serious challenges in web search environments where collections are significantly more heterogeneous and different types of retrieval tasks exist. In this paper, we present three techniques to address these challenges. We focus on performance prediction for two types of queries in web search environments: content-based and Named-Page finding. Our evaluation is mainly performed on the GOV2 collection. In addition to evaluating our models for the two types of queries separately, we consider a more challenging and realistic situation that the two types of queries are mixed together without prior information on query types. To assist prediction under the mixed-query situation, a novel query classifier is adopted. Results show that our prediction of web query performance is substantially more accurate than the current state-of-the-art prediction techniques. Consequently, our paper provides a practical approach to performance prediction in real-world web settings.

Copyrights may apply

Edit | Del

Strohman, Trevor, Croft, W. Bruce and Jensen, David (2007): Recommending citations for academic papers. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2007. pp. 705-706. Available online

We approach the problem of academic literature search by considering an unpublished manuscript as a query to a search system. We use the text of previous literature as well as the citation graph that connects it to find relevant related material. We evaluate our technique with manual and automatic evaluation methods, and find an order of magnitude improvement in mean average precision as compared to a text similarity baseline.

Copyrights may apply

Edit | Del

Yi, Xing, Allan, James and Croft, W. Bruce (2007): Matching resumes and jobs based on relevance models. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2007. pp. 809-810. Available online

We investigate the difficult problem of matching semi-structured resumes and jobs in a large scale real-world collection. We compare standard approaches to Structured Relevance Models (SRM), an extension of relevance-based language model for modeling and retrieving semi-structured documents. Preliminary experiments show that the SRM approach achieved promising performance and performed better than typical unstructured relevance models.

Copyrights may apply

Edit | Del

Balasubramanian, Niranjan, Allan, James and Croft, W. Bruce (2007): A comparison of sentence retrieval techniques. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2007. pp. 813-814. Available online

Identifying redundant information in sentences is useful for several applications such as summarization, document provenance, detecting text reuse and novelty detection. The task of identifying redundant information in sentences is defined as follows: Given a query sentence the task is to retrieve sentences from a given collection that express all or some subset of the information present in the query sentence. Sentence retrieval techniques rank sentences based on some measure of their similarity to a query. The effectiveness of such techniques depends on the similarity measure used to rank sentences. An effective retrieval model should be able to handle low word overlap between query and candidate sentences and go beyond just word overlap. Simple language modeling techniques like query likelihood retrieval have outperformed TF-IDF and word overlap based methods for ranking sentences. In this paper, we compare the performance of sentence retrieval using different language modeling techniques for the problem of identifying redundant information.

Copyrights may apply

Edit | Del

Petkova, Desislava and Croft, W. Bruce (2007): Proximity-based document representation for named entity retrieval. In: Silva, Mario J., Laender, Alberto H. F., Baeza-Yates, Ricardo A., McGuinness, Deborah L., Olstad, Bjørn, Olsen, Øystein Haug and Falcão, André O. (eds.) Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management - CIKM 2007 November 6-10, 2007, Lisbon, Portugal. pp. 731-740. Available online

» 2006 «

Edit | Del

Jeon, Jiwoon, Croft, W. Bruce, Lee, Joon Ho and Park, Soyeon (2006): A framework to predict the quality of answers with non-textual features. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2006. pp. 228-235. Available online

New types of document collections are being developed by various web services. The service providers keep track of non-textual features such as click counts. In this paper, we present a framework to use non-textual features to predict the quality of documents. We also show our quality measure can be successfully incorporated into the language modeling-based retrieval model. We test our approach on a collection of question and answer pairs gathered from a community based question answering service where people ask and answer questions. Experimental results using our quality measure show a significant improvement over our baseline.

Copyrights may apply

Edit | Del

Liu, Xiaoyong and Croft, W. Bruce (2006): Representing clusters for retrieval. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2006. pp. 671-672. Available online

Edit | Del

Li, Xiaoyan and Croft, W. Bruce (2006): Improving novelty detection for general topics using sentence level information patterns. In: Yu, Philip S., Tsotras, Vassilis J., Fox, Edward A. and Liu, Bing (eds.) Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management November 6-11, 2006, Arlington, Virginia, USA. pp. 238-247. Available online

Edit | Del

Eguchi, Koji and Croft, W. Bruce (2006): Boosting relevance model performance with query term dependence. In: Yu, Philip S., Tsotras, Vassilis J., Fox, Edward A. and Liu, Bing (eds.) Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management November 6-11, 2006, Arlington, Virginia, USA. pp. 792-793. Available online

Edit | Del

Shah, Chirag, Croft, W. Bruce and Jensen, David (2006): Representing documents with named entities for story link detection (SLD). In: Yu, Philip S., Tsotras, Vassilis J., Fox, Edward A. and Liu, Bing (eds.) Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management November 6-11, 2006, Arlington, Virginia, USA. pp. 868-869. Available online

Edit | Del

Zhou, Yun and Croft, W. Bruce (2006): Ranking robustness: a novel framework to predict query performance. In: Yu, Philip S., Tsotras, Vassilis J., Fox, Edward A. and Liu, Bing (eds.) Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management November 6-11, 2006, Arlington, Virginia, USA. pp. 567-574. Available online

» 2005 «

Edit | Del

Strohman, Trevor, Turtle, Howard and Croft, W. Bruce (2005): Optimization strategies for complex queries. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2005. pp. 219-225. Available online

Previous research into the efficiency of text retrieval systems has dealt primarily with methods that consider inverted lists in sequence; these methods are known as term-at-a-time methods. However, the literature for optimizing document-at-a-time systems remains sparse. We present an improvement to the max_score optimization, which is the most efficient known document-at-a-time scoring method. Like max_score, our technique, called term bounded max_score, is guaranteed to return exactly the same scores and documents as an unoptimized evaluation, which is particularly useful for query model research. We simulated our technique to explore the problem space, then implemented it in Indri, our large scale language modeling search engine. Tests with the GOV2 corpus on title queries show our method to be 23% faster than max_score alone, and 61% faster than our document-at-a-time baseline. Our optimized query times are competitive with conventional term-at-a-time systems on this year's TREC Terabyte task.

Copyrights may apply

Edit | Del

Jeon, Jiwoon, Croft, W. Bruce and Lee, Joon Ho (2005): Finding semantically similar questions based on their answers. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2005. pp. 617-618. Available online

A large number of question and answer pairs can be collected from question and answer boards and FAQ pages on the Web. This paper proposes an automatic method of finding the questions that have the same meaning. The method can detect semantically similar questions that have little word overlap because it calculates question-question similarities by using the corresponding answers as well as the questions. We develop two different similarity measures based on language modeling and compare them with the traditional similarity measures. Experimental results show that semantically similar questions pairs can be effectively found with the proposed similarity measures.

Copyrights may apply

Edit | Del

Metzler, Donald, Bernstein, Yaniv, Croft, W. Bruce, Moffat, Alistair and Zobel, Justin (2005): The recap system for identifying information flow. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2005. p. 678. Available online

Edit | Del

Jeon, Jiwoon, Croft, W. Bruce and Lee, Joon Ho (2005): Finding similar questions in large question and answer archives. In: Herzog, Otthein, Schek, Hans-Jörg and Fuhr, Norbert (eds.) Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management October 31 - November 5, 2005, Bremen, Germany. pp. 84-90. Available online

Edit | Del

Liu, Xiaoyong, Croft, W. Bruce and Koll, Matthew B. (2005): Finding experts in community-based question-answering services. In: Herzog, Otthein, Schek, Hans-Jörg and Fuhr, Norbert (eds.) Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management October 31 - November 5, 2005, Bremen, Germany. pp. 315-316. Available online

Edit | Del

Li, Xiaoyan and Croft, W. Bruce (2005): Novelty detection based on sentence level patterns. In: Herzog, Otthein, Schek, Hans-Jörg and Fuhr, Norbert (eds.) Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management October 31 - November 5, 2005, Bremen, Germany. pp. 744-751. Available online

Edit | Del

Metzler, Donald, Bernstein, Yaniv, Croft, W. Bruce, Moffat, Alistair and Zobel, Justin (2005): Similarity measures for tracking information flow. In: Herzog, Otthein, Schek, Hans-Jörg and Fuhr, Norbert (eds.) Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management October 31 - November 5, 2005, Bremen, Germany. pp. 517-524. Available online

Edit | Del

Zhou, Yun and Croft, W. Bruce (2005): Document quality models for web ad hoc retrieval. In: Herzog, Otthein, Schek, Hans-Jörg and Fuhr, Norbert (eds.) Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management October 31 - November 5, 2005, Bremen, Germany. pp. 331-332. Available online

Edit | Del

Abiteboul, Serge, Agrawal, Rakesh, Bernstein, Philip A., Carey, Michael J., Ceri, Stefano, Croft, W. Bruce, DeWitt, David J., Franklin, Michael J., Garcia-Molina, Hector, Gawlick, Dieter, Gray, Jim, Haas, Laura M., Halevy, Alon Y., Hellerstein, Joseph M., Ioannidis, Yannis E., Kersten, Martin L. and Pazzani, Michael J. (2005): The Lowell database research self-assessment. In Communications of the ACM, 48 (5) pp. 111-118

» 2004 «

Edit | Del

Shah, Chirag and Croft, W. Bruce (2004): Evaluating high accuracy retrieval techniques. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 2-9. Available online

Although information retrieval research has always been concerned with improving the effectiveness of search, in some applications, such as information analysis, a more specific requirement exists for high accuracy retrieval. This means that achieving high precision in the top document ranks is paramount. In this paper we present work aimed at achieving high accuracy in ad-hoc document retrieval by incorporating approaches from question answering (QA). We focus on getting the first relevant result as high as possible in the ranked list and argue that traditional precision and recall are not appropriate measures for evaluating this task. We instead use the mean reciprocal rank (MRR) of the first relevant result. We evaluate three different methods for modifying queries to achieve high accuracy. The experiments done on TREC data provide support for the approach of using MRR and incorporating QA techniques for getting high accuracy in ad-hoc retrieval task.

Copyrights may apply

Edit | Del

Liu, Xiaoyong and Croft, W. Bruce (2004): Cluster-based retrieval using language models. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 186-193. Available online

Previous research on cluster-based retrieval has been inconclusive as to whether it does bring improved retrieval effectiveness over document-based retrieval. Recent developments in the language modeling approach to IR have motivated us to re-examine this problem within this new retrieval framework. We propose two new models for cluster-based retrieval and evaluate them on several TREC collections. We show that cluster-based retrieval can perform consistently across collections of realistic size, and significant improvements over document-based retrieval can be obtained in a fully automatic manner and without relevance information provided by human.

Copyrights may apply

Edit | Del

Corrada-Emmanuel, Andres and Croft, W. Bruce (2004): Answer models for question answering passage retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 516-517. Available online

Answer patterns have been shown to improve the performance of open-domain factoid QA systems. Their use, however, requires either constructing the patterns manually or developing algorithms for learning them automatically. We present here a simpler approach that extends the techniques of language modeling to create answer models. These are language models trained on the correct answers to training questions. We show how they fit naturally into a probabilistic model for answer passage retrieval and demonstrate their effectiveness on the TREC 2002 QA Corpus.

Copyrights may apply

Edit | Del

Metzler, Donald, Lavrenko, Victor and Croft, W. Bruce (2004): Formal multiple-bernoulli models for language modeling. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 540-541. Available online

Edit | Del

Liu, Xiaoyong, Croft, W. Bruce, Oh, Paul and Hart, David (2004): Automatic recognition of reading levels from user queries. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 548-549. Available online

Edit | Del

Cronen-Townsend, Stephen, Zhou, Yun and Croft, W. Bruce (2004): A framework for selective query expansion. In: Grossman, David A., Gravano, Luis, Zhai, Chengxiang, Herzog, Otthein and Evans, David A. (eds.) Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management November 8-13, 2004, Washington, DC, USA. pp. 236-237. Available online

Edit | Del

Croft, W. Bruce and Callan, Jamie (2004): A Language Modeling Approach to Metadata for Cross-Database Linkage and Search. In: DG.O 2004 2004. . Available online

Edit | Del

Wei, Xing, Croft, W. Bruce and Pinto, David (2004): Question Answering Performance on Table Data. In: DG.O 2004 2004. . Available online

» 2003 «

Edit | Del

Croft, W. Bruce (2003): Information retrieval and computer science: an evolving relationship. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2003. pp. 2-3. Available online

Following the tradition of these acceptance talks, I will be giving my thoughts on where our field is going. Any discussion of the future of information retrieval (IR) research, however, needs to be placed in the context of its history and relationship to other fields. Although IR has had a very strong relationship with library and information science, its relationship to computer science (CS) and its relative standing as a sub-discipline of CS has been more dynamic. IR is quite an old field, and when a number of CS departments were forming in the 60s, it was not uncommon for a faculty member to be pursuing research related to IR. Early ACM curriculum recommendations for CS contained courses on information retrieval, and encyclopedias described IR and database systems as different aspects of the same field.

Copyrights may apply

Edit | Del

Pinto, David, McCallum, Andrew, Wei, Xing and Croft, W. Bruce (2003): Table extraction using conditional random fields. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2003. pp. 235-242. Available online

The ability to find tables and extract information from them is a necessary component of data mining, question answering, and other information retrieval tasks. Documents often contain tables in order to communicate densely packed, multi-dimensional information. Tables do this by employing layout patterns to efficiently indicate fields and records in two-dimensional form. Their rich combination of formatting and content present difficulties for traditional language modeling techniques, however. This paper presents the use of conditional random fields (CRFs) for table extraction, and compares them with hidden Markov models (HMMs). Unlike HMMs, CRFs support the use of many rich and overlapping layout and language features, and as a result, they perform significantly better. We show experimental results on plain-text government statistical reports in which tables are located with 92% F1, and their constituent lines are classified into 12 table-related categories with 94% accuracy. We also discuss future work on undirected graphical models for segmenting columns, finding cells, and classifying them as data cells or label cells.

Copyrights may apply

Edit | Del

Lawrie, Dawn J. and Croft, W. Bruce (2003): Generating hierarchical summaries for web searches. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2003. pp. 457-458. Available online

Hierarchies provide a means of organizing, summarizing and accessing information. We describe a method for automatically generating hierarchies from small collections of text, and then apply this technique to summarizing the documents retrieved by a search engine.

Copyrights may apply

Edit | Del

Li, Xiaoyan and Croft, W. Bruce (2003): Time-based language models. In: Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management November 2-8, 2003, New Orleans, Louisiana, USA. pp. 469-475. Available online

Edit | Del

Nallapati, Ramesh, Croft, W. Bruce and Allan, James (2003): Relevant query feedback in statistical language modeling. In: Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management November 2-8, 2003, New Orleans, Louisiana, USA. pp. 560-563. Available online

Edit | Del

Pinto, David, McCallum, Andrew, Wei, Xing and Croft, W. Bruce (2003): Table Extraction Using Conditional Random Fields. In: DG.O 2003 2003. . Available online

» 2002 «

Edit | Del

Pinto, David, Branstein, Michael, Coleman, Ryan, Croft, W. Bruce, King, Matthew, Li, Wei and Wei, Xing (2002): QuASM: a system for question answering using semi-structured data. In: JCDL02: Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries 2002. pp. 46-55. Available online

This paper describes a system for question answering using semi-structured metadata, QuASM (pronounced "chasm"). Question answering systems aim to improve search performance by providing users with specific answers, rather than having users scan retrieved documents for these answers. Our goal is to answer factual questions by exploiting the structure inherent in documents found on the World Wide Web (WWW). Based on this structure, documents are indexed into smaller units and associated with metadata. Transforming table cells into smaller units associated with metadata is an important part of this task. In addition, we report on work to improve question classification using language models. The domain used to develop this system is documents retrieved from a crawl of www.fedstats.gov.

Copyrights may apply

Edit | Del

Lavrenko, Victor, Choquette, Martin and Croft, W. Bruce (2002): Cross-lingual relevance models. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2002. pp. 175-182. Available online

We propose a formal model of Cross-Language Information Retrieval that does not rely on either query translation or document translation. Our approach leverages recent advances in language modeling to directly estimate an accurate topic model in the target language, starting with a query in the source language. The model integrates popular techniques of disambiguation and query expansion in a unified formal framework. We describe how the topic model can be estimated with either a parallel corpus or a dictionary. We test the framework by constructing Chinese topic models from English queries and using them in the CLIR task of TREC9. The model achieves performance around 95% of the strong mono-lingual baseline in terms of average precision. In initial precision, our

Copyrights may apply

Edit | Del

Cronen-Townsend, Steve, Zhou, Yun and Croft, W. Bruce (2002): Predicting query performance. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2002. pp. 299-306. Available online

We develop a method for predicting query performance by computing the relative entropy between a query language model and the corresponding collection language model. The resulting clarity score measures the coherence of the language usage in documents whose models are likely to generate the query. We suggest that clarity scores measure the ambiguity of a query with respect to a collection of documents and show that they correlate positively with average precision in a variety of TREC test sets. Thus, the clarity score may be used to identify ineffective queries, on average, without relevance information. We develop an algorithm for automatically setting the clarity score threshold between predicted poorly-performing queries and acceptable queries and validate it using TREC data. In particular, we compare the automatic thresholds to optimum thresholds and also check how frequently results as good are achieved in sampling experiments that randomly assign queries to the two classes.

Copyrights may apply

Edit | Del

Murdock, Vanessa and Croft, W. Bruce (2002): Task orientation in question answering. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2002. pp. 355-356. Available online

Edit | Del

Kelly, Diane, Yuan, Xiao-Jun, Belkin, Nicholas J., Murdock, Vanessa and Croft, W. Bruce (2002): Features of documents relevant to task- and fact-oriented questions. In: Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management November 4-9, 2002, McLean, VA, USA. pp. 645-647. Available online

Edit | Del

Liu, Xiaoyong and Croft, W. Bruce (2002): Passage retrieval based on language models. In: Proceedings of the 2002 ACM CIKM International Conference on Information and Knowledge Management November 4-9, 2002, McLean, VA, USA. pp. 375-382. Available online

Edit | Del

Luk, Robert W. P., Leong, Hong Va, Dillon, Tharam S., Chan, Alvin T. S., Croft, W. Bruce and Allan, James (2002): A survey in indexing and searching XML documents. In JASIST - Journal of the American Society for Information Science and Technology, 53 (6) pp. 415-437

» 2001 «

Edit | Del

Lavrenko, Victor and Croft, W. Bruce (2001): Relevance based language models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2001. pp. 120-127. Available online

We explore the relation between classical probabilistic models of information retrieval and the emerging language modeling approaches. It has long been recognized that the primary obstacle to effective performance of classical models is the need to estimate a relevance model: probabilities of words in the relevant class. We propose a novel technique for estimating these probabilities using the query alone. We demonstrate that our technique can produce highly accurate relevance models, addressing important notions of synonymy and polysemy. Our experiments show relevance models outperforming baseline language modeling systems on TREC retrieval and TDT tracking tasks. The main contribution of this work is an effective formal method for estimating a relevance model with no training data.

Copyrights may apply

Edit | Del

Lawrie, Dawn, Croft, W. Bruce and Rosenberg, Arnold (2001): Finding topic words for hierarchical summarization. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2001. pp. 349-357. Available online

Hierarchies have long been used for organization, summarization, and access to information. In this paper we define summarization in terms of a probabilistic language model and use the definition to explore a new technique for automatically generating topic hierarchies by applying a graph-theoretic algorithm, which is an approximation of the Dominating Set Problem. The algorithm efficiently chooses terms according to a language model. We compare the new technique to previous methods proposed for constructing topic hierarchies including subsumption and lexical hierarchies, as well as the top TF.IDF terms. Our results show that the new technique consistently performs as well as or better than these other techniques. They also show the usefulness of hierarchies compared with a list of terms.

Copyrights may apply

» 2000 «

Edit | Del

Xu, Jinxi and Croft, W. Bruce (2000): Improving the effectiveness of information retrieval with local context analysis. In ACM Transactions on Information Systems, 18 (1) pp. 79-112

Techniques for automatic query expansion have been extensively studied in information research as a means of addressing the word mismatch between queries and documents. These techniques can be categorized as either global or local. While global techniques rely on analysis of a whole collection to discover word relationships, local techniques emphasize analysis of the top-ranked documents retrieved for a query. While local techniques have shown to be more effective that global techniques in general, existing local techniques are not robust and can seriously hurt retrieved when few of the retrieval documents are relevant. We propose a new technique, called local context analysis, which selects expansion terms based on cooccurrence with the query terms within the top-ranked documents. Experiments on a number of collections, both English and non-English, show that local context analysis offers more effective and consistent retrieval results.

Copyrights may apply

» 1999 «

Edit | Del

Greiff, Warren R., Croft, W. Bruce and Turtle, Howard (1999): PIC matrices: a computationally tractable class of probabilistic query operators. In ACM Transactions on Information Systems, 17 (4) pp. 367-405

The inference network model of information retrieval allows a probabilistic interpretation of query operators. In particular, Boolean query operators are conveniently modeled as link matrices of the Bayesian Network. Prior work has shown, however, that these operators do not perform as well as the pnorm operators used for modeling query operators in the context of the vector space model. This motivates the search for alternative probabilistic formulations for these operators. The design of such alternatives must contend with the issue of computational tractability, since the evaluation of an arbitrary operator requires exponential time. We define a flexible class of link matrices that are natural candidates for the implementation of query operators and an O(n{sup:2}) algorithm (n = the number of parent nodes) for the computation of probabilities involving link matrices of this class. We present experimental results indicating that Boolean operators implemented in terms of link matrices from this class perform as well as pnorm operators in the context of the INQUERY inference network.

Copyrights may apply

Edit | Del

Xu, Jinxi and Croft, W. Bruce (1999): Cluster-Based Language Models for Distributed Retrieval. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1999. pp. 254-261. Available online

Edit | Del

Song, Fei and Croft, W. Bruce (1999): A General Language Model for Information Retrieval. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1999. pp. 279-280. Available online

Edit | Del

Song, Fei and Croft, W. Bruce (1999): A General Language Model for Information Retrieval. In: Proceedings of the 1999 ACM CIKM International Conference on Information and Knowledge Management November 2-6, 1999, Kansas City, Missouri, USA. pp. 316-321. Available online

» 1998 «

Edit | Del

Xu, Jinxi and Croft, W. Bruce (1998): Corpus-Based Stemming using Cooccurrence of Word Variants. In ACM Transactions on Information Systems, 16 (1) pp. 61-81

Stemming is used in many information retrieval (IR) systems to reduce variant word forms to common roots. It is one of the simplest applications of natural-language processing to IR and is one of the most effective in terms of user acceptance and consistency, though small retrieval improvements. Current stemming techniques do not, however, reflect the language use in specific corpora, and this can lead to occasional serious retrieval failures. We propose a technique for using corpus-based word variant cooccurrence statistics to modify or create a stemmer. The experimental results generated using English newspaper and legal text and Spanish text demonstrate the viability of this technique and its advantages relative to conventional approaches that only employ morphological rules.

Copyrights may apply

Edit | Del

Ballesteros, Lisa and Croft, W. Bruce (1998): Resolving Ambiguity for Cross-Language Retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1998. pp. 64-71. Available online

One of the main hurdles to improved CLIR effectiveness is resolving ambiguity associated with translation. Availability of resources is also a problem. First we present a technique based on co-occurrence statistics from unlinked corpora which can be used to reduce the ambiguity associated with phrasal and term translation. We then combine this method with other techniques for reducing ambiguity and achieve more than 90% monolingual effectiveness. Finally, we compare the co-occurrence method with parallel corpus and machine translation techniques and show that good retrieval effectiveness can be achieved without complex resources.

Copyrights may apply

Edit | Del

Ponte, Jay M. and Croft, W. Bruce (1998): A Language Modeling Approach to Information Retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1998. pp. 275-281. Available online

Models of document indexing and document retrieval have been extensively studied. The integration of these two classes of models has been the goal of several researchers but it is a very difficult problem. We argue that much of the reason for this is the lack of an adequate indexing model. This suggests that perhaps a better indexing model would help solve the problem. However, we feel that making unwarranted parametric assumptions will not lead to better retrieval performance. Furthermore, making prior assumptions about the similarity of documents is not warranted either. Instead, we propose an approach to retrieval based on probabilistic language modeling. We estimate models for each document individually. Our approach to modeling is non-parametric and integrates document indexing and document retrieval into a single model. One advantage of our approach is that collection statistics which are used heuristically in many other retrieval models are an integral part of our model. We have implemented our model and tested it empirically. Our approach significantly outperforms standard tf*idf weighting on two different collections and query sets.

Copyrights may apply

Edit | Del

Shneiderman, Ben, Byrd, Donald and Croft, W. Bruce (1998): Sorting Out Searching: A User-Interface Framework for Text Searches. In Communications of the ACM, 41 (4) pp. 95-98

» 1997 «

Edit | Del

Pyreddy, Pallavi and Croft, W. Bruce (1997): TINTIN: A System for Retrieval in Text Tables. In: DL97: Proceedings of the 2nd ACM International Conference on Digital Libraries 1997. pp. 193-200. Available online

Tables form an important kind of data element in text retrieval. Often, the gist of an entire news article or other exposition can be concisely captured in tabular form. In this paper, we examine the utility of exploiting information other than the key words in a digital document to provide the users with more flexible and powerful query capabilities. More specifically, we exploit the structural information in a document to identify tables and their component fields and let the users query based on these fields. Our empirical results have demonstrated that heuristic method based table extraction and component tagging can be performed effectively and efficiently. Moreover, our experiments in retrieval using the TINTIN system have strongly indicated that such structural decomposition can facilitate better representation of user's information needs and hence more effective retrieval of tables.

Copyrights may apply

Edit | Del

Ballesteros, Lisa and Croft, W. Bruce (1997): Phrasal Translation and Query Expansion Techniques for Cross-Language Information Retrieval. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1997. pp. 84-91. Available online

Dictionary methods for cross-language information retrieval give performance below that for mono-lingual retrieval. Failure to translate multi-term phrases has been shown to be one of the factors responsible for the errors associated with dictionary methods. First, we study the importance of phrasal translation for this approach. Second, we explore the role of phrases in query expansion via local context analysis and local feedback and show how they can be used to significantly reduce the error associated with automatic dictionary translation.

Copyrights may apply

Edit | Del

Greiff, Warren R., Croft, W. Bruce and Turtle, Howard (1997): Computationally Tractable Probabilistic Modelling of Boolean Operators. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1997. pp. 119-128. Available online

The inference network model of information retrieval allows for a probabilistic interpretation of Boolean query operators. Prior work has shown, however, that these operators do not perform as well as the pnorm operators developed in the context of the vector space model. The design of alternative operators in the inference network framework must contend with the issue of computational tractability. We define a flexible class of link matrices that are natural candidates for the implementation of Boolean operators and an O(n{sup:2}) algorithm for the computation of probabilities involving link matrices of this class. We present experimental results indicating that Boolean operators implemented in terms of link matrices from this class perform as well as pnorm operators.

Copyrights may apply

» 1996 «

Edit | Del

Manmatha, R., Han, Chengfeng, Riseman, E. M. and Croft, W. Bruce (1996): Indexing Handwriting Using Word Matching. In: DL96: Proceedings of the 1st ACM International Conference on Digital Libraries 1996. pp. 151-159. Available online

There are many historical manuscripts written in a single hand which it would be useful to index. Examples include the W. B. DuBois collection at the University of Massachusetts and the early Presidential libraries at the Library of Congress. The standard technique for indexing documents is to scan them in, convert them to machine readable form (ASCII) using Optical Character Recognition (OCR) and then index them using a text retrieval engine. However, OCR does not work well on handwriting. Here an alternative scheme is proposed for indexing such texts. Each page of the document is segmented into words. The images of the words are then matched against each other to create equivalence classes (each equivalence classes contains multiple instances of the same word). The user then provides ASCII equivalents for say the top 2000 equivalence classes. The current paper deals with the matching aspects of this process. Due to variations in even a single person's handwriting, it is expected that the matching will be the most difficult step in the whole process. A matching technique based on Euclidean distance mapping is discussed. Experiments are shown demonstrating the feasibility of the approach.

Copyrights may apply

Edit | Del

Xu, Jinxi and Croft, W. Bruce (1996): Query Expansion Using Local and Global Document Analysis. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1996. pp. 4-11. Available online

Automatic query expansion has long been suggested as a technique for dealing with the fundamental issue of word mismatch in information retrieval. A number of approaches to expansion have been studied and, more recently, attention has focused on techniques that analyze the corpus to discover word relationships (global techniques) and those that analyze documents retrieved by the initial query (local feedback). In this paper, we compare the effectiveness of these approaches and show that, although global analysis has some advantages, local analysis is generally more effective. We also show that using global analysis techniques, such as word context and phrase structure, on the local set of documents produces results that are both more effective and more predictable than simple local feedback.

Copyrights may apply

Edit | Del

Larkey, Leah S. and Croft, W. Bruce (1996): Combining Classifiers in Text Categorization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1996. pp. 289-297. Available online

Three different types of classifiers were investigated in the context of a text categorization problem in the medical domain: the automatic assignment of ICD9 codes to dictated inpatient discharge summaries. K-nearest-neighbor, relevance feedback, and Bayesian independence classifiers were applied individually and in combination. A combination of different classifiers produced better results than any single type of classifier. For this specific medical categorization problem, new query formulation and weighting methods used in the k-nearest-neighbor classifier improved performance.

Copyrights may apply

Edit | Del

Croft, W. Bruce, Broglio, John and Fujii, Hideo (1996): Applications of Multilingual Text Retrieval. In: HICSS 1996 1996. pp. 98-. Available online

» 1995 «

Edit | Del

Callan, James P., Lu, Zhihong and Croft, W. Bruce (1995): Searching Distributed Collections with Inference Networks. In: Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1995. pp. 21-28. Available online

The use of information retrieval systems in networked environments raises a new set of issues that have received little attention. These issues include ranking document collections for relevance to a query, selecting the best set of collections from a ranked list, and merging the document rankings that are returned from a set of collections. This paper describes methods of addressing each issue in the inference network model, discusses their implementation in the INQUERY system, and presents experimental results demonstrating their effectiveness.

Copyrights may apply

Edit | Del

Rajashekar, T. B. and Croft, W. Bruce (1995): Combining Automatic and Manual Index Representations in Probabilistic Retrieval. In JASIST - Journal of the American Society for Information Science and Technology, 46 (4) pp. 272-283

Edit | Del

Croft, W. Bruce (1995): NSF Center for Intelligent Information Retrieval. In Communications of the ACM, 38 (4) pp. 42-43

» 1993 «

Edit | Del

Haines, David and Croft, W. Bruce (1993): Relevance Feedback and Inference Networks. In: Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1993. pp. 2-11. Available online

Relevance feedback, which modifies queries using judgements of file relevance of a few, highly-ranked documents, has historically been an important method for increasing the performance of information retrieval systems. In this paper, we extend the inference network model introduced by Turtle and Croft to include relevance feedback techniques. The difference between relevance feedback on text abstracts and full text collections is studied. Preliminary results for relevance feedback on the structured queries supported by the inference net model are also reported.

Copyrights may apply

Edit | Del

Fujii, Hideo and Croft, W. Bruce (1993): A Comparison of Indexing Techniques for Japanese Text Retrieval. In: Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1993. pp. 237-246. Available online

A series of Japanese full-text retrieval experiments were conducted using an inference network document retrieval model. The retrieval performance of two major indexing methods, character-based and word-based, were evaluated. Using structured queries, the character-based indexing performed retrieval as well as, or slightly better, than the word-based system. This result has practical significance since the character-based indexing speed is considerably faster than the traditional word-based indexing. All the queries in this experiment were automatically formulated from natural language input.

Copyrights may apply

Edit | Del

Belkin, Nicholas J., Cool, C., Croft, W. Bruce and Callan, J. P. (1993): The Effect of Multiple Query Representations on Information Retrieval System Performance. In: Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1993. pp. 339-346. Available online

Five independently generated Boolean query formulations for ten different TREC topics were produced by ten different expert online searchers. These different formulations were grouped, and the groups, and combinations of them, were used as searches against the TREC test collection, using the INQUERY probabilistic inference network retrieval engine. Results show that progressive combination of query formulations leads to progressively improving retrieval performance. Results were compared against the performance of INQUERY natural language based queries, and in combination with them. The issue of recall as a performance measure in large databases was raised, since overlap between the searches conducted in this study, and the TREC-1 searches, was smaller than expected.

Copyrights may apply

Edit | Del

Callan, James P. and Croft, W. Bruce (1993): An Evaluation of Query Processing Strategies Using the TIPSTER Collection. In: Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1993. pp. 347-355. Available online

The TIPSTER collection is unusual because of both its size and detail. In particular, it describes a set of information needs, as opposed to traditional queries. These detailed representations of information need are an opportunity for research on different methods of formulating queries. This paper describes several methods of constructing queries for the INQUERY information retrieval system, and then evaluates those methods on the TIPSTER document collection. Both AdHoc and Routing query processing methods are evaluated.

Copyrights may apply

» 1992 «

Edit | Del

Krovetz, Robert and Croft, W. Bruce (1992): Lexical Ambiguity and Information Retrieval. In ACM Transactions on Information Systems, 10 (2) pp. 115-141

Lexical ambiguity is a pervasive problem in natural language processing. However, little quantitative information is available about the extent of the problem or about the impact that it has on information retrieval systems. We report on an analysis of lexical ambiguity in information retrieval test collections and on experiments to determine the utility of word meanings for separating relevant from nonrelevant documents. The experiments show that there is considerable ambiguity even in a specialized database. Word senses provide a significant separation between relevant and nonrelevant documents, but several factors contribute to determining whether disambiguation will make an improvement in performance. For example, resolving lexical ambiguity was found to have little impact on retrieval effectiveness for documents that have many words in common with the query. Other uses of word sense disambiguation in an information retrieval context are discussed.

Copyrights may apply

Edit | Del

Croft, W. Bruce, Smith, Lisa A. and Turtle, Howard (1992): A Loosely-Coupled Integration of a Text Retrieval System and an Object-Oriented Database System. In: Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1992. pp. 223-232. Available online

Document management systems are needed for many business applications. This type of system would combine the functionality of a database system, (for describing storing and maintaining documents with complex structure and relationships) with a text retrieval system (for effective retrieval based on full text). The retrieval model for a document management system is complicated by the variety and complexity of the objects that are represented. In this paper, we describe an approach to complex object retrieval using a probabilistic inference net model, and an implementation of this approach using a loose coupling of an object-oriented database system (IRIS) and a text retrieval system based on inference nets (INQUERY). The resulting system is used to store long, structured documents and can retrieve document components (sections, figures, etc.) based on their text contents or the contents of related components. The lessons learnt from the implementation are discussed.

Copyrights may apply

Edit | Del

Croft, W. Bruce, Fuhr, Norbert, Harman, Donna and Stanfill, Craig (1992): Experience with Large Document Collections. In: Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1992. p. 347. Available online

Edit | Del

Belkin, Nicholas J. and Croft, W. Bruce (1992): Information Filtering and Information Retrieval: Two Sides of the Same Coin?. In Communications of the ACM, 35 (12) pp. 29-38

» 1991 «

Edit | Del

Croft, W. Bruce (1991): Editorial. In ACM Transactions on Information Systems, 9 (3) p. 185

Edit | Del

Turtle, Howard and Croft, W. Bruce (1991): Evaluation of an Inference Network-Based Retrieval Model. In ACM Transactions on Information Systems, 9 (3) pp. 187-222

The use of inference networks to support document retrieval is introduced. A network-based retrieval model is described and compared to conventional probabilistic and Boolean models. The performance of a retrieval system based on the inference network model is evaluated and compared to performance with conventional retrieval models.

Copyrights may apply

Edit | Del

Croft, W. Bruce, Turtle, Howard and Lewis, David D. (1991): The Use of Phrases and Structured Queries in Information Retrieval. In: Proceedings of the Fourteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1991. pp. 32-45. Available online

Both phrases and Boolean queries have a long history in information retrieval, particularly in commercial systems. In previous work, Boolean queries have been used as a source of phrases for a statistical retrieval model. This work, like the majority of research on phrases, resulted in little improvement in retrieval effectiveness. In this paper, we describe an approach where phrases identified in natural language queries are used to build structured queries for a probabilistic retrieval model. Our results show that using phrases in this way can improve performance, and that phrases that are automatically extracted from a natural language query perform nearly as well as manually selected phrases.

Copyrights may apply

» 1990 «

Edit | Del

Croft, W. Bruce, Belkin, Nicholas J., Bruandet, Marie-France, Kuhlen, Rainer and Oren, Tim (1990): Hypertext and Information Retrieval: What are the Fundamental Concepts?. In: Rizk, Antoine, Streitz, Norbert A. and Andre, Jacques (eds.) ECHT 90 - European Conference on Hypertext November 27-30, 1990, Versailles, France. pp. 362-366.

Both hypertext and information retrieval (IR) systems provide access to databases consisting primarily of text documents. Both types of systems structure the content of these documents and support interaction with the users in order to improve the effectiveness of retrieval. Despite these similarities, hypertext and IR are generally regarded as separate research areas, with some overlap, but essentially different research agendas. To clarify these differences as well as the areas of overlap, the members of this panel will attempt to define the fundamental concepts and the major research issues in each area, with special emphasis on their own research.

Copyrights may apply

Edit | Del

Mahling, Dirk E. and Croft, W. Bruce (1990): An Interface for the Acquisition and Display of Office Procedures. In: Lochovsky, Frederick H. and Allen, Robert (eds.) Proceedings of the Conference on Office Information Systems 1990 April 25-27, 1990, Cambridge, Massachusetts, USA. pp. 123-130.

A central problem in the design of intelligent office systems is the acquisition of knowledge about office procedures. In this paper we describe a graphical interface for the acquisition and display of office procedures from a goal- and plan-based perspective. The DACRON interface is based on a model of the office workers' view of work. DACRON supports the acquisition of plan knowledge by providing graphical representations of domain entities from the users' point of view. It allows the display and view of office procedures graphically. An experimental usability study, involving more than twenty subjects, shows that DACRON can be used to acquire plan knowledge and give relevant advice.

Copyrights may apply

Edit | Del

Turtle, Howard and Croft, W. Bruce (1990): Inference Networks for Document Retrieval. In: Proceedings of the Thirteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1990. pp. 1-24.

The use of inference networks to support document retrieval is introduced. A network-based retrieval model is described and compared to conventional probabilistic and Boolean models.

Copyrights may apply

Edit | Del

Croft, W. Bruce and Das, Raj (1990): Experiments with Query Acquisition and Use in Document Retrieval Systems. In: Proceedings of the Thirteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1990. pp. 349-368.

In some recent experimental document retrieval systems, emphasis has been placed on the acquisition of a detailed model of the information need through interaction with the user. It has been argued that these "enhanced" queries, in combination with relevance feedback, will improve retrieval performance. In this paper, we describe a study with the aim of evaluating how easily enhanced queries can be acquired from users and how effectively this additional knowledge can be used in retrieval. The results indicate that significant effectiveness benefits can be obtained through the acquisition of domain concepts related to query concepts, together with their level of importance to the information need.

Copyrights may apply

Edit | Del

Lewis, David D. and Croft, W. Bruce (1990): Term Clustering of Syntactic Phrases. In: Proceedings of the Thirteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1990. pp. 385-404.

Term clustering and syntactic phrase formation are methods for transforming natural language text. Both have had only mixed success as strategies for improving the quality of text representations for document retrieval. Since the strengths of these methods are complementary, we have explored combining them to produce superior representations. In this paper we discuss our implementation of a syntactic phrase generator, as well as our preliminary experiments with producing phrase clusters. These experiments show small improvements in retrieval effectiveness resulting from the use of phrase clusters, but it is clear that corpora much larger than standard information retrieval test collections will be required to thoroughly evaluate the use of this technique.

Copyrights may apply

Edit | Del

Mahling, Dirk E., Sandvik, Oddmar A. and Croft, W. Bruce (1990): Visual Interaction Between End Users and Goal Based Systems. In: VL 1990 1990. pp. 182-186.

» 1989 «

Edit | Del

Croft, W. Bruce and Turtle, Howard (1989): A Retrieval Model for Incorporating Hypertext Links. In: Halasz, Frank and Meyrowitz, Norman (eds.) Proceedings of ACM Hypertext 89 Conference November 5-8, 1989, Pittsburgh, Pennsylvania. pp. 213-224.

Edit | Del

Mahling, Dirk E. and Croft, W. Bruce (1989): Relating Human Knowledge of Tasks to the Requirements of Plan Libraries. In International Journal of Man-Machine Studies, 31 (1) pp. 61-97

This article explores the fundamental issues of plan knowledge acquisition from domain experts. The general question is: Are humans with their knowledge of a domain and its procedures able to provide a planner with the necessary information for automatic planning? To answer this question we first review the requirements of the plan library of a situation calculus based planner. Then we review existing frameworks for the representation of human activity knowledge and investigate to what extent these frameworks address the requirements. A major factor in evaluating the frameworks is the psychological reality the framework has to the individual. From this review and interviews we conducted in a pilot study, we construct a framework for task recall. In this framework, the representation of a recallable activity is called an act. An act consists of a goal, a pre-situation, an operations-list and a post-situation. Acts can be decomposed and put into sequences. In experiments with the framework, we find support for all our hypotheses except the one dealing with effects. Further investigation of this issue is discussed.

Copyrights may apply

Edit | Del

Thompson, R. H. and Croft, W. Bruce (1989): Support for Browsing in an Intelligent Text Retrieval System. In International Journal of Man-Machine Studies, 30 (6) pp. 639-668

Browsing is potentially an extremely important technique for retrieving text documents from large knowledge bases. The advantages of this technique are that users get immediate feedback from the structure of the knowledge base and exert complete control over the outcome of the search. The primary disadvantages are that it is easy to get lost in a complex network of nodes representing documents and concepts, and there is no guarantee that a browsing search will be as effective as a more conventional search. In this paper, we show how a browsing capability can be integrated into an intelligent text retrieval system. The disadvantages mentioned above are avoided by providing facilities for controlling the browsing and for using the information derived during browsing in more formal search strategies. The architecture of the text retrieval system is described and the browsing techniques are illustrated using an example session.

Copyrights may apply

Edit | Del

Croft, W. Bruce (1989): Research and Development in Information Retrieval. In ACM Transactions on Information Systems, 7 (3) pp. 181-182

This Special Issue contains selected papers from the SIGIR Conference on Research and Development in Information Retrieval held at Cambridge, Massachusetts in June, 1989. The papers were selected by the program committee and revised for publication in TOIS. Information retrieval is a diverse field of research, and the areas covered at this conference include formal models, search strategies, hypermedia, storage structures, evaluation, natural language processing, interfaces, and knowledge-based architectures. The unifying goal of this research is the efficient and effective retrieval of complex, multimedia objects, with a primary focus on text documents.

Copyrights may apply

Edit | Del

Krovetz, Robert and Croft, W. Bruce (1989): Word Sense Disambiguation Using Machine-Readable Dictionaries. In: Proceedings of the Twelfth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1989. pp. 127-136.

Most approaches to full-text information retrieval currently index documents based on the words they contain, and retrieve them based on the word's frequency of occurrence. This can cause many irrelevant documents to be retrieved because words are often ambiguous. We propose an approach in which documents are indexed by word senses, and in which these senses are taken from a machine-readable dictionary. We review some of the work on machine-readable dictionaries and the approaches that have been taken to word sense disambiguation. We then discuss our own approach to the problem based on the use of multiple sources of evidence. We conclude with the results of some experiments that indicate the degree to which lexical ambiguity is a factor in current systems.

Copyrights may apply

» 1988 «

Edit | Del

Borgman, Christine L., Belkin, Nicholas J., Croft, W. Bruce, Lesk, Michael E. and Landauer, Thomas K. (1988): Retrieval Systems for the Information Seeker: Can the Role of the Intermediary be Automated?. In: Soloway, Elliot, Frye, Douglas and Sheppard, Sylvia B. (eds.) Proceedings of the ACM CHI 88 Human Factors in Computing Systems Conference June 15-19, 1988, Washington, DC, USA. pp. 51-53.

The introduction of automated information retrieval (IR) systems was met with great enthusiasm and predictions that manual literature searching soon would be replaced. Three decades later, IR systems have not progressed to the stage where any but the dedicated few can operate them without a highly skilled human intermediary acting as interface between user and system. In the interim, we have learned that the retrieval process is extremely complex both in terms of understanding people and their communication and in terms of understanding scientific information and technical vocabulary. Experiments with new techniques suggest to many the possibility of eliminating the human intermediary, either in large part or altogether; others would argue that the retrieval problems are too complex to be resolved for more than highly restricted domains. The possibility of eliminating the human intermediary is of current research interest to the several disciplines that are represented on this panel.

Copyrights may apply

Edit | Del

Croft, W. Bruce and Savino, Pasquale (1988): Implementing Ranking Strategies Using Text Signatures. In ACM Transactions on Information Systems, 6 (1) pp. 42-62

Signature files provide an efficient access method for text in documents, but retrieval is usually limited to finding documents that contain a specified Boolean pattern of words. Effective retrieval requires that documents with similar meanings be found through a process of plausible inference. The simplest way of implementing this retrieval process is to rank documents in order of their probability of relevance. In this paper techniques are described for implementing probabilistic ranking strategies with sequential and bit-sliced signature files and the limitations of these implementations with regard to their effectiveness are pointed out. A detailed comparison is made between signature-based ranking techniques and ranking using term-based document representatives and inverted files. The comparison shows that term-based representations are at least competitive (in terms of efficiency) with signature files and, in some situations, superior.

Copyrights may apply

Edit | Del

Croft, W. Bruce and Lefkowitz, Lawrence S. (1988): Using a Planner to Support Office Work. In: Allen, Robert (ed.) Proceedings of the Conference on Office Information Systems 1988 March 23-25, 1988, Palo Alto, California, USA. pp. 55-62.

Supporting a wide range of activities in offices has been a major objective for designers of office systems. The complex nature of office work and the fact that there are no simple limits on the amount of domain knowledge required to do this work have made the achievement of this objective very difficult. Planning and representation techniques from artificial intelligence appear to have some advantages for this task in terms of flexibility and adaptability. In this paper we describe the POLYMER planning system and representation language. In particular, we point out the system components that are required to make planning useful in real environments. The operation of the system is illustrated using an example activity.

Copyrights may apply

Edit | Del

Croft, W. Bruce and Krovetz, Robert (1988): Interactive Retrieval of Office Documents. In: Allen, Robert (ed.) Proceedings of the Conference on Office Information Systems 1988 March 23-25, 1988, Palo Alto, California, USA. pp. 228-235.

Office information systems are being used to describe and store documents with complex structure and multimedia content. Users of these systems can potentially make very complex specifications of the structure, layout and content of the documents they wish to retrieve. Although these complex queries could be more effective in identifying relevant documents, it is important that a well-defined model of retrieval is used, both as the basis for the retrieval strategies and the user interface. In this paper, we present a system (OFFICER) for the retrieval of office documents that is based on a model of plausible inference. The OFFICER query interface allows the specification of uncertain queries and combines uncertainties in the matching of queries and documents to produce an overall ranking for the documents.

Copyrights may apply

Edit | Del

Croft, W. Bruce, Lucia, T. J. and Cohen, P. R. (1988): Retrieving Documents by Plausible Inference: A Preliminary Study. In: Proceedings of the Eleventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1988. pp. 481-494.

Choosing an appropriate document representation and search strategy for document retrieval has been largely guided by achieving good average performance instead of optimizing the results for each individual query. A model of retrieval based on plausible inference gives us a different perspective and suggests that techniques should be found for combining multiple sources of evidence (or search strategies) into an overall assessment of a document's relevance, rather than attempting to pick a single strategy. In this paper, we explain our approach to plausible inference for retrieval and describe some preliminary experiments designed to test this approach. The experiments use a spreading activation search to implement the plausible inference process. The results show that significant effectiveness improvements are possible using this approach.

Copyrights may apply

» 1987 «

Edit | Del

Croft, W. Bruce and Lewis, David D. (1987): An Approach to Natural Language Processing for Document Retrieval. In: Proceedings of the Tenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1987. pp. 26-32.

Document retrieval systems have been restricted, by the nature of the task, to techniques that can be used with large numbers of documents and broad domains. The most effective techniques that have been developed are based on the statistics of word occurrences in text. In this paper, we describe an approach to using natural language processing (NLP) techniques for what is essentially a natural language problem-the comparison of a request text with the text of document titles and abstracts. The proposed NLP techniques are used to develop a request model based on "conceptual case frames" and to compare this model with the texts of candidate documents. The request model is also used to provide information to statistical search techniques that identify the candidate documents. As part of a preliminary evaluation of this approach, case frame representations of a set of requests from the CACM collection were constructed. Statistical searches carried out using dependency and relative importance information derived from the request models indicate that performance benefits can be obtained.

Copyrights may apply

» 1986 «

Edit | Del

Croft, W. Bruce (1986): User-Specified Domain Knowledge for Document Retrieval. In: Proceedings of the Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1986. pp. 201-206.

The introduction of domain knowledge into a document retrieval system has two important consequences; an increase in the effectiveness of retrieval and a decrease in the efficiency of text processing. In this paper, a method is presented of combining user-specified domain knowledge with efficient retrieval techniques based on probabilistic models. The domain knowledge is represented as a collection of frames that contain rules specifying recognition conditions for domain concepts and relationships between concepts. The inference network represented in these frames is used to infer the concepts that are related to a user's query. This approach is being implemented as part of the I{cubed}R expert intermediary system.

Copyrights may apply

» 1984 «

Edit | Del

Croft, W. Bruce (1984): The Role of Context and Adaptation in User Interfaces. In International Journal of Man-Machine Studies, 21 (4) pp. 283-292

A user interface can be viewed as a means of mapping user tasks to system tools. Context and adaptation are important features of a user/system interaction that can be used to simplify the task to tool mapping and thereby improve the interface. A system based on these features would be able to adapt its actions to be appropriate for a given context. Two systems are used as examples of the use of context and adaptation. The POISE system provides assistance to the users of an office system based on models of office tasks. The adaptive document-retrieval system chooses the most effective search strategy for retrieving relevant documents in a given context. The techniques used to implement context and adaptation in these systems are considerably different, but in both systems the user interface is made more effective.

Copyrights may apply

Edit | Del

Croft, W. Bruce and Lefkowitz, Lawrence S. (1984): Task Support in an Office System. In ACM Transactions on Information Systems, 2 (3) pp. 197-212

A major goal of an office system is to support tasks that are central to office functions. Some office tasks are readily implemented with generic office tools, such as calendars, forms packages, and mail. Many tasks, however, involve complex sequences of actions which do not all correspond to tool invocations but, instead, rely on the problem-solving abilities of office workers. In this paper we describe a system (POISE) that can be used to both automate routine tasks and provide assistance in more complex situations. The type of assistance provided can range from maintaining a record of the tasks currently being executed to suggesting possible next steps and answering natural language queries about the tasks. The POISE system uses both a procedure-based and a goal-based representation of the tasks to achieve efficiency and flexibility. The mechanisms used by POISE are described with example procedures from a university office.

Copyrights may apply

Edit | Del

Croft, W. Bruce and Lefkowitz, Lawrence S. (1984): Task Support in an Office System. In: Ellis, Clarence (ed.) Proceedings of the Second ACM-SIGOA Conference on Office Information Systems 1984 June 25-27, 1984, Toronto, Canada. pp. 22-24.

ADD PUBLICATION
SHOW THIS LIST ON YOUR HOMEPAGE

What do YOU think?

Give us your opinion! Do you have any comments/additions
that you would like other visitors to see?

 
comment You say: Mar 19th, 2010
#1
Be the first to add a thoughtful note to this page ! 

  will be spam-protected
 

 
How many?
=
e.g. "6"
 

Changes to this page (author)

11 Feb 2010: Enabled abstracts to be shown on W. Bruce Croft's author page.
18 Aug 2009: Author was edited
17 Aug 2009: Author was edited
17 Aug 2009: Author was edited
17 Aug 2009: Author was edited
19 Jun 2009: Author was edited
19 Jun 2009: Author was edited
19 Jun 2009: Author was edited
16 Jun 2009: Author was edited
12 Jun 2009: Author was edited
01 Jun 2009: Author was edited
31 May 2009: Author was edited
30 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
08 Apr 2009: Author was edited
08 Apr 2009: Author was edited
08 Apr 2009: Author was edited
08 Apr 2009: Author was edited
12 May 2008: Author was edited
12 May 2008: Author was edited
12 May 2008: Author was edited
12 May 2008: Author was edited
12 May 2008: Author was edited
12 May 2008: Author was edited
25 Jun 2007: Author was edited
25 Jun 2007: Author was edited
25 Jun 2007: Author was edited
25 Jun 2007: Author was edited
25 Jun 2007: Author was edited
25 Jun 2007: Author was edited
25 Jun 2007: Author was edited
25 Jun 2007: Author was edited
25 Jun 2007: Author was edited
25 Jun 2007: Author was edited
25 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was added to the bibliography
24 Jun 2007: Author was edited
28 Apr 2003: Added the author to the bibliography

Publication statistics

Publication period:1984-2008
Publication count:102
Number of co-authors:107



Productive colleagues

W. Bruce Croft's 3 most productive colleagues in number of publications:

Ben Shneiderman:206
James Allan:52
Jamie Callan:43


Collaboration count

Number of publications with 3 favourite co-authors:

Howard Turtle:8
Xiaoyong Liu:5
Yun Zhou:5

 

Other options

Learn more about W. Bruce Croft:
- Google Scholar
- ACM
- CSB

Mar 19

As a rule, software systems do not work well until they have been used, and have failed repeatedly, in real applications.

-- Dave Parnas

  • Share this quote on... Bookmark and Share
  • Get more quotes

Eva Hornecker on Tangible Interaction

Eva Hornecker explains the evolving concept of Tangible Interaction.

Read Eva's insightful entry here..

Help us help you!

  • Spread the word: Bookmark and Share
  • Donate
  • Other ways to help
 

Page information

Page maintainer: The Editorial Team
How to cite/reference this page
URL: http://www.interaction-design.org/references/authors/w__bruce_croft.html