It is easy for me to access this knowledge pool, I want it to grow so that I can grow along

Last 3 Donors


Support us

Funding progress for 2010:

Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval


 
Time and place:

2004
Series:
Conf. description:
SIGIR is the major international forum for the presentation of new research results and the demonstration of new systems and techniques in the field of information retrieval.
Help us!
Do you know when the next conference is? If yes, please add it to the calendar!
Publisher:
EDIT

References from this conference (2004)

The following articles are from "Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval":

 what's this?

Articles

p. 1

Bell, Gordon, Gemmell, Jim and Lueder, Roger (2004): Challenges in using lifetime personal information stores. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. p. 1. Available online

Within five years, our personal computers with terabyte disk drives will be able to store everything we read, write, hear, and many of the images we see including video. Vannevar Bush outlined such a system in his famous 1945 Memex article [1]. For the last four years we have worked on MyLifeBits www. MyLifeBits.com http://www.MyLifeBits.com, a system to digitally store everything from one's life, including books, articles, personal financial records, memorabilia, email, written correspondence, photos (time, location taken), telephone calls, video, television programs, and web pages visited. We recently added content from personal devices that automatically record photos and audio. The project started with the capture of Bell's content [2], followed by an effort to explore the use of the SQL database for storage and retrieval. Work has continued along these lines to extend content capture from every useful source e.g. a meeting capture system. The second phase of the project includes the design of tools and links for annotation, collections, cluster analysis, facets for characterizing the content, creation of timelines and stories, and other inherent database related capabilities, e.g. the ability to pivot on an event or photo or person to retrieve linked information [3]. Ideally we would like to have a system that would read every document, extract meta-data (e.g. Dublin Core) and classify it using multiple ontologies, faceted classifications, or the relevant. While such a system has implications for future computing devices and their users, these systems will only exist if we can effectively utilize the vast personal stores. Although our system is exploratory, the Stuff I've Seen system [4] demonstrates the utility and necessity of easy search and access to one's own data. Other research efforts with similar goals relating to personal information include Haystack [5], LifeStreams [6], and the UK "Memories for Life" Grand Challenge. There are serious research issues beyond the problem of making the information useful through rapid and easy retrieval. The "Dear Appy" problem ("Dear Appy, My application, or platform, or media left me unreadable. Signed, Lost Data") is unsettling to archivists and computer professionals -- and must be solved. Just navigating the stored life of individual would at first glance appear to take almost a lifetime to sift through. While we are making progress in the capture of less traditionally archived content (e.g. meetings, phone calls & video), automatic interpretation and index of voice are illusive. MyLifeBits is currently focused on retrieval including the hopefully automatic, addition of meta-data e.g. document type identification, high level knowledge. While such data is essential for the archivist, it is unclear how useful such meta-data is to a one's own information; without such higher level knowledge and concepts, the vast amount of raw bits may be completely unusable. The most cited problem of personal archives is the control of the content including personal security, together with joint ownership of content by other individuals and organizations. In many corporations, periodic expunging of documents is the standard. Similarly, the aspects of a person's life not available in public documents is owned by the organization and all documents may have to be tagged in such a way that it can be expunged, if necessary, when an individual is no longer part of the organization. The HPPA law in the US and even more stringent privacy laws in other counties have major implications for personal stores.

Copyrights may apply

p. 10-17

Amitay, Einat, Carmel, David, Lempel, Ronny and Soffer, Aya (2004): Scaling IR-system evaluation using term relevance sets. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 10-17. Available online

This paper describes an evaluation method based on Term Relevance Sets Trels that measures an IR system's quality by examining the content of the retrieved results rather than by looking for pre-specified relevant pages. Trels consist of a list of terms believed to be relevant for a particular query as well as a list of irrelevant terms. The proposed method does not involve any document relevance judgments, and as such is not adversely affected by changes to the underlying collection. Therefore, it can better scale to very large, dynamic collections such as the Web. Moreover, this method can evaluate a system's effectiveness on an updatable "live" collection, or on collections derived from different data sources. Our experiments show that the proposed method is very highly correlated with official TREC measures.

Copyrights may apply

p. 104-111

Kokiopoulou, E. and Saad, Y. (2004): Polynomial filtering in latent semantic indexing for information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 104-111. Available online

Latent Semantic Indexing (LSI) is a well established and effective framework for conceptual information retrieval. In traditional implementations of LSI the semantic structure of the collection is projected into the k-dimensional space derived from a rank-k approximation of the original term-by-document matrix. This paper discusses a new way to implement the LSI methodology, based on polynomial filtering. The new framework does not rely on any matrix decomposition and therefore its computational cost and storage requirements are low relative to traditional implementations of LSI. Additionally, it can be used as an effective information filtering technique when updating LSI models based on user feedback.

Copyrights may apply

p. 122-129

Canny, John (2004): GaP: a factor model for discrete data. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 122-129. Available online

We present a probabilistic model for a document corpus that combines many of the desirable features of previous models. The model is called "GaP" for Gamma-Poisson, the distributions of the first and last random variable. GaP is a factor model, that is it gives an approximate factorization of the document-term matrix into a product of matrices A and X. These factors have strictly non-negative terms. GaP is a generative probabilistic model that assigns finite probabilities to documents in a corpus. It can be computed with an efficient and simple EM recurrence. For a suitable choice of parameters, the GaP factorization maximizes independence between the factors. So it can be used as an independent-component algorithm adapted to document data. The form of the GaP model is empirically as well as analytically motivated. It gives very accurate results as a probabilistic model (measured via perplexity) and as a retrieval model. The GaP model projects documents and terms into a low-dimensional space of "themes," and models texts as "passages" of terms on the same theme.

Copyrights may apply

p. 130-137

Lau, Raymond Y. K., Bruza, Peter D. and Song, Dawei (2004): Belief revision for adaptive information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 130-137. Available online

Applying Belief Revision logic to model adaptive information retrieval is appealing since it provides a rigorous theoretical foundation to model partiality and uncertainty inherent in any information retrieval (IR) processes. In particular, a retrieval context can be formalised as a belief set and the formalised context is used to disambiguate vague user queries. Belief revision logic also provides a robust computational mechanism to revise an IR system's beliefs about the users' changing information needs. In addition, information flow is proposed as a text mining method to automatically acquire the initial IR contexts. The advantage of a belief-based IRsystem is that its IR behaviour is more predictable and explanatory. However, computational efficiency is often a concern when the belief revision formalisms are applied to large real-life applications. This paper describes our belief-based adaptive IR system which is underpinned by an efficient belief revision mechanism. Our initial experiments show that the belief-based symbolic IR model is more effective than a classical quantitative IR model. To our best knowledge, this is the first successful empirical evaluation of a logic-based IR model based on large IR benchmark collections.

Copyrights may apply

p. 138-145

Fan, Weiguo, Luo, Ming, Wang, Li, Xi, Wensi and Fox, Edward A. (2004): Tuning before feedback: combining ranking discovery and blind feedback for robust retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 138-145. Available online

Both ranking functions and user queries are very important factors affecting a search engine's performance. Prior research has looked at how to improve ad-hoc retrieval performance for existing queries while tuning the ranking function, or modify and expand user queries using a fixed ranking scheme using blind feedback. However, almost no research has looked at how to combine ranking function tuning and blind feedback together to improve ad-hoc retrieval performance. In this paper, we look at the performance improvement for ad-hoc retrieval from a more integrated point of view by combining the merits of both techniques. In particular, we argue that the ranking function should be tuned first, using user-provided queries, before applying the blind feedback technique. The intuition is that highly-tuned ranking offers more high quality documents at the top of the hit list, thus offers a stronger baseline for blind feedback. We verify this integrated model in a large scale heterogeneous collection and the experimental results show that combining ranking function tuning and blind feedback can improve search performance by almost 30% over the baseline Okapi system.

Copyrights may apply

p. 154-161

Rogati, Monica and Yang, Yiming (2004): Resource selection for domain-specific cross-lingual IR. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 154-161. Available online

An under-explored question in cross-language information retrieval (CLIR) is to what degree the performance of CLIR methods depends on the availability of high-quality translation resources for particular domains. To address this issue, we evaluate several competitive CLIR methods -- with different training corpora -- on test documents in the medical domain. Our results show severe performance degradation when using a general-purpose training corpus or a commercial machine translation system (SYSTRAN), versus a domain-specific training corpus. A related unexplored question is whether we can improve CLIR performance by systematically analyzing training resources and optimally matching them to target collections. We start exploring this problem by suggesting a simple criterion for automatically matching training resources to target corpora. By using cosine similarity between training and target corpora as resource weights we obtained an average of 5.6% improvement over using all resources with no weights. The same metric yields 99.4% of the performance obtained when an oracle chooses the optimal resource every time.

Copyrights may apply

p. 162-169

Zhang, Ying and Vines, Phil (2004): Using the web for automated translation extraction in cross-language information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 162-169. Available online

There have been significant advances in Cross-Language Information Retrieval (CLIR) in recent years. One of the major remaining reasons that CLIR does not perform as well as monolingual retrieval is the presence of out of vocabulary (OOV) terms. Previous work has either relied on manual intervention or has only been partially successful in solving this problem. We use a method that extends earlier work in this area by augmenting this with statistical analysis, and corpus-based translation disambiguation to dynamically discover translations of OOV terms. The method can be applied to both Chinese-English and English-Chinese CLIR, correctly extracting translations of OOV terms from the Web automatically, and thus is a significant improvement on earlier work.

Copyrights may apply

p. 170-177

Gao, Jianfeng, Nie, Jian-Yun, Wu, Guangyuan and Cao, Guihong (2004): Dependence language model for information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 170-177. Available online

This paper presents a new dependence language modeling approach to information retrieval. The approach extends the basic language modeling approach based on unigram by relaxing the independence assumption. We integrate the linkage of a query as a hidden variable, which expresses the term dependencies within the query as an acyclic, planar, undirected graph. We then assume that a query is generated from a document in two stages: the linkage is generated first, and then each term is generated in turn depending on other related terms according to the linkage. We also present a smoothing method for model parameter estimation and an approach to learning the linkage of a sentence in an unsupervised manner. The new approach is compared to the classical probabilistic retrieval model and the previously proposed language models with and without taking into account term dependencies. Results show that our model achieves substantial and significant improvements on TREC collections.

Copyrights may apply

p. 178-185

Hiemstra, Djoerd, Robertson, Stephen and Zaragoza, Hugo (2004): Parsimonious language models for information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 178-185. Available online

We systematically investigate a new approach to estimating the parameters of language models for information retrieval, called parsimonious language models. Parsimonious language models explicitly address the relation between levels of language models that are typically used for smoothing. As such, they need fewer (non-zero) parameters to describe the data. We apply parsimonious models at three stages of the retrieval process: 1) at indexing time; 2) at search time; 3) at feedback time. Experimental results show that we are able to build models that are significantly smaller than standard models, but that still perform at least as well as the standard approaches.

Copyrights may apply

p. 18-24

Diaz, Fernando and Jones, Rosie (2004): Using temporal profiles of queries for precision prediction. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 18-24. Available online

A key missing component in information retrieval systems is self-diagnostic tests to establish whether the system can provide reasonable results for a given query on a document collection. If we can measure properties of a retrieved set of documents which allow us to predict average precision, we can automate the decision of whether to elicit relevance feedback, or modify the retrieval system in other ways. We use meta-data attached to documents in the form of time stamps to measure the distribution of documents retrieved in response to a query, over the time domain, to create a temporal profile for a query. We define some useful features over this temporal profile. We find that using these temporal features, together with the content of the documents retrieved, we can improve the prediction of average precision for a query.

Copyrights may apply

p. 186-193

Liu, Xiaoyong and Croft, W. Bruce (2004): Cluster-based retrieval using language models. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 186-193. Available online

Previous research on cluster-based retrieval has been inconclusive as to whether it does bring improved retrieval effectiveness over document-based retrieval. Recent developments in the language modeling approach to IR have motivated us to re-examine this problem within this new retrieval framework. We propose two new models for cluster-based retrieval and evaluate them on several TREC collections. We show that cluster-based retrieval can perform consistently across collections of realistic size, and significant improvements over document-based retrieval can be obtained in a fully automatic manner and without relevance information provided by human.

Copyrights may apply

p. 194-201

Kurland, Oren and Lee, Lillian (2004): Corpus structure, language models, and ad hoc information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 194-201. Available online

Most previous work on the recently developed language-modeling approach to information retrieval focuses on document-specific characteristics, and therefore does not take into account the structure of the surrounding corpus. We propose a novel algorithmic framework in which information provided by document-based language models is enhanced by the incorporation of information drawn from clusters of similar documents. Using this framework, we develop a suite of new algorithms. Even the simplest typically outperforms the standard language-modeling approach in precision and recall, and our new interpolation algorithm posts statistically significant improvements for both metrics over all three corpora tested.

Copyrights may apply

p. 2-9

Shah, Chirag and Croft, W. Bruce (2004): Evaluating high accuracy retrieval techniques. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 2-9. Available online

Although information retrieval research has always been concerned with improving the effectiveness of search, in some applications, such as information analysis, a more specific requirement exists for high accuracy retrieval. This means that achieving high precision in the top document ranks is paramount. In this paper we present work aimed at achieving high accuracy in ad-hoc document retrieval by incorporating approaches from question answering (QA). We focus on getting the first relevant result as high as possible in the ranked list and argue that traditional precision and recall are not appropriate measures for evaluating this task. We instead use the mean reciprocal rank (MRR) of the first relevant result. We evaluate three different methods for modifying queries to achieve high accuracy. The experiments done on TREC data provide support for the approach of using MRR and incorporating QA techniques for getting high accuracy in ad-hoc retrieval task.

Copyrights may apply

p. 202-209

Xu, Wei and Gong, Yihong (2004): Document clustering by concept factorization. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 202-209. Available online

In this paper, we propose a new data clustering method called concept factorization that models each concept as a linear combination of the data points, and each data point as a linear combination of the concepts. With this model, the data clustering task is accomplished by computing the two sets of linear coefficients, and this linear coefficients computation is carried out by finding the non-negative solution that minimizes the reconstruction error of the data points. The cluster label of each data point can be easily derived from the obtained linear coefficients. This method differs from the method of clustering based on non-negative matrix factorization (NMF) \citeXu03 in that it can be applied to data containing negative values and the method can be implemented in the kernel space. Our experimental results show that the proposed data clustering method and its variations performs best among 11 algorithms and their variations that we have evaluated on both TDT2 and Reuters-21578 corpus. In addition to its good performance, the new method also has the merit in its easy and reliable derivation of the clustering results.

Copyrights may apply

p. 210-217

Zeng, Hua-Jun, He, Qi-Cai, Chen, Zheng, Ma, Wei-Ying and Ma, Jinwen (2004): Learning to cluster web search results. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 210-217. Available online

Organizing Web search results into clusters facilitates users' quick browsing through search results. Traditional clustering techniques are inadequate since they don't generate clusters with highly readable names. In this paper, we reformalize the clustering problem as a salient phrase ranking problem. Given a query and the ranked list of documents (typically a list of titles and snippets) returned by a certain Web search engine, our method first extracts and ranks salient phrases as candidate cluster names, based on a regression model learned from human labeled training data. The documents are assigned to relevant salient phrases to form candidate clusters, and the final clusters are generated by merging these candidate clusters. Experimental results verify our method's feasibility and effectiveness.

Copyrights may apply

p. 218-225

Li, Tao, Ma, Sheng and Ogihara, Mitsunori (2004): Document clustering via adaptive subspace iteration. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 218-225. Available online

Document clustering has long been an important problem in information retrieval. In this paper, we present a new clustering algorithm ASI, which uses explicitly modeling of the subspace structure associated with each cluster. ASI simultaneously performs data reduction and subspace identification via an iterative alternating optimization procedure. Motivated from the optimization procedure, we then provide a novel method to determine the number of clusters. We also discuss the connections of ASI with various existential clustering approaches. Finally, extensive experimental results on real data sets show the effectiveness of ASI algorithm.

Copyrights may apply

p. 25-32

Buckley, Chris and Voorhees, Ellen M. (2004): Retrieval evaluation with incomplete information. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 25-32. Available online

This paper examines whether the Cranfield evaluation methodology is robust to gross violations of the completeness assumption (i.e., the assumption that all relevant documents within a test collection have been identified and are present in the collection). We show that current evaluation measures are not robust to substantially incomplete relevance judgments. A new measure is introduced that is both highly correlated with existing measures when complete judgments are available and more robust to incomplete judgment sets. This finding suggests that substantially larger or dynamic test collections built using current pooling practices should be viable laboratory tools, despite the fact that the relevance information will be incomplete and imperfect.

Copyrights may apply

p. 250-257

Davidov, Dmitry, Gabrilovich, Evgeniy and Markovitch, Shaul (2004): Parameterized generation of labeled datasets for text categorization based on a hierarchical directory. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 250-257. Available online

Although text categorization is a burgeoning area of IR research, readily available test collections in this field are surprisingly scarce. We describe a methodology and system (named ACCIO) for automatically acquiring labeled datasets for text categorization from the World Wide Web, by capitalizing on the body of knowledge encoded in the structure of existing hierarchical directories such as the Open Directory. We define parameters of categories that make it possible to acquire numerous datasets with desired properties, which in turn allow better control over categorization experiments. In particular, we develop metrics that estimate the difficulty of a dataset by examining the host directory structure. These metrics are shown to be good predictors of categorization accuracy that can be achieved on a dataset, and serve as efficient heuristics for generating datasets subject to user's requirements. A large collection of automatically generated datasets are made available for other researchers to use.

Copyrights may apply

p. 258-265

Kim, Sang-Bum, Seo, Hee-Cheol and Rim, Hae-Chang (2004): Information retrieval using word senses: root sense tagging approach. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 258-265. Available online

Information retrieval using word senses is emerging as a good research challenge on semantic information retrieval. In this paper, we propose a new method using word senses in information retrieval: root sense tagging method. This method assigns coarse-grained word senses defined in WordNet to query terms and document terms by unsupervised way using co-occurrence information constructed automatically. Our sense tagger is crude, but performs consistent disambiguation by considering only the single most informative word as evidence to disambiguate the target word. We also allow multiple-sense assignment to alleviate the problem caused by incorrect disambiguation. Experimental results on a large-scale TREC collection show that our approach to improve retrieval effectiveness is successful, while most of the previous work failed to improve performances even on small text collection. Our method also shows promising results when is combined with pseudo relevance feedback and state-of-the-art retrieval function such as BM25.

Copyrights may apply

p. 266-272

Liu, Shuang, Liu, Fang, Yu, Clement and Meng, Weiyi (2004): An effective approach to document retrieval via utilizing WordNet and recognizing phrases. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 266-272. Available online

Noun phrases in queries are identified and classified into four types: proper names, dictionary phrases, simple phrases and complex phrases. A document has a phrase if all content words in the phrase are within a window of a certain size. The window sizes for different types of phrases are different and are determined using a decision tree. Phrases are more important than individual terms. Consequently, documents in response to a query are ranked with matching phrases given a higher priority. We utilize WordNet to disambiguate word senses of query terms. Whenever the sense of a query term is determined, its synonyms, hyponyms, words from its definition and its compound words are considered for possible additions to the query. Experimental results show that our approach yields between 23% and 31% improvements over the best-known results on the TREC 9, 10 and 12 collections for short (title only) queries, without using Web data.

Copyrights may apply

p. 273-280

Amitay, Einat, Har'El, Nadav, Sivan, Ron and Soffer, Aya (2004): Web-a-where: geotagging web content. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 273-280. Available online

We describe Web-a-Where, a system for associating geography with Web pages. Web-a-Where locates mentions of places and determines the place each name refers to. In addition, it assigns to each page a geographic focus -- a locality that the page discusses as a whole. The tagging process is simple and fast, aimed to be applied to large collections of Web pages and to facilitate a variety of location-based applications and data analyses. Geotagging involves arbitrating two types of ambiguities: geo/non-geo and geo/geo. A geo/non-geo ambiguity occurs when a place name also has a non-geographic meaning, such as a person name (e.g., Berlin) or a common word (Turkey). Geo/geo ambiguity arises when distinct places have the same name, as in London, England vs. London, Ontario. An implementation of the tagger within the framework of the WebFountain data mining system is described, and evaluated on several corpora of real Web pages. Precision of up to 82% on individual geotags is achieved. We also evaluate the relative contribution of various heuristics the tagger employs, and evaluate the focus-finding algorithm using a corpus pretagged with localities, showing that as many as 91% of the foci reported are correct up to the country level.

Copyrights may apply

p. 281-288

Zhang, Li, Pan, Yue and Zhang, Tong (2004): Focused named entity recognition using machine learning. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 281-288. Available online

In this paper we study the problem of finding most topical named entities among all entities in a document, which we refer to as focused named entity recognition. We show that these focused named entities are useful for many natural language processing applications, such as document summarization, search result ranking, and entity detection and tracking. We propose a statistical model for focused named entity recognition by converting it into a classification problem. We then study the impact of various linguistic features and compare a number of classification algorithms. From experiments on an annotated Chinese news corpus, we demonstrate that the proposed method can achieve near human-level accuracy.

Copyrights may apply

p. 289-296

Lam, Wai, Huang, Ruizhang and Cheung, Pik-Shan (2004): Learning phonetic similarity for matching named entity translations and mining new translations. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 289-296. Available online

We propose a novel named entity matching model which considers both semantic and phonetic clues. The matching is formulated as an optimization problem. One major component is a phonetic matching model which exploits similarity at the phoneme level. We investigate three learning algorithms for obtaining the similarity information of basic phoneme units based on training examples. By applying this proposed named entity matching model, we also develop a mining framework for discovering new, unseen named entity translations from online daily Web news. This framework harvests comparable news in different languages using an existing bilingual dictionary. It is able to discover new name translations not found in the dictionary.

Copyrights may apply

p. 297-304

Kumaran, Giridhar and Allan, James (2004): Text classification and named entities for new event detection. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 297-304. Available online

New Event Detection is a challenging task that still offers scope for great improvement after years of effort. In this paper we show how performance on New Event Detection (NED) can be improved by the use of text classification techniques as well as by using named entities in a new way. We explore modifications to the document representation in a vector space-based NED system. We also show that addressing named entities preferentially is useful only in certain situations. A combination of all the above results in a multi-stage NED system that performs much better than baseline single-stage NED systems.

Copyrights may apply

p. 313-320

Tryfonopoulos, Christos, Koubarakis, Manolis and Drougas, Yannis (2004): Filtering algorithms for information retrieval models with named attributes and proximity operators. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 313-320. Available online

In the selective dissemination of information (or publish/subscribe) paradigm, clients subscribe to a server with continuous queries (or profiles) that express their information needs. Clients can also publish documents to servers. Whenever a document is published, the continuous queries satisfying this document are found and notifications are sent to appropriate clients. This paper deals with the filtering problem that needs to be solved efficently by each server: Given a database of continuous queries db and a document d, find all queries q {epsilon} db that match d. We present data structures and indexing algorithms that enable us to solve the filtering problem efficiently for large databases of queries expressed in the model AWP which is based on named attributes with values of type text, and word proximity operators.

Copyrights may apply

p. 321-328

Beitzel, Steven M., Jensen, Eric C., Chowdhury, Abdur, Grossman, David A. and Frieder, Ophir (2004): Hourly analysis of a very large topically categorized web query log. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 321-328. Available online

We review a query log of hundreds of millions of queries that constitute the total query traffic for an entire week of a general-purpose commercial web search service. Previously, query logs have been studied from a single, cumulative view. In contrast, our analysis shows changes in popularity and uniqueness of topically categorized queries across the hours of the day. We examine query traffic on an hourly basis by matching it against lists of queries that have been topically pre-categorized by human editors. This represents 13% of the query traffic. We show that query traffic from particular topical categories differs both from the query stream as a whole and from other categories. This analysis provides valuable insight for improving retrieval effectiveness and efficiency. It is also relevant to the development of enhanced query disambiguation, routing, and caching algorithms.

Copyrights may apply

p. 33-40

Sanderson, Mark and Joho, Hideo (2004): Forming test collections with no system pooling. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 33-40. Available online

Forming test collection relevance judgments from the pooled output of multiple retrieval systems has become the standard process for creating resources such as the TREC, CLEF, and NTCIR test collections. This paper presents a series of experiments examining three different ways of building test collections where no system pooling is used. First, a collection formation technique combining manual feedback and multiple systems is adapted to work with a single retrieval system. Second, an existing method based on pooling the output of multiple manual searches is re-examined: testing a wider range of searchers and retrieval systems than has been examined before. Third, a new approach is explored where the ranked output of a single automatic search on a single retrieval system is assessed for relevance: no pooling whatsoever. Using established techniques for evaluating the quality of relevance judgments, in all three cases, test collections are formed that are as good as TREC.

Copyrights may apply

p. 337-344

Jin, Rong, Chai, Joyce Y. and Si, Luo (2004): An automatic weighting scheme for collaborative filtering. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 337-344. Available online

Collaborative filtering identifies information interest of a particular user based on the information provided by other similar users. The memory-based approaches for collaborative filtering (e.g., Pearson correlation coefficient approach) identify the similarity between two users by comparing their ratings on a set of items. In these approaches, different items are weighted either equally or by some predefined functions. The impact of rating discrepancies among different users has not been taken into consideration. For example, an item that is highly favored by most users should have a smaller impact on the user-similarity than an item for which different types of users tend to give different ratings. Even though simple weighting methods such as variance weighting try to address this problem, empirical studies have shown that they are ineffective in improving the performance of collaborative filtering. In this paper, we present an optimization algorithm to automatically compute the weights for different items based on their ratings from training users. More specifically, the new weighting scheme will create a clustered distribution for user vectors in the item space by bringing users of similar interests closer and separating users of different interests more distant. Empirical studies over two datasets have shown that our new weighting scheme substantially improves the performance of the Pearson correlation coefficient method for collaborative filtering.

Copyrights may apply

p. 345-352

Zhang, Yi (2004): Using bayesian priors to combine classifiers for adaptive filtering. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 345-352. Available online

An adaptive information filtering system monitors a document stream to identify the documents that match information needs specified by user profiles. As the system filters, it also refines its knowledge about the user's information needs based on long-term observations of the document stream and periodic feedback (training data) from the user. Low variance profile learning algorithms, such as Rocchio, work well at the early stage of filtering when the system has very few training data. Low bias profile learning algorithms, such as Logistic Regression, work well at the later stage of filtering when the system has accumulated enough training data. However, an empirical system needs to works well consistently at all stages of filtering process. This paper addresses this problem by proposing a new technique to combine different text classification algorithms via a constrained maximum likelihood Bayesian prior. This technique provides a trade off between bias and variance, and the combined classifier may achieve a consistent good performance at different stages of filtering. We implemented the proposed technique to combine two complementary classification algorithms: Rocchio and logistic regression. The new algorithm is shown to compare favorably with Rocchio, Logistic Regression, and the best methods in the TREC-9 and TREC-11 adaptive filtering tracks.

Copyrights may apply

p. 353-360

Yu, Kai, Tresp, Volker and Yu, Shipeng (2004): A nonparametric hierarchical bayesian framework for information filtering. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 353-360. Available online

Information filtering has made considerable progress in recent years. The predominant approaches are content-based methods and collaborative methods. Researchers have largely concentrated on either of the two approaches since a principled unifying framework is still lacking. This paper suggests that both approaches can be combined under a hierarchical Bayesian framework. Individual content-based user profiles are generated and collaboration between various user models is achieved via a common learned prior distribution. However, it turns out that a parametric distribution (e.g. Gaussian) is too restrictive to describe such a common learned prior distribution. We thus introduce a nonparametric common prior, which is a sample generated from a Dirichlet process which assumes the role of a hyper prior. We describe effective means to learn this nonparametric distribution, and apply it to learn users' information needs. The resultant algorithm is simple and understandable, and offers a principled solution to combine content-based filtering and collaborative filtering. Within our framework, we are now able to interpret various existing techniques from a unifying point of view. Finally we demonstrate the empirical success of the proposed information filtering methods.

Copyrights may apply

p. 361-368

Fan, Jianping, Gao, Yuli, Luo, Hangzai and Xu, Guangyou (2004): Automatic image annotation by using concept-sensitive salient objects for image content representation. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 361-368. Available online

Multi-level annotation of images is a promising solution to enable more effective semantic image retrieval by using various keywords at different semantic levels. In this paper, we propose a multi-level approach to annotate the semantics of natural scenes by using both the dominant image components and the relevant semantic concepts. In contrast to the well-known image-based and region-based approaches, we use the salient objects as the dominant image components to achieve automatic image annotation at the content level. By using the salient objects for image content representation, a novel image classification technique is developed to achieve automatic image annotation at the concept level. To detect the salient objects automatically, a set of detection functions are learned from the labeled image regions by using Support Vector Machine (SVM) classifiers with an automatic scheme for searching the optimal model parameters. To generate the semantic concepts, finite mixture models are used to approximate the class distributions of the relevant salient objects. An adaptive EM algorithm has been proposed to determine the optimal model structure and model parameters simultaneously. We have also demonstrated that our algorithms are very effective to enable multi-level annotation of natural scenes in a large-scale dataset.

Copyrights may apply

p. 369-376

Rath, Toni M., Manmatha, R. and Lavrenko, Victor (2004): A search engine for historical manuscript images. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 369-376. Available online

Many museum and library archives are digitizing their large collections of handwritten historical manuscripts to enable public access to them. These collections are only available in image formats and require expensive manual annotation work for access to them. Current handwriting recognizers have word error rates in excess of 50% and therefore cannot be used for such material. We describe two statistical models for retrieval in large collections of handwritten manuscripts given a text query. Both use a set of transcribed page images to learn a joint probability distribution between features computed from word images and their transcriptions. The models can then be used to retrieve unlabeled images of handwritten documents given a text query. We show experiments with a training set of 100 transcribed pages and a test set of 987 handwritten page images from the George Washington collection. Experiments show that the precision at 20 documents is about 0.4 to 0.5 depending on the model. To the best of our knowledge, this is the first automatic retrieval system for historical manuscripts using text queries, without manual transcription of the original corpus.

Copyrights may apply

p. 377-384

Kelly, Diane and Belkin, Nicholas J. (2004): Display time as implicit feedback: understanding task effects. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 377-384. Available online

Recent research has had some success using the length of time a user displays a document in their web browser as implicit feedback for document preference. However, most studies have been confined to specific search domains, such as news, and have not considered the effects of task on display time, and the potential impact of this relationship on the effectiveness of display time as implicit feedback. We describe the results of an intensive naturalistic study of the online information-seeking behaviors of seven subjects during a fourteen-week period. Throughout the study, subjects' online information-seeking activities were monitored with various pieces of logging and evaluation software. Subjects were asked to identify the tasks with which they were working, classify the documents that they viewed according to these tasks, and evaluate the usefulness of the documents. Results of a user-centered analysis demonstrate no general, direct relationship between display time and usefulness, and that display times differ significantly according to specific task, and according to specific user.

Copyrights may apply

p. 385-392

Wu, Mingfang, Muresan, Gheorghe, McLean, Alistair, Tang, Muh-Chyun (Morris), Wilkinson, Ross, Li, Yuelin, Lee, Hyuk-Jin and Belkin, Nicholas J. (2004): Human versus machine in the topic distillation task. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 385-392. Available online

This paper reports on and discusses a set of user experiments using the TREC 2003 Web interactive track protocol. The focus is on comparing humans and machine algorithms in terms of performance in a topic distillation task. We also investigated the effect of the search results layout in supporting the users' effort. We have demonstrated that machines can perform nearly as well as people on the topic distillation task. Given a system tailored to the task there is significant performance improvement and finally, given a presentation that supports the task, there is strong user satisfaction.

Copyrights may apply

p. 393

Willett, Peter (2004): Chemoinformatics: an application domain for information retrieval techniques. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. p. 393. Available online

Chemoinformatics is the generic name for the techniques used to represent, store and process information about the two-dimensional (2D) and three-dimensional (3D) structures of chemical molecules [1, 2]. Chemoinformatics has attracted much recent prominence as a result of developments in the methods that are used to synthesize new molecules and then to test them for biological activity. These developments have resulted in a massive increase in the amounts of structural and biological information that is available to support discovery programmes in the pharmaceutical and agrochemical industries. Chemoinformatics may appear to be far removed from information retrieval (IR), and there are indeed many significant differences, most notably in the use of graph representations to encode chemical molecules, rather than the strings that are used to encode text; however, there are also many similarities between the two fields, and this paper will exemplify some of these relationships. The most obvious area of similarity is in the principal types of database search that are carried out, with both application domains making extensive use of exact match, partial match and best match searching procedures: in the IR context these are known-item searching, Boolean searching and ranked-output searching; in the chemical context, these are structure searching, substructure searching and similarity searching. In IR, there is a natural distinction between an initial ranked-output search and one in which relevance feedback can be employed, where the keywords in the query statement are assigned weights based on their differential occurrences in known-relevant and known-nonrelevant documents. In the chemoinformatics technique called substructural analysis, substructural fragments are assigned weights based on their occurrence in molecules that do possess, and molecules that do not possess, some desired biological activity [3]. The analogy between relevance and biological activity has also resulted in the development of measures to quantify the effectiveness of chemical searching procedures that are based on the standard IR concepts of recall and precision [4]. Analogies such as these have provided the basis for some of the chemoinformatics research carried out in Sheffield. The starting point was the recognition that techniques applicable to documents represented by keywords might also be applicable to molecules represented by substructural fragments. This led directly to the introduction of similarity searching, something that is now a standard tool in chemoinformatics software systems; in particular, its use for virtual screening, i.e., the ranking of a database in order of decreasing probability of activity so as to maximize the cost-effectiveness of biological testing [5]. Measures of inter-molecular structural similarity also lie at the heart of systems for clustering chemical databases: just as IR has the Cluster Hypothesis (similar documents tend to be relevant to the same requests) as a basis for document clustering, so the Similar Property Principle (similar molecules tend to have similar properties) has led to clustering becoming a well-established tool for the organization of large chemical databases [6]. More recently, we have applied another IR technique, the use of data fusion to combine different rankings of a database, to chemoinformatics and again found that it is equally applicable in this new domain [7]. The many similarities between IR and chemoinformatics that have already been identified suggest that chemoinformatics is a domain of which IR researchers should be aware when considering the applicability of new techniques that they have developed.

Copyrights may apply

p. 394-401

Xi, Wensi, Lind, Jesper and Brill, Eric (2004): Learning effective ranking functions for newsgroup search. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 394-401. Available online

Web communities are web virtual broadcasting spaces where people can freely discuss anything. While such communities function as discussion boards, they have even greater value as large repositories of archived information. In order to unlock the value of this resource, we need an effective means for searching archived discussion threads. Unfortunately the techniques that have proven successful for searching document collections and the Web are not ideally suited to the task of searching archived community discussions. In this paper, we explore the problem of creating an effective ranking function to predict the most relevant messages to queries in community search. We extract a set of predictive features from the thread trees of newsgroup messages as well as features of message authors and lexical distribution within a message thread. Our final results indicate that when using linear regression with this feature set, our search system achieved a 28.5% performance improvement compared to our baseline system.

Copyrights may apply

p. 402-409

Larkey, Leah S., Feng, Fangfang, Connell, Margaret and Lavrenko, Victor (2004): Language-specific models in multilingual topic tracking. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 402-409. Available online

Topic tracking is complicated when the stories in the stream occur in multiple languages. Typically, researchers have trained only English topic models because the training stories have been provided in English. In tracking, non-English test stories are then machine translated into English to compare them with the topic models. We propose a native language hypothesis stating that comparisons would be more effective in the original language of the story. We first test and support the hypothesis for story link detection. For topic tracking the hypothesis implies that it should be preferable to build separate language-specific topic models for each language in the stream. We compare different methods of incrementally building such native language topic models.

Copyrights may apply

p. 41-48

Oard, Douglas W., Soergel, Dagobert, Doermann, David, Huang, Xiaoli, Murray, G. Craig, Wang, Jianqiang, Ramabhadran, Bhuvana, Franz, Martin, Gustman, Samuel, Mayfield, James, Kharevych, Liliya and Strassel, Stephanie (2004): Building an information retrieval test collection for spontaneous conversational speech. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 41-48. Available online

Test collections model use cases in ways that facilitate evaluation of information retrieval systems. This paper describes the use of search-guided relevance assessment to create a test collection for retrieval of spontaneous conversational speech. Approximately 10,000 thematically coherent segments were manually identified in 625 hours of oral history interviews with 246 individuals. Automatic speech recognition results, manually prepared summaries, controlled vocabulary indexing, and name authority control are available for every segment. Those features were leveraged by a team of four relevance assessors to identify topically relevant segments for 28 topics developed from actual user requests. Search-guided assessment yielded sufficient inter-annotator agreement to support formative evaluation during system development. Baseline results for ranked retrieval are presented to illustrate use of the collection.

Copyrights may apply

p. 410-417

Zhang, Dell and Lee, Wee Sun (2004): Web taxonomy integration through co-bootstrapping. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 410-417. Available online

We address the problem of integrating objects from a source taxonomy into a master taxonomy. This problem is not only currently pervasive on the web, but also important to the emerging semantic web. A straightforward approach to automating this process would be to learn a classifier that can classify objects from the source taxonomy into categories of the master taxonomy. The key insight is that the availability of the source taxonomy data could be helpful to build better classifiers for the master taxonomy if their categorizations have some semantic overlap. In this paper, we propose a new approach, co-bootstrapping, to enhance the classification by exploiting such implicit knowledge. Our experiments with real-world web data show substantial improvements in the performance of taxonomy integration.

Copyrights may apply

p. 418-424

Xu, Jinxi, Weischedel, Ralph and Licuanan, Ana (2004): Evaluation of an extraction-based approach to answering definitional questions. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 418-424. Available online

This paper evaluates an extraction-based approach to answering definitional questions. Our system extracted useful linguistic constructs called linguistic features from raw text using information extraction tools and formulated answers based on such features. The features employed include appositives, copulas, structured patterns, relations, propositions and raw sentences. The features were ranked based on feature type and similarity to a question profile. Redundant features were detected using a simple heuristic-based strategy. The approach achieved state of the art performance at the TREC 2003 QA evaluation. Component analysis of the system was carried out using an automatic scoring function called Rouge (Lin and Hovy, 2003). Major findings include 1) answers using linguistic features are significantly better than those using raw sentences; 2) the most useful features are appositives and copulas; 3) question profiles, as a means of modeling user interests, can significantly improve system performance; 4) the Rouge scores are closely correlated with subjective evaluation results, indicating the suitability of using Rouge for evaluating definitional QA systems.

Copyrights may apply

p. 425-432

Chieu, Hai Leong and Lee, Yoong Keok (2004): Query based event extraction along a timeline. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 425-432. Available online

In this paper, we present a framework and a system that extracts events relevant to a query from a collection C of documents, and places such events along a timeline. Each event is represented by a sentence extracted from C, based on the assumption that "important" events are widely cited in many documents for a period of time within which these events are of interest. In our experiments, we used queries that are event types ("earthquake") and person names (e.g. "George Bush"). Evaluation was performed using G8 leader names as queries: comparison made by human evaluators between manually and system generated timelines showed that although manually generated timelines are on average more preferable, system generated timelines are sometimes judged to be better than manually constructed ones.

Copyrights may apply

p. 433-439

Grabski, Korinna and Scheffer, Tobias (2004): Sentence completion. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 433-439. Available online

We discuss a retrieval model in which the task is to complete a sentence, given an initial fragment, and given an application specific document collection. This model is motivated by administrative and call center environments, in which users have to write documents with a certain repetitiveness. We formulate the problem setting and discuss appropriate performance metrics. We present an index-based retrieval algorithm and a cluster-based approach, and evaluate our algorithms using collections of emails that have been written by two distinct service centers.

Copyrights may apply

p. 440-447

Cai, Deng, He, Xiaofei, Wen, Ji-Rong and Ma, Wei-Ying (2004): Block-level link analysis. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 440-447. Available online

Link Analysis has shown great potential in improving the performance of web search. PageRank and HITS are two of the most popular algorithms. Most of the existing link analysis algorithms treat a web page as a single node in the web graph. However, in most cases, a web page contains multiple semantics and hence the web page might not be considered as the atomic node. In this paper, the web page is partitioned into blocks using the vision-based page segmentation algorithm. By extracting the page-to-block, block-to-page relationships from link structure and page layout analysis, we can construct a semantic graph over the WWW such that each node exactly represents a single semantic topic. This graph can better describe the semantic structure of the web. Based on block-level link analysis, we proposed two new algorithms, Block Level PageRank and Block Level HITS, whose performances we study extensively using web data.

Copyrights may apply

p. 448-455

Plachouras, Vassilis and Ounis, Iadh (2004): Usefulness of hyperlink structure for query-biased topic distillation. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 448-455. Available online

In this paper, we introduce an information theoretic method for estimating the usefulness of the hyperlink structure induced from the set of retrieved documents. We evaluate the effectiveness of this method in the context of an optimal Bayesian decision mechanism, which selects the most appropriate retrieval approaches on a per-query basis for two TREC tasks. The estimation of the hyperlink structure's usefulness is stable when we use different weighting schemes, or when we employ sampling of documents to reduce the computational overhead. Next, we evaluate the effectiveness of the hyperlink structure's usefulness in a realistic setting, by setting the thresholds of a decision mechanism automatically. Our results show that improvements over the baselines are obtained.

Copyrights may apply

p. 456-463

Cai, Deng, Yu, Shipeng, Wen, Ji-Rong and Ma, Wei-Ying (2004): Block-based web search. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 456-463. Available online

In this paper, we introduce an information theoretic method for estimating the usefulness of the hyperlink structure induced from the set of retrieved documents. We evaluate the effectiveness of this method in the context of an optimal Bayesian decision mechanism, which selects the most appropriate retrieval approaches on a per-query basis for two TREC tasks. The estimation of the hyperlink structure's usefulness is stable when we use different weighting schemes, or when we employ sampling of documents to reduce the computational overhead. Next, we evaluate the effectiveness of the hyperlink structure's usefulness in a realistic setting, by setting the thresholds of a decision mechanism automatically. Our results show that improvements over the baselines are obtained.

Copyrights may apply

p. 464-465

Doran, William P., Stokes, Nicola, Newman, Eamonn, Dunnion, John and Carthy, Joe (2004): A hybrid statistical/linguistic model for generating news story gists. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 464-465. Available online

In this paper, we describe a News Story Gisting system that generates a 10-word short summary of a news story. This system uses a machine learning technique to combine linguistic, statistical and positional information in order to generate an appropriate summary. We also present the results of an automatic evaluation of this system with respect to the performance of other baseline summarisers using the new ROUGE evaluation metric.

Copyrights may apply

p. 466-467

Sanderson, Mark and Pasley, Robert (2004): Image based gisting in CLIR. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 466-467. Available online

In this paper, we describe research which could lead to a novel approach to gathering an overview of a document in a foreign language. The research explores how much of the meaning of a document could be represented using images by researching the ability of subjects to derive the search term that might have been used to return a set of images from an image library. The Google image search engine was used to retrieve the images for this experiment, which uses English throughout. The results were analysed with respect to a previous paper [1] exploring ability to recognise concrete objects in hierarchies. It was found that there is a tendency to use one particular level of categorization.

Copyrights may apply

p. 468-469

Greevy, Edel and Smeaton, Alan F. (2004): Classifying racist texts using a support vector machine. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 468-469. Available online

In this poster we present an overview of the techniques we used to develop and evaluate a text categorisation system to automatically classify racist texts. Detecting racism is difficult because the presence of indicator words is insufficient to indicate racist texts, unlike some other text classification tasks. Support Vector Machines (SVM) are used to automatically categorise web pages based on whether or not they are racist. Different interpretations of what constitutes a term are taken, and in this poster we look at three representations of a web page within an SVM -- bag-of-words, bigrams and part-of-speech tags.

Copyrights may apply

p. 470-471

Azman, Azreen and Ounis, Iadh (2004): Discovery of aggregate usage profiles based on clustering information needs. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 470-471. Available online

We present an alternative technique for discovering aggregate usage profiles from Web access logs. The technique is based on clustering information needs inferred from users' browsing paths. Browsing paths are extracted from users' access logs. Information need is inferred from each browsing path by using the Ostensive Model[1]. The technique is evaluated in a document recommendation application. We compare the performance of our technique against the well-established transaction-based technique proposed in [2]. Based on an initial evaluation, the results are encouraging.

Copyrights may apply

p. 472-473

Lu, Jie and Callan, Jamie (2004): Merging retrieval results in hierarchical peer-to-peer networks. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 472-473. Available online

p. 474-475

Sakai, Tetsuya, Saito, Yoshimi, Ichimura, Yumi, Kokubu, Tomoharu and Koyama, Makoto (2004): The effect of back-formulating questions in question answering evaluation. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 474-475. Available online

p. 476-477

Montgomery, Jesse, Si, Luo, Callan, Jamie and Evans, David A. (2004): Effect of varying number of documents in blind feedback: analysis of the 2003 NRRC RIA workshop. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 476-477. Available online

p. 478-479

Granka, Laura A., Joachims, Thorsten and Gay, Geri (2004): Eye-tracking analysis of user behavior in WWW search. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 478-479. Available online

We investigate how users interact with the results page of a WWW search engine using eye-tracking. The goal is to gain insight into how users browse the presented abstracts and how they select links for further exploration. Such understanding is valuable for improved interface design, as well as for more accurate interpretations of implicit feedback (e.g. clickthrough) for machine learning. The following presents initial results, focusing on the amount of time spent viewing the presented abstracts, the total number of abstract viewed, as well as measures of how thoroughly searchers evaluate their results set.

Copyrights may apply

p. 480-481

Chandrasekar, Raman, Chen, Harr, Corston-Oliver, Simon and Brill, Eric (2004): Subwebs for specialized search. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 480-481. Available online

We describe a method to define and use subwebs, user-defined neighborhoods of the Internet. Subwebs help improve search performance by inducing a topic-specific page relevance bias over a collection of documents. Subwebs may be automatically identified using a simple algorithm we describe, and used to provide highly-relevant topic-specific information retrieval. Using subwebs in a Help and Support topic, we see marked improvements in precision compared to generic search engine results.

Copyrights may apply

p. 482-483

Gu, Zhenmei and Luo, Ming (2004): Comparison of using passages and documents for blind relevance feedback in information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 482-483. Available online

This paper compares document blind feedback and passage blind feedback in Information Retrieval (IR), based on the work during the NRRC 2003 Reliable Information Access Summer workshop. The analysis of our experimental results shows overall consistency on the performance impact of using passages and documents for blind feedback. However, it is observed that the behavior of passage blind feedback, compared to document blind feedback, is both system dependent and topic dependent. The relationships between the performance impact of passage blind feedback and the number of feedback terms and the topic's average relevant document length, respectively, are examined to illustrate these dependencies.

Copyrights may apply

p. 484-485

Clough, Paul and Sanderson, Mark (2004): Measuring pseudo relevance feedback & CLIR. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 484-485. Available online

In this poster, we report on the effects of pseudo relevance feedback (PRF) for a cross language image retrieval task using a test collection. Typically PRF has been shown to improve retrieval performance in previous CLIR experiments based on average precision at a fixed rank. However our experiments have shown that queries in which no relevant documents are returned also increases. Because query reformulation for cross language is likely to be harder than with monolingual searching, a great deal of user dissatisfaction would be associated with this scenario. We propose that an additional effectiveness measure based on failed queries may better reflect user satisfaction than average precision alone.

Copyrights may apply

p. 486-487

Tao, Tao and Zhai, Chengxiang (2004): A two-stage mixture model for pseudo feedback. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 486-487. Available online

Pseudo feedback is a commonly used technique to improve information retrieval performance. It assumes a few top-ranked documents to be relevant, and learns from them to improve the retrieval accuracy. A serious problem is that the performance is often very sensitive to the number of pseudo feedback documents. In this poster, we address this problem in a language modeling framework. We propose a novel two-stage mixture model, which is less sensitive to the number of pseudo feedback documents than an effective existing feedback model. The new model can tolerate a more flexible setting of the number of pseudo feedback documents without the danger of losing much retrieval accuracy.

Copyrights may apply

p. 488-489

Crestan, Eric and Loupy, Claude de (2004): Natural language processing for browse help. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 488-489. Available online

In this paper, we will present three "browsing" systems that should save user's time. The first uses named entities and gives a way to reduce search space. By using a information visualization system, the user can comprehend more easily the content of a corpus or a document. Named entities are highlighted for quick reading, temporal and geographic representation gives a global view of the result of a query. All these browse and search helps seem to be very useful. Nevertheless, an evaluation would give more practical results.

Copyrights may apply

p. 49-56

Fang, Hui, Tao, Tao and Zhai, Chengxiang (2004): A formal study of information retrieval heuristics. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 49-56. Available online

Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. One basic research question is thus what exactly are these "necessary" heuristics that seem to cause good retrieval performance. In this paper, we present a formal study of retrieval heuristics. We formally define a set of basic desirable constraints that any reasonable retrieval function should satisfy, and check these constraints on a variety of representative retrieval functions. We find that none of these retrieval functions satisfies all the constraints unconditionally. Empirical results show that when a constraint is not satisfied, it often indicates non-optimality of the method, and when a constraint is satisfied only for a certain range of parameter values, its performance tends to be poor when the parameter is out of the range. In general, we find that the empirical performance of a retrieval formula is tightly related to how well it satisfies these constraints. Thus the proposed constraints provide a good explanation of many empirical observations and make it possible to evaluate any existing or new retrieval formula analytically.

Copyrights may apply

p. 490-491

Mayfield, James and McNamee, Paul (2004): Triangulation without translation. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 490-491. Available online

Transitive retrieval and triangulation have been proposed as ways to improve cross-language retrieval quality when translation resources have poor lexical coverage. We demonstrate that cross-language retrieval is viable for European languages with no translation resources at all; that transitive retrieval without translation does not suffer the drop-off in retrieval quality sometimes reported for transitive retrieval with translation; and that triangulation that combines multiple transitive runs with no translation can boost performance over direct translation-free retrieval.

Copyrights may apply

p. 492-493

Sriram, Smitha, Shen, Xuehua and Zhai, Chengxiang (2004): A session-based search engine. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 492-493. Available online

In this poster, we describe a novel session-based search engine, which puts the search in context. The search engine has a number of session-based features including expansion of the current query with user query history and clickthrough data (title and summary of clicked web pages) in the same search session and the session boundary recognition through temporal closeness and probabilistic similarity between query terms. In addition, the search engine visualizes the rank change of web pages as different queries are submitted in the same search session to help the user reformulate the query.

Copyrights may apply

p. 494-495

Beitzel, Steven M., Jensen, Eric C., Chowdhury, Abdur, Grossman, David A. and Frieder, Ophir (2004): Evaluation of filtering current news search results. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 494-495. Available online

We describe an evaluation of result set filtering techniques for providing ultra-high precision in the task of presenting related news for general web queries. In this task, the negative user experience generated by retrieving non-relevant documents has a much worse impact than not retrieving relevant ones. We adapt cost-based metrics from the document filtering domain to this result filtering problem in order to explicitly examine the tradeoff between missing relevant documents and retrieving non-relevant ones. A large manual evaluation of three simple threshold filters shows that the basic approach of counting matching title terms outperforms also incorporating selected abstract terms based on part-of-speech or higher-level linguistic structures. Simultaneously, leveraging these cost-based metrics allows us to explicitly determine what other tasks would benefit from these alternative techniques.

Copyrights may apply

p. 502-503

Leuski, Anton (2004): Email is a stage: discovering people roles from email archives. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 502-503. Available online

p. 504-505

Shah, Gauri and Syeda-Mahmood, Tanveer (2004): Searching databases for semantically-related schemas. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 504-505. Available online

In this paper, we address the problem of searching schema databases for semantically-related schemas. We first give a method of finding semantic similarity between pair-wise schemas based on tokenization, part-of-speech tagging, word expansion, and ontology matching. We then address the problem of indexing the schema database through a semantic hash table. Matching schemas in the database are found by hashing the query attributes and recording peaks in the histogram of schema hits. Results indicated a 90% improvement in search performance while maintaining high precision and recall.

Copyrights may apply

p. 506-507

Buckley, Chris (2004): Topic prediction based on comparative retrieval rankings. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 506-507. Available online

A new measure, AnchorMap, is introduced to evaluate how close two document retrieval rankings are to each other. It is shown that AnchorMap scores, when run on a set of initial ranked document lists from 8 different systems, are very highly correlated with categorization of topics as easy or hard, and separately, are highly correlated with those topics on which blind feedback works. In another experiment, AnchorMap is used to compare the initial ranked document list from a single system against the ranked document list from that system after blind feedback. Again, high AnchorMap values are highly correlated with both topic difficulty and successful application of blind feedback. Both experiments are examples of using properties of a topic which are independent of relevance information to predict the actual performance of IR systems on the topic. Initial experiments to attempt to improve retrieval performance based upon AnchorMap failed; the causes for failure are discussed.

Copyrights may apply

p. 508-509

Liddy, Elizabeth D., Diekema, Anne R. and Yilmazel, Ozgur (2004): Context-based question-answering evaluation. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 508-509. Available online

In this poster, we will present the results of efforts we have undertaken to conduct evaluations of a QA system in a real world environment and to understand the nature of the dimensions on which users evaluate QA systems when given full reign to comment on whatever dimensions they deem important.

Copyrights may apply

p. 510-511

Sun, Yixing, Harper, David J. and Watt, Stuart N. K. (2004): Design of an e-book user interface and visualizations to support reading for comprehension. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 510-511. Available online

Current e-Book browsers provide minimal support for comprehending the organization, narrative structure, and themes, of large complex books. In order to build an understanding of such books, readers should be provided with user interfaces that present, and relate, the organizational, narrative and thematic structures. We propose adapting information retrieval techniques for the purpose of discovering these structures, and sketch three distinctive visualizations for presenting these structures to the e-Book reader. These visualizations are presented within an initial design for an e-Book browser.

Copyrights may apply

p. 512-513

Hawking, David, Upstill, Trystan and Craswell, Nick (2004): Toward better weighting of anchors. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 512-513. Available online

Okapi BM25 scoring of anchor text surrogate documents has been shown to facilitate effective ranking in navigational search tasks over web data. We hypothesize that even better ranking can be achieved in certain important cases, particularly when anchor scores must be fused with content scores, by avoiding length normalisation and by reducing the attentuation of scores associated with high tf. Preliminary results are presented.

Copyrights may apply

p. 514-515

Ye, Jiamin and Smeaton, Alan F. (2004): Aggregated feature retrieval for MPEG-7 via clustering. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 514-515. Available online

In this paper, we describe an approach to combining text and visual features from MPEG-7 descriptions of video. A video retrieval process is aligned to a text retrieval process based on the TF*IDF vector space model via clustering of low-level visual features. Our assumption is that shots within the same cluster are not only similar visually but also semantically, to a certain extent. Our experiments on the TRECVID2002 and TRECVID2003 collections show that adding extra meaning to a shot based on the shots from the same cluster is useful when each video in a collection contains a high proportion of similar shots, for example in documentaries.

Copyrights may apply

p. 516-517

Corrada-Emmanuel, Andres and Croft, W. Bruce (2004): Answer models for question answering passage retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 516-517. Available online

Answer patterns have been shown to improve the performance of open-domain factoid QA systems. Their use, however, requires either constructing the patterns manually or developing algorithms for learning them automatically. We present here a simpler approach that extends the techniques of language modeling to create answer models. These are language models trained on the correct answers to training questions. We show how they fit naturally into a probabilistic model for answer passage retrieval and demonstrate their effectiveness on the TREC 2002 QA Corpus.

Copyrights may apply

p. 518-519

Wu, Harris and Gordon, Michael D. (2004): Collaborative filing in a document repository. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 518-519. Available online

We introduce an emergent, collaborative filing system. In such a system, an individual is allowed to organize a subset of documents in a repository into a personal hierarchy and share the hierarchy with others. The system generates a "consensus" hierarchy from all users' personal hierarchies, which provides a full, common, and emergent view of all documents. We believe that collaborative filing helps translate personal, tacit knowledge into sharable structures, which help the user as well a community of which he or she is a part. Our filing system is suitable for any documents from text to multimedia files. Initial results on an experimental website show promise. For a knowledge task involving extensive document retrieval, hierarchies are not only used frequently but are also effective in identifying high quality documents. One surprising finding is how often subjects use others' personal hierarchies, and upon close examination, social networks play a key role as well.

Copyrights may apply

p. 520-521

White, Ryen W. and Jose, Joemon M. (2004): A study of topic similarity measures. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 520-521. Available online

In this poster we describe an investigation of topic similarity measures. We elicit assessments on the similarity of 10 pairs of topic from 76 subjects and use these as a benchmark to assess how well each measure performs. The measures have the potential to form the basis of a predictive technique, for adaptive search systems. The results of our evaluation show that measures based on the level of correlation between topics concords most with general subject perceptions of search topic similarity.

Copyrights may apply

p. 522-523

Yang, Hui and Chua, Tat-Seng (2004): Effectiveness of web page classification on finding list answers. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 522-523. Available online

List question answering (QA) offers a unique challenge in effectively and efficiently locating a complete set of distinct answers from huge corpora or the Web. In TREC-12, the median average F1 performance of list QA systems was

Copyrights may apply

p. 524-525

Zhang, Ying and Vines, Phil (2004): Detection and translation of OOV terms prior to query time. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 524-525. Available online

Accurate cross-language information retrieval requires that query terms be correctly translated. Several new techniques to improve the translation of out of vocabulary terms in English-Chinese cross-language information retrieval have been developed. However, these require queries and a document collection to enable translation disambiguation. Although effective, they involve much processing and searching of the Web at query time, and may not be practical in a production web search engine. In this work, we consider what tasks maybe carried out beforehand, the goal being to reduce the processing required at query time. We have successfully developed new techniques to extract and translate out of vocabulary terms using the Web and add them into a translation dictionary prior to query time.

Copyrights may apply

p. 526-527

Nemeth, Yael, Shapira, Bracha and Taeib-Maimon, Meirav (2004): Evaluation of the real and perceived value of automatic and interactive query expansion. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 526-527. Available online

The paper describes a user study examining methods for improving users queries, specifically interactive and automatic query expansion and advanced search options. The user study includes subjective and objective evaluation of the effect of the above methods and a comparison between the real and perceived effect.

Copyrights may apply

p. 528-529

Harman, Donna and Buckley, Chris (2004): The NRRC reliable information access (RIA) workshop. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 528-529. Available online

p. 530-531

Soboroff, Ian (2004): On evaluating web search with very few relevant documents. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 530-531. Available online

Many common web searches by their nature have a very small number of relevant documents. Homepage and "namedpage" searching are known-item searches where there is only a single relevant document. Topic distillation is a special kind of topical relevance search where the user wishes to find a few key web sites rather than every relevant web page. Because these types of searches are so common, web search evaluations have come to focus on tasks where there are very few relevant documents. Evaluations with few relevant documents pose special challenges for current metrics. In particular, the TREC 2003 topic distillation evaluation is unable to distinguish most submitted runs from each other.

Copyrights may apply

p. 532-533

Li, Qing, Kim, Byeong Man, Guan, Dong Hai and Oh, Duk whan (2004): A music recommender based on audio features. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 532-533. Available online

Many collaborative music recommender systems (CMRS) have succeeded in capturing the similarity among users or items based on ratings, however they have rarely considered about the available information from the multimedia such as genres, let alone audio features from the media stream. Such information is valuable and can be used to solve several problems in RS. In this paper, we design a CMRS based on audio features of the multimedia stream. In the CMRS, we provide recommendation service by our proposed method where a clustering technique is used to integrate the audio features of music into the collaborative filtering (CF) framework in hopes of achieving better performance. Experiments are carried out to demonstrate that our approach is feasible.

Copyrights may apply

p. 534-535

Ma, Liping and Shepherd, John (2004): Information extraction using two-phase pattern discovery. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 534-535. Available online

This paper presents a new two-phase pattern (2PP) discovery technique for information extraction. 2PP consists of orthographic pattern discovery (OPD) and semantic pattern discovery (SPD) where the OPD determines the structural features from an identified region of a document and the SPD discovers a dominant semantic pattern for the region via inference, apposition and analogy. Then the discovered pattern is applied back into the region to extract required data items through pattern matching. We evaluated 2PP using 6500 data items and obtained effective result.

Copyrights may apply

p. 536-537

Lu, Yue, Zhang, Li and Tan, Chew Lim (2004): A search engine for imaged documents in PDF files. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 536-537. Available online

Large quantities of documents in the Internet and digital libraries are simply scanned and archived in image format, many of which are packed in PDF files. The word search tool provided by Adobe Reader/Acrobat does not work for these imaged documents. In this paper, we present a search engine to deal with this issue for imaged documents in PDF files. The experimental results show an encouraging performance.

Copyrights may apply

p. 538-539

Liu, Yan, Carbonell, Jaime, Klein-Seetharaman, Judith and Gopalakrishnan, Vanathi (2004): Context sensitive vocabulary and its application in protein secondary structure prediction. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 538-539. Available online

Protein secondary structure prediction is an important step towards understanding the relation between protein sequence and structure. However, most current prediction methods use features difficult for biologists to interpret. In this paper, we present a new method that applies information retrieval techniques to solve the problem: we extract a context sensitive biological vocabulary for protein sequences and apply text classification methods to predict protein secondary structure. Experimental results show that our method performs comparably to the state-of-art methods. Furthermore, the context sensitive vocabularies can serve as a useful tool to discover meaningful regular expression patterns for protein structures.

Copyrights may apply

p. 540-541

Metzler, Donald, Lavrenko, Victor and Croft, W. Bruce (2004): Formal multiple-bernoulli models for language modeling. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 540-541. Available online

p. 542-543

Azzopardi, L., Girolami, M. and Rijsbergen, C. J. Van (2004): User biased document language modelling. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 542-543. Available online

Capitalizing on the intuitive underlying assumptions of Language Modelling for Ad-Hoc Retrieval we present a novel approach that is capable of injecting the user's context of the document collection into the retrieval process. The preliminary findings from the evaluation undertaken suggest that improved IR performance is possible under certain circumstances. This motivates further investigation to determine the extent and significance of this improved performance.

Copyrights may apply

p. 544-545

Collins-Thompson, Kevyn and Callan, Jamie (2004): Information retrieval for language tutoring: an overview of the REAP project. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 544-545. Available online

p. 546-547

Xu, Yinghui and Umemura, Kyoji (2004): A unified model of literal mining and link analysis for ranking web resources. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 546-547. Available online

Web link analysis has been proved to provide significant enhancement to the precision of Web search in practice. The PageRank algorithm, which is used in Google Search Engine, plays an important role on improving the quality of its resuts by employing the explicit hyperlink structure among the Web pages. The prestige of Web pages defined by PageRank is purely derived from surfer random walk on the Web graph without textual content consideration. However, in the practical sense, user surfing behavior is far from random jumping. In this paper, we present a unified model for a more accurate page rank. User's surfing is guided by a probabilistic model that is based on literal matching between connected pages. The result shows that our proposed ranking algorithms do perform better than the original PageRank.

Copyrights may apply

p. 548-549

Liu, Xiaoyong, Croft, W. Bruce, Oh, Paul and Hart, David (2004): Automatic recognition of reading levels from user queries. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 548-549. Available online

p. 550-551

Basilico, Justin and Hofmann, Thomas (2004): A joint framework for collaborative and content filtering. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 550-551. Available online

This paper proposes a novel, unified, and systematic approach to combine collaborative and content-based filtering for ranking and user preference prediction. The framework incorporates all available information by coupling together multiple learning problems and using a suitable kernel or similarity function between user-item pairs. We propose and evaluate an on-line algorithm (JRank) that generalizes perceptron learning using this framework and shows significant improvement over other approaches.

Copyrights may apply

p. 552-553

Kim, Hee-soo, Choi, Ikkyu and Kim, Minkoo (2004): Refining term weights of documents using term dependencies. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 552-553. Available online

When processing raw documents in Information Retrieval (IR) System, a term-weighting scheme is used to calculate the importance of each term which occurs in a document. However, most term-weighting schemes assume that a term is independent of the other terms. Term dependency is an indispensable consequence of language use [1]. Therefore, this assumption can make the information of a document being lost. In this paper, we propose new approach to refine term weights of documents using term dependencies discovered from a set of documents. Then, we evaluate our method with two experiments based on the vector space model [2] and the language model [3].

Copyrights may apply

p. 554-555

Sigurbjornsson, Borkur, Kamps, Jaap and Rijke, Maarten de (2004): Multiple sources of evidence for XML retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 554-555. Available online

Document-centric XML collections contain text-rich documents, marked up with XML tags. The tags add lightweight semantics to the text. Querying such collections calls for a hybrid query language: the text-rich nature of the documents suggest a content-oriented (IR) approach, while the mark-up allows users to add structural constraints to their IR queries. We will show how evidence for relevancy from different sources helps to answer such hybrid queries. We evaluate our methods using the INEX 2003 test set, and show that structural hints in hybrid queries help to improve retrieval effectiveness.

Copyrights may apply

p. 558-559

Hedley, Y. L., Younas, M., James, A. and Sanderson, M. (2004): Query-related data extraction of hidden web documents. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 558-559. Available online

The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (i.e., Google and Yahoo). Such information is dynamically generated through querying databases -- which are referred to as Hidden Web databases. Documents returned in response to a user query are typically presented using template-generated Web pages. This paper proposes a novel approach that identifies Web page templates by analysing the textual contents and the adjacent tag structures of a document in order to extract query-related data. Preliminary results demonstrate that our approach effectively detects templates and retrieves data with high recall and precision.

Copyrights may apply

p. 560-561

Fujii, Atsushi, Iwayama, Makoto and Kando, Noriko (2004): The patent retrieval task in the fourth NTCIR workshop. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 560-561. Available online

This paper describes the Patent Retrieval Task in the Fourth NTCIR Workshop, and the test collections produced in this task. We perform the invalidity search task, in which each participant group searches a patent collection for the patents that can invalidate the demand in an existing claim. We also perform the automatic patent map generation task, in which the patents associated with a specific topic are organized in a multi-dimensional matrix.

Copyrights may apply

p. 562-563

Voorhees, Ellen M. (2004): Measuring ineffectiveness. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 562-563. Available online

An evaluation methodology that targets ineffective topics is needed to support research on obtaining more consistent retrieval across topics. Using average values of traditional evaluation measures is not an appropriate methodology because it emphasizes effective topics: poorly performing topics' scores are by definition small, and they are therefore difficult to distinguish from the noise inherent in retrieval evaluation. We examine two new measures that emphasize a system's worst topics. While these measures focus on different aspects of retrieval behavior than traditional measures, the measures are less stable than traditional measures and the margin of error associated with the new measures is large relative to the observed differences in scores.

Copyrights may apply

p. 564-565

Cowans, Philip J. (2004): Information retrieval using hierarchical dirichlet processes. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 564-565. Available online

An information retrieval method is proposed using a hierarchical Dirichlet process as a prior on the parameters of a set of multinomial distributions. The resulting method naturally includes a number of features found in other popular methods. Specifically, tf.idf-like term weighting and document length normalisation are recovered. The new method is compared with Okapi BM-25 [3] and the Twenty-One model [1] on TREC data and is shown to give better performance.

Copyrights may apply

p. 566-567

Goweder, Abduelbaset, Poesio, Massimo and Roeck, Anne De (2004): Broken plural detection for Arabic information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 566-567. Available online

An information retrieval method is proposed using a hierarchical Dirichlet process as a prior on the parameters of a set of multinomial distributions. The resulting method naturally includes a number of features found in other popular methods. Specifically, tf.idf-like term weighting and document length normalisation are recovered. The new method is compared with Okapi BM-25 [3] and the Twenty-One model [1] on TREC data and is shown to give better performance.

Copyrights may apply

p. 568-569

Jin, Rong and Si, Luo (2004): A study of methods for normalizing user ratings in collaborative filtering. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 568-569. Available online

The goal of collaborative filtering is to make recommendations for a test user by utilizing the rating information of users who share interests similar to the test user. Because ratings are determined not only by user interests but also the rating habits of users, it is important to normalize ratings of different users to the same scale. In this paper, we compare two different normalization strategies for user ratings, namely the Gaussian normalization method and the decoupling normalization method. Particularly, we incorporated these two rating normalization methods into two collaborative filtering algorithms, and evaluated their effectiveness on the EachMovie dataset. The experiment results have shown that the decoupling method for rating normalization is more effective than the Gaussian normalization method in improving the performance of collaborative filtering algorithms.

Copyrights may apply

p. 57-63

Wen, Ji-Rong, Lao, Ni and Ma, Wei-Ying (2004): Probabilistic model for contextual retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 57-63. Available online

Contextual retrieval is a critical technique for facilitating many important applications such as mobile search, personalized search, PC troubleshooting, etc. Despite of its importance, there is no comprehensive retrieval model to describe the contextual retrieval process. We observed that incompatible context, noisy context and incomplete query are several important issues commonly existing in contextual retrieval applications. However, these issues have not been previously explored and discussed. In this paper, we propose probabilistic models to address these problems. Our study clearly shows that query log is the key to build effective contextual retrieval models. We also conduct a case study in the PC troubleshooting domain to testify the performance of the proposed models and experimental results show that the models can achieve very good retrieval precision.

Copyrights may apply

p. 570-571

Warren, Robert H. and Liu, Ting (2004): A review of relevance feedback experiments at the 2003 reliable information access (RIA) workshop.. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 570-571. Available online

We review here the results of one of the experiments performed at the 2003 Reliable Information Access (RIA) Workshop, hosted by Mitre Corporation and the Northeast Regional Research Center (NRRC). The experiment concentrates on query expansion using relevance feedback and explores the behaviour of several information retrieval systems using variable numbers of relevant documents.

Copyrights may apply

p. 572-573

Liu, Bicheng, Harper, David J. and Watt, Stuart (2004): Supporting federated information sharing communities. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 572-573. Available online

In this paper we describe the concept of Federated Information Sharing Communities (FISC), and associated architecture, which provide a way for organisations, distributed workgroups and individuals to build up a federated community based on their common interests over the World Wide Web. To support communities, we develop capabilities that go beyond the generic retrieval of documents to include the ability to retrieve people, their interests and inter-relationships. We focus on providing social awareness "in the large" to help users understand the members within a community and the relationships between them. Within the FISC framework, we provide viewpoint retrieval to enable a user to construct visual contextual views of the community from the perspective of any community member. To evaluate these ideas we develop test beds to compare individual component technologies such as user and group profile construction and similarity matching, and we develop prototypes to explore the broader architecture and usage issues.

Copyrights may apply

p. 574-575

Collins-Thompson, Kevyn, Callan, Jamie, Terra, Egidio and Clarke, Charles L. A. (2004): The effect of document retrieval quality on factoid question answering performance. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 574-575. Available online

p. 576-577

Upstill, Trystan and Robertson, Stephen (2004): Exploiting hyperlink recommendation evidence in navigational web search. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 576-577. Available online

p. 578-579

Hunnisett, D. S. and Teahan, W. J. (2004): Context-based methods for text categorisation. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 578-579. Available online

We propose several context-based methods for text categorization. One method, a small modification to the PPM compression-based model which is known to significantly degrade compression performance, counter-intuitively has the opposite effect on categorization performance. Another method, called C-measure, simply counts the presence of higher order character contexts, and outperforms all other approaches investigated.

Copyrights may apply

p. 580-581

Aery, Manu and Chakravarthy, Sharma (2004): eMailSift: mining-based approaches to email classification. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 580-581. Available online

p. 584-585

Buckley, Chris (2004): Why current IR engines fail. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 584-585. Available online

Observations from a unique investigation of failure analysis of Information Retrieval (IR) research engines are presented. The Reliable Information Access (RIA) Workshop invited seven leading IR research groups to supply both their systems and their experts to an effort to analyze why their systems fail on some topics and whether the failures are due to system flaws, approach flaws, or the topic itself. There were surprising results from this cross-system failure analysis. One is that despite systems retrieving very different documents, the major cause of failure for any particular topic was almost always the same across all systems. Another is that relationships between aspects of a topic are not especially important for state-of-the-art systems; the systems are failing at a much more basic level where the top-retrieved documents are not reflecting some aspect at all.

Copyrights may apply

p. 586-587

Zahariev, Manuel (2004): Automatic sense disambiguation for acronyms. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 586-587. Available online

A machine learning methodology for the disambiguation of acronym senses is presented, which starts from an acronym sense dictionary. Training data is automatically extracted from downloaded documents identified from the results of search engine queries. Leave-one-out cross-validation on 9,963 documents with 47 acronym forms achieves accuracy 92.58% and F{sup:{beta}=1}=91.52%.

Copyrights may apply

p. 588-589

Somlo, Gabriel L. and Howe, Adele E. (2004): Filtering for personal web information agents. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 588-589. Available online

p. 590-591

Christel, Michael G., Moraveji, Neema and Huang, Chang (2004): Evaluating content-based filters for image and video retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 590-591. Available online

This paper investigates the level of metadata accuracy required for image filters to be valuable to users. Access to large digital image and video collections is hampered by ambiguous and incomplete metadata attributed to imagery. Though improvements are constantly made in the automatic derivation of semantic feature concepts such as indoor, outdoor, face, and cityscape, it is unclear how good these improvements should be and under what circumstances they are effective. This paper explores the relationship between metadata accuracy and effectiveness of retrieval using an amateur photo collection, documentary video, and news video. The accuracy of the feature classification is varied from performance typical of automated classifications today to ideal performance taken from manually generated truth data. Results establish an accuracy threshold at which semantic features can be useful, and empirically quantify the collection size when filtering first shows its effectiveness.

Copyrights may apply

p. 596

Gey, Fredric C., Chen, Aitao, Larson, Ray and Carl, Kim (2004): Geotemporal querying of multilingual documents. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. p. 596. Available online

This demonstration utilizes a geographic information system interface to display multilingual news documents in time and space by extracting place names from text and matching them to a multilingual multi-script gazetteer which identifies the latitude and longitude of the location.

Copyrights may apply

p. 597

Shen, Xuehua, Sriram, Smitha and Zhai, Chengxiang (2004): ACES: a contextual engine for search. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. p. 597. Available online

p. 598

Chapman, Sam, Dingli, Alexiei and Ciravegna, Fabio (2004): Armadillo: harvesting information for the semantic web. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. p. 598. Available online

p. 599

Kruschwitz, Udo and Al-Bakour, Hala (2004): UKSearch: search with automatically acquired domain knowledge. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. p. 599. Available online

p. 600

Larson, Ray R. and Frontiera, Patricia (2004): Geographic information retrieval (GIR): searching where and what. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. p. 600. Available online

p. 602

Bot, Razvan Stefan (2004): Improving document representation by accumulating relevance feedback (abstract only): the relevance feedback accumulation algorithm. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. p. 602. Available online

This paper presents a document representation improvement technique named the Relevance Feedback Accumulation (RFA) algorithm. Using prior relevance feedback assessments and a data mining measure called support this algorithm improves document representations and generates higher quality indexes. At the same time, the algorithm is efficient and scalable, suited for retrieval systems managing large document collections. The results of the preliminary evaluation reveal that the RFA algorithm is able to reduce the index dimensionality while improving retrieval effectiveness.

Copyrights may apply

p. 603

Trotman, Andrew (2004): An artificial intelligence approach to information retrieval (abstract only). In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. p. 603. Available online

Current approaches to information retrieval rely on the creativity of individuals to develop new algorithms. In this investigation the use of genetic algorithms (GA) and genetic programming (GP) to learn IR algorithms is examined. Document structure weighting is a technique whereby different parts of a document (title, abstract, etc.) contribute unevenly to the overall document weight during ranking. Near optimal weights can be learned with a GA. Doing so shows a statistically significant 5% relative improvement in MAP for vector space inner product and Croft's probabilistic ranking, but no improvement for BM25. Two applications of this approach are suggested: offline learning, and relevance feedback. In a second set of experiments, a new ranking function was learned using GP. This new function yields a statistically significant 11% relative improvement on unseen queries tested on the training documents. Portability tests to different collections (not used in training) demonstrate the performance of the new function exceeds vector space and probability, and slightly exceeds BM25. Learning weights for this new function is proposed. The application of genetic learning to stemming and thesaurus construction is discussed. Stemming rules such as those of the Porter algorithm are candidates for GP learning whereas synonym sets are candidates for GA learning.

Copyrights may apply

p. 604

Yuan, Xiaojun (2004): Supporting multiple information-seeking strategies in a single system framework (abstract only). In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. p. 604. Available online

This research explores the relationship between information-seeking strategies (ISSs) and information retrieval (IR) system design. When people seek information they engage in a variety of ISSs in order to search for specific items, learn about the contents of the database, evaluate retrieved information, and so on. The theoretical foundations of the work are based on the information-seeking episode model developed by Belkin (1996), and the multi-facet classification scheme of information behaviors proposed by Cool & Belkin (2002). The goal of this research is to construct and evaluate an interactive retrieval system which uses different combinations of IR techniques to support different ISSs. Example IR techniques include comparison using exact and probabilistic matching algorithms; summarization of information objects using titles, snippets or abstracts; visualization techniques such as lists or classified results; and navigation techniques such as scrolling or following links. By designing a retrieval system with diverse strategies in mind, we can adaptively support multiple ISSs, permitting a user to move seamlessly from one strategy to another, choosing instantiations of each support technique tailored to the specific ISS. The research will be conducted in a series of four steps. (1) Develop an object-oriented framework for representing basic IR techniques. (2) Design, implement and evaluate systems which support individual ISSs such as browsing and searching. (3) Specify an interaction structure for guiding and controlling sequences of different supporting techniques.(4) Design, implement, and evaluate a dynamically adaptive system supporting multiple ISSs in comparison to a non-adaptive baseline system.

Copyrights may apply

p. 64-71

Nallapati, Ramesh (2004): Discriminative models for information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 64-71. Available online

Discriminative models have been preferred over generative models in many machine learning problems in the recent past owing to some of their attractive theoretical properties. In this paper, we explore the applicability of discriminative classifiers for IR. We have compared the performance of two popular discriminative models, namely the maximum entropy model and support vector machines with that of language modeling, the state-of-the-art generative model for IR. Our experiments on ad-hoc retrieval indicate that although maximum entropy is significantly worse than language models, support vector machines are on par with language models. We argue that the main reason to prefer SVMs over language models is their ability to learn arbitrary features automatically as demonstrated by our experiments on the home-page finding task of TREC-10.

Copyrights may apply

p. 72-79

Kazai, Gabriella, Lalmas, Mounia and Vries, Arjen P. de (2004): The overlap problem in content-oriented XML retrieval evaluation. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 72-79. Available online

Within the INitiative for the Evaluation of XML Retrieval (INEX) a number of metrics to evaluate the effectiveness of content-oriented XML retrieval approaches were developed. Although these metrics provide a solution towards addressing the problem of overlapping result elements, they do not consider the problem of overlapping reference components within the recall-base, thus leading to skewed effectiveness scores. We propose alternative metrics that aim to provide a solution to both overlap issues.

Copyrights may apply

p. 80-87

Kamps, Jaap, Rijke, Maarten de and Sigurbjornsson, Borkur (2004): Length normalization in XML retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 80-87. Available online

XML retrieval is a departure from standard document retrieval in which each individual XML element, ranging from italicized words or phrases to full blown articles, is a potentially retrievable unit. The distribution of XML element lengths is unlike what we usually observe in standard document collections, prompting us to revisit the issue of document length normalization. We perform a comparative analysis of arbitrary elements versus relevant elements, and show the importance of length as a parameter for XML retrieval. Within the language modeling framework, we investigate a range of techniques that deal with length either directly or indirectly. We observe a length bias introduced by the amount of smoothing, and show the importance of extreme length priors for XML retrieval. We also show that simply removing shorter elements from the index (by introducing a cut-off value) does not create an appropriate document length normalization. Even after increasing the minimal size of XML elements occurring in the index, the importance of an extreme length bias remains.

Copyrights may apply

p. 88-95

Liu, Shaorong, Zou, Qinghua and Chu, Wesley W. (2004): Configurable indexing and ranking for XML information retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 88-95. Available online

Indexing and ranking are two key factors for efficient and effective XML information retrieval. Inappropriate indexing may result in false negatives and false positives, and improper ranking may lead to low precisions. In this paper, we propose a configurable XML information retrieval system, in which users can configure appropriate index types for XML tags and text contents. Based on users' index configurations, the system transforms XML structures into a compact tree representation, Ctree, and indexes XML text contents. To support XML ranking, we propose the concepts of "weighted term frequency" and "inverted element frequency," where the weight of a term depends on its frequency and location within an XML element as well as its popularity among similar elements in an XML dataset. We evaluate the effectiveness of our system through extensive experiments on the INEX 03 dataset and 30 content and structure (CAS) topics. The experimental results reveal that our system has significantly high precision at low recall regions and achieves the highest average precision (0.3309) as compared with 38 official INEX 03 submissions using the strict evaluation metric.

Copyrights may apply

p. 96-103

He, Xiaofei, Cai, Deng, Liu, Haifeng and Ma, Wei-Ying (2004): Locality preserving indexing for document representation. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 96-103. Available online

Document representation and indexing is a key problem for document analysis and processing, such as clustering, classification and retrieval. Conventionally, Latent Semantic Indexing (LSI) is considered effective in deriving such an indexing. LSI essentially detects the most representative features for document representation rather than the most discriminative features. Therefore, LSI might not be optimal in discriminating documents with different semantics. In this paper, a novel algorithm called Locality Preserving Indexing (LPI) is proposed for document indexing. Each document is represented by a vector with low dimensionality. In contrast to LSI which discovers the global structure of the document space, LPI discovers the local structure and obtains a compact document representation subspace that best detects the essential semantic structure. We compare the proposed LPI approach with LSI on two standard databases. Experimental results show that LPI provides better representation in the sense of semantic structure.

Copyrights may apply




What do YOU think?

Give us your opinion! Do you have any comments/additions
that you would like other visitors to see?

 
comment You say: Mar 21st, 2010
#1
Be the first to add a thoughtful note to this page ! 

  will be spam-protected
 

 
How many?
=
e.g. "6"
 

Changes to this page (conference)

24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
24 Jun 2007: Conference Proceedings was edited
Mar 21

Software design is the act of determining the user's experience with a piece of software. It has nothing to do with how the code works inside, or how big or small the code is. The designer's task is to specify completely and unambiguously the user's whole experience.

-- David Liddle, From Bringing Design to Software, edited by Terry Winograd, 1996

  • Share this quote on... Bookmark and Share
  • Get more quotes

Eva Hornecker on Tangible Interaction

Eva Hornecker explains the evolving concept of Tangible Interaction.

Read Eva's insightful entry here..

Help us help you!

  • Spread the word: Bookmark and Share
  • Donate
  • Other ways to help
 

Page information

Page maintainer: The Editorial Team
How to cite/reference this page
URL: http://www.interaction-design.org/references/conferences/proceedings_of_the_27th_annual_international_acm_sigir_conference_on_research_and_development_in_information_retrieval.html