Fernando Diaz
About the author:
No description available of Fernando Diaz...
Publications by Fernando Diaz (bibliography)
» 2007 «
Diaz, Fernando (2007): Performance prediction using spatial autocorrelation. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2007. pp. 583-590. Available online
Evaluation of information retrieval systems is one of the core tasks in information retrieval. Problems include the inability to exhaustively label all documents for a topic, generalizability from a small number of topics, and incorporating the variability of retrieval systems. Previous work addresses the evaluation of systems, the ranking of queries by difficulty, and the ranking of individual retrievals by performance. Approaches exist for the case of few and even no relevance judgments. Our focus is on zero-judgment performance prediction of individual retrievals. One common shortcoming of previous techniques is the assumption of uncorrelated document scores and judgments. If documents are embedded in a high-dimensional space (as they often are), we can apply techniques from spatial data analysis to detect correlations between document scores. We find that the low correlation between scores of topically close documents often implies a poor retrieval performance. When compared to a state of the art baseline, we demonstrate that the spatial analysis of retrieval scores provides significantly better prediction performance. These new predictors can also be incorporated with classic predictors to improve performance further. We also describe the first large-scale experiment to evaluate zero-judgment performance prediction for a massive number of retrieval systems over a variety collections in several languages.
Copyrights may apply
Jones, Rosie and Diaz, Fernando (2007): Temporal profiles of queries. In ACM Transactions on Information Systems, 25 (3) p. 14
Documents with timestamps, such as email and news, can be placed along a timeline. The timeline for a set of documents returned in response to a query gives an indication of how documents relevant to that query are distributed in time. Examining the timeline of a query result set allows us to characterize both how temporally dependent the topic is, as well as how relevant the results are likely to be. We outline characteristic patterns in query result set timelines, and show experimentally that we can automatically classify documents into these classes. We also show that properties of the query result set timeline can help predict the mean average precision of a query. These results show that meta-features associated with a query can be combined with text retrieval techniques to improve our understanding and treatment of text search on documents with timestamps.
Copyrights may apply
» 2006 «
Diaz, Fernando and Metzler, Donald (2006): Improving the estimation of relevance models using large external corpora. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2006. pp. 154-161. Available online
Information retrieval algorithms leverage various collection statistics to improve performance. Because these statistics are often computed on a relatively small evaluation corpus, we believe using larger, non-evaluation corpora should improve performance. Specifically, we advocate incorporating external corpora based on language modeling. We refer to this process as external expansion. When compared to traditional pseudo-relevance feedback techniques, external expansion is more stable across topics and up to 10% more effective in terms of mean average precision. Our results show that using a high quality corpus that is comparable to the evaluation corpus can be as, if not more, effective than using the web. Our results also show that external expansion outperforms simulated relevance feedback. In addition, we propose a method for predicting the extent to which external expansion will improve retrieval performance. Our new measure demonstrates positive correlation with improvements in mean average precision.
Copyrights may apply
» 2005 «
Diaz, Fernando (2005): Regularizing ad hoc retrieval scores. In: Herzog, Otthein, Schek, Hans-Jörg and Fuhr, Norbert (eds.) Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management October 31 - November 5, 2005, Bremen, Germany. pp. 672-679. Available online
» 2004 «
Diaz, Fernando and Jones, Rosie (2004): Using temporal profiles of queries for precision prediction. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 18-24. Available online
A key missing component in information retrieval systems is self-diagnostic tests to establish whether the system can provide reasonable results for a given query on a document collection. If we can measure properties of a retrieved set of documents which allow us to predict average precision, we can automate the decision of whether to elicit relevance feedback, or modify the retrieval system in other ways. We use meta-data attached to documents in the form of time stamps to measure the distribution of documents retrieved in response to a query, over the time domain, to create a temporal profile for a query. We define some useful features over this temporal profile. We find that using these temporal features, together with the content of the documents retrieved, we can improve the prediction of average precision for a query.
Copyrights may apply
SHOW THIS LIST ON YOUR HOMEPAGE
What do YOU think?
Give us your opinion! Do you have any comments/additions
that you would like other visitors to see?
You say:
Mar 21st, 2010
Changes to this page (author)
17 Feb 2010: Enabled abstracts to be shown on Fernando Diaz's author page.29 May 2009: Author was edited 29 May 2009: Author was edited
12 May 2008: Author was edited
12 May 2008: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was added to the bibliography