David Carmel
About the author:
No description available of David Carmel...
Publications by David Carmel (bibliography)
» 2009 «
Amitay, Einat, Carmel, David, Har'El, Nadav, Ofek-Koifman, Shila, Soffer, Aya, Yogev, Sivan and Golbandi, Nadav (2009): Social search and discovery using a unified approach. In: Proceedings of the 2009 International Conference on the World Wide Web 2009. pp. 1211-1212. Available online
We explore new ways of improving a search engine using data from Web 2.0 applications such as blogs and social bookmarks. This data contains entities such as documents, people and tags, and relationships between them. We propose a simple yet effective method, based on faceted search, that treats all entities in a unified manner: returning all of them (documents, people and tags) on every search, and allowing all of them to be used as search terms. We describe an implementation of such a social search engine on the intranet of a large enterprise, and present large-scale experiments which verify the validity of our approach.
Copyrights may apply
» 2008 «
Carmel, David, Yom-Tov, Elad and Roitman, Haggai (2008): Enhancing digital libraries using missing content analysis. In: JCDL08 Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008. pp. 1-10. Available online
This work shows how the content of a digital library can be enhanced to better satisfy its users' needs. Missing content is identified by finding missing content topics in the system's query log or in a pre-defined taxonomy of required knowledge. The collection is then enhanced with new relevant knowledge, which is extracted from external sources that satisfy those missing content topics. Experiments we conducted measure the precision of the system before and after content enhancement. The results demonstrate a significant improvement in the system effectiveness as a result of content enhancement and the superiority of the missing content enhancement policy over several other possible policies.
Copyrights may apply
» 2006 «
Mamou, Jonathan, Carmel, David and Hoory, Ron (2006): Spoken document retrieval from call-center conversations. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2006. pp. 51-58. Available online
We are interested in retrieving information from conversational speech corpora, such as call-center data. This data comprises spontaneous speech conversations with low recording quality, which makes automatic speech recognition (ASR) a highly difficult task. For typical call-center data, even state-of-the-art large vocabulary continuous speech recognition systems produce a transcript with word error rate of 30% or higher. In addition to the output transcript, advanced systems provide word confusion networks (WCNs), a compact representation of word lattices associating each word hypothesis with its posterior probability. Our work exploits the information provided by WCNs in order to improve retrieval performance. In this paper, we show that the mean average precision (MAP) is improved using WCNs compared to the raw word transcripts. Finally, we analyze the effect of increasing ASR word error rate on search effectiveness. We show that MAP is still reasonable even under extremely high error rate.
Copyrights may apply
» 2005 «
Mishne, Gilad, Carmel, David, Hoory, Ron, Roytman, Alexey and Soffer, Aya (2005): Automatic analysis of call-center conversations. In: Herzog, Otthein, Schek, Hans-Jörg and Fuhr, Norbert (eds.) Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management October 31 - November 5, 2005, Bremen, Germany. pp. 453-459. Available online
Anagnostopoulos, Aris, Broder, Andrei Z. and Carmel, David (2005): Sampling search-engine results. In: Proceedings of the 2005 International Conference on the World Wide Web 2005. pp. 245-256. Available online
We consider the problem of efficiently sampling Web search engine query results. In turn, using a small random sample instead of the full set of results leads to efficient approximate algorithms for several applications, such as: * Determining the set of categories in a given taxonomy spanned by the search results; * Finding the range of metadata values associated to the result set in order to enable "multi-faceted search;" * Estimating the size of the result set; * Data mining associations to the query terms. We present and analyze an efficient algorithm for obtaining uniform random samples applicable to any search engine based on posting lists and document-at-a-time evaluation. (To our knowledge, all popular Web search engines, e.g. Google, Inktomi, AltaVista, AllTheWeb, belong to this class.) Furthermore, our algorithm can be modified to follow the modern object-oriented approach whereby posting lists are viewed as streams equipped with a next method, and the next method for Boolean and other complex queries is built from the next method for primitive terms. In our case we show how to construct a basic next(p) method that samples term posting lists with probability p, and show how to construct next(p) methods for Boolean operators (AND, OR, WAND) from primitive methods. Finally, we test the efficiency and quality of our approach on both synthetic and real-world data.
Copyrights may apply
» 2004 «
Amitay, Einat, Carmel, David, Lempel, Ronny and Soffer, Aya (2004): Scaling IR-system evaluation using term relevance sets. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 10-17. Available online
This paper describes an evaluation method based on Term Relevance Sets Trels that measures an IR system's quality by examining the content of the retrieved results rather than by looking for pre-specified relevant pages. Trels consist of a list of terms believed to be relevant for a particular query as well as a list of irrelevant terms. The proposed method does not involve any document relevance judgments, and as such is not adversely affected by changes to the underlying collection. Therefore, it can better scale to very large, dynamic collections such as the Web. Moreover, this method can evaluate a system's effectiveness on an updatable "live" collection, or on collections derived from different data sources. Our experiments show that the proposed method is very highly correlated with official TREC measures.
Copyrights may apply
Amitay, Einat, Carmel, David, Herscovici, Michael, Lempel, Ronny and Soffer, Aya (2004): Trend detection through temporal link analysis. In JASIST - Journal of the American Society for Information Science and Technology, 55 (14) pp. 1270-1281
» 2003 «
Amitay, Einat, Carmel, David, Darlow, Adam, Lempel, Ronny and Soffer, Aya (2003): The connectivity sonar: detecting site functionality by structural patterns. In: Proceedings of the Fourteenth ACM Conference on Hypertext 2003. pp. 38-47. Available online
Web sites today serve many different functions, such as corporate sites, search engines, e-stores, and so forth. As sites are created for different purposes, their structure and connectivity characteristics vary. However, this research argues that sites of similar role exhibit similar structural patterns, as the functionality of a site naturally induces a typical hyperlinked structure and typical connectivity patterns to and from the rest of the Web. Thus, the functionality of Web sites is reflected in a set of structural and connectivity-based features that form a typical signature. In this paper, we automatically categorize sites into eight distinct functional classes, and highlight several search-engine related applications that could make immediate use of such technology. We purposely limit our categorization algorithms by tapping connectivity and structural data alone, making no use of any content analysis whatsoever. When applying two classification algorithms to a set of 202 sites of the eight defined functional categories, the algorithms correctly classified between 54.5% and 59% of the sites. On some categories, the
Copyrights may apply
Carmel, David, Maarek, Yoelle S., Mandelbrod, Matan, Mass, Yosi and Soffer, Aya (2003): Searching XML documents via XML fragments. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2003. pp. 151-158. Available online
Most of the work on XML query and search has stemmed from the publishing and database communities, mostly for the needs of business applications. Recently, the Information Retrieval community began investigating the XML search issue to answer information discovery needs. Following this trend, we present here an approach where information needs can be expressed in an approximate manner as pieces of XML documents or "XML fragments" of the same nature as the documents that are being searched. We present an extension of the vector space model for searching XML collections via XML fragments and ranking results by relevance. We describe how we have extended a full-text search engine to comply with this model. The value of the proposed method is demonstrated by the relative high precision of our system, which was among the top performers in the recent INEX workshop. Our results indicate that certain queries are more appropriate than others for the extended vector space model. Specifically, queries with relatively specific contexts but vague information needs are best situated to reap the benefit of this model. Finally our results show that one method may not fit all types of queries and that it could be worthwhile to use different solutions for different applications.
Copyrights may apply
Broder, Andrei Z., Carmel, David, Herscovici, Michael, Soffer, Aya and Zien, Jason Y. (2003): Efficient query evaluation using a two-level retrieval process. In: Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management November 2-8, 2003, New Orleans, Louisiana, USA. pp. 426-434. Available online
» 2002 «
Aridor, Yariv, Carmel, David, Maarek, Yoelle S., Soffer, Aya and Lempel, Ronny (2002): Knowledge encapsulation for focused search from pervasive devices. In ACM Transactions on Information Systems, 20 (1) pp. 25-46
Mobile knowledge seekers often need access to information on the Web during a meeting or on the road, while away from their desktop. A common practice today is to use pervasive devices such as Personal Digital Assistants or mobile phones. However, these devices have inherent constraints (e.g., slow communication, form factor) which often make information discovery tasks impractical.In this paper, we present a new focused-search approach specifically oriented for the mode of work and the constraints dictated by pervasive devices. It combines focused search within specific topics with encapsulation of topic-specific information in a persistent repository. One key characteristic of these persistent repositories is that their footprint is small enough to fit on local devices, and yet they are rich enough to support many information discovery tasks in disconnected mode. More specifically, we suggest a representation for topic-specific information based on "knowledge-agent bases" that comprise all the information necessary to access information about a topic (under the form of key concepts and key Web pages) and assist in the full search process from query formulation assistance to result scanning on the device itself. The key contribution of our work is the coupling of focused search with encapsulated knowledge representation making information discovery from pervasive devices practical as well as efficient. We describe our model in detail and demonstrate its aspects through sample scenarios.
Copyrights may apply
Carmel, David, Farchi, Eitan, Petruschka, Yael and Soffer, Aya (2002): Automatic query refinement using lexical affinities with maximal information gain. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2002. pp. 283-290. Available online
This work describes an automatic query refinement technique, which focuses on improving precision of the top ranked documents. The terms used for refinement are lexical affinities (LAs), pairs of closely related words which contain exactly one of the original query terms. Adding these terms to the query is equivalent to re-ranking search results, thus, precision is improved while recall is preserved. We describe a novel method that selects the most "informative" LAs for refinement, namely, those LAs that best separate relevant documents from irrelevant documents in the set of results. The information gain of candidate LAs is determined using unsupervised estimation that is based on the scoring function of the search engine. This method is thus fully automatic and its quality depends on the quality of the scoring function. Experiments we conducted with TREC data clearly show a significant improvement in the precision of the top ranked documents.
Copyrights may apply
Baeza-Yates, Ricardo A., Carmel, David, Maarek, Yoelle S. and Soffer, Aya (2002): Preface. In JASIST - Journal of the American Society for Information Science and Technology, 53 (6) pp. 413-414
» 2001 «
Carmel, David, Cohen, Doron, Fagin, Ronald, Farchi, Eitan, Herscovici, Michael, Maarek, Yoelle S. and Soffer, Aya (2001): Static index pruning for information retrieval systems. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2001. pp. 43-50. Available online
We introduce static index pruning methods that significantly reduce the index size in information retrieval systems. We investigate uniform and term-based methods that each remove selected entries from the index and yet have only a minor effect on retrieval results. In uniform pruning, there is a fixed cutoff threshold, and all index entries whose contribution to relevance scores is bounded above by a given threshold are removed from the index. In term-based pruning, the cutoff threshold is determined for each term, and thus may vary from term to term. We give experimental evidence that for each level of compression, term-based pruning outperforms uniform pruning, under various measures of precision. We present theoretical and experimental evidence that under our term-based pruning scheme, it is possible to prune the index greatly and still get retrieval results that are almost as good as those based on the full index.
Copyrights may apply
Aridor, Yariv, Carmel, David, Maarek, Yoelle S., Soffer, Aya and Lempel, Ronny (2001): Knowledge encapsulation for focused search from pervasive devices. In: Proceedings of the 2001 International Conference on the World Wide Web 2001. pp. 754-764. Available online
SHOW THIS LIST ON YOUR HOMEPAGE
What do YOU think?
Give us your opinion! Do you have any comments/additions
that you would like other visitors to see?
You say:
Mar 21st, 2010
Changes to this page (author)
20 Feb 2010: Enabled abstracts to be shown on David Carmel's author page.09 Jul 2009: Author was edited 09 Jul 2009: Author was edited
09 Jul 2009: Author was edited
31 May 2009: Author was edited
31 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
07 Apr 2009: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
23 Jun 2007: Author was edited
28 Apr 2003: Added the author to the bibliography