Number of co-authors:28
Number of publications with 3 favourite co-authors:Daniel S. Weld:7Ana-Maria Popescu:3Stephen Soderland:2
Oren Etzioni's 3 most productive colleagues in number of publications:Daniel S. Weld:28Jeff Huang:15Tessa A. Lau:7
User error: replace user and press any key to continue.
-- Popular computer one-liner
Read the fascinating history of Wearable Computing, told by its father, Steve Mann
Read Steve's chapter !
Publications by Oren Etzioni (bibliography)
Huang, Jeff, Etzioni, Oren, Zettlemoyer, Luke, Clark, Kevin and Lee, Christian (2012): RevMiner: an extractive interface for navigating reviews on a smartphone. In: Proceedings of the 2012 ACM Symposium on User Interface Software and Technology 2012. pp. 3-12.
Smartphones are convenient, but their small screens make searching, clicking, and reading awkward. Thus, perusing product reviews on a smartphone is difficult. In response, we introduce RevMiner -- a novel smartphone interface that utilizes Natural Language Processing techniques to analyze and navigate reviews. RevMiner was run over 300K Yelp restaurant reviews extracting attribute-value pairs, where attributes represent restaurant attributes such as sushi and service, and values represent opinions about the attributes such as fresh or fast. These pairs were aggregated and used to: 1) answer queries such as "cheap Indian food", 2) concisely present information about each restaurant, and 3) identify similar restaurants. Our user studies demonstrate that on a smartphone, participants preferred RevMiner's interface to tag clouds and color bars, and that they preferred RevMiner's results to Yelp's, particularly for conjunctive queries (e.g., "great food and huge portions"). Demonstrations of RevMiner are available at revminer.com.
© All rights reserved Huang et al. and/or ACM Press
Etzioni, Oren, Banko, Michele, Soderland, Stephen and Weld, Daniel S. (2008): Open information extraction from the web. In Communications of the ACM, 51 (12) pp. 68-74.
Banko, Michele and Etzioni, Oren (2007): Strategies for lifelong knowledge extraction from the web. In: Sleeman, Derek H. and Barker, Ken (eds.) K-CAP 2007 - Proceedings of the 4th International Conference on Knowledge Capture October 28-31, 2007, Whistler, BC, Canada. pp. 95-102.
Etzioni, Oren (2007): Machine reading of web text. In: Sleeman, Derek H. and Barker, Ken (eds.) K-CAP 2007 - Proceedings of the 4th International Conference on Knowledge Capture October 28-31, 2007, Whistler, BC, Canada. pp. 1-4.
Cafarella, Michael J. and Etzioni, Oren (2005): A search engine for natural language applications. In: Proceedings of the 2005 International Conference on the World Wide Web 2005. pp. 442-452.
Many modern natural language-processing applications utilize search engines to locate large numbers of Web documents or to compute statistics over the Web corpus. Yet Web search engines are designed and optimized for simple human queries -- they are not well suited to support such applications. As a result, these applications are forced to issue millions of successive queries resulting in unnecessary search engine load and in slow applications with limited scalability. In response, this paper introduces the Bindings Engine (BE), which supports queries containing typed variables and string-processing functions. For example, in response to the query "powerful " BE will return all the nouns in its index that immediately follow the word "powerful", sorted by frequency. In response to the query "Cities such as ProperNoun(Head())", BE will return a list of proper nouns likely to be city names. BE's novel neighborhood index enables it to do so with O(k) random disk seeks and O(k) serial disk reads, where k is the number of non-variable terms in its query. As a result, BE can yield several orders of magnitude speedup for large-scale language-processing applications. The main cost is a modest increase in space to store the index. We report on experiments validating these claims, and analyze how BE's space-time tradeoff scales with the size of its index and the number of variable types. Finally, we describe how a BE-based application extracts thousands of facts from the Web at interactive speeds in response to simple user queries.
© All rights reserved Cafarella and Etzioni and/or ACM Press
Etzioni, Oren, Cafarella, Michael, Downey, Doug, Kok, Stanley, Popescu, Ana-Maria, Shaked, Tal, Soderland, Stephen, Weld, Daniel S. and Yates, Alexander (2004): Web-scale information extraction in knowitall: (preliminary results). In: Proceedings of the 2004 International Conference on the World Wide Web 2004. pp. 100-110.
Manually querying search engines in order to accumulate a large body of factual information is a tedious, error-prone process of piecemeal search. Search engines retrieve and rank potentially relevant documents for human perusal, but do not extract facts, assess confidence, or fuse information from multiple documents. This paper introduces KnowItAll, a system that aims to automate the tedious process of extracting large collections of facts from the web in an autonomous,domain-independent, and scalable manner. The paper describes preliminary experiments in which an instance of KnowItAll, running for four days on a single machine, was able to automatically extract 54,753 facts. KnowItAll associates a probability with each fact enabling it to trade off precision and recall. The paper analyzes KnowItAll's architecture and reports on lessons learned for the design of large-scale information extraction systems.
© All rights reserved Etzioni et al. and/or ACM Press
McDowell, Luke, Etzioni, Oren, Halevy, Alon and Levy, Henry (2004): Semantic email. In: Proceedings of the 2004 International Conference on the World Wide Web 2004. pp. 244-254.
This paper investigates how the vision of the Semantic Web can be carried over to the realm of email. We introduce a general notion of semantic email, in which an email message consists of an RDF query or update coupled with corresponding explanatory text. Semantic email opens the door to a wide range of automated, email-mediated applications with formally guaranteed properties. In particular, this paper introduces a broad class of semantic email processes. For example consider the process of sending an email to a program committee asking who will attend the PC dinner automatically collecting the responses and tallying them up. We define both logical and decision-theoretic models where an email process is modeled as a set of updates to a data set on which we specify goals via certain constraints or utilities. We then describe a set of inference problems that arise while trying to satisfy these goals and analyze their computational tractability. In particular we show that for the logical model it is possible to automatically infer which email responses are acceptable w.r.t. a set of constraints in polynomial time and for the decision-theoretic model it is possible to compute the optimal message-handling policy in polynomial time. Finally we discuss our publicly available implementation of semantic email and outline research challenges in this realm.
© All rights reserved McDowell et al. and/or ACM Press
Popescu, Ana-Maria, Etzioni, Oren and Kautz, Henry (2003): Towards a theory of natural language interfaces to databases. In: Johnson, Lewis and Andre, Elisabeth (eds.) International Conference on Intelligent User Interfaces 2003 January 12-15, 2003, Miami, Florida, USA. pp. 149-157.
The need for Natural Language Interfaces to databases (NLIs) has become increasingly acute as more and more people access information through their web browsers, PDAs, and cell phones. Yet NLIs are only usable if they map natural language questions to SQL queries correctly. As Shneiderman and Norman have argued, people are unwilling to trade reliable and predictable user interfaces for intelligent but unreliable ones. In this paper, we introduce a theoretical framework for reliable NLIs, which is the foundation for the fully implemented Precise NLI. We prove that, for a broad class of semantically tractable natural language questions, Precise is guaranteed to map each question to the corresponding SQL query. We report on experiments testing Precise on several hundred questions drawn from user studies over three benchmark databases. We find that over 80% of the questions are semantically tractable questions, which Precise answers correctly. Precise automatically recognizes the 20% of questions that it cannot handle, and requests a paraphrase. Finally, we show that Precise compares favorably with Mooney's learning NLI and with Microsoft's English Query product.
© All rights reserved Popescu et al. and/or ACM Press
Yates, Alexander, Etzioni, Oren and Weld, Daniel S. (2003): A reliable natural language interface to household appliances. In: Johnson, Lewis and Andre, Elisabeth (eds.) International Conference on Intelligent User Interfaces 2003 January 12-15, 2003, Miami, Florida, USA. pp. 189-196.
As household appliances grow in complexity and sophistication, they become harder and harder to use, particularly because of their tiny display screens and limited keyboards. This paper describes a strategy for building natural language interfaces to appliances that circumvents these problems. Our approach leverages decades of research on planning and natural language interfaces to databases by reducing the appliance problem to the database problem; the reduction provably maintains desirable properties of the database interface. The paper goes on to describe the implementation and evaluation of the EXACT interface to appliances, which is based on this reduction. EXACT maps each English user request to an SQL query, which is transformed to create a PDDL goal, and uses the Blackbox planner  to map the planning problem to a sequence of appliance commands that satisfy the original request. Both theoretical arguments and experimental evaluation show that EXACT is highly reliable.
© All rights reserved Yates et al. and/or ACM Press
Popescu, Ana-Maria, Etzioni, Oren and Kautz, Henry (2003): Towards a theory of natural language interfaces to databases. In: Johnson, Lewis and Andre, Elisabeth (eds.) International Conference on Intelligent User Interfaces 2003 January 12-15, 2003, Miami, Florida, USA. p. 327.
The need for Natural Language Interfaces (NLIs) to databases has become increasingly acute as more nontechnical people access information through their web browsers, PDAs and cell phones. Yet NLIs are only usable if they map natural language questions to SQL queries correctly. We introduce the Precise NLI , which reduces the semantic interpretation challenge in NLIs to a graph matching problem. Precise uses the max-flow algorithm to efficiently solve this problem. Each max-flow solution corresponds to a possible semantic interpretation of the sentence. precise collects max-flow solutions, discards the solutions that do not obey syntactic constraints and retains the rest as the basis for generating SQL queries corresponding to the question q. The syntactic information is extracted from the parse tree corresponding to the given question which is computed by a statistical parser . For a broad, well-defined class of semantically tractable natural language questions, Precise is guaranteed to map each question to the corresponding SQL query. Semantically tractable questions correspond to a natural, domain-independent subset of English that can be efficiently and accurately interpreted as nonrecursive Datalog clauses. Precise is transportable to arbitrary databases, such as the Restaurants, Jobs and Geography databases used in our implementation. Examples of semantically tractable questions include: "What Chinese restaurants with a 3.5 rating are in Seattle?", "What are the areas of US states with large populations?", "What jobs require 4 years of experience and desire a B.S.CS degree?".Given a question which is not semantically tractable, Precise recognizes it as such and informs the user that it cannot answer it. Given a semantically tractable question, Precise computes the set of non-equivalent SQL interpretations corresponding to the question. If a unique such SQL interpretation exists, Precise outputs it together with the corresponding result set obtained by querying the current database. If the set contains more than one SQL interpretation, the natural language question is ambiguous in the context of the current database. In this case, Precise asks for the user's help in determining which interpretation is the correct one. Our experiments have shown that Precise has high coverage and accuracy over common English questions. In future work, we plan to explore increasingly broad classes of questions and include Precise as a module in a full-fledged dialog system. An important direction for future work is helping users understand the types of questions Precise cannot handle via dialog, enabling them to build an accurate mental model of the system and its capabilities. Also, our own group's work on the EXACT natural language interface  builds on Precise and on the underlying theoretical framework. EXACT composes an extended version of Precise with a sound and complete planner to develop a powerful and provably reliable interface to household appliances.
© All rights reserved Popescu et al. and/or ACM Press
Kwok, Cody, Etzioni, Oren and Weld, Daniel S. (2001): Scaling question answering to the web. In ACM Transactions on Information Systems, 19 (3) pp. 242-262.
The wealth of information on the web makes it an attractive resource for seeking quick answers to simple, factual questions such as "who was the first American in space?" or "what is the second tallest mountain in the world?" Yet today's most advanced web search services (e.g., Google and AskJeeves) make it surprisingly tedious to locate answers to such questions. In this paper, we extend question-answering techniques, first studied in the information retrieval literature, to the web and experimentally evaluate their performance. First we introduce Mulder, which we believe to be the first general-purpose, fully-automated question-answering system available on the web. Second, we describe Mulder's architecture, which relies on multiple search-engine queries, natural-language parsing, and a novel voting procedure to yield reliable answers coupled with high recall. Finally, we compare Mulder's performance to that of Google and AskJeeves on questions drawn from the TREC-8 question answering track. We find that Mulder's recall is more than a factor of three higher than that of AskJeeves. In addition, we find that Google requires 6.6 times as much user effort to achieve the same level of recall as Mulder.
© All rights reserved Kwok et al. and/or ACM Press
Levy, David, Arms, William, Etzioni, Oren, Nester, Diane and Tillett, Barbara (2001): High Tech or High Touch: Automation and Human Mediation in Libraries. In: JCDL01: Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries 2001. p. 345.
There are those who now think that traditional library services, such as cataloging and reference, will no longer be needed in the future, or at least will be fully automated. Others are equally adamant that human intervention is not only important but essential. Underlying such positions are a host of assumptions - about the continued existence and place of paper, the role of human intelligence and interpretation, the nature of research, and the significance of the human element. This panel brings together experts in libraries and digital technology to uncover such issues and assumptions and to discuss and debate the place of people and machines in cataloging and reference work.
© All rights reserved Levy et al. and/or ACM Press
Kwok, Cody C. T., Etzioni, Oren and Weld, Daniel S. (2001): Scaling question answering to the Web. In: Proceedings of the 2001 International Conference on the World Wide Web 2001. pp. 150-161.
Perkowitz, Mike and Etzioni, Oren (2000): Adaptive Web sites. In Communications of the ACM, 43 (8) pp. 152-158.
Etzioni, Amitai and Etzioni, Oren (1999): Face-to-Face and Computer-Mediated Communities, A Comparative Analysis. In The Information Society, 15 (4) .
Lau, Tessa A., Etzioni, Oren and Weld, Daniel S. (1999): Privacy Interfaces for Information Management. In Communications of the ACM, 42 (10) pp. 88-94.
Zamir, Oren and Etzioni, Oren (1998): Web Document Clustering: A Feasibility Demonstration. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1998. pp. 46-54.
Users of Web search engines are often forced to sift through the long ordered list of document returned by the engines. The IR community has explored document clustering as an alternative method of organizing retrieval results, but clustering has yet to be deployed on the major search engines. The paper articulates the unique requirements of Web document clustering and reports on the first evaluation of clustering methods in this domain. A key requirement is that the methods create their clusters based on the short snippets returned by Web search engines. Surprisingly, we find that clusters based on snippets are almost as good as clusters created using the full text of Web documents. To satisfy the stringent requirements of the Web domain, we introduce an incremental, linear time (in the document collection size) algorithm called Suffix Tree Clustering (STC), which creates clusters based on phrases shared between documents. We show that STC is faster than standard clustering methods in this domain, and argue that Web document clustering via STC is both feasible and potentially beneficial.
© All rights reserved Zamir and Etzioni and/or ACM Press
Etzioni, Oren (1996): The World-Wide Web: Quagmire or Gold Mine?. In Communications of the ACM, 39 (11) pp. 65-68.
Etzioni, Oren and Weld, Daniel S. (1994): A Softbot-Based Interface to the Internet. In Communications of the ACM, 37 (7) pp. 72-76.
Show this list on your homepage
Join the technology elite and advance:
Changes to this page (author)23 Nov 2012: Added27 Feb 2010: Modified
18 Aug 2009: Added
18 Aug 2009: Added
17 Aug 2009: Added
17 Aug 2009: Added
17 Aug 2009: Added
09 Jul 2009: Added
09 Jul 2009: Added
09 Jul 2009: Added
09 Jul 2009: Added
03 Jun 2009: Added
03 Jun 2009: Added
01 Jun 2009: Added
25 Jun 2007: Added
24 Jun 2007: Added
28 Apr 2003: Added
Page maintainer: The Editorial Team