Weiguo Fan
About the author:
No description available of Weiguo Fan...
Publications by Weiguo Fan (bibliography)
» 2009 «
Liu, Ning, Yan, Jun, Fan, Weiguo, Yang, Qiang and Chen, Zheng (2009): Identifying vertical search intention of query through social tagging propagation. In: Proceedings of the 2009 International Conference on the World Wide Web 2009. pp. 1209-1210. Available online
A pressing task during the unification process is to identify a user's vertical search intention based on the user's query. In this paper, we propose a novel method to propagate social annotation, which includes user-supplied tag data, to both queries and VSEs for semantically bridging them. Our proposed algorithm consists of three key steps: query annotation, vertical annotation and query intention identification. Our algorithm, referred to as TagQV, verifies that the social tagging can be propagated to represent Web objects such as queries and VSEs besides Web pages. Experiments on real Web search queries demonstrate the effectiveness of TagQV in query intention identification.
Copyrights may apply
» 2008 «
Roussinov, Dmitri, Fan, Weiguo and Robles-Flores, Jose Antonio (2008): Beyond keywords: Automated question answering on the web. In Communications of the ACM, 51 (9) pp. 60-65
» 2007 «
Yu, Xiaoyan, Tungare, Manas, Fan, Weiguo, Perez-Quinones, Manuel, Fox, Edward A., Cameron, William, Teng, GuoFang and Cassel, Lillian (2007): Automatic syllabus classification. In: JCDL07: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries 2007. pp. 440-441. Available online
Syllabi are important educational resources. However, searching for a syllabus on the Web using a generic search engine is an error-prone process and often yields too many non-relevant links. In this paper, we present a syllabus classifier to filter noise out from search results. We discuss various steps in the classification process, including class definition, training data preparation, feature selection, and classifier building using SVM and Naive Bayes. Empirical results indicate that the best version of our method achieves a high classification accuracy, i.e., an F value of 83% on average.
Copyrights may apply
Li, Xin, Yan, Jun, Deng, Zhihong, Ji, Lei, Fan, Weiguo, Zhang, Benyu and Chen, Zheng (2007): A novel clustering-based RSS aggregator. In: Proceedings of the 2007 International Conference on the World Wide Web 2007. pp. 1309-1310. Available online
In recent years, different commercial Weblog subscribing systems have been proposed to return stories from users. subscribed feeds. In this paper, we propose a novel clustering-based RSS aggregator called as RSS Clusgator System (RCS) for Weblog reading. Note that an RSS feed may have several different topics. A user may only be interested in a subset of these topics. In addition there could be many different stories from multiple RSS feeds, which discuss similar topic from different perspectives. A user may be interested in this topic but do not know how to collect all feeds related to this topic. In contrast to many previous works, we cluster all stories in RSS feeds into hierarchical structure to better serve the readers. Through this way, users can easily find all their interested stories. To make the system current, we propose a flexible time window for incremental clustering. RCS utilizes both link information and content information for efficient clustering. Experiments show the effectiveness of RCS.
Copyrights may apply
0003, Hui Lin, Fan, Weiguo, Wallace, Linda and Zhang, Zhongju (2007): An Empirical Study of Web-Based Knowledge Community Success. In: HICSS 2007 - 40th Hawaii International International Conference on Systems Science 3-6 January, 2007, Waikoloa, Big Island, HI, USA. p. 178. Available online
» 2006 «
Shen, Rao, Vemuri, Naga Srinivas, Fan, Weiguo, Torres, Ricardo da Silva and Fox, Edward A. (2006): Exploring digital libraries: integrating browsing, searching, and visualization. In: JCDL06: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006. pp. 1-10. Available online
Exploring services for digital libraries (DLs) include two major paradigms, browsing and searching, as well as other services such as clustering and visualization. In this paper, we formalize and generalize DL exploring services within a DL theory. We develop theorems to indicate that browsing and searching can be converted or mapped to each other under certain conditions. The theorems guide the design and implementation of exploring services for an integrated archaeological DL, ETANA-DL. Its integrated browsing and searching can support users in moving seamlessly between these operations, minimizing context switching, and keeping users focused. It also integrates browsing and searching into a single visual interface for DL exploration. A user study to evaluate ETANA-DL's exploring services helped validate our hypotheses.
Copyrights may apply
Vemuri, Naga Srinivas, Shen, Rao, Tupe, Sameer, Fan, Weiguo and Fox, Edward A. (2006): ETANA-ADD: an interactive tool for integrating archaeological DL collections. In: JCDL06: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006. pp. 161-162. Available online
ETANA-DL is an archaeology digital library built based on the principles of Open Digital Libraries. A key challenge addressed in ETANA-DL is integration of new archaeological sites. To enable archaeologists to build OAI data providers for easy integration, we developed an interactive software tool for database-to-XML generation, schema mapping, and global archive generation. This tool greatly enhances our ability to build new Open Archives. We tested the tool with data from the Umm el-Jimal site.
Copyrights may apply
Gorton, Douglas, Shen, Rao, Vemuri, Naga Srinivas, Fan, Weiguo and Fox, Edward A. (2006): ETANA-GIS: GIS for archaeological digital libraries. In: JCDL06: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006. p. 379. Available online
With the growing importance of mapping land, regions, and their related features, Geographic Information Systems (GIS) has become an ever important standard in fields where such detailed study of land features is required. Our archaeology digital library, ETANA-DL (http://etana.dlib.vt.edu), contains thousands of records from eight member excavations. Here, we draw on the Space aspect of the 5S meta-model [1] for digital libraries and demonstrate a methodology used to integrate archaeological GIS data with the wealth of information within ETANA-DL. ETANAGIS connects the digital library's textual records with a spatial representation of their original locations, enhancing users' understanding of the find. Using a dataset of the University of Toronto's Tell Madaba excavation project [2], we developed an interactive, Web-based representation of the original ArcGIS document (accessible from ETANA-DL homepage). For dynamic generation of maps from geospatial data, we use the MapServer [3] project, a mature, project which boasts a rich toolset of features for cartographic related image generation. MapServer can directly utilize ArcGIS layer resources but some translation and additional authoring must occur for proper image generation. Then, using PHP, the MapScript MapServer API, and navigation tools, the map was ported to an interactive, Web-accessible format. Based on a study of alternatives, the technology we chose for our technique seemed to be the best suited for digital library integration and is also completely open source. To explore the presentation of the map, a user employs the navigation tools displayed in the corner of the main view (see Figure 1). In addition, full control of displayed layers, a smaller map showing overall view and context, as well as a dynamic scale bar are available for use. To integrate the Web-based version of the Tell Madaba GIS map with the existing digital library, the layers depicting archaeological divisions are clickable and labeled for easy identification. Any area queried results in a pop-up box with ETANA-DL's records and artifacts for that area. While this integration connects the digital library with the spatial representation of the region, the unique quality of various GIS maps causes certain difficulties. The lack of standard in denoting spatial divisions in GIS is one hindrance to producing a more automated technique. Future work will include more automation, usability evaluation, and integration of additional excavations. We hope integration of the digital library and GIS greatly aids users' understanding of the spatial organization of the included data.
Copyrights may apply
Lacerda, Anisio, Cristo, Marco, Concalves, Marcos Andre, Fan, Weiguo, Ziviani, Nivio and Ribeiro-Neto, Berthier A. (2006): Learning to advertise. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2006. pp. 549-556. Available online
Content-targeted advertising, the task of automatically associating ads to a Web page, constitutes a key Web monetization strategy nowadays. Further, it introduces new challenging technical problems and raises interesting questions. For instance, how to design ranking functions able to satisfy conflicting goals such as selecting advertisements (ads) that are relevant to the users and suitable and profitable to the publishers and advertisers? In this paper we propose a new framework for associating ads with web pages based on Genetic Programming (GP). Our GP method aims at learning functions that select the most appropriate ads, given the contents of a Web page. These ranking functions are designed to optimize overall precision and minimize the number of misplacements. By using a real ad collection and web pages from a newspaper, we obtained a gain over a state-of-the-art baseline method of 61.7% in average precision. Further, by evolving individuals to provide good ranking estimations, GP was able to discover ranking functions that are very effective in placing ads in web pages while avoiding irrelevant ones.
Copyrights may apply
Roussinov, Dmitri and Fan, Weiguo (2006): Learning Ranking vs. Modeling Relevance. In: HICSS 2006 - 39th Hawaii International International Conference on Systems Science 4-7 January, 2006, Kauai, HI, USA. . Available online
Schaupp, L. Christian, Fan, Weiguo and Belanger, France (2006): Determining Success for Different Website Goals. In: HICSS 2006 - 39th Hawaii International International Conference on Systems Science 4-7 January, 2006, Kauai, HI, USA. . Available online
Fox, Edward A., Neves, Fernando A. Das, Yu, Xiaoyan, Shen, Rao, Kim, Seonho and Fan, Weiguo (2006): Exploring the computing literature with visualization and stepping stones & pathways. In Communications of the ACM, 49 (4) pp. 52-58
Belanger, France, Fan, Weiguo, Schaupp, L. Christian, Krishen, Anjala, Everhart, Jeannine, Poteet, David and Nakamoto, Kent (2006): Web site success metrics: addressing the duality of goals. In Communications of the ACM, 49 (12) pp. 114-116
Fan, Weiguo, Wallace, Linda, Rich, Stephanie and Zhang, Zhongju (2006): Tapping the power of text mining. In Communications of the ACM, 49 (9) pp. 76-82
» 2005 «
Raghavan, Ananth, Rangarajan, Divya, Shen, Rao, Goncalves, Marcos Andre, Vemuri, Naga Srinivas, Fan, Weiguo and Fox, Edward A. (2005): Schema mapper: a visualization tool for DL integration. In: JCDL05: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries 2005. p. 414. Available online
Schema mapping is a challenging problem. It has come to the fore in recent years; there are important applications like database schema integration and, more recently, digital library merging of heterogeneous data. Previous studies have approached the schema mapping process either from algorithmic or visualization perspectives, with few integrating both. With Schema Mapper we demonstrate a semi-automatic tool for schema integration that combines a novel visual interface with an algorithm-based recommendation engine. Schemas are visualized as hyperbolic trees (see Fig. 1), thus allowing more schema nodes to be displayed at one time. Matches to selections are recommended to the user, which makes the mapping operation easier and faster.
Copyrights may apply
Yan, Jun, Liu, Ning, Zhang, Benyu, Yan, Shuicheng, Chen, Zheng, Cheng, Qiansheng, Fan, Weiguo and Ma, Wei-Ying (2005): OCFS: optimal orthogonal centroid feature selection for text categorization. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2005. pp. 122-129. Available online
Text categorization is an important research area in many Information Retrieval (IR) applications. To save the storage space and computation time in text categorization, efficient and effective algorithms for reducing the data before analysis are highly desired. Traditional techniques for this purpose can generally be classified into feature extraction and feature selection. Because of efficiency, the latter is more suitable for text data such as web documents. However, many popular feature selection techniques such as Information Gain (IG) and?2-test (CHI) are all greedy in nature and thus may not be optimal according to some criterion. Moreover, the performance of these greedy methods may be deteriorated when the reserved data dimension is extremely low. In this paper, we propose an efficient optimal feature selection algorithm by optimizing the objective function of Orthogonal Centroid (OC) subspace learning algorithm in a discrete solution space, called Orthogonal Centroid Feature Selection (OCFS). Experiments on 20 Newsgroups (20NG), Reuters Corpus Volume 1 (RCV1) and Open Directory Project (ODP) data show that OCFS is consistently better than IG and CHI with smaller computation time especially when the reduced dimension is extremely small.
Copyrights may apply
Zhang, Benyu, Li, Hua, Liu, Yi, Ji, Lei, Xi, Wensi, Fan, Weiguo, Chen, Zheng and Ma, Wei-Ying (2005): Improving web search results using affinity graph. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2005. pp. 504-511. Available online
In this paper, we propose a novel ranking scheme named Affinity Ranking (AR) to re-rank search results by optimizing two metrics: (1) diversity -- which indicates the variance of topics in a group of documents; (2) information richness -- which measures the coverage of a single document to its topic. Both of the two metrics are calculated from a directed link graph named Affinity Graph (AG). AG models the structure of a group of documents based on the asymmetric content similarities between each pair of documents. Experimental results in Yahoo! Directory, ODP Data, and Newsgroup data demonstrate that our proposed ranking algorithm significantly improves the search performance. Specifically, the algorithm achieves 31% improvement in diversity and 12% improvement in information richness relatively within the top 10 search results.
Copyrights may apply
Zhang, Baoping, Chen, Yuxin, Fan, Weiguo, Fox, Edward A., Goncalves, Marcos Andre, Cristo, Marco and Calado, Pavel (2005): Intelligent fusion of structural and citation-based evidence for text classification. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2005. pp. 667-668. Available online
This paper shows how different measures of similarity derived from the citation information and the structural content (e.g., title, abstract) of the collection can be fused to improve classification effectiveness. To discover the best fusion framework, we apply Genetic Programming (GP) techniques. Our experiments with the ACM Computing Classification Scheme, using documents from the ACM Digital Library, indicate that GP can discover similarity functions superior to those based solely on a single type of evidence. Effectiveness of the similarity functions discovered through simple majority voting is better than that of content-based as well as combination-based Support Vector Machine classifiers. Experiments also were conducted to compare the performance between GP techniques and other fusion techniques such as Genetic Algorithms (GA) and linear fusion. Empirical results show that GP was able to discover better similarity functions than other fusion techniques.
Copyrights may apply
Roussinov, Dmitri, Fan, Weiguo and Neves, Fernando A. Das (2005): Semantic verification for fact seeking engines. In: Herzog, Otthein, Schek, Hans-Jörg and Fuhr, Norbert (eds.) Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management October 31 - November 5, 2005, Bremen, Germany. pp. 323-324. Available online
Roussinov, Dmitri, Fan, Weiguo and Neves, Fernando A. Das (2005): Discretization based learning approach to information retrieval. In: Herzog, Otthein, Schek, Hans-Jörg and Fuhr, Norbert (eds.) Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management October 31 - November 5, 2005, Bremen, Germany. pp. 321-322. Available online
Torres, Ricardo da Silva, Falcão, Alexandre X., Zhang, Baoping, Fan, Weiguo, Fox, Edward A., Goncalves, Marcos Andre and Calado, Pavel (2005): A new framework to combine descriptors for content-based image retrieval. In: Herzog, Otthein, Schek, Hans-Jörg and Fuhr, Norbert (eds.) Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management October 31 - November 5, 2005, Bremen, Germany. pp. 335-336. Available online
Zhang, Baoping, Chen, Yuxin, Fan, Weiguo, Fox, Edward A., Goncalves, Marcos Andre, Cristo, Marco and Calado, Pavel (2005): Intelligent GP fusion from multiple sources for text classification. In: Herzog, Otthein, Schek, Hans-Jörg and Fuhr, Norbert (eds.) Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management October 31 - November 5, 2005, Bremen, Germany. pp. 477-484. Available online
Radev, Dragomir R., Fan, Weiguo, Qi, Hong, Wu, Harris and Grewal, Amardeep (2005): Probabilistic question answering on the Web. In JASIST - Journal of the American Society for Information Science and Technology, 56 (6) pp. 571-583
» 2004 «
Ravindranathan, Unni, Shen, Rao, Goncalves, Marcos Andre, Fan, Weiguo, Fox, Edward A. and Flanagan, James W. (2004): ETANA-DL: a digital library for integrated handling of heterogeneous archaeological data. In: JCDL04: Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries 2004. pp. 76-77. Available online
Archaeologists have to deal with vast quantities of information, generated both in the field and laboratory. That information is heterogeneous in nature, and different projects have their own systems to store and use it. This adds to the challenges regarding collaborative research between such projects as well as information retrieval for other more general purposes. This paper describes our approach towards creating ETANA-DL, a digital library (DL) to help manage these vast quantities of information and to provide various kinds of services. The 5S framework for modeling a DL gives us an edge in understanding this vast and complex information space, as well as in designing and prototyping a DL to satisfy information needs of archaeologists and other user communities.
Copyrights may apply
Fan, Weiguo, Luo, Ming, Wang, Li, Xi, Wensi and Fox, Edward A. (2004): Tuning before feedback: combining ranking discovery and blind feedback for robust retrieval. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2004. pp. 138-145. Available online
Both ranking functions and user queries are very important factors affecting a search engine's performance. Prior research has looked at how to improve ad-hoc retrieval performance for existing queries while tuning the ranking function, or modify and expand user queries using a fixed ranking scheme using blind feedback. However, almost no research has looked at how to combine ranking function tuning and blind feedback together to improve ad-hoc retrieval performance. In this paper, we look at the performance improvement for ad-hoc retrieval from a more integrated point of view by combining the merits of both techniques. In particular, we argue that the ranking function should be tuned first, using user-provided queries, before applying the blind feedback technique. The intuition is that highly-tuned ranking offers more high quality documents at the top of the hit list, thus offers a stronger baseline for blind feedback. We verify this integrated model in a large scale heterogeneous collection and the experimental results show that combining ranking function tuning and blind feedback can improve search performance by almost 30% over the baseline Okapi system.
Copyrights may apply
Zhang, Baoping, Goncalves, Marcos Andre, Fan, Weiguo, Chen, Yuxin, Fox, Edward A., Calado, Pavel and Cristo, Marco (2004): Combining structural and citation-based evidence for text classification. In: Grossman, David A., Gravano, Luis, Zhai, Chengxiang, Herzog, Otthein and Evans, David A. (eds.) Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management November 8-13, 2004, Washington, DC, USA. pp. 162-163. Available online
Xue, Gui-Rong, Zeng, Hua-Jun, Chen, Zheng, Yu, Yong, Ma, Wei-Ying, Xi, Wensi and Fan, Weiguo (2004): Optimizing web search using web click-through data. In: Grossman, David A., Gravano, Luis, Zhai, Chengxiang, Herzog, Otthein and Evans, David A. (eds.) Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management November 8-13, 2004, Washington, DC, USA. pp. 118-126. Available online
Fan, Weiguo, Fox, Edward A., Pathak, Praveen and Wu, Harris (2004): The effects of fitness functions on genetic programming-based ranking discovery forWeb search. In JASIST - Journal of the American Society for Information Science and Technology, 55 (7) pp. 628-636
Fan, Weiguo, Gordon, Michael D., Pathak, Praveen, Xi, Wensi and Fox, Edward A. (2004): Ranking Function Optimization for Effective Web Search by Genetic Programming: An Empirical Study. In: HICSS 2004 2004. . Available online
» 2002 «
Radev, Dragomir R., Libner, Kelsey and Fan, Weiguo (2002): Getting answers to natural language questions on the Web. In JASIST - Journal of the American Society for Information Science and Technology, 53 (5) pp. 359-364
Radev, Dragomir, Fan, Weiguo, Qi, Hong, Wu, Harris and Grewal, Amardeep (2002): Probabilistic question answering on the web. In: Proceedings of the 2002 International Conference on the World Wide Web 2002. pp. 408-419. Available online
Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this paper we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR) using proximity and question type features achieves a total reciprocal document rank of .20 on the TREC 8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.
Copyrights may apply
Gordon, Michael D., Lindsay, Robert K. and Fan, Weiguo (2002): Literature-based discovery on the World Wide Web. In ACM Trans. Internet Techn., 2 (4) pp. 261-275
» 2001 «
Radev, Dragomir R., Qi, Hong, Zheng, Zhiping, Blair-Goldensohn, Sasha, Zhang, Zhu, Fan, Weiguo and Prager, John M. (2001): Mining the Web for Answers to Natural Language Questions. In: Proceedings of the 2001 ACM CIKM International Conference on Information and Knowledge Management November 5-10, 2001, Atlanta, Georgia, USA. pp. 143-150. Available online
» 2000 «
Pathak, Praveen, Gordon, Michael D. and Fan, Weiguo (2000): Effective Information Retrieval using Genetic Algorithms based Matching Functions Adaptation. In: HICSS 2000 2000. . Available online
SHOW THIS LIST ON YOUR HOMEPAGE
What do YOU think?
Give us your opinion! Do you have any comments/additions
that you would like other visitors to see?
You say:
Mar 21st, 2010
Changes to this page (author)
12 Feb 2010: Enabled abstracts to be shown on Weiguo Fan's author page.18 Aug 2009: Author was edited 18 Aug 2009: Author was edited
18 Aug 2009: Author was edited
18 Aug 2009: Author was edited
17 Aug 2009: Author was edited
09 Jul 2009: Author was edited
09 Jul 2009: Author was edited
13 Jun 2009: Author was edited
13 Jun 2009: Author was edited
13 Jun 2009: Author was edited
12 Jun 2009: Author was edited
12 Jun 2009: Author was edited
01 Jun 2009: Author was edited
01 Jun 2009: Author was edited
01 Jun 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
29 May 2009: Author was edited
25 Jul 2007: Author was edited
24 Jul 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was edited
24 Jun 2007: Author was added to the bibliography