Yan Zhang


Publications by Yan Zhang (bibliography)

Yan, Rui, Huang, Congrui, Tang, Jie, Zhang, Yan and Li, Xiaoming (2012): To better stand on the shoulder of giants. In: JCDL12 Proceedings of the 2012 Joint International Conference on Digital Libraries 2012. pp. 51-60. Available online

Usually scientists breed research ideas inspired by previous publications, but they are unlikely to follow all publications in the unbounded literature collection. The volume of literature keeps on expanding extremely fast, whilst not all papers contribute equal impact to the academic society. Being aware of potentially influential literature would put one in an advanced position in choosing important research references. Hence, estimation of potential influence is of great significance. We study a challenging problem of identifying potentially influential literature. We examine a set of hypotheses on what are the fundamental characteristics for highly cited papers and find some interesting patterns. Based on these observations, we learn to identify potentially influential literature via Future Influence Prediction (FIP), which aims to estimate the future influence of literature. The system takes a series of features of a particular publication as input and produces as output the estimated citation counts of that article after a given time period. We consider several regression models to formulate the learning process and evaluate their performance based on the coefficient of determination (R2). Experimental results on a real-large data set show a mean average predictive performance of 83.6% measured in R^2. We apply the learned model to the application of bibliography recommendation and obtain prominent performance improvement in terms of Mean Average Precision (MAP).

Yan, Rui, Kong, Liang, Li, Yu, Zhang, Yan and Li, Xiaoming (2011): A finegrained digestion of news webpages through Event Snippet Extraction. In: Proceedings of the 2011 International Conference on the World Wide Web 2011. pp. 157-158. Available online

We describe a framework to digest news webpages in finer granularity: to extract event snippets from contexts. "Events" are atomic text snippets and a news article is constituted by more than one event snippet. Event Snippet Extraction (ESE) aims to mine these snippets out. The problem is important because its solutions may be applied to many information mining and retrieval tasks. The challenge is to exploit rich features to detect snippet boundaries, including various semantic, syntactic and visual features. We run experiments to present the effectiveness of our approaches.

Yan, Rui, Wan, Xiaojun, Otterbacher, Jahna, Kong, Liang, Li, Xiaoming and Zhang, Yan (2011): Evolutionary timeline summarization: a balanced optimization framework via iterative substitution. In: Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2011. pp. 745-754. Available online

Classic news summarization plays an important role with the exponential document growth on the Web. Many approaches are proposed to generate summaries but seldom simultaneously consider evolutionary characteristics of news plus to traditional summary elements. Therefore, we present a novel framework for the web mining problem named Evolutionary Timeline Summarization (ETS). Given the massive collection of time-stamped web documents related to a general news query, ETS aims to return the evolution trajectory along the timeline, consisting of individual but correlated summaries of each date, emphasizing relevance, coverage, coherence and cross-date diversity. ETS greatly facilitates fast news browsing and knowledge comprehension and hence is a necessity. We formally formulate the task as an optimization problem via iterative substitution from a set of sentences to a subset of sentences that satisfies the above requirements, balancing coherence/diversity measurement and local/global summary quality. The optimized substitution is iteratively conducted by incorporating several constraints until convergence. We develop experimental systems to evaluate on 6 instinctively different datasets which amount to 10251 documents. Performance comparisons between different system-generated timelines and manually created ones by human editors demonstrate the effectiveness of our proposed framework in terms of ROUGE metrics.

Zhang, Yan (2011): Exploring a web space for consumer health information: implications for design. In: Proceedings of the 2011 iConference 2011. pp. 811-812. Available online

Knowledge about people's natural or preferred ways of exploring a system could reveal cognitive paths through which users learn a system. This knowledge can also inform the design of interfaces that facilitate users' learning of the system and the design of training materials that better accommodate users' preferences. In the study, we investigate the behaviors of first-time users' behavior of exploring a web space for consumer health information, MedlinePlus, and discuss implications for designing consumer health information retrieval systems.

Zhang, Yan (2011): Effects of tasks on users' perceptions of the content of a web-based IR system. In: Proceedings of the 2011 iConference 2011. pp. 813-815. Available online

Finding relevant information is a major goal that motivates people to seek information using an IR system. Therefore, it is important to understand how people perceive the content of a system while interacting with it to solve specific problems. This article presents a preliminary study on users' perceptions of the content of a web-based IR system and the effects of tasks on their perceptions.

Zhang, Yan (2008): Undergraduate students' mental models of the Web as an information retrieval system. In JASIST - Journal of the American Society for Information Science and Technology, 59 (13) pp. 2087-2098. Available online

Jiang, Qiancheng, Zhang, Lei, Zhu, Yizhen and Zhang, Yan (2008): Larger is better: seed selection in link-based anti-spamming algorithms. In: Proceedings of the 2008 International Conference on the World Wide Web 2008. pp. 1065-1066. Available online

Seed selection is of significant importance for the biased PageRank algorithms such as TrustRank to combat link spamming. Previous work usually uses a small seed set, which has a big problem that the top ranking results have a strong bias towards seeds. In this paper, we analyze the relationship between the result bias and the number of seeds. Furthermore, we experimentally show that an automatically selected large seed set can work better than a carefully selected small seed set.

Zhang, Yan, Sun, Zhengxing and Li, Wenhui (2008): Texture synthesis based on Direction Empirical Mode Decomposition. In Computers & Graphics, 32 (2) pp. 175-186. Available online

Capra, Robert, Marchionini, Gary, Oh, Jung Sun, Stutzman, Fred and Zhang, Yan (2007): Effects of structure and interaction style on distinct search tasks. In: JCDL07: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries 2007. pp. 442-451. Available online

In this paper we present the results of a study that investigates the relationships between search tasks, information architecture, and interaction style. Three kinds of search tasks (simple lookup, complex lookup and exploratory) were performed using three different user interfaces (standard web site, hierarchical text-based faceted interface, and dynamic query faceted interface) for a large-scale public corpus containing semi-structured statistical data and reports. Twenty-eight people conducted the three kinds of searches in a between-subjects study and twelve others conducted the three kinds of searches on all three systems in a within-subjects study. Quantitative results demonstrate that the alternative general-purpose user interfaces that accept automated structuring of data offer comparable effectiveness, efficiency, and aesthetics to manually constructed architectures. Qualitative results demonstrate the manual architectures are favored.

Zhang, Yan, Jia, Yan, Huang, Xiaobin, Zhou, Bin and Gu, Jian (2007): A Scalable Method for Efficient Grid Resource Discovery. In: Luo, Yuhua (ed.) Cooperative Design, Visualization, and Engineering, 4th International Conference - CDVE 2007 Shanghai, China, 2007, September 16-20. pp. 97-103. Available online

Zhang, Yan, Qu, Wei and Liu, Anna (2006): Adaptive Self-Configuration Architecture for J2EE-Based Middleware Systems. In: HICSS 2006 - 39th Hawaii International International Conference on Systems Science 4-7 January, 2006, Kauai, HI, USA. . Available online

Zhao, Xinyou and Zhang, Yan (2006): An Instructor-Oriented Prototype System for Virtual Classroom. In: ICALT 2006 - Proceedings of the 6th IEEE International Conference on Advanced Learning Technologies 5-7 July, 2006, Kerkrade, The Netherlands. pp. 200-204. Available online

Zhang, Yan, Goonetilleke, Ravindra S., Plocher, Thomas and Liang, Sheau-Farn Max (2005): Time-related behaviour in multitasking situations. In International Journal of Human-Computer Studies, 62 (4) pp. 425-455. Available online

Researchers have classified differing time-related behaviours as Monochronicity (M) and Polychronicity (P). The objective of this study was to evaluate control strategy and control performance differences between M and P persons in a process control domain. Forty-two people participated in an experimental study. Time-related behaviour was evaluated using the Modified Polychronic Attitude Index 3 (M/P score) scale. Each participant was asked to monitor and control two processes at the same time using the Control Station software. A 2 (control system order)*5 (trials) factorial experiment was used. Performance was quantified using overall mean error and overall Root-Mean-Square (RMS) error. Control strategy was quantified using the number of switches between the two processes and the number of magnitude changes within each of the processes. Correlation and regression analyses showed that the M/P score was significantly correlated with the strategy variables and performance variables. When the participants were split into the three groups, M (M/P score greater than or equal to 1 and less than or equal to 3), neutral (M/P score between 3 and 5) and P (M/P score greater than or equal to 5 and less than or equal to 7), there were significant differences in the performance and strategy measures among the three groups. The strategy variables indicated that monochrons attempted to control the two processes serially, while polychrons controlled both processes somewhat simultaneously. The neutral group was in-between the M and P groups. The results also showed that the overall mean error and overall RMS error of polychrons were significantly smaller than that of the monochrons. Furthermore, there was no significant difference in the NASA-Task Load Index score between monochrons and polychrons, even though there were significant correlations between the M/P score and some of the scale dimensions' weightings. The results of this study can have important implications for the training and selection of personnel in multitask situations, such as industrial process control.

Zhang, Yan and Sun, Zhanli (2005): A general introduction to the research and legislation of Chinese electronic commerce law. In: Li, Qi and Liang, Ting-Peng (eds.) Proceedings of the 7th International Conference on Electronic Commerce - ICEC 2005 August 15-17, 2005, Xian, China. pp. 864-870. Available online

Chen, Jianwen and Zhang, Yan (2004): An extended logic programming based multi-agent system formalization in mobile environments. In: Grossman, David A., Gravano, Luis, Zhai, Chengxiang, Herzog, Otthein and Evans, David A. (eds.) Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management November 8-13, 2004, Washington, DC, USA. pp. 166-167. Available online

