Publication statistics

Pub. period:1995-2006
Pub. count:5
Number of co-authors:7



Co-authors

Number of publications with 3 favourite co-authors:

Marc Najork:3
Dennis Fetterly:3
Hania Gajewska:2

 

 

Productive colleagues

Mark Manasse's 3 most productive colleagues in number of publications:

Marc Najork:21
Alexandros Ntoulas:9
Dennis Fetterly:8
 
 
 

Upcoming Courses

go to course
Emotional Design: How to make products people will love
Starts TODAY LAST CALL!
go to course
UI Design Patterns for Successful Software
87% booked. Starts in 8 days
 
 

Featured chapter

Marc Hassenzahl explains the fascinating concept of User Experience and Experience Design. Commentaries by Don Norman, Eric Reiss, Mark Blythe, and Whitney Hess

User Experience and Experience Design !

 
 

Our Latest Books

 
 
The Social Design of Technical Systems: Building technologies for communities. 2nd Edition
by Brian Whitworth and Adnan Ahmad
start reading
 
 
 
 
Gamification at Work: Designing Engaging Business Software
by Janaki Mythily Kumar and Mario Herger
start reading
 
 
 
 
The Social Design of Technical Systems: Building technologies for communities
by Brian Whitworth and Adnan Ahmad
start reading
 
 
 
 
The Encyclopedia of Human-Computer Interaction, 2nd Ed.
by Mads Soegaard and Rikke Friis Dam
start reading
 
 

Mark Manasse

 

Publications by Mark Manasse (bibliography)

 what's this?
2006
 
Edit | Del

Ntoulas, Alexandros, Najork, Marc, Manasse, Mark and Fetterly, Dennis (2006): Detecting spam web pages through content analysis. In: Proceedings of the 2006 International Conference on the World Wide Web 2006. pp. 83-92. Available online

In this paper, we continue our investigations of "web spam": the injection of artificially-created pages into the web in order to influence the results from search engines, to drive traffic to certain pages for fun or profit. This paper considers some previously-undescribed techniques for automatically detecting spam pages, examines the effectiveness of these techniques in isolation and when aggregated using classification algorithms. When combined,

© All rights reserved Ntoulas et al. and/or ACM Press

2005
 
Edit | Del

Fetterly, Dennis, Manasse, Mark and Najork, Marc (2005): Detecting phrase-level duplication on the world wide web. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2005. pp. 170-177. Available online

Two years ago, we conducted a study on the evolution of web pages over time. In the course of that study, we discovered a large number of machine-generated "spam" web pages emanating from a handful of web servers in Germany. These spam web pages were dynamically assembled by stitching together grammatically well-formed German sentences drawn from a large collection of sentences. This discovery motivated us to develop techniques for finding other instances of such "slice and dice" generation of web pages, where pages are automatically generated by stitching together phrases drawn from a limited corpus. We applied these techniques to two data sets, a set of 151 million web pages collected in December 2002 and a set of 96 million web pages collected in June 2004. We found a number of other instances of large-scale phrase-level replication within the two data sets. This paper describes the algorithms we used to discover this type of replication, and highlights the results of our data mining.

© All rights reserved Fetterly et al. and/or ACM Press

2003
 
Edit | Del

Fetterly, Dennis, Manasse, Mark, Najork, Marc and Wiener, Janet (2003): A large-scale study of the evolution of web pages. In: Proceedings of the 2003 International Conference on the World Wide Web 2003. pp. 669-678. Available online

How fast does the web change? Does most of the content remain unchanged once it has been authored, or are the documents continuously updated? Do pages change a little or a lot? Is the extent of change correlated to any other property of the page? All of these questions are of interest to those who mine the web, including all the popular search engines, but few studies have been performed to date to answer them. One notable exception is a study by Cho and Garcia-Molina, who crawled a set of 720,000 pages on a daily basis over four months, and counted pages as having changed if their MD5 checksum changed. They found that 40% of all web pages in their set changed within a week, and 23% of those pages that fell into the .com domain changed daily. This paper expands on Cho and Garcia-Molina's study, both in terms of coverage and in terms of sensitivity to change. We crawled a set of 150,836,209 HTML pages once every week, over a span of 11 weeks. For each page, we recorded a checksum of the page, and a feature vector of the words on the page, plus various other data such as the page length, the HTTP status code, etc. Moreover, we pseudo-randomly selected 0.1% of all of our URLs, and saved the full text of each download of the corresponding pages. After completion of the crawl, we analyzed the degree of change of each page, and investigated which factors are correlated with change intensity. We found that the average degree of change varies widely across top-level domains, and that larger pages change more often and more severely than smaller ones. This paper describes the crawl and the data transformations we performed on the logs, and presents some statistical observations on the degree of change of different classes of pages.

© All rights reserved Fetterly et al. and/or ACM Press

1995
 
Edit | Del

Berc, Lance, Gajewska, Hania and Manasse, Mark (1995): Pssst: Side Conversations in the Argo Telecollaboration System. In: Robertson, George G. (ed.) Proceedings of the 8th annual ACM symposium on User interface and software technology November 15 - 17, 1995, Pittsburgh, Pennsylvania, United States. pp. 155-156. Available online

We describe side conversations, a new facility we have added to the Argo telecollaboration system. Side conversations allow subgroups of teleconference participants to whisper to each other. The other participants can see who is whispering to whom, but cannot hear what is being said.

© All rights reserved Berc et al. and/or ACM Press

 
Edit | Del

Gajewska, Hania, Manasse, Mark and Redell, Dave (1995): Argohalls: Adding Support for Group Awareness to the Argo Telecollaboration System. In: Robertson, George G. (ed.) Proceedings of the 8th annual ACM symposium on User interface and software technology November 15 - 17, 1995, Pittsburgh, Pennsylvania, United States. pp. 157-158. Available online

Members of geographically distributed work groups often complain of a feeling of isolation and of not knowing "who is around". Argohalls attempt to solve this problem by integrating video icons, clustered into groups representing physical hallways, into the Argo telecollaboration system. Argo users can "hang out" in hallways in order to keep track of the co-workers on their projects, and they can roam other hallways to "run into" whoever happens to be there.

© All rights reserved Gajewska et al. and/or ACM Press

 
Add publication
Show list on your website
 
 

Join our community and advance:

Your
Skills

Your
Network

Your
Career

 
Join our community!
 
 
 

Page Information

Page maintainer: The Editorial Team
URL: http://www.interaction-design.org/references/authors/mark_manasse.html