Publication statistics

Pub. period:1987-2012
Pub. count:22
Number of co-authors:38


Number of publications with 3 favourite co-authors:

Alison Babeu:5
David Bamman:4
David A. Smith:3



Productive colleagues

Gregory Crane's 3 most productive colleagues in number of publications:

Edward A. Fox:109
Gary Marchionini:74
Robert J. K. Jacob:57

Gregory Crane

Has also published under the name of:
"Gregory R. Crane"


Publications by Gregory Crane (bibliography)

 what's this?
Edit | Del

Crane, Gregory, Almas, Bridget, Babeu, Alison, Cerrato, Lisa, Harrington, Matthew, Bamman, David and Diakoff, Harry (2012): Student researchers, citizen scholars and the trillion word library. In: JCDL12 Proceedings of the 2012 Joint International Conference on Digital Libraries 2012. pp. 213-222. Available online

The surviving corpora of Greek and Latin are relatively compact but the shift from books and written objects to digitized texts has already challenged students of these languages to move away from books as organizing metaphors and to ask, instead, what do you do with a billion, or even a trillion, words? We need a new culture of intellectual production in which student researchers and citizen scholars play a central role. And we need as a consequence to reorganize the education that we provide in the humanities, stressing participatory learning, and supporting a virtuous cycle where students contribute data as they learn and learn in order to contribute knowledge. We report on five strategies that we have implemented to further this virtuous cycle: (1) reading environments by which learners can work with languages that they have not studied, (2) feedback for those who choose to internalize knowledge about a particular language, (3) methods whereby those with knowledge of different languages can collaborate to develop interpretations and to produce new annotations, (4) dynamic reading lists that allow learners to assess and to document what they have mastered, and (5) general e-portfolios in which learners can track what they have accomplished and document what they have contributed and learned to the public or to particular groups.

© All rights reserved Crane et al. and/or ACM Press

Edit | Del

Bamman, David and Crane, Gregory (2011): Measuring historical word sense variation. In: JCDL11 Proceedings of the 2010 Joint International Conference on Digital Libraries 2011. pp. 1-10. Available online

We describe here a method for automatically identifying word sense variation in a dated collection of historical books in a large digital library. By leveraging a small set of known translation book pairs to induce a bilingual sense inventory and labeled training data for a WSD classifier, we are able to automatically classify the Latin word senses in a 389 million word corpus and track the rise and fall of those senses over a span of two thousand years. We evaluate the performance of seven different classifiers both in a tenfold test on 83,892 words from the aligned parallel corpus and on a smaller, manually annotated sample of 525 words, measuring both the overall accuracy of each system and how well that accuracy correlates (via mean square error) to the observed historical variation.

© All rights reserved Bamman and Crane and/or their publisher

Edit | Del

Bamman, David, Babeu, Alison and Crane, Gregory (2010): Transferring structural markup across translations using multilingual alignment and projection. In: JCDL10 Proceedings of the 2010 Joint International Conference on Digital Libraries 2010. pp. 11-20. Available online

We present here a method for automatically projecting structural information across translations, including canonical citation structure (such as chapters and sections), speaker information, quotations, markup for people and places, and any other element in TEI-compliant XML that delimits spans of text that are linguistically symmetrical in two languages. We evaluate this technique on two datasets, one containing perfectly transcribed texts and one containing errorful OCR, and achieve an accuracy rate of 88.2% projecting 13,023 XML tags from source documents to their transcribed translations, with an 83.6% accuracy rate when projecting to texts containing uncorrected OCR. This approach has the potential to allow a highly granular multilingual digital library to be bootstrapped by applying the knowledge contained in a small, heavily curated collection to a much larger but unstructured one.

© All rights reserved Bamman et al. and/or their publisher

Edit | Del

Berti, Monica, Romanello, Matteo, Babeu, Alison and Crane, Gregory (2009): Collecting fragmentary authors in a digital library. In: JCDL09 Proceedings of the 2009 Joint International Conference on Digital Libraries 2009. pp. 259-262. Available online

This paper discusses new work to represent, in a digital library of classical sources, authors whose works themselves are lost and who survive only where surviving authors quote, paraphrase or allude to them. It describes initial works from a digital collection of such fragmentary authors designed not only to capture but to extend the ontologies that traditional scholarship has developed over generations: the aim is representing every nuance of print conventions while using the capabilities of digital libraries to extend our ability to identify fragments, to represent what we have identified, and to render the results of that work intellectually and physically more accessible than was possible in print culture.

© All rights reserved Berti et al. and/or their publisher

Edit | Del

Romanello, Matteo, Berti, Monica, Babeu, Alison and Crane, Gregory (2009): When printed hypertexts go digital: information extraction from the parsing of indices. In: Proceedings of the 20th ACM Conference on Hypertext and Hypermedia 2009. pp. 357-358. Available online

Modern critical editions of ancient works generally include manually created indices of other sources quoted in the text. Since indices can be considered as a form of domain specific language, the paper presents a parsing-based approach to the problem of extracting information from them to support the creation of a collection of fragmentary texts. This paper first considers the characteristics and structure of quotation indices and their importance when dealing with fragmentary texts. It then presents the results of applying a fuzzy parser to the OCR transcription of an index of quotations to extract information from potentially noisy input.

© All rights reserved Romanello et al. and/or their publisher

Edit | Del

Bamman, David and Crane, Gregory (2008): Building a dynamic lexicon from a digital library. In: JCDL08 Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008. pp. 11-20. Available online

We describe here in detail our work toward creating a dynamic lexicon from the texts in a large digital library. By leveraging a small structured knowledge source (a 30,457 word treebank), we are able to extract selectional preferences for words from a 3.5 million word Latin corpus. This is promising news for low-resource languages and digital collections seeking to leverage a small human investment into much larger gain. The library architecture in which this work is developed allows us to query customized subcorpora to report on lexical usage by author, genre or era and allows us to continually update the lexicon as new texts are added to the collection.

© All rights reserved Bamman and Crane and/or ACM Press

Edit | Del

Ray, Joyce, Lynch, Clifford, Bobley, Brett, Crane, Gregory and Wheatley, Steven (2007): Cyberinfrastructure for the humanities and social sciences: advancing the humanities research agenda. In: JCDL07: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries 2007. p. 214. Available online

In 2006 the American Council of Learned Societies (ACLS) released Our Cultural Commonwealth, the final report of the Commission on Cyberinfrastructure for the Humanities and Social Sciences. The report, based on a study funded by the Mellon Foundation, explored how research environments might be created for the humanities and social sciences to complement those being developed to support scientific research. The report includes key recommendations addressed to universities, funding agencies, scholarly societies, academic libraries, publishers, Congress, state legislatures, and others. Implementation of the recommendations could potentially transform scholarship and exponentially increase access to resources and new scholarship in the humanities and social sciences. But the report has not been universally embraced. How will humanities scholarship be advanced by new technologies and research practices, and how will the academic community recognize new forms of scholarship? How will funding agencies respond to the challenges and issues raised? What does cyberinfrastructure mean for different domains within the humanities? These questions will be addressed by panelists and discussed by participants.

© All rights reserved Ray et al. and/or ACM Press

Edit | Del

Stewart, Gordon, Crane, Gregory and Babeu, Alison (2007): A new generation of textual corpora: mining corpora from very large collections. In: JCDL07: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries 2007. pp. 356-365. Available online

While digital libraries based on page images and automatically generated text have made possible massive projects such as the Million Book Library, Open Content Alliance, Google, and others, humanists still depend upon textual corpora expensively produced with labor-intensive methods such as double-keyboarding and manual correction. This paper reports the results from an analysis of OCR-generated text for classical Greek source texts. Classicists have depended upon specialized manual keyboarding that costs two or more times as much as keyboarding of English both for accuracy and because classical Greek OCR produced no usable results. We found that we could produce texts by OCR that, in some cases, approached the 99.95% professional data entry accuracy rate. In most cases, OCR-generated text yielded results that, by including the variant readings that digital corpora traditionally have left out, provide better recall and, we argue, can better serve many scholarly needs than the expensive corpora upon which classicists have relied for a generation. As digital collections expand, we will be able to collate multiple editions against each other, identify quotations of primary sources, and provide a new generation of services.

© All rights reserved Stewart et al. and/or ACM Press

Edit | Del

Crane, Gregory and Jones, Alison (2006): The challenge of virginia banks: an evaluation of named entity analysis in a 19th-century newspaper collection. In: JCDL06: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006. pp. 31-40. Available online

This paper evaluates automatic extraction of ten named entity classes from a 19th century newspaper, the Civil War years of the Richmond Times Dispatch, digitized with IMLS support by the University of Richmond. This paper analyzes success with ten categories of entities prominent in these newspapers and the particular problems that these classes of named entities raise. Personal and place names are familiar but some more important categories (such as ship names and military units) illustrate some of the challenges that named entity identification confronts as it evolves into a fundamental tool not only for automatic metadata generation but also for searching and browsing as well. We conclude by suggesting the kinds of knowledge sources that digital libraries need to assemble as part of their machine readable reference collections to support named entity identification as a core service.

© All rights reserved Crane and Jones and/or ACM Press

Edit | Del

Weaver, Gabriel, Strickland, Barbara and Crane, Gregory (2006): Quantifying the accuracy of relational statements in Wikipedia: a methodology. In: JCDL06: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries 2006. p. 358. Available online

An initial evaluation of the English Wikipedia indicates that it may provide accurate data for disambiguating and finding relations among named entities.

© All rights reserved Weaver et al. and/or ACM Press

Edit | Del

Mimno, David, Jones, Alison and Crane, Gregory (2005): Finding a catalog: generating analytical catalog records from well-structured digital texts. In: JCDL05: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries 2005. pp. 271-280. Available online

One of the criticisms library users often make of catalogs is that they rarely include information below the bibliographic level. It is generally impossible to search a catalog for the titles and subjects of particular chapters or volumes. There has been no way to add this information to catalog records without exponentially increasing the workload of catalogers. At the same time, well-structured full-text XML transcriptions of printed works are becoming increasingly available. This paper describes how existing investments in full text digitization and structural markup combined with current named-entity extraction technology can efficiently generate the detailed level of catalog data that users want, at no significant additional cost. This system is demonstrated on an existing digital collection within the Perseus Digital Library.

© All rights reserved Mimno et al. and/or ACM Press

Edit | Del

Shiaw, Horn-yeu, Jacob, Robert J. K. and Crane, Gregory (2004): The 3D vase museum: a new approach to context in a digital library. In: JCDL04: Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries 2004. pp. 125-134. Available online

We present a new approach to displaying and browsing a digital library collection, a set of Greek vases in the Perseus digital library. Our design takes advantage of three-dimensional graphics to preserve context even while the user focuses in on a single item. In a typical digital library user interface, a user can either get an overview for context or else see a single selected item, sacrificing the context view. In our 3D Vase Museum, the user can navigate seamlessly from a high level scatterplot-like plan view to a perspective overview of a subset of the collection, to a view of an individual item, to retrieval of data associated with that item, all within the same virtual room and without any mode change or special command. We present this as an example of a solution to the problem of focus-plus-context in information visualization. We developed 3D models from the 2D photographs in the collection and placed them in our 3D virtual room. We evaluated our approach by comparing it to the conventional interface in Perseus using tasks drawn from archaeology courses and found a clear improvement Subjects who used our 3D Vase Museum performed the tasks 33% better and did so nearly three times faster.

© All rights reserved Shiaw et al. and/or ACM Press

Edit | Del

Fox, Edward A., Crane, Gregory, Griffin, Stephen M., Larsen, Ronald L., Levy, David M., McArthur, David J. and Shigeo, Sugimoto (2004): Digital libraries settling the score: 10 years hence and 10 before. In: JCDL04: Proceedings of the 4th ACM/IEEE-CS Joint Conference on Digital Libraries 2004. p. 374. Available online

Six panelists and a moderator leverage knowledge of the first ten years of the digital libraries field, to suggest key future directions.

© All rights reserved Fox et al. and/or ACM Press

Edit | Del

Smith, David A., Mahoney, Anne and Crane, Gregory (2002): Integrating harvesting into digital library content. In: JCDL02: Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries 2002. pp. 183-184. Available online

The Open Archives Initiative has gained success by aiming between complex federation schemes and low functionality web crawling. Much information still remains hidden inside documents catalogued by OAI metadata. We discuss how subdocument information can be exposed by data providers and exploited by service providers. We discuss services for citation reversal and name and term linking with harvested data in the Perseus Project's document management system and a proxy service for automatically adding these links to OAI documents outside Perseus.

© All rights reserved Smith et al. and/or ACM Press

Edit | Del

Crane, Gregory, Smith, David A. and Wulfman, Clifford E. (2001): Building a Hypertextual Digital Library in the Humanities: A Case Study on London. In: JCDL01: Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries 2001. pp. 426-434. Available online

This paper describes the creation of a new humanities digital library collection: 11,000,000 words and 10,000 images representing books, images and maps on pre-twentieth century London and its environs. The London collection contained far more dense and precise information than the materials from the Greco-Roman world on which we had previously concentrated. The London collection thus allowed us to explore new problems of data structure, manipulation, and visualization. This paper contrasts our model for how humanities digital libraries are best used with the assumptions that underlie many academic digital libraries on the one hand and more literary hypertexts on the other. Since encoding guidelines such as those from the TEI provide collection designers with far more options than any one project can realize, this paper describes what structures we used to organize the collection and why. We particularly emphasize the importance of mining historical authority lists (encyclopedias, gazetteers, etc.) and then generating automatic span-to-span links within the collection.

© All rights reserved Crane et al. and/or ACM Press

Edit | Del

Rydberg-Cox, Jeffrey A., Mahoney, Anne and Crane, Gregory (2001): Document Quality Indicators and Corpus Editions. In: JCDL01: Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries 2001. pp. 435-436. Available online

Corpus editions can only be useful to scholars when users know what to expect of the texts. We argue for text quality indicators, both general and domain-specific.

© All rights reserved Rydberg-Cox et al. and/or ACM Press

Edit | Del

Crane, Gregory, Chavez, Robert F., Mahoney, Anne, Milbank, Thomas L., Rydberg-Cox, Jeffrey A., Smith, David A. and Wulfman, Clifford E. (2001): Drudgery and deep thought. In Communications of the ACM, 44 (5) pp. 34-40. Available online

Edit | Del

Crane, Gregory and Rydberg-Cox, Jeffrey A. (2000): New Technology and New Roles: The Need for "Corpus Editors". In: DL00: Proceedings of the 5th ACM International Conference on Digital Libraries 2000. pp. 252-253. Available online

Digital libraries challenge humanists and other academics to rethink the relationship between technology and their work. At the Perseus Project, we have seen the rise of a new combination of skills. The "Corpus Editor" manages a collection of materials that are thematically coherent and focused but are too large to be managed solely with the labor-intensive techniques of traditional editing. The corpus editor must possess a degree of domain specific knowledge and technical expertise that virtually no established graduate training provides. This new position poses a challenge to humanists as they train and support members of the field pursuing new, but necessary tasks.

© All rights reserved Crane and Rydberg-Cox and/or ACM Press

Edit | Del

Crane, Gregory (1996): Building a Digital Library: The Perseus Project as a Case Study in the Humanities. In: DL96: Proceedings of the 1st ACM International Conference on Digital Libraries 1996. pp. 3-10. Available online

This paper outlines some of our preliminary findings in the Perseus Project, an on-going digital library on ancient Greek culture that has been under development since 1987.

© All rights reserved Crane and/or ACM Press

Edit | Del

Marchionini, Gary and Crane, Gregory (1994): Evaluating Hypermedia and Learning: Methods and Results from the Perseus Project. In ACM Transactions on Information Systems, 12 (1) pp. 5-34. Available online

The Perseus Project has developed a hypermedia corpus of materials related to the ancient Greek world. The materials include a variety of texts and images, and tools for using these materials and navigating the system. Results from a three-year evaluation of Perseus use in a variety of college settings are described. The evaluation assessed both this particular system and the application of the technological genre to information management and to learning. The evaluation used a variety of methods to address questions about learning and teaching with hypermedia and to guide the development of early versions of the system. Results illustrate that such environments offer potential for accelerating learning and for supporting new types of learning and teaching; that students and instructors must develop new strategies for learning and teaching with such technology; and that institutions must develop infrastructural support for such technology. The results also illustrate the importance of well-designed interfaces and different types of assignments on user performance.

© All rights reserved Marchionini and Crane and/or ACM Press

Edit | Del

Kahn, Peter H., Nyce, James M., Oren, Tim, Crane, Gregory, Smith, Linda C., Trigg, Randall H. and Meyrowitz, Norman (1991): From Memex to Hypertext: Understanding the Influence of Vannevar Bush. In: Walker, Jan (ed.) Proceedings of ACM Hypertext 91 Conference December 15-18, 1991, San Antonio, Texas. p. 361. Available online

Edit | Del

Crane, Gregory (1987): From the Old to the New: Integrating Hypertext into Traditional Scholarship. In: Weiss, Stephen and Schwartz, Mayer (eds.) Proceedings of ACM Hypertext 87 Conference November 13-15, 1987, Chapel Hill, North Carolina. pp. 51-55.

Hypertext allows academics to structure and manipulate their ideas in a radically new way, but it should also reinforce traditional scholarly activity. Those designing Hypertext systems that are intended for the general academic market must be careful to support not only new possibilities, but those functions with which academics are already familiar. Further, many scholars hope that their documents will be useful for decades to come. We need standard document architectures that will separate a particular Hypertext from the system in which it was designed.

© All rights reserved Crane and/or ACM Press

