Content area
Full Text
Abstract: There is an ever increasing interest in publishing Linked Open Datasets about scientific papers. The current landscape is very fragmented: some projects focus on bibliographic data, others on authorship data, others on citations, and so on. The quality is also heterogeneous and the production and maintenance of such datasets is difficult and time-consuming. In this paper we introduce the Semantic Lancet Project, whose goal is to make available rich semantic data about scholarly publications and to provide users with sophisticated services on top of those data. We developed a chain of tools that produce high-quality data from multiple sources. It has been successfully used to produce a rich and freely available LOD, described here as well.
Keywords: data reengineering, linked open dataset, scholarly publications, semantic enhancement, semantic publishing
1. Introduction
The authors of a scientific paper can cite another paper for different reasons. A very common reason is that the cited paper has been useful to those authors, for instance because it has proposed a problem to investigate or a solution to analyse. Another common reason is that the cited paper contains information that is necessary for understanding those authors' work.
The readers of a scientific paper can instead use the references (or citation list) to appreciate the context of the paper. For instance, looking at the years of publication of each citation a reader can try to guess how obsolete the contents of the paper are. Citations are also increasingly being used for evaluation purposes, to assess the quality of the scientific production of single researchers or teams or even communities.
Citations are however only a part of the picture. The availability of rich data about scientific papers - including information about authors, publishing processes and, more important, about the content itself - opens the way to novel applications for a large spectrum of users. The same processes of evaluation, access and exploitation of the research results can be improved by combining all this information.
In fact, the knowledge management of scholarly products is an emerging research area: for authors, it includes the gathering of personal repositories of papers, citations, and their relationships with the author's work; for publishers, it includes the construction of large repositories of assets from conferences or...