Department: Information Engineering

Total Contribution: Euro 130.000

Call: STARS-2017-StG

Project Duration in months: 24

Start Date: 15/03/2018
End Date: 14/03/2020

CDC - Computational Data Citation

Citations are the cornerstone of knowledge propagation and the primary means to assess the quality of research as well as to direct investments in science. Science is “data-intensive,” where large volumes of data are collected and analyzed to discover intricate patterns through simulations and experiments. Hence, most scientific reference have been replaced by online curated datasets. Yet, given a dataset, there exists no quantitative, consistent, and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded, and what its value is. 

The CDC project targeted the lack of an automatic, deep, and persistent mechanism for data citation that is the leading cause for the issues presented above. The present STARS project shaped the theory and practice of an emerging field of computer science: Computational Data Citation (CDC). We designed the first well-founded model, efficient algorithms, and a solid citation system for citing data.  The broader impact of this research program in on scientists who publish their findings in organized and structured databases and data centres that curate, preserve, and publish data, as well as on government agencies that instruct research investments. 

This program supported the transition to data-intensive scientific discovery by favouring:
(i) the development of new publication models;
(ii) the promotion of scientific dataset sharing and integration between data and published literature; and
(iii) the evolution of scientific data management and access.

Moreover, this research program consolidated Data Citation as a new scientific field of computer science. It helped to create a young and dynamic research group at the University of Padua with a robust multidisciplinary focus composed of six Ph.D. students.

The work carried out in CDC consolidated an international scientific network working on data citation comprising the University of Edinburgh (UK), the University of Pennsylvania (USA), the University of Amsterdam (The Netherlands),  and Aalborg University (Denmark). Overall, in two years CDC produced more than 30 peer-reviewed publications in international journals and conferences.