Name of the presenter: Joonas Kesäniemi

About the presenter: Joonas Kesäniemi is all-around (linked) data enthusiast working within the library to develop services for the whole university and beyond.

Intended audience:

Anyone interested in research data, linked data and/or information system architectures.


Activities related to research data offer new opportunities for academic libraries to increase their visibility in the eyes of the researchers. Helsinki University Library has been working for years to come up with new ways of integrating itself as part of research activities, for example through data management related training, content curation and new discovery services for licensed content. When it comes to actually managing data during the project or storing and distributing it afterwards, the only alternative is usually to endorse an external service. But not this time. What started as a request for metadata related consultation and an idea about a simple web form, has turned into a flexible, reusable and scalable platform for building a customized web-based data management tool that produces data that is linked, distributed and made instantly applicable through APIs provided by the library.

For the past year and a half the library has been working in close co-operation with a research group from the Research Unit for Variation, Contacts and Change in English (VARIENG) to develop a tool and model for managing their research data. Together we are building the Language Change Database (LCD), which is designed to be a collaborative, cumulative and open access resource for corpus-based research on the English language. By providing comparative and machine-readable baseline data e.g. about the focus, sources and results of earlier studies, the LCD aims to facilitate historical linguistic research in general as well as statistical modelling, systematic review and replication of prior research with other datasets.

In addition to providing the research group with a customizable data platform and consultation on how to incrementally build their data model, the library also handles the distribution of the openly available subset of the research data through its linked data platform. The platform combines external datasets and data from different university systems such as CRIS, ILS, semantic wikis, and now the LCD, into one read-only linked dataset. This means that we can, for example, follow links from an organizational unit to a research group, from the group to a publication, from the publication to a dataset and finally from the dataset to the data itself all within the same service.

This presentation describes the infrastructure built and maintained by the library to support research groups in their research data management, publication and discovery tasks at different stages of project’s life cycle. We will present the technologies, architecture and services involved as well as lessons learned during the process. The underlying technology and accumulated knowledge on how to incrementally build a data model for research data provides a foundation for a more general data management solution for other research groups dealing with relatively small, but potentially complex data. The current project is a one-time deal, but it highlights an opportunity for libraries to be part of the research, and the place to go, not only for data about data, but for the data too.