Open Licenses for Data, Software and Code
This 2-hour long unit provides trainers with the core skills and knowledge to teach the nuances of licensing research outputs. After completing the unit, the trainer will be able to reflect on how to tailor the content to their local environment, and have more knowledge about how and when to apply licenses throughout a project, while complying with funders, institutional requirements and the research discipline/aims of the project.
Licensing is a complex topic, hence we present a very thorough slide deck covering many aspects of licensing. As this learning path thus has a wealth of slides, and the instructor should choose which slides best suit a given occasion. We leave this slimming down of the content at the discretion of the instructor, and the context and target group in which they are teaching.
Learning Objectives
- Explain the different types of licenses that can be applied to FAIR research outputs in accordance with regulations, policies and other legal requirements
- Apply (multiple) licenses appropriately during the development and publication of a research output
- Argue for the choice of one or more appropriate licenses to research output and identify the importance of creating machine-readable rights statements
- Engage in the research process and influence the research process to accommodate for more available and licensed data
Target Audience
- Data Stewards
- Data Professionals
Duration
- 120 mins
Prerequisites
It is expected that the learner has taken an introduction to research data management and has the following:
- basic knowledge of what Creative Common Licenses are
- basic knowledge of the common terms used in licensing, such as “attribution”, “copyleft” and “non-commercial”
- basic understanding of the difference between a waiver and a license
- basic awareness of restrictions to licensing data as well as an awareness that funders, publishers, the research project and disciplines can have different requirements to licensing research output
- awareness of the need to contact a legal expert (the boundaries of their expertise)
Learning Tools: None
Slides
The entire library of slides we recommend you pick and choose from can be downloaded here.
The slides adapted for the Train-the-trainer event held on the 19th of June 2024 can be downloaded here.
Instructor notes for the three topics taught in the learning unit
Unit 1: Why you shouldn’t license your research output
To introduce the topic of licensing, in this unit we start by asking the students why they should NOT license their data. In this way, they can share any experiences they may have had with people who were reluctant to use licenses for research data. Based on these shared experiences, the instructor can then begin to teach and discuss the advantages of and hesitations towards licensing data in contexts relevant for the learners.
Key takeaways:
- Applying a licence is of crucial importance for disseminating a research product.
- The choosing to apply a license is a determinant stage in the process of diffusing a research product. This choice has concrete consequences and should not be taken lightly.
- When faced with a reluctance to use a license, it is necessary to insist that it is not a constraint but a benefit for the researcher himself and for his scientific community (protection of one’s right, more citations, more re-use).
Instructor notes unit 1
Slides 2-4: Introduction and housekeeping. These slides lay out the overview of the learning path, some ground rules, learning objectives and an overview of what the data steward can expect from this 2 hour learning path. These 10 minutes can be used to introduce trainers and helpers, share the agenda and learning objectives (as shown on the slides). Instructors can reiterate that this an intermediate learning path aimed at data stewards who may know some basics about licensing, and are keen to develop the knowledge further.
Slide 5: Icebreaker. These 10 minutes can be used for participants to introduce themselves to one another using the prompts on the slides. For an in-person training, this could be done in small breakout groups. If the training is virtual, it could also be done on a collective document (google doc/hackMD) to enable participation.
Slide 6-7: This unit is titled ‘Why you shouldn’t license your research output’ and is intended to be provocative. The aim is for the instructor to use common scenarios encountered by data stewards where researchers may be hesitant to apply licenses, and for the instructor to advocate for license use through addressing each scenario in turn.
Slides 8-9: These slides cover the aforementioned scenarios, through imagining a hypothetical conversation between a data steward and a researcher. Each scenario/reason not to license has a counter or follow up on how licensing can help and why it is important. Instructors are encouraged to ask data stewards to reflect on similar conversations they may have had and how arguments used by the (fictional) data steward in the slides could help address some of the questions.
Slide 10: The instructor concludes the unit by stating the takeaways. The instructor is free to develop these two elements according to the time remaining.
Slide 11: This learning activity uses two question prompts to engage participants in a discussion about how they would advocate for license use among researchers who were reluctant to use licenses. Participants can choose from the two scenarios. It is possible that there is not enough time in the learning path to discuss each breakout group or participant response in detail, in this case participants will be asked to share reflections from the learning activity after the session.
Unit 2: What to consider before applying licenses to research outputs
This unit addresses challenges for data stewards and researchers when sharing and licensing research outputs. These challenges arise in connection with requirements from external stakeholders like funders and publishers, ethical or legal obligations, and in relation to intellectual property rights and commercialization. The learner is made aware of factors that can negatively impact the novelty of research outputs and make sharing and licensing research data, software and code unattractive. Finally, mitigation strategies are introduced that a data steward can employ to adhere to external requirements and obligations, while at the same time protecting researchers’ rights to their data.
Key takeaways:
- Before applying licenses to research outputs like data, code, or software, you have to balance many different factors like stakeholder requirements, ethical or legal obligations, regulations, policies, and rights.
- Mitigation strategies can help to navigate these partially contradicting requirements and maintain the ability to share and license research outputs.
- You have to be aware of the benefits and risks of choosing one licensing pathway over another.
Instructor notes unit 2
The aim of this unit is to encourage a discussion of the different requirements, policies and regulation that can influence the sharing and licensing of research outputs. The unit begins with a presentation of the research life cycle and how licences may influence a project’s dissemination strategy. Then, the unit introduces funder and journal requirements, ethical concerns, personal data and contractual obligations that research outputs may be subject to. Finally, the unit explores how licensing may raise concerns about maintaining the novelty of licensed data (software and code), commercialisation, copyright infringement, dual use and rights to assign licenses. Slides 17 through 36 present all the factors influencing the sharing and licensing of research outputs one by one. Each factor is presented on two slides. The first slide introduces the factor and gives a concrete example of where a researcher – or you as a data steward – may come across a given requirement (something you should comply with) or an expectation (something you can comply with). The second slide expands on the concrete example and points to potential mitigation strategies. Mitigation strategies are ways of balancing the given requirements or expectations with other potentially contradicting requirements or expectations and thereby maintain your ability to share and license research outputs. Assigning a license is a capability that you must have, and the mitigation strategies are ways to have that capability at the time of licensing.
Remember: Sharing non-licensed research outputs makes their reuse almost impossible - or will at least prevent their correct attribution and citation.
General teaching hint for this unit: You do not have to go through all the slides one by one. Think about your audience and then choose the factors that are most relevant for them.
Slide 13: Licensing is part of good planning practice (understood as preparation before the project). But as research changes along the way, there is a continuous need to focus on change management and what that means for the ability to license research outputs. Remember; a plan only survives the first meeting with reality. Teaching hint: You can use this slide either as a talking primer for yourself (“lecturing”) or you can use it as a discussion primer to explore together with the learners where licences play a role in the research life cycle. It could be particularly interesting to discuss disciplinary and/or organisational differences and opportunities.
Slide 14: This slide provides an overview of all the factors influencing the decision of whether to share a research output (or not) and how to license it. The image of the balance is meant to visualise that sharing outputs and choosing an appropriate license is a balancing act, where many different factors like stakeholder requirements, ethical or legal obligations, regulations, policies, and rights have to be balanced in order to make the right decision. Sometimes these factors may pull into different directions or even plainly contradict one another, e.g. a journal requiring to publish data openly, when the data are personal. Or a funder requiring early data sharing, while the researcher would not want to risk not being able to publish novel findings in a prestigious journal.
Slide 15+16 (funders): Funders often require that research outputs are made openly available or at least made FAIR, in order to ensure a high accessibility and an equally high reusability. The example on slide 17 is clipped from Horizon Europe’s Annotated Grant Agreement (AGA) and presents a number of requirements for open and FAIR data. These are explained in more detail in the first two bullets on slide 18. Here, it is important to note that the licensing requirements are known early on in the process (in general, already during the grant application phase). The recommended mitigation strategies are therefore that both the FAIRification and the licensing of research outputs should be negotiated between project partners early on and documented in writing, e.g. in a data management plan (DMP).
Slide 17-18 (journals): A lot of journals have an expectation that a researcher will share research outputs like data and code under a given license. These expectations are usually documented in the journals’ data policies. You can find them by looking for “Open Data”, “FAIR data”, “Data Policy” or “Data Availability Statement” in the “Information for Authors” on the journal or publisher website. The researcher and their collaborators have to investigate this as soon as possible and then act accordingly. If a journal requires you, for instance, to publish data with a CC-BY 4.0 license, you cannot include CC-BY-SA 4.0-licensed data in your dataset.
Slide 19-20 (ethical): Ethical concerns relate often to data sharing, not to the ability to license. Slide 22 lists a number of research subjects, whose rights have to be protected – and these are not only human participants, but also animals, plants and even entire ecosystems. You live up with the principles of ethical research conduct by assessing the potential impact of your research before you even start interacting with your research subjects and objects; by choosing only a subset (potentially de-identified) for publication and by obtaining proper informed consent that includes data sharing.
Slide 21-21 (personal data): Personal data are difficult to manage - and even harder to share and license. This is due to data privacy regulations enforced by e.g. GDPR in Europe and related national legislation.
Slide 22-24 (contractual): If you collect, buy, or produce data in a confidential environment (or under a non-disclosure agreement (NDA), or if you produce code under contractual obligations, you may be restricted in the ability to share and license. The sooner you match your sharing and licensing aspirations with your contractual obligations, the sooner you can apply the mitigation strategies.
Slide 25-26 (novelty of findings): Although more and more journals require sharing and licensing data and/or code, there is a risk in doing this too soon. Openly sharing and licensing the data and/or code for others may jeopardize the novelty of your findings, if somebody else uses the data and/or code to reach the same results before you.
Slide 27-28 (commercialization): Researchers are obliged through their job contract with the university to report novel findings in order to explore potential commercial spinouts, patents or the like. This obligation can be given by law, but can sometimes also be required by funders. It is best practice to seek advice early on in the process, at any rate before sharing and/or licensing data and/or code, to investigate if this hinders the ability to commercialize.
Slide 29 (copyright/license infringement): If you share and license something you did not create, e.g. by combining it with your own data and/or code, you may violate the copyright of others or the license of the original material. Let’s go through the three examples on
Slide 30: (1) Bloomberg: If you buy data from a commercial data provider, the data usually underlie quite restrictive usage rights agreements that you signed in the purchase process, e.g. you cannot redistribute and/or waiver the original rights to the data and/or code. (2) CC BY-NC-ND box: This can also be a problem, if you reuse other peoples’ publicly available work that is licensed in a specific way. A work licensed CC BY-NC-ND, for instance, cannot be reused in a commercial context (NC = non-commercial) and you are not allowed to change the work in any way (ND = no derivatives). (3) Pizza: The last example is the case of a citizen science project, where citizens are invited to participate and contribute ideas. When volunteers or citizen scientists then produce drawings of their design ideas, these have to be considered original works that are subject to copyright. Does the research project have the ability to assign a license to this work as part of the dissemination of their research? The answer is no, but this could be achieved either by written agreement, or by having the citizen scientists license their work in the first place (see Hansen et al. 2021 for details).
Slide 31-32 (dual use): So-called “dual use” concerns the misuse of research data and/or for undesired military or civil applications. Data and code that can be (mis)used for military purposes must not be shared and/or licensed. This is a legal requirement.
Slide 33-35 (right to assign a license): The ability for a researcher and/or data steward to waiver rights to data and/or code cannot be taken for granted. There might be restrictions or procedures (as illustrated by the Bristol example) that limits the ability, or national legislation (as illustrated by the French example) that enables the creators of data and/or code to license it without having an explicit permit. On the backdrop of the national legislation in France, research projects are free to choose whether or not to license their research data, without being obliged to do so. As a matter of best practice, however, it would still be useful to continue using re-use licenses, as they clearly inform re-users of the extent of their rights and obligations. Learn more about licence compatibility in Europe in Graux 2023. Learn more about national legislation at Data Europa.
Slide 36 (summing up, identical to slide 14): This slide provides an overview of all the factors influencing the decision of whether to share a research output (or not) and how to license it. The image of the balance is meant to visualise that sharing outputs and choosing an appropriate license is a balancing act, where many different factors like stakeholder requirements, ethical or legal obligations, regulations, policies, and rights have to be balanced in order to make the right decision. Sometimes these factors may pull into different directions or even plainly contradict one another, e.g. a journal requiring to publish data openly, when the data are personal. Or a funder requiring early data sharing, while the researcher would not want to risk not being able to publish novel findings in a prestigious journal.
Slide 37 (Summing up, what to do in practice before awarding a license: Checklist and reminders. The basic essentials to remember when you have all these variables described on the previous slide that can come into play.
Slide 38 (Learning Activity): See Learning Activity notes for instructions for this exercise
Slide 39 (Key Takeaways):
Unit 3: practical application of licenses to (FAIR) research output
Following on from the abstract discussion and after checking the ability to license research outputs in unit 2, this third unit is dedicated to the concrete process of choosing the appropriate license. It presents the learner with practical tools and guides.
The unit starts with a global introduction with an overview of different licenses that are available to license data, software or code. Then, a step-by-step guide to license research output is suggested :
- Step 1 (to be prepared to award a license): identify the holder of the copyright
- Step 2 (when awarding a license): how to chose the perfect license for your research output and what considerations should be taken into account when doing it. Some tools are suggested that can help the student.
- Step 3 (after awarding a license): the work is not over quite yet, some things need to be checked once the license is awarded : the trainer briefly addresses the importance of human and machine readability and licensing metadata.
Key takeaways: - Before applying licenses to research outputs like data, code or software, you have to balance many different factors like stakeholder requirements, ethical or legal obligations, regulations, policies and rights. - Mitigation strategies can help to navigate these partially contradicting requirements and maintain the ability to share and license research outputs. - You have to be aware of the benefits and risks of choosing one licensing pathway over another.
Instructor notes unit 3
Slide 41: overview Presents an overview of different type of existing licenses.
- The aim is to present a variety of different licenses and explain that they meet different needs. Some are internationally recognized and widely shared, while others are country-specific.
- Point out, if necessary, that a license is a legal instrument allowing the holder of the rights in a work to grant certain rights over the use of that work.
- The Public domain icon allows you to introduce the concept of intellectual property, discuss the restrictions that may be applied under states laws and to enter the subject of the rights granted by the licenses.
- One or two licenses can be looked at more closely to examine the terms and conditions for use (or you can use the SPDX list for this)
Slide 42-43: Who is the copyright holder Question who has the right to license in our public institutions and the latitude researchers have to publish, distribute and communicate their data.
- To keep things simple, we consider 3 types of research outputs as examples : data set, data base, software
- The aim here is to explain that researchers do not have all the rights to scientific production, and that before granting a license it is necessary to determine who is entitled to do so, what is protected and under what legal conditions.
- Only the holder of the rights, can define and impose a distribution license for his work.
Slide 44 (What is the appropriate license for my research output): This slide explains which licenses (most popular) can be used for the 3 types of reseach output described above (one of these licenses is specifically french).
- The aim is to explain that some licenses are more suitable with a research output than others : For example CC licenses should not be used for software, specific licenses are available for databases such as Open Data Commons and CC licenses are not suitable for databases before version 4.0
- The various research outputs are not all of the same nature in the legal sense, and require specific licenses to be used.
- Concerning the different type of software licenses, you can explain that there are different types of licenses and a possible progression towards different degrees of freedom:
- without copyleft (permissive license). The initial license is not mandatory. It is possible to redistribute and modify but also to add restrictions,
- low copyleft for which the initial license remains and additions can have another license and also
- strong copyleft where the license imposes itself on everything.
Teaching hint: If you want to discuss the subject of software a little more deeply
- Some Key notions can be explained : What does the term "free software" refer to? What is the difference with "open software"
- Explain that the term "free software" refers to the freedom for the users to run, copy, distribute, study, modify and improve the software.
- Explain that a free software or "libre software" isn't necessarily cost-free
- Clarify that although open source software is made available to the public free of charge, it is not part of the public domain, some licenses may authorize copyright protection and the sale of works derived from open source code
- Explain that open source is based on the principles of free software, but with ten prerequisites (see the definition)
- Explain that free or open source software is not “free of rights”
- Explain that most software are free and open source at the same time and in this case refer to the same thing : the source code is publicly accessible and can be personalized
Slide 45:Creative Commons (Focus) The table is here to help understand and use Creative commons
- The aim is to understand the interests and challenges of creative commons licenses.
- Remind the public that creative commons licenses are contracts by which the author of a creation (be it a document, a drawing, software, a video, or even a database) authorizes, in advance, anyone to exploit, distribute, modify or develop his or her creation, provided he or she complies with its conditions..
- Consider the different licenses: what are the different options? how do we choose the license that best suits our needs? How to combine licenses?
- The fact sheet on creative commons can be used as a basis for discussion or just to illustrate common concerns.
Slide 46-47: Licenses advantages and limitations (example of the UK context) This table shows licenses that are used in the british context. The advantages and disadvantages of each license can be compared. It illustrates the fact that every consideration should be taken into account when choosing a license.
Slide 48: What to do in practice while awarding a license These slides are dedicated to choosing a license. A step-by-step approsch is suggested.
- If reusing already existing data, be careful of the terms and condition under which it can be reused. It might not possible to reuse it.
- When awarding a license, you can use online tools to guide you (five examples are given in the two following slides).
- Take a moment to remind learners that the license choice cannot go without taking the file format into account. In the idea of choosing a license as open as possible, you should also choose a format that is as open as possible.
Slide 49: What to do in practice while awarding a license – the tools 1
- The "Choo-choo-choose your license" illustration shows how you can choose a CC license when taking into account your needs and considerations and what is expected from the license.
- Choosealicense and the EUDAT license selector wizard are two online tools that can help in the same way.
Teaching hint: The trainer can take a few minutes to go online and demonstrate one or both tools (need for an internet connection).
Slide 50: What to do in practice while awarding a license – the tools 2
- The ARDC published a Research Data Rights guide that can be useful
- The Joinup Licensing Assistant can be used if awarding a license for a software. It works on the same principle as Choosealicence and the EUDAT license selector wizard : you pick your conditions and needs and it shows you what license might be more adapted to your situation.
Teaching hint: As for the previous slide and depending on the time available, the trainer can take a few minutes to show the guide and / or demonstrate the Joinup Licensing Assistant.
All the tools presented on this slide and the previous one do not need to be demonstrated, but one or two might give good example of what is available online.
Slide 51: What to do in practice after awarding a license? This checklist slide contains recommendations that can be commented on.
- The idea is to raise awareness of the fact that once you've chosen a license, it's advisable to indicate and explain your choices.
- An example of statement is provided to illustrate a good practice.
Slide 52: A minute about machine readability This slide points out the importance of the license machine readability. It can be commented briefly. On the practical side, it is handled by the chosen repository - but trainees need to understand why the license needs to be both human and machine readable. The anatomy of a CC license is given as an example.
Slide 53: A minute about licensing metadata This slide reads like the previous one : you can explain it briefly. The idea is to show that metadata can have their own separated license and to explain why they should be as open as possible since they can be reused on their own.
Slide 54: Takeaways - The trainer focuses on three main points presented on the slide, that have beed the red thread throughout the unit. Use time to answer potential questions. - Research products are not equal in legal terms and require specific licenses. There are many different licenses available, each one having its own characteristics. Your choise of license partly depends on the nature of your research output. You need to identify the rights that apply to the different research outputs before awarding a license. - There are free online tools that can guide you towards finding the perfect license, considering your own needs and conditions. - Even when you have found the license that perfectly matches your needs, you work is not over yet. You still have to check its readability - for humans: how your choice of license is expressed when landing on your research output page - or machines: for interoperability
Slide 55 (Learning Activity): See Learning Activity notes for instructions for this exercise
Summary
- Applying a licence is of crucial importance for disseminating a research product.
- The choosing to apply a license is a determinant stage in the process of diffusing a research product. This choice has concrete consequences and should not be taken lightly.
- When faced with a reluctance to use a license, it is necessary to insist that it is not a constraint but a benefit for the researcher himself and for his scientific community (protection of one’s right, more citations, more re-use).
- Before applying licenses to research outputs like data, code or software, you have to balance many different factors like stakeholder requirements, ethical or legal obligations, regulations, policies and rights.
- Mitigation strategies can help to navigate these partially contradicting requirements and maintain the ability to share and license research outputs.
- You have to be aware of the benefits and risks of choosing one licensing pathway over another.
- Research products are not equal in legal terms and require specific licenses. There are many different licenses available, each one having its own characteristics. Your choise of license partly depends on the nature of your research output. You need to identify the rights that apply to the different research outputs before awarding a license.
- There are free online tools that can guide you towards finding the perfect license, considering your own needs and conditions.
- Even when you have found the license that perfectly matches your needs, you work is not over yet. You still have to check its readability.
- for humans: how your choice of license is expressed when landing on your research output page
- or machines: for interoperability
Suggested Reading
Please refer to the slides for a full list of references and links to tools presented during the training