Guidelines and HOWTOs/Licensing: Difference between revisions
(→License Tooling: add license conversion tooling) |
Thiagosueto (talk | contribs) (Mention a shell way to find a list of all people that contributed to a file to make REUSE compliance easier) |
||
(22 intermediate revisions by 3 users not shown) | |||
Line 3: | Line 3: | ||
== Introduction == | == Introduction == | ||
Every source code file shall contain information about its license to inform the user of the file how it may be used. Such information is usually added in the form of source code comments at the top of the file. Traditionally, those comments were formulated as so-called "license headers", which are either the complete license texts or short texts that explain where to find the license text for the file along with some legal constraints. Nowadays, this system is getting replaced with "SPDX Expressions", which should become the default way to state a license (see [https://reuse.software REUSE.software]). | Every source code file shall contain information about its license to inform the user of the file how it may be used. Such information is usually added in the form of source code comments at the top of the file. Traditionally, those comments were formulated as so-called "license headers", which are either the complete license texts or short texts that explain where to find the license text for the file along with some legal constraints. Nowadays, this system is getting replaced with "SPDX Expressions", which should become the default way to state a license (see [https://reuse.software REUSE.software]). | ||
=== In a Nutshell === | |||
Whenever adding new source to a KDE repository, follow this checklist: | |||
# The file must have a license defined, which shall be one of the licenses in the [[Policies/Licensing_Policy|KDE Licensing Policy]]. | |||
# You should '''add yourself to the copyright holders''' whenever you do a non-trivial contribution to a file. | |||
# The copyright and license statements shall follow the REUSE.software specification (see below) | |||
Thus, your file shall start with a license statement similar to: | |||
<pre> | |||
/* | |||
SPDX-FileCopyrightText: [CURRENT YEAR] [YOUR NAME] <[YOUR MAIL ADDRESS]> | |||
[FURTHER CONTRIBUTORS] | |||
SPDX-License-Identifier: [LICENSE IDENTIFIER EXPRESSION FROM POLICY] | |||
*/ | |||
</pre> | |||
=== REUSE.software === | === REUSE.software === | ||
Line 33: | Line 50: | ||
<pre> | <pre> | ||
/* | /* | ||
SPDX-FileCopyrightText: 2019 Jane Doe <[email protected]> | |||
SPDX-FileCopyrightText: 2019-2020 John Doe <[email protected]> | |||
SPDX-License-Identifier: LGPL-2.1-or-later | |||
*/ | |||
</pre> | </pre> | ||
== License Statements == | == License Statements in Source-Code Files == | ||
For source code files, it is simple: follow the recommendations from [https://reuse.software/spec/ reuse.software] and add a comment at the top of the file that states the required SPDX tags valid for the code of the complete file. It should be at the first possible position for a comment. For example, your file could begin with: | |||
<pre> | <pre> | ||
/* | /* | ||
SPDX-FileCopyrightText: <year> <name> <contact-address> | |||
SPDX-FileCopyrightText: <year> <name> <contact-address> | |||
... | |||
SPDX-License-Identifier: <SPDX-expression> | |||
*/ | |||
</pre> | </pre> | ||
So e.g. with script language files the license statement comment would be directly after any needed initial shebang line or be the first itself. With C or C++ header files the license statement comment would be starting at the very first line, so before the include guard, to have the comment at the same location both in header and source files for easy comparison by humans. | |||
=== SPDX-FileCopyrightText or Copyright Statement === | === SPDX-FileCopyrightText or Copyright Statement === | ||
A copyright statement shall always contain the name of the copyright holder, the year of publication and a contact address. | A copyright statement shall always contain the name of the copyright holder, the year of publication and a contact address. According to the REUSE specification, there are several ways to state it correct. In KDE we prefer, for simplicity the following kind of statement: | ||
# <code>SPDX-FileCopyrightText: 2019 Jane Doe <[email protected]></code> | # <code>SPDX-FileCopyrightText: 2019 Jane Doe <[email protected]></code> | ||
For copyright statements, please | For copyright statements, please ensure: | ||
* Prefer "Copyright" or "SPDX-FileCopyrightText:" to state the copyright holders. | * Prefer "Copyright" or "SPDX-FileCopyrightText:" to state the copyright holders. | ||
Line 101: | Line 115: | ||
For details, check [https://reuse.software/spec/#license-files REUSE.software specification, Section "License Files"]. | For details, check [https://reuse.software/spec/#license-files REUSE.software specification, Section "License Files"]. | ||
=== License Statements in Non-Source-Code Files === | |||
For source code files, it is simple to add a comment at the top of the file that states the required SPDX tags. For files that do not contain the source code this is slightly more complicated: | |||
==== UI Files (*.ui) ==== | |||
You can add copyright info inside <author> tag. | |||
<pre> | |||
<?xml version="1.0" encoding="UTF-8"?> | |||
<ui version="4.0"> | |||
<author> | |||
SPDX-FileCopyrightText: none | |||
SPDX-License-Identifier: GPL-3.0-or-later | |||
</author> | |||
... | |||
</pre> | |||
==== DocBook Files (*.docbook) ==== | |||
For now add an XML comment at the top of the file behind the DOCTYPE tag. This has to duplicate the copyright information given in the respective docbook tags elsewhere. | |||
This could look like: | |||
<pre> | |||
<?xml version="1.0" encoding="utf-8"?> | |||
<!DOCTYPE article PUBLIC "-//KDE//DTD DocBook XML V4.5-Based Variant V1.1//EN" | |||
"dtd/kdedbx45.dtd" [ | |||
]> | |||
<!-- | |||
SPDX-FileCopyrightText: Author <email> | |||
SPDX-License-Identifier: GFDL-1.2-or-later | |||
--> | |||
... | |||
</pre> | |||
==== Other XML Files (appdata.xml, *.qrc) ==== | |||
Just add an XML comment at the top of the file behind the xml opening tag. This could look like: | |||
<pre> | |||
<?xml version="1.0" encoding="utf-8"?> | |||
<!-- | |||
SPDX-FileCopyrightText: none | |||
SPDX-License-Identifier: CC0-1.0 | |||
--> | |||
<component type="desktop"> | |||
... | |||
</pre> | |||
== <span id="license-compatibility"></span>License Compatibility == | |||
Unfortunately not every license can be combined with every other license. This is due to the fact that licenses may contain contradicting requirements that a licensee cannot fulfill at the same time. Thus, it is important to choose licenses that are compatible to each other. | |||
This topic is [https://www.gnu.org/licenses/license-compatibility.html discussed in depth] e.g. by the Free Software Foundation. For the scope of this howto it is usually enough to remember: | |||
* All source code files that are compiled into a binary artifact must be compatible with each other. | |||
* A binary artifact is a (shared) library, a plugin or an executable. | |||
* If you link an application with any GPL license '''dynamically''' to a library that has any LGPL, BSD or MIT license, then everything will be fine. | |||
* For combination of differently licensed files see an [https://dwheeler.com/essays/floss-license-slide.html Essay by David Wheeler], which is forms the base of the license compatibility matrix in the outbound license check generator. | |||
== <span id="license-tooling"></span>License Tooling == | == <span id="license-tooling"></span>License Tooling == | ||
=== | === REUSE Compliance Checking === | ||
As we follow the reuse.software specification, we can use their compliance tool to check the correctness of license statements. The tool is available via https://github.com/fsfe/reuse-tool or can simply be installed via pip: | As we follow the reuse.software specification, we can use their compliance tool to check the correctness of license statements. The tool is available via https://github.com/fsfe/reuse-tool or can simply be installed via pip: | ||
Line 113: | Line 178: | ||
* download : Download the specified license into the LICENSES/ directory. | * download : Download the specified license into the LICENSES/ directory. | ||
* lint : Verify the project for REUSE compliance. | * lint : Verify the project for REUSE compliance. | ||
Please note that a positive result from this tool only tells that license statements are added in a reasonable way and not that the licenses that were chosen are reasonable (see license compatibility). | |||
=== Outbound License Checking === | |||
The outbound license in this context describes the license a specific binary artifact has (contrary to the inbound license, which is the license of the source code files). In order to check that the individual file licenses are compatible with the desired outbound license, [[#license-compatibility|the compatibility of licenses]] has to be checked. | |||
A way to do this easily is to use the Outbound-License-Check Generator in Extra-Cmake-Modules (not yet released, probably part of KF5 5.75). | |||
=== Conversion from Traditional License Headers to SPDX Expressions === | === Conversion from Traditional License Headers to SPDX Expressions === | ||
Line 125: | Line 196: | ||
No. Those files shall be replaced completely by files with a standardized naming and canonical license texts in the <code>LICENSES/</code> subfolder of your repository. All files in that folder shall follow the REUSE specification. | No. Those files shall be replaced completely by files with a standardized naming and canonical license texts in the <code>LICENSES/</code> subfolder of your repository. All files in that folder shall follow the REUSE specification. | ||
[https://phabricator.kde.org/T12730 Discussion took place here.] | [https://phabricator.kde.org/T12730 Discussion took place here.] | ||
=== Which copyright holder shall I add to a CC0-1.0 file? === | |||
For a non-copyrightable file there cannot be a copyright holder. However, the REUSE project expects to make a conscious decision about it. | |||
You can state this by adding a SPDX-FileCopyrightText statement that tells, nobody has copyright on this file, e.g.: | |||
<pre> | |||
SPDX-FileCopyrightText: none | |||
SPDX-License-Identifier: CC0-1.0 | |||
</pre> | |||
The REUSE.software project discusses uncopyrightable software [https://reuse.software/faq/#uncopyrightable in their FAQ]. | |||
=== How do I state the copyright information only for a code/text snippet? === | |||
If some part of a file is differently licensed from the rest, e.g. because it was copied from somewhere else, you might want to explicitly state a certain license only of this code part. A good reason for this might be that you want to simplify later possible license changes of the whole file; if you do not mark the snippet to be licensed differently, the overall license information from the file apply. | |||
When stating a snippet it is always important to clearly mark the beginning and the end of the snipped and precisely add the same information as when documenting the copyright and license constraints for the whole file. | |||
However, as of today (2020-10), it is still under discussion both in the REUSE project and the SPDX specification how to best specify these information. The latest discussion can be found in this mailing list thread [https://lists.spdx.org/g/Spdx-legal/message/2852]). | |||
So, for now, the best advice is to state the snippet's beginning, its end and in between name the copyright holders and the license. | |||
=== How can I best give credit to somebody in a copyright header? === | |||
Copyright statements are sometimes used to give praise for somebody's work or to thank somebody who made it possible that people could work on the file. Such statements are completely fine to put there! Yet, we have a few best practices that make license tooling easier and that also make such statements more readable: | |||
* Split a copyright statement and a "thank you" notice or an additional information into two lines, because license tag scanners then can better find the tagged statements. | |||
* Make it clear to which copyright holders the statement belongs by inserting empty lines inside the copyright header. | |||
An example could look like: | |||
<pre> | |||
SPDX-License-Identifier: LGPL-2.1-or-later | |||
Project idea and initial maintainer: | |||
SPDX-FileCopyrightText: Contributor A <...> | |||
We thank some company XY who sponsored working on this library: | |||
SPDX-FileCopyrightText: Contributor B <...> | |||
SPDX-FileCopyrightText: Contributor C <...> | |||
</pre> | |||
=== I found a license declaration for a GPL license but there is no number, which is it? === | |||
If there is a license statement for a file that states that the file is licensed under the GNU General Public license but does not state any license version number, then you can [http://refer%20to%20§9%20of%20GPL-2.0 https://www.gnu.org/licenses/old-licenses/gpl-2.0.html]. That paragraph states that in such a case, without a stated license version number, any version of the GPL applies. I.e. you can translate such a header with <code>GPL-2.0-or-later</code>. | |||
=== How do I quickly get a list of contributors that have contributed to a certain file? === | |||
For existing projects that have not yet been made SPDX compliant, it should be possible to find all the users that have contributed to a certain file with a bash command or script such as the following: | |||
<pre> | |||
git log --reverse --date=format:"%Y" fileyouwant.txt | grep -e Author -e Date | sed 's/Author: //' | sed 's/Date: //' | paste -sd ' \n' | awk '{l=$NF;$NF=NF-1;print l,$0}' | awk 'NF{NF--};1' | sort --unique | |||
</pre> | |||
Replace the <code>fileyouwant.txt</code> placeholder accordingly. | |||
The above bash command will result in a list of contributors in the format "YEAR Full Name <[email protected]>" separated by a newline, with any duplicates removed. | |||
This only works to a certain extent. If the file was migrated from a previous VCS system to git and history was lost, it is no longer possible to retrieve the necessary information by this means. Similarly, if the file has been moved from one repository to another, it is unlikely that its history was preserved. In that case, you must look through the history of the file to find out its origin, and track the list of contributors from its original repository. |
Latest revision as of 21:52, 17 September 2024
This document explains how to state license information in KDE projects according to the KDE Licensing Policy. Stating licenses correctly is a very important task, because only correctly licensed software can be distributed and delivered to our users.
Introduction
Every source code file shall contain information about its license to inform the user of the file how it may be used. Such information is usually added in the form of source code comments at the top of the file. Traditionally, those comments were formulated as so-called "license headers", which are either the complete license texts or short texts that explain where to find the license text for the file along with some legal constraints. Nowadays, this system is getting replaced with "SPDX Expressions", which should become the default way to state a license (see REUSE.software).
In a Nutshell
Whenever adding new source to a KDE repository, follow this checklist:
- The file must have a license defined, which shall be one of the licenses in the KDE Licensing Policy.
- You should add yourself to the copyright holders whenever you do a non-trivial contribution to a file.
- The copyright and license statements shall follow the REUSE.software specification (see below)
Thus, your file shall start with a license statement similar to:
/* SPDX-FileCopyrightText: [CURRENT YEAR] [YOUR NAME] <[YOUR MAIL ADDRESS]> [FURTHER CONTRIBUTORS] SPDX-License-Identifier: [LICENSE IDENTIFIER EXPRESSION FROM POLICY] */
REUSE.software
REUSE.software is an initiative by the Free Software Foundation Europe (FSFE), which provides recommendations to make licensing easier. Their guidelines state how to use SPDX identifier to easier license statements in source files. When following their recommendations, the correct statement of license information can be tested by the "reuse" Python tool, which checks the syntactical correctness of the license statements and the overall conformance with the REUSE specification:
SPDX Identifiers and Expressions
Software Package Data Exchange® (SPDX) is an open standard for communicating software bill of material information. The SPDX specification is developed by the SPDX workgroup, which is hosted by The Linux Foundation. The idea is to have a public registry of all open source licenses and important license exceptions, such that license statements can be reduced to simply stating short license identifiers.
Complex Expressions
Not every source code is licensed under just one license. For example, you might want to state that a file can be used under the terms for the BSD-2-Clause license or under the terms of the GNU Public License version 2 or later. For these cases, the SPDX workgroup also provides a specification how state complex license statements (which we call "SPDX expressions"). The SPDX expression language also allows tooling based syntax checks, which enables us to use tools to check the correctness of license statements.
For SPDX expressions that do not only consist of one SPDX identifier, the following keywords can be used (for details, see SPDX specification):
- OR
- AND
- WITH
The preference order of these operators is as stated above (cf. SPDX Specification, Appendix IV), where a lower order operator is applied before a higher order operator. For example, for GPL-3.0-only OR LGPL-2.1-only_WITH_Qt-LGPL-exception-1.1
the Qt-LGPL-exception-1.1 applies only when using the code under the LGPL-2.1 license.
SPDX Expression Examples
A REUSE compliant license statement always has the form to first state the copyright holders and then to specify the license or licenses under which the source code can be used. The following example states that both Jane Doe and John Doe hold copyrights of the source code and that it can be used under the Lesser GNU Public License version 2.1 or any later version of this license:
/* SPDX-FileCopyrightText: 2019 Jane Doe <[email protected]> SPDX-FileCopyrightText: 2019-2020 John Doe <[email protected]> SPDX-License-Identifier: LGPL-2.1-or-later */
License Statements in Source-Code Files
For source code files, it is simple: follow the recommendations from reuse.software and add a comment at the top of the file that states the required SPDX tags valid for the code of the complete file. It should be at the first possible position for a comment. For example, your file could begin with:
/* SPDX-FileCopyrightText: <year> <name> <contact-address> SPDX-FileCopyrightText: <year> <name> <contact-address> ... SPDX-License-Identifier: <SPDX-expression> */
So e.g. with script language files the license statement comment would be directly after any needed initial shebang line or be the first itself. With C or C++ header files the license statement comment would be starting at the very first line, so before the include guard, to have the comment at the same location both in header and source files for easy comparison by humans.
SPDX-FileCopyrightText or Copyright Statement
A copyright statement shall always contain the name of the copyright holder, the year of publication and a contact address. According to the REUSE specification, there are several ways to state it correct. In KDE we prefer, for simplicity the following kind of statement:
SPDX-FileCopyrightText: 2019 Jane Doe <[email protected]>
For copyright statements, please ensure:
- Prefer "Copyright" or "SPDX-FileCopyrightText:" to state the copyright holders.
- State the copyright information in the order: year, name, contact address.
- Any contact address should be stated in angle brackets.
- The year of publication can be a single year, multiple years, or a span of years.
SPDX-License-Identifier Statement
The SPDX-License-Identifier
tag must be followed by a valid SPDX License Expression.
Here are several example statements from the KDE project. For the list of all allowed licenses in the KDE project, please see the KDE Licensing Policy:
SPDX-License-Identifier: LGPL-2.1-or-later
SPDX-License-Identifier: LGPL-2.1-only OR LGPL-3.0-only OR LicenseRef-KDE-Accepted-LGPL
SPDX-License-Identifier: LGPL-2.1-only WITH Qt-LGPL-exception-1.1
SPDX-License-Identifier: GPL-2.0-only OR GPL-3.0-only OR LicenseRef-KDE-Accepted-GPL
SPDX-License-Identifier: GPL-2.0-or-later
SPDX-License-Identifier: GPL-3.0-or-later
SPDX-License-Identifier: MIT
SPDX-License-Identifier: BSD-2-Clause
License Texts
Each repository shall contain a folder LICENSES/
in the root of the repository. In this folder, there shall be a license file for all (and only for those!) SPDX Identifiers that are used inside the project. The license file must be in plain text and state the license text.
For all SPDX identifiers and exception identifiers that are listed in the SPDX registry, use the https://github.com/fsfe/reuse-tool to download the correct license test with:
reuse download <IDENTIFIER>
For example, in order to download the "LGPL-2.1-or-later" license file use the command reuse download LGPL-2.1-or-later
, which places the file into the LICENSES/
folder of your project.
All SPDX identifiers that start with "LicenseRef-" are custom identifiers (cf. SPDX Specification, Section 6), which are not listed in the SPDX registry. The correct license file contents for
- LicenseRef-KDE-Accepted-LGPL
- LicenseRef-KDE-Accepted-GPL
are listed in the KDE Licensing Policy.
For details, check REUSE.software specification, Section "License Files".
License Statements in Non-Source-Code Files
For source code files, it is simple to add a comment at the top of the file that states the required SPDX tags. For files that do not contain the source code this is slightly more complicated:
UI Files (*.ui)
You can add copyright info inside <author> tag.
<?xml version="1.0" encoding="UTF-8"?> <ui version="4.0"> <author> SPDX-FileCopyrightText: none SPDX-License-Identifier: GPL-3.0-or-later </author> ...
DocBook Files (*.docbook)
For now add an XML comment at the top of the file behind the DOCTYPE tag. This has to duplicate the copyright information given in the respective docbook tags elsewhere. This could look like:
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE article PUBLIC "-//KDE//DTD DocBook XML V4.5-Based Variant V1.1//EN" "dtd/kdedbx45.dtd" [ ]> <!-- SPDX-FileCopyrightText: Author <email> SPDX-License-Identifier: GFDL-1.2-or-later --> ...
Other XML Files (appdata.xml, *.qrc)
Just add an XML comment at the top of the file behind the xml opening tag. This could look like:
<?xml version="1.0" encoding="utf-8"?> <!-- SPDX-FileCopyrightText: none SPDX-License-Identifier: CC0-1.0 --> <component type="desktop"> ...
License Compatibility
Unfortunately not every license can be combined with every other license. This is due to the fact that licenses may contain contradicting requirements that a licensee cannot fulfill at the same time. Thus, it is important to choose licenses that are compatible to each other.
This topic is discussed in depth e.g. by the Free Software Foundation. For the scope of this howto it is usually enough to remember:
- All source code files that are compiled into a binary artifact must be compatible with each other.
- A binary artifact is a (shared) library, a plugin or an executable.
- If you link an application with any GPL license dynamically to a library that has any LGPL, BSD or MIT license, then everything will be fine.
- For combination of differently licensed files see an Essay by David Wheeler, which is forms the base of the license compatibility matrix in the outbound license check generator.
License Tooling
REUSE Compliance Checking
As we follow the reuse.software specification, we can use their compliance tool to check the correctness of license statements. The tool is available via https://github.com/fsfe/reuse-tool or can simply be installed via pip:
pip3 install reuse
For details, see the README.md file. The most important options are:
- download : Download the specified license into the LICENSES/ directory.
- lint : Verify the project for REUSE compliance.
Please note that a positive result from this tool only tells that license statements are added in a reasonable way and not that the licenses that were chosen are reasonable (see license compatibility).
Outbound License Checking
The outbound license in this context describes the license a specific binary artifact has (contrary to the inbound license, which is the license of the source code files). In order to check that the individual file licenses are compatible with the desired outbound license, the compatibility of licenses has to be checked. A way to do this easily is to use the Outbound-License-Check Generator in Extra-Cmake-Modules (not yet released, probably part of KF5 5.75).
Conversion from Traditional License Headers to SPDX Expressions
In KDE, we have a tool called license-digger, which is prepared specifically for the license headers typically used in KDE projects. You can simply run it over a repository that you want to convert, review the changes it did and create a merge request.
For details about usage and extending it for not detected headers, see its README.md.
Frequently Asked Questions
Shall I add a COPYING or COPYING.LIB file to my repository?
No. Those files shall be replaced completely by files with a standardized naming and canonical license texts in the LICENSES/
subfolder of your repository. All files in that folder shall follow the REUSE specification.
Discussion took place here.
Which copyright holder shall I add to a CC0-1.0 file?
For a non-copyrightable file there cannot be a copyright holder. However, the REUSE project expects to make a conscious decision about it. You can state this by adding a SPDX-FileCopyrightText statement that tells, nobody has copyright on this file, e.g.:
SPDX-FileCopyrightText: none SPDX-License-Identifier: CC0-1.0
The REUSE.software project discusses uncopyrightable software in their FAQ.
How do I state the copyright information only for a code/text snippet?
If some part of a file is differently licensed from the rest, e.g. because it was copied from somewhere else, you might want to explicitly state a certain license only of this code part. A good reason for this might be that you want to simplify later possible license changes of the whole file; if you do not mark the snippet to be licensed differently, the overall license information from the file apply. When stating a snippet it is always important to clearly mark the beginning and the end of the snipped and precisely add the same information as when documenting the copyright and license constraints for the whole file. However, as of today (2020-10), it is still under discussion both in the REUSE project and the SPDX specification how to best specify these information. The latest discussion can be found in this mailing list thread [1]). So, for now, the best advice is to state the snippet's beginning, its end and in between name the copyright holders and the license.
How can I best give credit to somebody in a copyright header?
Copyright statements are sometimes used to give praise for somebody's work or to thank somebody who made it possible that people could work on the file. Such statements are completely fine to put there! Yet, we have a few best practices that make license tooling easier and that also make such statements more readable:
- Split a copyright statement and a "thank you" notice or an additional information into two lines, because license tag scanners then can better find the tagged statements.
- Make it clear to which copyright holders the statement belongs by inserting empty lines inside the copyright header.
An example could look like:
SPDX-License-Identifier: LGPL-2.1-or-later Project idea and initial maintainer: SPDX-FileCopyrightText: Contributor A <...> We thank some company XY who sponsored working on this library: SPDX-FileCopyrightText: Contributor B <...> SPDX-FileCopyrightText: Contributor C <...>
I found a license declaration for a GPL license but there is no number, which is it?
If there is a license statement for a file that states that the file is licensed under the GNU General Public license but does not state any license version number, then you can https://www.gnu.org/licenses/old-licenses/gpl-2.0.html. That paragraph states that in such a case, without a stated license version number, any version of the GPL applies. I.e. you can translate such a header with GPL-2.0-or-later
.
How do I quickly get a list of contributors that have contributed to a certain file?
For existing projects that have not yet been made SPDX compliant, it should be possible to find all the users that have contributed to a certain file with a bash command or script such as the following:
git log --reverse --date=format:"%Y" fileyouwant.txt | grep -e Author -e Date | sed 's/Author: //' | sed 's/Date: //' | paste -sd ' \n' | awk '{l=$NF;$NF=NF-1;print l,$0}' | awk 'NF{NF--};1' | sort --unique
Replace the fileyouwant.txt
placeholder accordingly.
The above bash command will result in a list of contributors in the format "YEAR Full Name <[email protected]>" separated by a newline, with any duplicates removed.
This only works to a certain extent. If the file was migrated from a previous VCS system to git and history was lost, it is no longer possible to retrieve the necessary information by this means. Similarly, if the file has been moved from one repository to another, it is unlikely that its history was preserved. In that case, you must look through the history of the file to find out its origin, and track the list of contributors from its original repository.