Open Source Publishing

External Packages

External Packages

The vast collection of available software is one core strenght of Python. The standard library is extensive and covers a wide range of functionalities, but it does not encompass everything a developer might need.

External packages like numpy, pandas, scipy, or scikit-learn not only

  • save development time extending Python's capabilities, but also
  • provide optimized and well-tested functionalities, enhancing overall productivity.

💡 Do not reinvent the wheel, use external packages!

External Packages

⚠️ In what follow by "package" we refer to "distribution package" as a collection of software to be installed. This is different from an "import package" which is a container of modules. For more details, read this discussion.

⚠️ In the following we assume Python to be version 3.4 or above. If not, some details might change.

External Packages

Installing External Packages

The simple way to install an external package is via pip, Python's official package installer.
pip comes bundled with Python and is the standard way to install packages from the Python Package Index (PyPI).

Ensure that pip is available:

python3 -m pip --version

😭 Not the case? Check here.

External Packages
pip install numpy

The above download from PyPI and installs

  • the latest compatible version of numpy and
  • all its dependencies

in a "global" location (e.g. in /usr/lib/).

⚠️ In shared environment users might not have the permission for "global" installation. Installing packages in user mode with the --user flag avoids this issue:

pip install --user numpy
External Packages

If a specific version is required:

pip install numpy==1.26.2

Or a range of versions:

pip install "numpy>=1.20,<2"

💡 When installing a package, pip takes care of its dependencies for you!

External Packages

Once installed, a package can be upgraded:

pip install --upgrade numpy

or uninstalled:

pip uninstall numpy

To list all the installed packages:

pip list
Virtual Environments

Virtual Environments

Packages might have dependencies to other packages, usually with strict version requirements. When installing multiple packages, version conflicts in the dependencies can arise.

Virtual environments are extremly helpful when working on multiple projects, as they manage project-specific dependencies.

By isolating your project's requirements, virtual environments

  • prevent version conflicts with other projects
  • enhance reproducibility of the results
  • simplify maintaining and managing dependencies
  • improve portability of your software stack
Virtual Environments

With the built-in venv module, creating a virtual environment extremely easy:

python3 -m venv myenv

The command will create a myenv directory in the current directory which will be used as installation location.

💡 You can also specify a different location:

python3 -m venv /path/to/myenv

💡 There are other external tools for dependecy management: poetry and conda are popular ones. For simplicity, the course focuses only on the combination of pip and venv.

Virtual Environments

Once created, the environment can be activated with

$ source myenv/bin/activate

💡 On activation (myenv) will be appear next to your promt

Inside a virtual environment packages can be managed with pip normally:

(myenv)$ pip install pkg1 pkg2==0.0.1 "pkg3>1,<2"

Once done, deactivate the environment with deactivate.

Virtual Environments

It is good practice to store the dependencies of a project in a requirements.txt file:

(myenv)$ pip freeze > requirements.txt

The packages listed in requirements.txt can be installed in a new environment new_env simply by

(new_env)$ pip install -r requirements.txt

💡 The requirements.txt file allows collaborators to replicate the project's environment.

Virtual Environments

Best practices

  • Use a virtual environment for each project to keep dependencies isolated.
  • Avoid installing unnecessary packages globally to maintain a clean environment.
  • Use and commit a requirements.txt file to track project dependencies.
  • Do not commit your virtual environment directory (e.g., myenv/) to Git!

💡 You can add the virtual environment directory to the .gitignore file to prevent the virtual environment from being tracked.

Open Source Software

Open Source Software

🤔 What is Open Source Software?

Open source software is software with source code that anyone can inspect, modify, and enhance.
from opensource.com

Open Source Software vs. Free Software

Free software is open source, but not vice versa.

  • Free software: emphasis on the freedom to do anything with the software
  • Open source software: emphasis on the collaborative development process
Open Source Software

Open access policy of the Max Planck Society

Making its scientists' research findings available for the benefit of the whole of humanity, free of charge whenever possible (Open Access), is a key aspiration of the Society.
from the Max Planck Open Access Policy

Generally speaking, all completed research results financed predominantly by public funds must be published in suitable, independent scientific media in good time. In particular, this includes making use of the opportunities offered by open access publications.
from the Rules of conduct for good scientific practice, Section 2.5

Publish your code

Publish your code

🤔 Why publish your code?

  • Present code related to a scientific publication
  • Get a review of your code by other experts
  • Let others reuse or extend your code to address further scientific questions
  • Request help for fixing bugs or implementing new features
Publish your code

Quasi-mandatory:

  • README.md: overview of the project
  • LICENSE or COPYING: copy and paste from license website
  • pyproject.toml: meta-data for the Python project

Optional:

  • CHANGELOG.md: high-level summary of changes in every release
  • CODE_OF_CONDUCT.md: how to interact among contributors (see, e.g., here)
  • CONTRIBUTING.md: how others can contribute, report bugs, propose changes
  • INSTALL.md: how to build the code
Publish your code

Readme file

💡 Even the best code is useless without a Readme file.

The Readme file should cover the following information:

  • Brief description of the code and its features
  • Usage information (with examples, screenshots, or plots)
  • Build and installation instructions (including a list of build dependencies)
  • Project status or roadmap (whether and how it is maintained)
  • Related projects (optional)
  • Known issues (optional)
  • License

Find some inspiration here.

Publish your code

License file

🚫 Disclaimer: We do not provide legal advice here.

Default case: No license

  • Code remains under copyright by law
  • No permission to use or modify the code

⚠️ Always choose a license and add a license file!

Publish your code

How to license your code

  1. Ask contributors for permission
  2. Select a suitable license (see, e.g., choosealicense.com)
  3. Check compatibility with all included libraries
  4. Add the license file to your repository: LICENSE or COPYING
  5. Optional: Add a license header to all source files

💡 The MIT license is a short and simple permissive license with conditions only requiring preservation of copyright and license notices.

Publish your code

CONTRIBUTING.md file

  • Document the Git workflow
  • Explain when to open a new issue and what information must be given
  • Mention whether you are open to merge requests (bugfixes and/or features?)
  • Outline the coding conventions and testing strategies
  • Add links to the communication channels
Publish your code

Software forges

💡 A forge is a collaborative development platform hosting repositories.

Many different software forges exist.

  • Most of them support Git
  • Nearly all provide merge requests, an issue tracker, and a wiki
  • Many are run by companies
  • Some can be self-hosted
Publish your code

Public: GitHub

  • Most popular option
  • Many features in the free account
  • Collaborators need a GitHub account
  • Owned by Microsoft Corp.
Publish your code

Public: GitLab

  • Quite the same functionality as GitHub
  • Open source itself
  • Free community edition for self-hosting
  • Owned by GitLab Inc.
Publish your code

MPG: GitLab @ MPCDF

  • All repositories stored in-house
  • All features are free
  • Shared runners for pipelines (various hardware, access to GPUs)
  • Docker images with environment modules (module load ...)
  • Collaborators need an MPCDF account (invite via Self Service)
Publish your code

Other options

Package your project

Package your project

🤔 Why making a package out of a project?

Allow others to easily install and use the project

Package your project

Packaging a Python project

From a high level point of view to package a Python project means to prepare its components (source code, related data, and configuration) to be built into a distribution format:

  • source distributions (sdists)
  • binary distributions (wheels)

Once the package has been built into a distribution format it can be installed via pip or uploaded to a package index like PyPI.

Package your project

Anatomy of a Python project

The structure of the project myproj ready for distribution could look like this:

myproj/
├── docs/                      # documentation
├── src/                       # source code directory
│   └── mypkg/                 # package directory
│       ├── __init__.py
│       ├── module1.py
│       └── utils/
│           ├── __init__.py
│           └── helper.py
├── tests/                     # tests directory
│   └── test_module1.py
├── LICENSE                    # license file
├── pyproject.toml             # meta data
├── README.md                  # brief project documentation
└── requirements.txt           # dependencies
Package your project

To distribute your Python project:

  1. Organize the import packages
  2. Define the build configuration and dependencies in pyproject.toml
  3. [Recommended] Add tests
  4. [Recommended] Add documentation
  5. [Recommended] Add a license
  6. Generate distribution archives
  7. [Optional] Upload to a package index (e.g., PyPI)
Package your project

Organize the import packages

There are 2 layouts for the import packages:

  • flat layout: config files and import packages are all in the top-level directory
    myproj/
    ├── mypkg/                       # package directory
    │   ├── __init__.py
    
  • src layout: import packages are in a subdirectory (typically called src)
    myproj/
    ├── src/                         # source code directory
    │   └── mypkg/                   # package directory
    │       ├── __init__.py
    

⚠️ To run the code, the src layout requires the installation of the project, and the flat layout does not.

Package your project

Project configuration

In modern Python projects, metadata and build configurations are defined in the pyproject.toml file.

There are three possible sections (TOML tables) in this file serving different purposes:

  • [build-system] - build settings
  • [project] - project metadata and dependencies
  • [tool] - external tools configurations
Package your project

Project configuration - [build-system]

The [build-system] table is strongly recommended as it defines the build backend to use.

For example, to use setuptools:

[build-system]
requires = ["setuptools >= 77.0.3"]
build-backend = "setuptools.build_meta"

⚠️ Both of these values will be provided by the documentation for your build backend. There should be no need for you to customize these settings.

💡 There are other build backends available like hatchling or uv_build.

Package your project

Project configuration - [project]

The [project] table specifies the project’s basic metadata:

[project]
name = "myproj-unique-name"
version = "1.2.3"
authors = [
  { name="Max Planck", email="max.planck@mpg.de" },
]
description = "A small example package"
readme = "README.md"
requires-python = ">=3.9"
dependencies = ["other-pkg"]
classifiers = [
    "Programming Language :: Python :: 3",
    "Operating System :: OS Independent",
]
license = "MIT"
license-files = ["LICEN[CS]E*"]

[project.urls]
Homepage = "https://github.com/user/myproj"
Package your project

Project configuration - [project]

[project]
name = "myproj.unique-name"

This is the distribution name of your package. It is a required field and must only consist of ASCII letters, digits, underscores _, hyphens - and periods ..

⚠️ The distribution name is case insensitive and should be unique in the package index!

Package your project

Project configuration - [project]

[project]
version = "1.2.3"

This field is also required and defines the package version.

💡 There are mainly 2 versioning schemes:

  • Semantic uses 3-part version numbers, major.minor.patch
  • Calendar typically takes the form year.month
Package your project

💡 Depending on the build backend the version can be defined dynamically from a package attribute, a file, or from a git tag.

For example, if the __init__.py file defines a __version__ attribute, setuptool can be configured to read it from there:

[project]
dynamic = ["version"]
# ...
[tool.setuptools.dynamic]
version = {attr = "mypkg.__version__"}
Package your project

Project configuration - [project]

[project]
dependencies = [
  "numpy>=1.23.5; python_version<'3.12'",
  "numpy>=1.26.0; python_version>='3.12'",
  "pandas",
]

💡 You can make some dependencies optional with optional-dependencies:

[project.optional-dependencies]
plot = ["matplotlib"]
dev = [
    "matplotlib",
    "pre-commit",
]
 pip install myproj-unique-name[dev]
Package your project

Project configuration - [tool]

The [tool] table is composed by tool-specific subtables, for example:

[tool.setuptools.dynamic]
version = {attr = "mypkg.__version__"}

[tool.ruff]
line-length = 88

[tool.ruff.lint]
extend-select = ["I", "W505"]

[tool.ruff.lint.pycodestyle]
max-doc-length = 88
Package your project

Install your package locally

Once pyproject.toml is defined, the project can be installed via

pip install path/to/myproj

setuptool will find and install (copy into the default site-packages directory) all import packages in the src directory.

💡 In case of a flat layout, setuptool looks for import packages in the top-level directory.

⚠️ Any Python module not contained in the src directory will not be installed!

Package your project

Install your package locally

During development, an editable installation can be useful to implement and test changes iteratively:

pip install -e path/to/myproj

🤓 With an editable installation, the import packages are not copied to site-packages instead a link to src is created.
💡 Changes in the Python source code become effective immediately without requiring a new installation.

Package your project

Build distribution archives

To generate the distribution archives for the package run the following from the directory where pyproject.toml is located:

python3 -m pip install --upgrade build
python3 -m build

The above command will generate a directory dist/:

dist/
├── myproj-unique-name-1.2.3-py3-none-any.whl
└── myproj-unique-name-1.2.3.tar.gz
Package your project

Build distribution archives

You can extract and inspect the content of the

  • Wheel

    unzip myproj-unique-name-1.2.3-py3-none-any.whl
    ls mypkg/
    

    💡 It contains the import packages only!

  • Source distribution

    tar -xf myproj-unique-name-1.2.3.tar.gz
    ls myproj-unique-name-1.2.3/
    

    💡 It contains src, test, README.md, LICENSE, pyproject.toml, etc.

Package your project

Upload to PyPI

Install twine

python3 -m pip install --upgrade twine

Upload to TestPyPI:

python3 -m twine upload --repository testpypi dist/*

💡 Before uploading to TestPyPI, you will need to create an account and generate an API token.