Previous slide Next slide Toggle fullscreen Open presenter view
Git for collaborative project
Git for collaborative project
Different layers of complexity
Different layers of complexity
Tracking the changes in a βlinearβ fashion
Tracking the changes in a βlinearβ fashion**
Ideal for projects where the codebase is simple and one person is working on the project , or there is a minimal need for collaboration .
All work is done on a single branch, usually the main branch.
Every commit is added directly to the tip of the main branch, creating a linear history.
Since there is only one branch, there is no need to deal with branches or merging.
Typical workflow : change/add/delete files β git add β git commit β git push
Workflow with branches and Merge Requests
Workflow with branches and Pull Requests
This workflow involves creating separate branches for different features, bug fixes, or experiments.
Changes are reviewed and integrated into the main branch through merge requests (also know as Pull Requests ).
Workflow with branches and Merge Requests
Use-cases :
Collaborative projects : Ideal for teams where multiple developers work on different features simultaneously.
Complex projects : Suitable for projects where features are developed independently and integrated later.
Workflow with branches and Merge Requests
Workflow Example
Create a branch : Create a new branch for a feature or bugfix using git checkout -b feature/new-feature.
Introduce changes and commit them: change/add/delete files β git add β git commit -m "commit message"
Push the branch : Push the branch to the remote repository using git push origin feature/new-feature
Create a Merge Request : Open a merge request to merge the branch into the main branch.
Code review : Review the changes in the merge request, and discuss or request changes if necessary.
Merge : Once approved, merge the merge request into the main branch.
Delete the branch : Optionally, delete the feature branch after merging.
Workflow with branches and Merge Requests
Forking repositories:
A fork is a copy of a repository that lives on your GitLab/Github account.
It allows you to freely experiment and make changes without affecting the original repository .
Forks are essential when you want to contribute to a project but do not have write access to the original repo.
When you "fork" a project, Gitlab/Github will make a copy of the project that is entirely yours; it lives in your namespace, and you can push to it.
Workflow with branches and Merge Requests
How to use forking to contibute to an open-source project:
Fork the original repository on GitHub/GitLab.
Clone your fork to your local machine:
Create a topic branch from main branch.
Make some commits to improve the project.
Open a Merge Request.
Discuss, and optionally continue committing.
The project owner merges or closes the Merge Request.
Sync the updated main branch back to your fork.
Branching - Basic concepts
Mastering Branching and Merging
Branching
Branching means that you diverge from the main line of development and continue to do work without messing with that main line (or any other line of development that exists):
Each branch maintains its own independent history .
Changes in one branch donβt impact other branches until you explicitly merge them.
Branch can be created at any time and from any other branch (usually from a βparentβ branch like main or dev).
An analogy: branches are like alternative timelines.
Merging & merge conflicts
Merging
Merging & merge conflicts
Merge conflicts
What is a merge conflict?
Merge conflicts occur when competing changes contradict one another
Merging & merge conflicts
When can a merge conflict occur?
When more than one person changes the same line in a file and tries to merge the change to the same branch.
When a developer deletes a file, but another developer edits it, and they both try to merge their changes to the same branch.
When a developer deletes a line, but another developer edits it, and they both try to merge their changes to the same branch.
etc...
Merging & merge conflicts
How to avoid merge conflicts:
Isolate changes:
Change code in small, isolated steps
merge the changes into the main branch frequently.
Minimize overlap:
Avoid simultaneous edits on the same file or lines.
If you need to make large changes or refactor shared code, communicate with the team, and try to schedule it for a time when fewer people are working on those files.
Regularly pull from the main branch/branch you're working on in a collaborative fashion.
Branching and merging in practice
Branching and merging in practice
Now, let's see this in practice in our repo:
First, letβs check the branch weβre currently at by executing git branch:
Create a branch called feature/eval and switch into that branch by running git checkout -b feature/eval:
$ git checkout -b feature/eval
Switched to a new branch 'feature/eval'
Now, if weβll check on which branch we are, weβll see that weβve switched to the feature/eval branch:
$ git branch
* feature/eval
main
Branching and merging in practice
Branching and merging in practice
Now, letβs submit the changes as usual:
$ git add src/eval.py
$ git commit -m "draft of evaluation"
[feature/eval 7134380] draft of evaluation
1 file changed, 10 insertions(+)
create mode 100644 src/eval.py
Branching and merging in practice
Finally, letβs push our changes:
$ git push --set-upstream origin feature/eval
Enumerating objects: 10, done .
Counting objects: 100% (10/10), done .
Delta compression using up to 16 threads
Compressing objects: 100% (6/6), done .
Writing objects: 100% (7/7), 750 bytes | 750.00 KiB/s, done .
Total 7 (delta 1), reused 0 (delta 0), pack-reused 0
remote:
remote: To create a merge request for feature/eval, visit:
remote: https://gitlab.com/mockusername/cnn_training/-/
merge_requests/new?merge_request%5Bsource_branch%5D=feature%2Feval
remote:
To gitlab.com:mockusername/cnn_training.git
* [new branch] feature/eval -> feature/eval
Branch 'feature/eval' set up to track remote branch 'feature/eval' from 'origin' .
Branching and merging in practice
Note that we run git push --set-upstream origin feature/eval instead of the usual git push.
If we were to run the normal git push, we would have received an error like that:
$ git push
fatal: The current branch feature/eval has no upstream branch.
To push the current branch and set the remote as upstream, use
git push --set-upstream origin featureΒ /eval
thatβs because the branch you're trying to push (feature/eval) doesn't yet βexistβ in the remote repository
so, when you create a new branch locally , Git doesn't automatically know where to push it on the remote repository unless you explicitly specify it.
After doing git push --set-upstream origin feature/eval once , future git push and git pull commands will work without these additional specifications because Git will now know the upstream branch.
Branching and merging in practice
Now, letβs examine our git repository:
Branch main
Branch feature/eval
Youβll see that although in the main branch, only dataset.py and train.py scripts exist, on the feature/eval branch, we also have the newly created eval.py script.
Branching and merging in practice
Letβs now merge the branch back to the main!
Branching and merging in practice
Branching and merging in practice
Branching and merging in practice
Letβs talk about some of the non-self-explanatory fields:
An assignee : a person who was working on the feature/issue and who is in charge of merging that pull request after getting comments and change requests from other maintainers.
A reviewer : someone you want to review the code.
Branching and merging in practice
Squash commits :
Squash takes all the commits in the branch (F, G, H) and melds them into 1 commit. That commit is then added to the history, but none of the commits that made up the branch are preserved:
Branching and merging in practice
Reviewing the changes:
Once the merge request is created, ask someone to review your changes before merging them into the desired branch.
Branching and merging in practice
Guidline for reviewing the changes:
Be kind: review the code, not the person.
Donβt just criticize - give guidance: donβt just highlight what is wrong; suggest how it could be improved.
Explain your reasoning: make the βwhyβ behind suggestions clear.
Label comment severity: distinguish blocking issues from non-blocking suggestions or personal preferences.
Accept good explanations: if the author justifies a decision well, donβt force a change.
Branching and merging in practice
Branching and merging in practice
Awesome, now youβve applied all the changes to the main branch and it also has your eval script:
Branching: Useful commands
Switch into an existing local branch branch_name : git checkout branch_name
Switching into a branch that was created globally but doesnβt yet exist locally:
Fetch and check branch name, first fetching via git fetch, and then showing all branches including the remote ones: git branch -a
$ git branch
*feat/slides
master
$ git fetch
$ git branch -a
* feat/slides
master
remotes/origin/HEAD -> origin/master
remotes/origin/feat/ex2
remotes/origin/feat/slides
remotes/origin/master
checkout into the remote branch:
either with checkout: git checkout -b 'feat/ex2' 'remotes/origin/feat/ex2'
or with switch: git switch -c feat/ex2 remotes/origin/feat/ex2
Branching: Useful commands
Show the history of changes: git log:
As the output is quite verbose, you can define an alias for this command with various parameters to "prettify" your output.
For that, run the following command:
git config --global alias.lg "log --all --graph \
--pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset \
%s %Cgreen(%an, %ad)%Creset' --abbrev-commit"
Now, executing git lg in our repo will produce the following output:
* 4bcaf7a - (HEAD -> main, origin/main, origin/HEAD)
Merge branch 'feature/eval' into 'main' (Test User, Fri May 23 14:25:37 2025 +0000)
|\
| * 8c3d83c - (origin/feature/eval, feature/eval) draft of evaluation
(John Doe, Fri May 23 16:19:36 2025 +0200)
| * 39ccc3f - gitignore added (John Doe, Fri May 23 16:18:34 2025 +0200)
|/
* d1f690b - added the dataset handling (John Doe, Fri May 23 13:02:06 2025 +0200)
* 4334cba - draft of train script (John Doe, Fri May 23 11:29:47 2025 +0200)
* 8dafb80 - Initial commit (Test User, Fri May 23 08:02:05 2025 +0000)
Good practices: Branches naming conventions
Branches naming conventions
E.g. as listed here :
feature/ or feat/ : For developing new features,
bugfix/ of fix/ : To fix bugs in the code. Often created associated to an issue.
hotfix/ : To fix critical bugs in production.
docs/ : Used to write, modify, or correct documentation.
Good practices: pre-commit hooks
pre-commits:
What are pre-commit hooks?
A Git mechanism that runs specified code before committing the changes.
Why use them?
Enforces consistent coding standards across the team.
Prevents common mistakes and errors before they enter the codebase.
What can they do:
Validate the code
Check for formatting errors
Perform custom checks based on project needs
Reject commits if issues are detected
Pre-commit hooks in practice
When you run git commit, Git invokes the .git/hooks/pre-commit executable on the files you staged
If any of the checks fail, then the commit is aborted.
Pre-commit hooks in practice
When you run git commit, Git invokes the .git/hooks/pre-commit executable on the files you staged
If any of the checks fail, then the commit is aborted.
Pre-commit hooks in practice
When you run git commit, Git invokes the .git/hooks/pre-commit executable on the files you staged
If any of the checks fail, then the commit is aborted.
Pre-commit hooks in practice
When you run git commit, Git invokes the .git/hooks/pre-commit executable on the files you staged
If any of the checks fail, then the commit is aborted.
Pre-commit hooks in practice
When you run git commit, Git invokes the .git/hooks/pre-commit executable on the files you staged
If any of the checks fail, then the commit is aborted.
Pre-commit hooks in practice
When you run git commit, Git invokes the .git/hooks/pre-commit executable on the files you staged
If any of the checks fail, then the commit is aborted.
Pre-commit hooks in practice
When you run git commit, Git invokes the .git/hooks/pre-commit executable on the files you staged
If any of the checks fail, then the commit is aborted.
Pre-commit hooks in practice
When you run git commit, Git invokes the .git/hooks/pre-commit executable on the files you staged
If any of the checks fail, then the commit is aborted.
Pre-commit hooks in practice
When you run git commit, Git invokes the .git/hooks/pre-commit executable on the files you staged
If any of the checks fail, then the commit is aborted.
Pre-commit hooks in practice
When you run git commit, Git invokes the .git/hooks/pre-commit executable on the files you staged
If any of the checks fail, then the commit is aborted.
Pre-commit hooks in practice
When you run git commit, Git invokes the .git/hooks/pre-commit executable on the files you staged
If any of the checks fail, then the commit is aborted.
Pre-commit hooks in practice
When you run git commit, Git invokes the .git/hooks/pre-commit executable on the files you staged
If any of the checks fail, then the commit is aborted.
Pre-commit hooks in practice
When you run git commit, Git invokes the .git/hooks/pre-commit executable on the files you staged
If any of the checks fail, then the commit is aborted.
Pre-commit hooks in practice
Python package pre-commit:
An interface that simplifies the creation and use of pre-commit hooks.
Works with any project, not just Python-based projects.
No need to write custom hooks β simply select pre-existing ones!
Pre-commit hooks in practice
Let's create simple pre-commit hooks
Install python's pre-commit package:
pip install pre-commit
Create a .pre-commit-config.yaml in your project directory:
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.5.0
hooks:
- id: ruff
args: [ --fix ]
- id: ruff-format
Pre-commit hooks in practice
Set up the git hook scripts:
pre-commit install
(Optional) Check out how it works by running it againts all files:
pre-commit run --all-files
Pre-commit hooks in practice
Defining .pre-commit-config.yaml file
Pre-commit hooks in practice
Bypassing pre-commit hooks
If necessary, you can bypass pre-commit hooks during a commit:
git commit --no-verify -m "<your commit message>"
Use --no-verify only when absolutely necessary , such as during emergency fixes or when hooks are malfunctioning. Regular use may lead to inconsistent code quality.
Pre-commit hooks: key takeaways
Catch issues early : Automatically detect and prevent common errors before they enter your codebase.
Enforce consistency : Maintain uniform code formatting and style across the team.
Customizable & extendable : Leverage community-maintained hooks or create custom ones tailored to your project's needs.
Pre-commit hooks are only active if installed!
Pre-commit hooks are not enforced by default . Each contributor must run pre-commit install to activate them in their local Git environment.
CI/CD pipelines in Gitlab
CI/CD pipelines in Gitlab
What is CI/CD?
A continuous software development method where code changes are automatically built, tested, deployed, and monitored
These automatizations are organized in CI/CD Pipelines
The pipeline is defined by you and can be configured to be triggered by various actions, e.g.
After each commit is pushed
For merge requests
For certain branches
One could e.g. prevent merging into the main branch if the CI/CD pipeline fails, as described here
CI = Continious Integration:
the practice of automatically building and testing each specified change.
Examples in Python projects:
"Building" :
Create a virtual environment & install all dependeincies
Build a Python package you're developing
Install other external tools if applicabale
Testing :
Run a defined testsuite e.g. via pytest (unit tests, integration tests, etc,)
Automated testing :
Helps detect errors early.
Prevents oversight, like skipping tests, thus improving code reliability.
Makes developers aware about which commits are working, who made them, and identify problem areas quickly.
Automated builds :
Enhances reproducibility
Guarantees stable builds after every change, thus eliminating "it worked on my machine" problem
Helps you to catch missing dependencies (e.g., if you forgot to include a required library).
Allows testing the code in multiple environments.
=> Improved collaboration, code reliability and reproducibility
CD = Continious Deployement / Delivery
Extension of Continious Integration
Once code has been tested and built as part of the CI process , it automatically deploys all code changes to a testing and/or production environment
Examples in Python projects:
Publish a package in pip
Create and publish documentation for the package you're developing
Deploy software (e.g. web app / API endpoint)
How CI/CD works in Gitlab
How CI/CD works in Gitlab
CI/CD pipeline consists of jobs , which are lists of tasks to be executed.
How CI/CD works in Gitlab
CI/CD pipeline consists of jobs , which are lists of tasks to be executed.
Jobs are organized into stages , which define the sequence in which the jobs run.
How CI/CD works in Gitlab
CI/CD pipeline consists of jobs , which are lists of tasks to be executed.
Jobs are organized into stages , which define the sequence in which the jobs run.
How CI/CD works in Gitlab
CI/CD pipeline consists of jobs , which are lists of tasks to be executed.
Jobs are organized into stages , which define the sequence in which the jobs run.
Jobs are executed on the Gitlab runners :
How CI/CD works in Gitlab
CI/CD pipeline consists of jobs , which are lists of tasks to be executed.
Jobs are organized into stages , which define the sequence in which the jobs run.
Jobs are executed on the Gitlab runners :
daemons running on another server
How CI/CD works in Gitlab
CI/CD pipeline consists of jobs , which are lists of tasks to be executed.
Jobs are organized into stages , which define the sequence in which the jobs run.
Jobs are executed on the Gitlab runners :
daemons running on another server
How CI/CD works in Gitlab
CI/CD pipeline consists of jobs , which are lists of tasks to be executed.
Jobs are organized into stages , which define the sequence in which the jobs run.
Jobs are executed on the Gitlab runners :
daemons running on another server
How CI/CD works in Gitlab
CI/CD pipeline consists of jobs , which are lists of tasks to be executed.
Jobs are organized into stages , which define the sequence in which the jobs run.
Jobs are executed on the Gitlab runners :
daemons running on another server
How CI/CD works in Gitlab
CI/CD pipeline consists of jobs , which are lists of tasks to be executed.
Jobs are organized into stages , which define the sequence in which the jobs run.
Jobs are executed on the Gitlab runners :
daemons running on another server
How CI/CD works in Gitlab
CI/CD pipeline consists of jobs , which are lists of tasks to be executed.
Jobs are organized into stages , which define the sequence in which the jobs run.
Jobs are executed on the Gitlab runners :
daemons running on another server
How CI/CD works in Gitlab
CI/CD pipeline consists of jobs , which are lists of tasks to be executed.
Jobs are organized into stages , which define the sequence in which the jobs run.
Jobs are executed on the Gitlab runners :
daemons running on another server
How CI/CD works in Gitlab
CI/CD pipeline consists of jobs , which are lists of tasks to be executed.
Jobs are organized into stages , which define the sequence in which the jobs run.
Jobs are executed on the Gitlab runners :
daemons running on another server
Types of Gitlab runners :
Gitlab-hosted runners:
available in gitlab.com with some monthly quotas
Custom-hosted runners :
managed by organizations for their own GitLab instances , e.g. MPCDF GitLab service :
MPCDF provides its own runnes for MPCDF GitLab service
only available if you have your repository in gitlab.mpcdf.mpg.de
Self-managed runners:
installed, configured, and managed in your own infrastructure.
Creating a CI pipeline: .gitlab-ci.yml
In your repository, create a .gitlab-ci.yml:
(minimalistic example)
default:
image: python:3.12
before_script:
- pip install -r requirements.txt
run_tests:
script:
- echo "This is the test stage"
- pytest tests/test_unit.py
Creating a CI pipeline: .gitlab-ci.yml
Creating a CI pipeline: .gitlab-ci.yml
Let's break it down step by step!
Creating a CI pipeline: .gitlab-ci.yml
(Optional) Define the stages (e.g. in our case, it will only be the test stage)
stages:
- test
(Optional) Define the variables :
variables:
USERNAME: "mockusername"
PROJECT_NAME: "my-demo"
Variables can be defined at various levels:
At the top-level : available to all jobs unless a job overrides it.
Within a specific job : only accessible within that jobβs script, before_script, or after_script.
Creating a CI pipeline: .gitlab-ci.yml
Define the jobs:
Define a job called run_unit_tests:
run_unit_tests:
stage: test
script:
- echo "This is the test stage"
- pytest -v --junitxml=tests/report_unit.xml tests/test_unit.py
artifacts:
when: always
reports:
junit: /builds/$USERNAME/$PROJECT_NAME/tests/report_unit.xml
Job run_unit_tests belongs to the stage test and executes a script .
A script is just a collection of shell commands
Parameter βjunitxml defines an XML file for the output of the tests. In the artifacts section, this file is used by GitLab to produce a nice graphical report which can be found in the pipeline overview under Tests
Creating a CI pipeline: .gitlab-ci.yml
CI/CD Pipelines β Key takeaways
GitLab CI/CD pipelines :
Automates build β test β deploy steps on every action (e.g. push or merge)
Defined in .gitlab-ci.yml
Run on runners (hosted or self-managed)
Use rules, variables, and artifacts to control execution & collect test reports.
CI (Continuous Integration) :
Automatically builds & tests every commit
Helps detect bugs early and ensures stable, reproducible builds
CD (Continuous Delivery/Deployment) :
Automatically delivers tested code to staging/production
Benefits :
Catches bugs early β faster feedback for developers
Better collaboration across teams
Enables fast, traceable releases
Key takeaways - pre-commits VS a CI/CD pipeline
pre-commit hooks :
Run locally before a commit
Give immediate feedback to the developer
Best for fast checks on individual files , such as:
badly formatted code
syntax errors
unused imports
broken config files
CI/CD pipeline :
Runs remotely after push / PR / merge
Provides shared feedback for the whole team
Best for full-project checks in a clean environment , such as:
running the complete test suite
checking that the package can actually be built and installed
checking that the entry points (API / CLI) work as expected
- Be Kind: Always maintain a courteous and respectful tone. Focus your comments on the code, not the developer.
- Explain Your Reasoning: Provide clear explanations for your suggestions to help the developer understand the rationale behind them.
- Balance Guidance: Strike a balance between pointing out issues and offering direct solutions. Encourage developers to think critically and make informed decisions.
- Encourage Simplification: Advocate for simplifying complex code or adding comments to enhance clarity, rather than just explaining the complexity.
- Label Comment Severity: Differentiate the importance of your comments.
- Accepting Explanations: If a developer provides an explanation for a piece of code, it often indicates that the code could be clearer. Suggest rewriting for clarity or adding comments as appropriate.
At first, we need to define a Docker image which should be used as basis for a container to execute the CI Pipeline:
GitLab will automatically clone the Git repository into the running Docker container, so all files of your project are available inside the container.
The job N belongs to the stage M and executes a script.
Every job in a CI pipeline get its own instance of a Docker container, executed one after the other.
We can specify the option "needs": e.g. it can refer to the job N and means, that the job K will only be executed if the job N job was successfull. (e.g. job in the stage "deploy' for uploading the package to pypi should only run if the tests pass)
- waiting to be contacted by the central GitLab server to execute CI pipelines.
- waiting to be contacted by the central GitLab server to execute CI pipelines.
- waiting to be contacted by the central GitLab server to execute CI pipelines.
- waiting to be contacted by the central GitLab server to execute CI pipelines.
- waiting to be contacted by the central GitLab server to execute CI pipelines.
- waiting to be contacted by the central GitLab server to execute CI pipelines.
- waiting to be contacted by the central GitLab server to execute CI pipelines.
- waiting to be contacted by the central GitLab server to execute CI pipelines.