DevOps Certification Training Course
- 182k Enrolled Learners
- Weekend/Weekday
- Live Class
I was recently asked by a customer to outline the pros and cons of using git submodules vs. google repo tool to manage multi-repository integrations in git.
There are a lot of articles on the internet bashing each of the tools, but in our opinion most of it comes from misunderstanding the tool’s design or trying to apply it in an unappropriate context.
This post summarizes the general rules of thumb we at Otomato follow when choosing a solution for this admittedly nontrivial situation.
First of all – whenever possible - we recommend integrating your components on binary package level rather than compiling everything from source each time. I.e. : packaging components to jars, npms, eggs, rpms or docker images, uploading to a binary repo and pulling in as versioned dependencies during the build. You can learn more from the Google cloud architect certification.
Still - sometimes this is not an optimal solution, especially if you do a lot of feature branch development (which in itself is an antipattern in classical Continuous Delivery approach – see here for example).
For these cases we stick to the following guidelines.
Pros:
1. An integrated solution, part of git since v1.5
2. Deterministic relationship definition (parent project always points to a specific commit in submodule)
3. Integration points are recorded in parent repo.
4. Easy to recreate historical configurations.
5. Total separation of lifecycles between the parent and the submodules.
6. Supported by jenkins git plugin.
Cons:
1. Management overhead. (Need separate clones to introduce changes in submodules)
2. Developers get confused if they don’t understand the inner mechanics.
3. Need for additional commands (‘clone recursive’ and ‘submodule update’)
4. External tools support is not perfect (bitbucket, sourcetree, ide plugins)
Pros:
1. Tracking synchronized development effort is easier.
2. Gerrit integration (?)
3. A separate jenkins plugin.
Cons:
1. An external obscure mechanism
2. Requires an extra repository for management.
3. Nondeterministic relationship definition (each repo version can be defined as a floating head)
4. Hard to reconstruct old versions.
5. No support in bitbucket or gui tools.
In general : Whenever we want to integrate separate decoupled components with distinct lifecycles I recommend submodules over repo, but their implementation must come with proper education regarding the special workflow they require. In the long run it pays off as integration points can be managed in deterministic manner and with knowledge comes the certainty in the tool.
If you find your components are too tightly coupled or you you’re in need of continuous intensive development occurring concurrently in multiple repos you should probably use git subtrees or just spare yourself the headache and drop everything into one big monorepo. (This depends, of course, on how big your codebase is.)
The important thing to understand is that software integration is never totally painless and there is no perfect cure for the pain. Choose the solution that makes your life easier and assume the responsibilty of learning the accompanying workflow. As they say : “it’s not the tool it’s how you use it.” If you want to know more, check out DevOps Online Training today.
I’ll be happy to hear what you think, as this is a controversial issue with many different opinions flying around on the internet.
Related Post:
Course Name | Date | Details |
---|---|---|
DevOps Certification Training Course | Class Starts on 20th January,2025 20th January MON-FRI (Weekday Batch) | View Details |
DevOps Certification Training Course | Class Starts on 25th January,2025 25th January SAT&SUN (Weekend Batch) | View Details |
DevOps Certification Training Course | Class Starts on 17th February,2025 17th February MON-FRI (Weekday Batch) | View Details |
edureka.co
Nice post!!!
One question: You haven’t stated when to prefer using repo tool?
4. Hard to reconstruct old versions.
repo tool can create snapshot manifest
Hey Philip, thanks for checking out our blog. In order to keep a backup of the project, you can create a repo manifest snapshot for the project. To explain this further:
The purpose of Git is to manage a project, or a set of files, as they change over time. Git stores this information in a data structure called a repository.
Repo is a repository management tool built on top of Git.It’s first purpose is to downloads files from multiple git repositories into your local working directory.
A repo manifest describes the structure of a repo client; that is the directories that are visible and where they should be obtained from the git
Manifests are inherently version controlled, since they are kept within a Git repository. Updates to manifests are automatically obtained by clients during `repo sync`.
Hope this helps. Cheers!