Mono-repos vs Multi-repos ❚ Stories, essays, learning, and other considerations

Faced with a new project where a team has to set up a code repository, or faced with a project that grows and diversifies, we have to decide whether a single repo can suit all our needs or if we should split our code into several repos.

Discussions about the pros and cons of both are numerous, and extreme examples of prominent companies with huge mono repos abound.

I was recently asked what I thought about this. My opinion has changed over the years, and as for many topics at some point the answer becomes what it should have been from the start: “it depends”.

Indeed, it’s not worth a general debate on the merits of both (they both have merits). It’s better to discuss a specific situation. The important aspect is to choose what is best for your project, your team. You have a unique context and should not feel forced to map the decisions of other projects or other companies (even well-known companies).

How can I decide ?

There’s an overhead with having multiple repos because it’s harder to integrate them together (coordinated releases, publishing artifacts, testing across several repos, etc). Therefore, as long as you can get by with only one repo it’s great.

At some point, it won’t be that great any more. Usually, because different parts of the repo have different life cycles. At that point, you can consider splitting your repo.

It’s a decision that must be taken after carefully dissecting your context; analyzing the problems you are facing and how well they would be solved by splitting your repository.

How can I know that I have different life cycles for my components ?

I find it difficult to come up with a general rule. Here are examples:

When the changes, most often than not, do not happen together.
When it leads to extremely complicated pipelines in your CI.
When you want un-synchronize the releases of your components (for example if there is a need to support several versions of a component)
When a component happens to have several use cases out of the original one and becomes a shared resource / dependency.
When responsibilities for part of the code changes team

Mostly, rather than trying to follow those examples, it’s better to get a feel for it yourself. Trust me, you’ll know when you have to split :).

The cost of splitting

You should be acutely aware of the substantial overhead of having several repos; keep your single repo as long as you can.

At some point, splitting will be the right decision. Still, it will inevitably complicate your life. For example, any change in the “public interface” of any of the projects will require at least one other MR in another project to synchronize the changes (or a later MR to bump a version somewhere). As long as you have a single repository, project-wide changes can be done atomically, which makes things easier.

This is not to say that there is a bad choice here, of course. What I’m saying is that there’s an inevitable overhead in having several repos, whether it’s integration, project-wide changes, deciding where to put the documentation and so forth… There will be an overhead.

Consequently, there has to be something compensating that overhead, otherwise it’s an unnecessary cost (that can be lived with, but could have been avoided).

It’s what happened for a few projects I worked on. In one case, we started with a single repo and at some point needed to split it. It became 2, 3, 5… We now have around ~15 I think. In other words: sometimes it’s worth it. You’ll probably know when it gets to that point, because it becomes obvious that your current situation is not tenable any more.

Once you have switched to having several repos, adding one does not increase the overhead significantly – the overhead is already there.

I’m using git; what about sub-modules ?

I was asked also if sub-modules were an option. They are, of course – though personally I’ve had bad experiences with them and try to avoid using them. It might very well be because I never learned to use them proficiently – but from what I can see on projects I encounter on github, most people avoid them as much as I do.

I prefer having controlled dependencies going through a dependency manager. Most language ecosystems have decent tooling for dependency management; using those brings more consistency. Often they can even fetch dependencies directly from git repos; making them similar to the concept of git’s sub-modules.