Choosing between Single or multiple projects in a git repository?

  • In a git environment, where we have modularized most projects, we're facing the one project per repository or multiple projects per repository design issue. Let's consider a modularized project:

    myProject/
       +-- gui
       +-- core
       +-- api
       +-- implA
       +-- implB
    

    Today we're having one project per repository. It gives freedom to

    • release individual components
    • tag individual components

    But it's also cumbersome to branch components as often branching api requires equivalent branches in core, and perhaps other components.

    Given we want to release individual components can we still get the similar flexibility by utilizing a multiple projects per repository design.

    What experiences are there and how/why did you address these issues?

    I have a very similar issue right now. I need to release different versions of a project so they will need to be in different repositories. This is a nightmare to manage though. It would be great if there was a way to branch just sub directories.

    Each module need to have separate version numbers. And we use `git-describe`.

    You say "one project per repository" and then you list *one* project (named `myProject`) with multiple folders. But then you are talking about branching folders `api` and `core` as if they were respositories rather than folders.

  • Christopher

    Christopher Correct answer

    8 years ago

    There are three major disadvantages to one project per repository, the way you've described it above. These are less true if they are truly distinct projects, but from the sounds of it changes to one often require changes to another, which can really exaggerate these problems:

    1. It's harder to discover when bugs were introduced. Tools like git bisect become much more difficult to use when you fracture your repository into sub-repositories. It's possible, it's just not as easy, meaning bug-hunting in times of crisis is that much harder.
    2. Tracking the entire history of a feature is much more difficult. History traversing commands like git log just don't output history as meaningfully with fractured repository structures. You can get some useful output with submodules or subtrees, or through other scriptable methods, but it's just not the same as typing tig --grep=<caseID> or git log --grep=<caseID> and scanning all the commits you care about. Your history becomes harder to understand, which makes it less useful when you really need it.
    3. New developers spend more time learning the Version Control's structure before they can start coding. Every new job requires picking up procedures, but fracturing a project repository means they have to pick up the VC structure in addition the code's architecture. In my experience, this is particularly difficult for developers new to git who come from more traditional, centralized shops that use a single repository.

    In the end, it's an opportunity cost calculation. At one former employer, we had our primary application divided into 35 different sub-repositories. On top of them we used a complicated set of scripts to search history, make sure state (i.e. production vs. development branches) was the same across them, and deploy them individually or en masse.

    It was just too much; too much for us at least. The management overhead made our features less nimble, made deployments much harder, made teaching new devs take too much time, and by the end of it, we could barely recall why we fractured the repository in the first place. One beautiful spring day, I spent $10 for an afternoon of cluster compute time in EC2. I wove the repos back together with a couple dozen git filter-branch calls. We never looked back.

    As an off topic aside, there are few more enjoyable things as a repository manager than purchasing time on a system that can do in two hours what your laptop couldn't do in 20, for less than the price of lunch. Sometimes I really love the internet.

    How would you release those individual projects as separate releases? Or do you never need to do that? That is the problem I have. With if you need to create a V1 of Project A, and V2 of Project B.

    For moving between the "one project per repo" and "multiple repos" consider git-subtree (good explanation at http://stackoverflow.com/a/17864475/15585)

    I wrote a script to automate this for common use cases: https://github.com/Oakleon/git-join-repos

    What is a "VC structure?"

    @RobertHarvey My guess is that "VC" means "Version Control".

    @RobertHarvey -- yeah it means "version control" in this context

    Is there any way to update the first link in the answer? Seems broken

    @D.BenKnoble -- unfortunately the link looks dead and I can't find a suitable replacement. I've removed the reference. Basically, you need to `git submodule update` after each bisection.

    Things get a lot easier if you stop caring about the repository history. Only thing what "git bisect" does is to find out which developer to tell "You broke this!", we don't assign blame. If there is a bug, we'll fix it and move on.

    @Calmarius - The commit message is more important than the user. History tells you why changes were made. It's more difficult to remove potentially dead code if you don't know why it was added in the first place.

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM