commit 5ca3bcf - Git collaborating with others

commit 5ca3bcf - Git collaborating with others

·

16 min read

Last time , we covered the basics of the ever-popular distributed version control system, Git - enough to get started using it alone, locally. This time round, we’ll be looking into how we use Git to collaborate with others remotely and manage our repositories.


Merging

Between them, branching and merging are fundamental to Git's success over much of its competition. How this is achieved may differ based on the context - for example, if you're simply using Git locally without using a cloud-based service such as GitHub, then you'll likely become familiar with the merge process directly, such as via the command line or a Git GUI client.

On the other hand, if you are using a service such as GitHub to collaborate with others (for example, contributing to open source or working as part of a team in an organisation), you'll most likely encounter merging via a web interface via Pull or Merge Requests (a visual method of proposing changes to be merged from one branch into another, often only after certain conditions have been met by the developer, such as successful testing, code review, and build verification). Of course, it depends on how that project, team, or organisation decide to regulate merges and the wider development workflow, but in most cases, it'll be one of (or a combination of) the two.

Put simply, merging in Git is much like merging traffic on a road - where traffic has diverged into two lanes previously and moved at different speeds, we eventually need to re-merge them around one another without causing an accident.

When we request to merge two branches, Git will attempt to locate a point in the history of both branches which matches, the common base. Then, Git will create a new commit in the target branch (the one we're merging into) to incorporate all changes since the common base to the tip of the source branch (the one we're merging) which the target does not have.

To be practical though, let's look at the two most common forms of local merging - fast-forward (the standard type Git uses, where possible) and no fast-forward (an option).

Fast-forward

In a fast-forward merge (the default), Git will analyse the history of commits between the target and the source to determine whether it can form a linear line between them. For example, if we branched out from main (call it myNewBranch), added 3 commits, and then merged back into main (which hadn't changed in the interim), the histories will be combined and the tip pointer of the main branch would be "fast-forwarded" to point to the commit which was previously the tip of our myNewBranch.

 topic               x --- x --- x
                   /
 main x --- x --- x

So how do we do it...

foo@bar ~/opt/projects: (main)
$ git merge 123-feature-fizzbuzz

No Fast-forward

But what happens if the mainline branch moves on whilst we are still working on our forked branch? Well, Git can't simply "fast-forward" the tip of main to match the tip of myNewBranch anymore because other commits in our target stand in the way. Git can't simply apply the changes atop the target as these aren't contained in the source branch.

 topic              x --- x --- x
                   /
 main x --- x --- x --- x

Which begs the question, how can we merge these branches? What does a no fast-forward do differently? As the name implies, Git doesn't "fast-forward" the tip of our target branch, but instead creates a commit that integrates all the changes from within the branch around those that others have created in our target branch.

foo@bar ~/opt/projects: (main)
$ git pull
    ...
foo@bar ~/opt/projects: (main)
$ git merge --no-ff 123-feature-fizzbuzz

So our branches end up looking something like this...

 topic              x --- x --- x
                   /              \
 main x --- x --- x --- x --- x --- x

Generally speaking, using the --no-ff approach is adopted either in the early stages of a repository's lifecycle or at a particular point from which to adopt it, as a workflow management policy as it's a great way to integrate features via a single point which can be reverted if required, or larger sets of changes. Similarly, one might use the --ff-only option to prevent these types of merges. Check out this article from Atlassian for more about the inner working, types, and options of merging.

A handy option for merge commands is --dry-run. As it says on the tin, it's charged with running a merge in a transactional manner, stopping before finalising the merge. This allows testing of a merge without complicating our working directory with any potential conflicts, and without actually merging our branches locally.

Conflicts

Whichever branch strategy we choose, we're bound to eventually encounter a merge conflict if we're working on multiple branches simultaneously or collaborating on projects with others.

Fundamentally, a conflict is Git's way of saying "two doesn't go into one - I've no context about the correctness of these changes, so only you can resolve this". Essentially, imagine multiple people trying to fit through a single standard sized door - it's not going to happen. They need to determine who is meant to pass through and coordinate their order.

A conflict will typically happen when we merge branches, or for those working with a remote origin repository when pulling changes. Git will warn us upon merging that a conflict has arisen, and will deposit those files and corresponding changes into the working directory for us to either resolve, commit, and continue the merge using, or to abort the merge with git merge –abort and prevent those changes from being integrated.

From the list of files, Git informs us are conflicting, we need to open these and examine the changes ourselves. Many GUI-based Git clients and IDEs have great tools for visualising and simplifying the merge conflict resolution process, but let's focus on Git itself. When we open the files concerned, notice that Git has inserted characters around the areas concerned.

var x = 1
var y = x++;

<<<<<<< main
console.log(x);
=======
console.log(y);
>>>>>>> 123-feature-fizzbuzz;

Git is telling us that in our feature branch (in the case of a merge targeting main, this is our source branch), it sees a commit in this branch trying to print the variable y. But, whilst we've been working on our feature, someone else has created a commit that is attempting to print x.

In this example, the impact of the change is minor, but to comprehend the importance of paying due attention to conflicts, imagine for a moment this conflict represents the same line of code being updated to run two very different database queries in an e-commerce company's codebase. One commit might apply a 10% discount to a range of products, whereas the other might apply a 60% discount on a different range of products - those are radically different changes that will have an impact upon the codebase, data, and the organisation. That's why merge conflicts require human intervention - to infer and apply context to the proposed changes and determine the correct course of action to ensure these commits are desired, correct, and void of unintended side effects so we must ask things such as

  • Which is the correct percentage the marketing department wanted to apply for this promotion?
  • Which products are and are not included in this promotion?
  • Could one of these commits be a typo accidentally committed that might cost the company a great deal of money?
  • Has this same line been merged incorrectly, previously?
  • Should this promotion currently be running at all?
  • What process, decisions, and communication lead to the breakdown in agreement on the percentage to apply, if at all?

In terms of resolving this, let's say we've spoken with our colleagues and we're sure that printing y is the correct course of action. In which case, we simply delete the lines we don't need, like so

var x = 1
var y = x++;

console.log(y);

Now that we've resolved the conflict in our working directory mid-merge, like any other change, we need to stage and commit this resolution to enable the merge to continue and for Git to once again feedback if any other subsequent conflicts are found after our proposed resolution is integrated.

Rebasing

Merging has become the defacto method of combining changes primarily because it's non-destructive. Meaning, changes in each branch are preserved or coordinated around one another to integrate the various changes. At the same time, it can litter the mainline branch with merge commits and integrate existing changes from the mainline into a feature branch. So what options exist for scenarios where we needn't worry about preserving the history of our branch? Enter rebase.

Knowing merging preserves branch history, then consider rebasing as the tool for modifying history whilst coordinating changes. Hence why this section comes with a note... the key thing to learn with rebasing is when not to do it. Primarily, do yourself and your team(s) a favour and avoid rebasing any mainline branch. Why? Bear this in mind whilst we cover what rebase does and it'll become clear.

Let's assume our colleagues have continued working on the main branch whilst we develop our topic branch.

 topic              x --- x --- x
                   /
 main x --- x --- w --- y --- z

Let's also assume that we need commits w, y, and z which were committed after the topic branch was created. We could merge main into our topic branch, but perhaps we want to preserve the history and order of commits in our topic branch for a clear linear progression of this feature. We could cherry-pick those multiple commits, but it's simpler to rebase in this case.

foo@bar ~/opt/projects: (123-feature-fizzbuzz)
$ git rebase main

What we end up with, is our same topic branch history atop the latest main history, as we see below. We should also note that the x commit identifiers are now symbolised by abc because when rebasing, the hashes of our topic branch commits change since the history they sit upon changes and therefore the information contained with that commit and its hash needs to change.

 topic                          a --- b --- c
                               /
 main x --- x --- w --- y --- z

Now imagine we ran git push to the remote, from which our colleagues also work. What Git will tell us is that our local topic and the origin topic have diverged. Meaning, origin has commits which our local topic doesn't, and our local topic has commits origin doesn't. In this scenario, assuming we're the only individual working on this topic branch, we can overwrite the origin branch with our local rewritten branch using git push --force.

Knowing what rebase is for and how it works, it becomes clear why rebasing a mainline branch is a terrible idea. Imagine the above in reverse - our feature commits would not only end up as the base of main, integrating changes before their time and likely causing conflicts upon conflicts, but it would entirely re-write the commit history of main. Whilst we'll the divergence issue when using rebase on any local copy of a remote branch, the worst thing we could do is issue the infamous git push --force after rebasing main as it would overwrite the remote history.

In closing, we can see why merging is the preferred method of branch integration - to preserve history, retain linearity, and retain simplicity when integrating and collaborating with others.

Synchronising with remotes

Now we're familiar with using Git locally - from the essential commit right through to branches and change coordination through merging, let's touch upon the distributed part of this distributed version control system.

Origin and Remote

As we know, Git is distributed - a centrally hosted repository with any number of cloned copies where the entire repository is pulled down to our machine. And two terms relating to that we'll come across which are commonly conflated and confused by beginners and experts alike, are remote and origin.

When we clone a repository, Git will record the source of that repository as the location with which to synchronise. This source is what Git terms a remote. We can add as many as these for a repository as desired and synchronise with each independently as pleased. In fact, remote is the unique term of the two which is paired with a command - git remote add - allowing us to add additional remote locations.

foo@bar ~/opt/projects: (main)
$ git push --set-upstream origin main

So what of origin? As it happens, we've already covered it - that first remote that Git adds is the origin. It's simply the default name Git chooses for the initial remote for a repository.

Clone

The purpose of clone is as one might deduce from the name - to make a local copy of a repository (either entirely, or partially) located at the given remote location and record that remote as its origin for synchronisation.

The resulting local repository will clone the entire data set from the remote - including every variation of every file tucked away in the history of that repository, meaning any copy can (in theory) become a replacement for the remote should it become corrupted.

This is typically done one of two ways - via ssh or https (though, a third option exists but which has no authentication protections - the git protocol). Either way, the resulting local repository doesn't differ whichever is chosen, however, the protocol and source of the origin remote location will, as will the methods used to authenticate with that remote.

foo@bar ~/opt/projects: (123-feature-fizzbuzz)
$ git clone ssh://foobar@example.com/path/to/project.git 

foo@bar ~/opt/projects: (123-feature-fizzbuzz)
$ git clone https://example.com/path/to/project [<directory> | . ]

# . = the current directory

Fetch

As a distributed VCS, obviously we need a way to synchronise our local repository with its remote counterpart. How we do that depends on what we're aiming to achieve - do we need to simply gather the latest changes to the repository locally but preserve our working, or do we need to download and integrate those changes immediately?

The answer to that question determines whether we use fetch or pull. Using git fetch will download any changes from the remote repository which our local repository doesn't have. However, it does not merge them into our current branches. So what does this mean in practice?

If we look under the .git/ directory in our local project, we'll find a subdirectory called refs. Dig a little deeper and we'll find further directories for any remote repositories and the branches those remotes are tracking at the source.

foo@bar ~/opt/projects: (123-feature-fizzbuzz)
$ ls -l .git/refs/remotes/origin
total 16
-rw-r--r--  1 user  group  41  1 Jan 12:00 main
-rw-r--r--  1 user  group  41 1 Jan 13:00 123-feature-fizzbuzz

These are locally accessible references to the state of the remote repository. So when we issue a fetch, Git will update the references for these branches with the latest remote changes, whilst also syncing any new branches available at the remote. It's here that Git looks when we ask it to merge - it uses the references here as sources for the changes at our remote to integrate into our local branches.

But, it doesn't do this automatically - a good analogy being installing new software onto a computer - consider fetch the download phase of such a process. The installation phase is then either a manual effort (using merge for example), or automated via the use of git pull. Meaning, although fetch will update the references of our local repository with those on the remote, it doesn't integrate those changes into our working directory, which are stored under .git/refs/heads

Pull

Like fetch, pull is for updating our local references to the remote repository with any changes it has. The difference being that pull does integrate those changes into our local repository heads and work directory. Put simply, pull is the download and integrate command. In Git's own words ...

Incorporates changes from a remote repository into the current branch. In its default mode, git pull is shorthand for git fetch followed by git merge FETCH_HEAD.

It's quite literally a fetch to update our local references, and then a merge to integrate those into our local repository and working directory. This is often the more commonly used command when we want to retrieve our colleagues changes, and although often times these days our remotes have lovely UIs atop them that we can get a simpler view of the remote repository branches and history (such as GitHub, DevOps, and GitLab etc), fetch certainly still has it's place for keeping our local references up to date with the remote.

Push

Quite simply, push is the opposite of pull... it pushes the history of your local active branch to the copy of said branch at the remote. In the case of main, it'll already have a remote counterpart recorded locally so we can simply issue a basic push command.

foo@bar ~/opt/projects: (main)
$ git push main

However, if our topic branch was created locally from the local copy of our main branch, then we need to tell Git to not just push this branch up to the remote for others to access, but to also tell it that this local branch should be set to track that new remote branch - in other words, they're related such that this local branch should refer to that branch on that remote by default in future.

This is done using the below command, we're telling Git not just to push our changes to the original remote's 123-feature-fizzbuzz branch, but we're telling it that when we refer to 123-feature-fizzbuzz, assume we mean the 123-feature-fizzbuzz located at the remote named origin, using --set-upstream, to save us having to keep enter a remote name in future if we know it'll remain the same.

foo@bar ~/opt/projects: (123-feature-fizzbuzz)
$ git push --set-upstream origin 123-feature-fizzbuzz

Finally, push is yet another implementor of --dry-run - a helpful way to detect any potential issues when pushing our changes up for an easier ride.

Version control policies

It's as likely as the sky is blue that the companies you'll work with within your career will have an internal policy or guidelines for how they expect their chosen version control package to be used by all.

Likewise, it's unlikely that in this day and age you'll find an established open-source project that doesn't have guidelines on how to contribute to the project with some expectations for things such as commits, branches, and pull requests in whichever cloud-based service it's hosted on.

These guidelines are there to ensure the integrity of the project and its codebase, a legible audit trail of changes, and ultimately, productive and collaborative engineers. Oftentimes, they include guidelines for using Git feature such as

  • Branching - where to create new branches from; how to prefix and name branches; highlight mainline branches
  • Merging - where to merge to; ensuring topic branches are up-to-date with mainline; where merging is conducted (locally or remotely)
  • Pull requests - details about the changes made; verification (testing) and solution improvement (feedback)
  • Commit messages - making them brief, descriptive, and informative

What next?

Now that we're more familiar with Git, why not research and experiment with a few more advanced concepts yourself, else I might ramble on forevermore.

  • Interactive rebasing - fixing and squashing
  • Cherry picking
  • Hooks
  • Sub-repositories
  • Coordinating merges and conflicts with others work
  • Collaboration focused features and workflows via on-premises or cloud services

If you're not so familiar and still looking for more free guidance or practice, there's an abundance of articles, videos, and tutorials online to consume. In fact, the folks over at Codecadamy have a great course for learning Git pragmatically with plenty of juicy explanations as you go. If you're stretched for time or resource, then the fantastic people who make-up freeCodeCamp have an abundance of resource on the subject, including a crash course video series for time constrained students and a detailed article on the interactions between Git and GitHub.

Finally, the best advice any experienced engineer can give regarding Git is much the same as any other tool in our tool belts - practice. Get familiar with the concepts and commands and make mistakes - getting yourself into a pickle with Git is the best way to understand how it works ain’t practice and how to resolve those issues. So, practice makes perfect ✌️