Our Magento Git Guide and Work Flow

Introduction

We have long been advocates of using SVN – but times have changed and so has the style of the way we work – which is what makes Git such an appealing choice for us. So if you’re coming from SVN too, some things worth knowing are:

  1. Repositories are de-centralised – With SVN, you have 1 master repository in a central location and everything is checked in/out of this location; with Git, its different. Each copy of the project tree (ie. your working copy) has its own repository – the .git sub-directory of the project tree root.
  2. Revisions are no longer decimal numbers – With SVN, your revisions are numbered sequentially with an integer. Due to the distributed nature of Git, and its potential to scaling to hundreds of thousands of revisions, the revisions are identified by a SHA1 hash. You can still short-cut your way through the tree though, HEAD (the latest revision), HEAD^ (the latest revision’s parent), HEAD^^ or HEAD~2 (the latest revision’s parent’s parent); etc.
  3. Hierarchy – Perhaps the biggest adjustment you’ll make is that SVN has a folder based hierarchy, a trunk isn’t anything special, its just a folder, the same with your tags or branches. In fact, with SVN you can check-out a single folder – which gives it a great advantage. Where Git, is just a URL – which identifies its repositories location (be it local or remote), within it, it automatically contains your master, branches and tags (not your local branches, but we’ll come back to this later).

NB. Its worth noting – this guide is not to be a fully fledged Git tutorial – there are plenty of those around and we’ve even listed the ones we prefer in the resources section below. This guide will serve as an insight into Git, traversing from SVN to Git and if you ever work with Sonassi as your development team – our rulebook and work-flow for collaborative development.

Contents


Getting started with Git

The best way to start anything is to introduce yourself, so be sure to set your name and email

git config --global user.name "Sonassi"
git config --global user.email "contact@sonassi.com"

And its always nice to have colours enabled

git config --global color.diff auto
git config --global color.status auto
git config --global color.branch auto

At this stage, we’ll assume you’ve already got Git installed (if not read, How do I install Git?) – and we’ll follow the same practice of setting up a repository like we did in our Magento SVN guide. For the purpose of simplicity, we’ll also assume you are using a remote repository, like BitBucket, GitHub or Springloops.

So start by entering your root directory for your project,

cd /home/sonassi/public_html/
git init

If you are checking out an existing repository, use:

git clone git@bitbucket.org:sonassi/sonassi.git

Whereas if it is empty, then use:

git remote add origin ssh://git@bitbucket.org:sonassi/sonassi.git

If it is an empty repository, Git will set up the current directory. If the repository contains files, it will download them to a sub-directory (named from the repository name). Eg. ./sonassi. It is safe to move the contents of this directory to your current directory (including the .git directory it will create).

Lets start by ignoring the non-essential files

cat > .gitignore << EOF
.gitignore
.htaccess
app/etc/local.xml
cron.php
cron.sh
downloader
errors/
includes
index.php
index.php.sample
install.php
LICENSE*
media
pear
php.ini.sample
RELEASE_NOTES.txt
robots.txt
shell
var
EOF

Then add the files we want to version control

git add app skin js lib
git commit -m "Initial Commit"
git push origin master

That's it! We've created the repository, committed the initial files and pushed that to our remote repository. Now we've nailed the basics, we can start to understand how Git should fall into your daily work-flow.

Comparable SVN:Git Functions

If you're coming from SVN, this look-up table will serve as a helpful reference to show you the equivalent Git command.

Commiting
git clone url svn checkout url
git pull svn update
git pull svn update
-- --
git init
git add . svnadmin create repo
git commit svn import file://repo
-- --
git diff svn diff | less
git diff rev path svn diff -r rev path
-- --
git apply patch -p0
-- --
git status svn status
git checkout path svn revert path
-- --
git add file svn add file
git rm file svn rm file
git mv file svn mv file
-- --
git commit -a svn commit
Branching
git branch branch svn copy path/trunk/ path/branches/branch
git checkout branch svn switch path/branches/branch
git checkout rev svn update -r rev
Merging
git merge branch svn merge -r rev:HEAD path/branches/branch
Resetting
git reset --hard origin master svn checkout -r rev path/to/branch

Rules

The rule that we follow; to ensure a clean, stable and scalable development, for anything other than short-lived changes, is to branch early - merge & commit often. Most changes, should be branched off into a local branch, the change made, a commit made and when complete - can be merged back into the master and pushed back to the repository. The exclusion being when you need to make a quick single-file change (eg. a minor CSS edit).

  1. Merge from origin/master at the start of the work day
  2. Commit your edits to your local repository at the end of the day
  3. The database name must be suffixed with _branch_purpose/name, Eg. _live or _stag
  4. Never copy/rename (Eg. creating a .bak file) a file to create a restore point for temporary edits
  5. The staging site must only be used for short-lived changes and previews of branch releases
  6. Live/production site pulls must be authorised by the relevant party only after the staging release has been approved
  7. Always give descriptive commit comments with ticket number/bug ID references where necessary
  8. If possible, use a hub system to isolate live pulls
  9. The staging site must never be left in an inconsistent state - all changes should either be committed or reset
  10. When you are finished with a branch, delete the branch, its files, its database and associated users
  11. Never push to the remote repository master, unless you have sufficient time to test and approve the changes for live use
  12. When creating a branch, do not give it an arbitrary name - ether name it after a specific ticket that is to be resolved (eg. bugfix1999) or if it is a long-term personal branch, use your name-companyname (eg. ben-sonassi)

Hierarchy

In practice, our development, staging and live Magento VCS environment would be set up like this.

Git Magento Hierarchy

You'll notice that the staging site and live site actually are both master, we define staging as being the final pre-live test environment. So after all your local branch testing, you test once more on the staging site; once that is approved; you can then pull on the production/live site. Remember rule #5.

To create the hierarchy is very straightforward, it begins by creating quick clones of the live site; we've wrote a few scripts to automate the process, so that's what we'll use.

Creating/refreshing the staging site

We clone the live site to create the mirrored staging environment. But we deliberately exclude some of the content (logs, sessions etc.) and we create a symbolic link for the media directory (to save on disk space usage). Just remember, with this set up, if you delete an image/product via the admin on the staging site - it will remove the file from the live site.

cd /home/sonassi
LIVE_DIR="public_html/"
mkdir subdomains
rsync -axHPS --delete $LIVE_DIR subdomains/staging --exclude="var/log/*" --exclude="var/cache/*" --exclude="var/session/*" --exclude="var/tmp/*" --exclude="media" --exclude="errors/*" --exclude="*.tgz" --exclude="*.gz" --exclude="app/etc/local.xml"
ln -s /home/sonassi/public_html/media /home/sonassi/subdomains/staging/media

Then edit the database connection details to suit in ./subdomains/staging/app/etc/local.xml. Then we'll dump the live database ready for import on the staging site - we have wrote a script which can dump the live database faster than the normal process and without causing table-level locks (ie. without impacting the live site at all - read the full guide here). After the dump, we also used sed to find and replace all he URLs to be the new staging URL.

cd /home/sonassi/public_html
wget -O mage-dbdump.sh sys.sonassi.com/mage-dbdump.sh
chmod +x mage-dbdump.sh
./mage-dbdump.sh
mv ./var/db.sql ../subdomains/staging/var/db.sql
mv ./mage-dbdump.sh ../subdomains/staging/
cd ../subdomains/staging/
sed -i 's/www.mydomain.com/staging.mydomain.com/g' ./var/db.sql
./mage-dbdump.sh --restore

Now, we've got an operational staging site, which is a clone of the live site, and also a working directory for master. This now gives us the final preview point before any changes are made live - and a little environment for tiny changes to be made.

Work Flow

Git Workflow Diagram

Short lived or tiny changes

The only exception to making direct edits to the staging site is when you have a change that can be executed very quickly and you know the definitive output. A good example is if you need to make a quick CSS correction and you know the exact code (or close enough) to make the change. So you edit the relevant file(s), test the output, commit your changes with an appropriate comment, then push the changes to origin master. After approval from the live site maintainer, you can then perform a pull on the live site.

In practice, the code execution would be like this:

cd /home/sonassi/subdomains/staging
git status (you need to make sure that no-one has left the staging site in an inconsistent state - breaking rule #9)
nano ./skin/frontend/mypackage/default/custom.css (you then make some changes to the CSS)
git add ./skin/frontend/mypackage/default/custom.css (contrary to SVN, even if a file is under version control, you still have to state if the change should be part of this commit by "adding" the file)
git commit -m "Edited line +33 of custom.css to resolve ticket #1555"
git push origin master

cd /home/sonassi/public_html
git reset --hard origin master
git pull origin master

You shouldn't have to reset the live site's working directory - but we just want to be extra cautious to remove any temporary edits some naughty developers may have done.

So in summary, make sure you remember:

  1. NEVER edit the live site files directly - ever!
  2. NEVER make a change on the staging site, unless it takes less than a few minutes to complete

Branching out & merging

Git Magento Workflow Diagram

Branching

Branching out is a very straightforward practice, the first step is to set up the new branch environment. Thankfully, the process is near identical to creating a staging environment, with two additional commands at the end:

cd /home/sonassi/subdomains/mybranch
git branch mybranch
git checkout mybranch

This command then changes the current working directory to be an instance of your new branch. You can make changes in here, commit as frequently as you desire, then whenever necessary, either merge master into your branch - or if you are complete, fold your branch into master.

Remember, that branches are local to your own repository - so no-one else will be able to use, see or edit your branch unless you push the branch to the origin repository. But, unless you have a compelling reason to share a branch, its unlikely you'll need to do this.

Merging

Merging, assuming no conflicts happen, are very painless to carry out. There are two ways you can go about a merge,

  1. Merge another branch into your branch
  2. Merge your branch into another branch

It doesn't actually matter which you are doing, but sometimes one approach may seem more appropriate over another.

For example, if you have an actively developed branch, and you need to bring it up to date with the master - then you would execute

cd /home/sonassi/subdomains/bugfix1999
git fetch origin master
git merge master

Or, as another example, you have finished with your branch and intend to fold it into the master - then you would execute

cd /home/sonassi/subdomains/bugfix1999
git commit -am "Final changes to bugfix1999- to resolve ticket #1999, prior to folding into master"
git checkout master
git merge bugfix1999
git branch -D bugfix1999
git push origin master

Conflict resolution

Git is very sophisticated with auto-merging and will only fall back to typical conflict resolution as a last resort, but once you get your head around the <<<<<<<< notation - it will soon make sense, and you'll no longer have that sinking feeling when you've hit a conflict. A common file for conflicts to exist is likely to be a stylesheet, so we'll use style.css as our example.

Auto-merging style.css
CONFLICT (content): Merge conflict in style.css
Automatic merge failed; fix conflicts and then commit the result.

First thing to do is make sure you don't panic, resolving a conflict manually isn't any where near as hard as you might think it is. The second step is to open the conflicted file and scroll down to the conflict, you'll notice that Git has added a number of left chevrons to denote where the conflict(s) has happened.

  • HEAD represents the current working directory/branch you have selected - ie. your code.
  • master represents the current remove directory/branch - ie. someone else's code (similarly, this could be a branch or anywhere else you are merging from)
<<<<<<<< HEAD
body {
 background:blue;
 font-size:28px;
 font-weight:normal;
}
=======
body {
 background:red;
 font-size:48px;
 font-weight:bold;
}
>>>>>>>> master

So you have to make the educated decision as to what the correct output should be; which may be ...

  1. Entirely the other code
  2. Entirely your code
  3. A combination of the two.

The key here is to make a decision and edit the file accordingly - making sure you remove the conflict notation (<<<<<<<<,======= and >>>>>>>>) as necessary. So in our example, we've decided that we actually need a bit of both, leaving:

body {
 background:blue;
 font-size:48px;
 font-weight:bold;
}

Then, after resolving your conflict(s), its worth committing those changes before actually continuing with your work.

Your daily routine

Git Daily Workflow (courtesy of Naked Startup)

Getting into the habit of pulling down the changes from the remote repository in the morning and pushing your changes (or at the very least, committing your changes) in the evening is important to ensure you have a (relatively) conflict free and up-to-date working copy of the site. The last thing you want to do is go 3 days without a pull, suddenly finding dozens of conflicts that you've got to fiddle your way through.

So, print this PDF off, stick it on your wall - and live by it - the Git daily work-flow guide.

If you already having an existing branch that you are working on, then the first thing to do is merge the latest changes from master into your branch. You shouldn't have to commit any changes in your branch because you should have done that the night before!

git pull origin master
git merge master

If you have any conflicts, work through them, if not - you can carry on with the development for the day. Be sure to regularly commit your changes throughout the day (with meaningful commits). Then as it comes to the end of the day, you'll need to make sure any new files you've added to the repository have been added to version control and then make a final commit for the end of the day.

git add a_new_file
git commit -am "Addition of a_new_file, and commit of existing files for EOD 29/07/2012"

Do not merge your changes to the remote repository's master unless you have sufficient time to test and gain approval on the staging site.


Undoing a file deletion from a historical commit

Find the last commit that affected the given path. As the file isn't in the HEAD commit, this commit must have deleted it; then checkout the version at the commit before.

git rev-list -n 1 HEAD -- path/to/file
git checkout revision^ -- path/to/file

Or in one command, if $file is the file in question.

file="path/to/file"
git checkout $(git rev-list -n 1 HEAD -- "$file")^ -- "$file"

Quick and dirty restore points

Git Index Structure

The Git index is a great resource for making checkpoints in your code, that don't really require a commit exactly. If you are not familiar with the Git index, it is the "placeholder" in which changes are stored after a git add and before a git commit (see diagram). Normally, there isn't any time between add and commit - but you can actually use the index to your advantage.

The index can be used exactly the same way you would rely on the undo history in your PHP editor to provide a way to restore back to a previous point in the code. So if you are ultimately quite happy with your progress on a specific file - or you want to try some slightly different/risky code, then before you proceed you can quickly add that file to the index.

git add ./app/code/community/Sonassi/MyNewExtension/etc/config.xml

Then if you do happen to make a change and you want to restore back to the previous "working" version of the file quickly, you can just checkout the file.

git checkout ./app/code/community/Sonassi/MyNewExtension/etc/config.xml

This way, you can add files to the index repeatedly throughout the day without muddying your commit history.

Amending previous commits

git commit –amend

One unmentioned commit management feature is git commit –amend which would allow you to update the last commit with new edits. If you’re familiar with git rebase -i squashing, then this is like squashing your index into the last commit. You could also amend with the working files by using git commit –amend -a or providing specific files on the command line.

If you have made a mistake and want to remove the last commit - provided you still haven't pushed it yet, it is simply a case of running a single command

git reset --soft HEAD~1

This will not undo any changes in the files, but rather just remove the last commit from the repository. If you swapped --soft for --hard, it would perform the same action, but also remove all your changes too.

Rebase-ing

In Git, there are two main ways to integrate changes from one branch into another - the merge and the rebase. Rather than clone someone else's article, there is a fantastic and clear explanation of rebasing - what it does, when to do it and importantly, when not to do it. You can find the article here


Resources

This article provided an insight into Git management and work-flow - but is only a taster for what can be done with Git. It also wouldn't have been possible to write it without the excellent resources that people have taken the time to write; so here are some great websites that you can also use to supplement your knowledge.

The Simple Guide Guide - http://rogerdudler.github.com/git-guide/
Git Crash Course - http://git.or.cz/course/svn.html
The Definitive Git Website - http://git-scm.com/
Kent Nguyen's Git Practices - http://kentnguyen.com/development/visualized-git-practices-for-team/
Oliver Steel's Git Workflow - http://osteele.com/archives/2008/05/my-git-workflow
Git for Beginners - The definitive practical guide - http://stackoverflow.com/questions/315911/git-for-beginners-the-definitive-practical-guide
Joe Maller's Web Focused Git Work Flow - http://joemaller.com/990/a-web-focused-git-workflow/
Vincent Driessen's Successful Branching Model - http://nvie.com/posts/a-successful-git-branching-model/
Naked Startup's Simple Daily Git Work Flow - http://nakedstartup.com/2010/04/simple-daily-git-workflow

  • http://about.me/hans2103 hans2103

    Useful documentation. Thank you!

  • Sander Mangel

    Great post, just what I was looking for. Im looking for a way to use Git between development and live but im strugling with database changes. When I alter the database i’d like to store that (like configuration changes in the backend > system). But I dont want to overwrite the orders table in the live environment when pushing the development to live.
    Does anyone have any suggestions?

  • Wes

    Hey guys,

    Excellent post!

    Question for you, how do you manage the product images of large sites, especially when doing development work on your local machine? One option I’ve looked at is just keeping an active rsync directory to continually sync changes, but this doesn’t seem like a perfect solution. Or possibly using URL rewrites to link up the production server images… do you guys have any suggestions here?

    Thanks again!

  • http://twitter.com/Alphydan Alphydan

    I would be careful with “git add .” you don’t want all kinds of libraries, tar files, OS files and such potential new things added by mistake. Maybe “git add -u” or start with “git status”, and then “git add” some specific files?