Our Magento Git Guide and Work Flow

Introduction

We have long been advocates of using SVN – but times have changed and so has the style of the way we work – which is what makes Git such an appealing choice for us. So if you’re coming from SVN too, some things worth knowing are:

  1. Repositories are de-centralised – With SVN, you have 1 master repository in a central location and everything is checked in/out of this location; with Git, its different. Each copy of the project tree (ie. your working copy) has its own repository – the .git sub-directory of the project tree root.
  2. Revisions are no longer decimal numbers – With SVN, your revisions are numbered sequentially with an integer. Due to the distributed nature of Git, and its potential to scaling to hundreds of thousands of revisions, the revisions are identified by a SHA1 hash. You can still short-cut your way through the tree though, HEAD (the latest revision), HEAD^ (the latest revision’s parent), HEAD^^ or HEAD~2 (the latest revision’s parent’s parent); etc.
  3. Hierarchy – Perhaps the biggest adjustment you’ll make is that SVN has a folder based hierarchy, a trunk isn’t anything special, its just a folder, the same with your tags or branches. In fact, with SVN you can check-out a single folder – which gives it a great advantage. Where Git, is just a URL – which identifies its repositories location (be it local or remote), within it, it automatically contains your master, branches and tags (not your local branches, but we’ll come back to this later).

NB. Its worth noting – this guide is not to be a fully fledged Git tutorial – there are plenty of those around and we’ve even listed the ones we prefer in the resources section below. This guide will serve as an insight into Git, traversing from SVN to Git and if you ever work with Sonassi as your development team – our rulebook and work-flow for collaborative development.

Contents

 

Getting started with Git

The best way to start anything is to introduce yourself, so be sure to set your name and email

git config --global user.name "Sonassi"
git config --global user.email "contact@sonassi.com"

And its always nice to have colours enabled

git config --global color.diff auto
git config --global color.status auto
git config --global color.branch auto

At this stage, we’ll assume you’ve already got Git installed (if not read, How do I install Git?) – and we’ll follow the same practice of setting up a repository like we did in our Magento SVN guide. For the purpose of simplicity, we’ll also assume you are using a remote repository, like BitBucket, GitHub or Springloops.

So start by entering your root directory for your project,

cd /home/sonassi/public_html/
git init

If you are checking out an existing repository, use:

git clone git@bitbucket.org:sonassi/sonassi.git

Whereas if it is empty, then use:

git remote add origin ssh://git@bitbucket.org:sonassi/sonassi.git

If it is an empty repository, Git will set up the current directory. If the repository contains files, it will download them to a sub-directory (named from the repository name). Eg. ./sonassi. It is safe to move the contents of this directory to your current directory (including the .git directory it will create).

Lets start by ignoring the non-essential files

cat > .gitignore << EOF
/.gitignore
/.htaccess
/app/etc/local.xml
/cron.php
/cron.sh
/downloader
/errors/
/includes
/index.php
/index.php.sample
/install.php
/LICENSE*
/media
/pear
/php.ini.sample
RELEASE_NOTES.txt
/robots.txt
/shell
/var
EOF

Then add the files we want to version control

git add app skin js lib
git commit -m "Initial Commit"
git push origin master

That’s it! We’ve created the repository, committed the initial files and pushed that to our remote repository. Now we’ve nailed the basics, we can start to understand how Git should fall into your daily work-flow.

Comparable SVN:Git Functions

If you’re coming from SVN, this look-up table will serve as a helpful reference to show you the equivalent Git command.

Commiting
git clone url svn checkout url
git pull svn update
git pull svn update
git init
git add . svnadmin create repo
git commit svn import file://repo
git diff svn diff | less
git diff rev path svn diff -r rev path
git apply patch -p0
git status svn status
git checkout path svn revert path
git add file svn add file
git rm file svn rm file
git mv file svn mv file
git commit -a svn commit
Branching
git branch branch svn copy path/trunk/ path/branches/branch
git checkout branch svn switch path/branches/branch
git checkout rev svn update -r rev
Merging
git merge branch svn merge -r rev:HEAD path/branches/branch
Resetting
git reset –hard origin master svn checkout -r rev path/to/branch

Rules

The rule that we follow; to ensure a clean, stable and scalable development, for anything other than short-lived changes, is to branch early – merge & commit often. Most changes, should be branched off into a local branch, the change made, a commit made and when complete – can be merged back into the master and pushed back to the repository. The exclusion being when you need to make a quick single-file change (eg. a minor CSS edit).

  1. Merge from origin/master at the start of the work day
  2. Commit your edits to your local repository at the end of the day
  3. The database name must be suffixed with _branch_purpose/name, Eg. _live or _stag
  4. Never copy/rename (Eg. creating a .bak file) a file to create a restore point for temporary edits
  5. The staging site must only be used for short-lived changes and previews of branch releases
  6. Live/production site pulls must be authorised by the relevant party only after the staging release has been approved
  7. Always give descriptive commit comments with ticket number/bug ID references where necessary
  8. If possible, use a hub system to isolate live pulls
  9. The staging site must never be left in an inconsistent state – all changes should either be committed or reset
  10. When you are finished with a branch, delete the branch, its files, its database and associated users
  11. Never push to the remote repository master, unless you have sufficient time to test and approve the changes for live use
  12. When creating a branch, do not give it an arbitrary name – ether name it after a specific ticket that is to be resolved (eg. bugfix1999) or if it is a long-term personal branch, use your name-companyname (eg. ben-sonassi)

Hierarchy

In practice, our development, staging and live Magento VCS environment would be set up like this.

Git Magento Hierarchy

You’ll notice that the staging site and live site actually are both master, we define staging as being the final pre-live test environment. So after all your local branch testing, you test once more on the staging site; once that is approved; you can then pull on the production/live site. Remember rule #5.

To create the hierarchy is very straightforward, it begins by creating quick clones of the live site; we’ve wrote a few scripts to automate the process, so that’s what we’ll use.

Creating/refreshing the staging site

We clone the live site to create the mirrored staging environment. But we deliberately exclude some of the content (logs, sessions etc.) and we create a symbolic link for the media directory (to save on disk space usage). Just remember, with this set up, if you delete an image/product via the admin on the staging site – it will remove the file from the live site.

cd /home/sonassi
LIVE_DIR="public_html/"
mkdir subdomains
rsync -axHPS --delete $LIVE_DIR subdomains/staging --exclude="var/log/*" --exclude="var/cache/*" --exclude="var/session/*" --exclude="var/tmp/*" --exclude="media" --exclude="errors/*" --exclude="*.tgz" --exclude="*.gz" --exclude="app/etc/local.xml"
ln -s /home/sonassi/public_html/media /home/sonassi/subdomains/staging/media

Then edit the database connection details to suit in ./subdomains/staging/app/etc/local.xml. Then we’ll dump the live database ready for import on the staging site – we have wrote a script which can dump the live database faster than the normal process and without causing table-level locks (ie. without impacting the live site at all – read the full guide here). After the dump, we also used sed to find and replace all he URLs to be the new staging URL.

cd /home/sonassi/public_html
wget -O mage-dbdump.sh sys.sonassi.com/mage-dbdump.sh
chmod +x mage-dbdump.sh
./mage-dbdump.sh
mv ./var/db.sql ../subdomains/staging/var/db.sql
mv ./mage-dbdump.sh ../subdomains/staging/
cd ../subdomains/staging/
sed -i 's/www.mydomain.com/staging.mydomain.com/g' ./var/db.sql
./mage-dbdump.sh --restore

Now, we’ve got an operational staging site, which is a clone of the live site, and also a working directory for master. This now gives us the final preview point before any changes are made live – and a little environment for tiny changes to be made.

Work Flow

Git Workflow Diagram

Short lived or tiny changes

The only exception to making direct edits to the staging site is when you have a change that can be executed very quickly and you know the definitive output. A good example is if you need to make a quick CSS correction and you know the exact code (or close enough) to make the change. So you edit the relevant file(s), test the output, commit your changes with an appropriate comment, then push the changes to origin master. After approval from the live site maintainer, you can then perform a pull on the live site.

In practice, the code execution would be like this:

cd /home/sonassi/subdomains/staging
git status (you need to make sure that no-one has left the staging site in an inconsistent state - breaking rule #9)
nano ./skin/frontend/mypackage/default/custom.css (you then make some changes to the CSS)
git add ./skin/frontend/mypackage/default/custom.css (contrary to SVN, even if a file is under version control, you still have to state if the change should be part of this commit by "adding" the file)
git commit -m "Edited line +33 of custom.css to resolve ticket #1555"
git push origin master

cd /home/sonassi/public_html
git reset --hard origin master
git pull origin master

You shouldn’t have to reset the live site’s working directory – but we just want to be extra cautious to remove any temporary edits some naughty developers may have done.

So in summary, make sure you remember:

  1. NEVER edit the live site files directly – ever!
  2. NEVER make a change on the staging site, unless it takes less than a few minutes to complete

Branching out & merging

Git Magento Workflow Diagram

Branching

Branching out is a very straightforward practice, the first step is to set up the new branch environment. Thankfully, the process is near identical to creating a staging environment, with two additional commands at the end:

cd /home/sonassi/subdomains/mybranch
git branch mybranch
git checkout mybranch

This command then changes the current working directory to be an instance of your new branch. You can make changes in here, commit as frequently as you desire, then whenever necessary, either merge master into your branch – or if you are complete, fold your branch into master.

Remember, that branches are local to your own repository – so no-one else will be able to use, see or edit your branch unless you push the branch to the origin repository. But, unless you have a compelling reason to share a branch, its unlikely you’ll need to do this.

Merging

Merging, assuming no conflicts happen, are very painless to carry out. There are two ways you can go about a merge,

  1. Merge another branch into your branch
  2. Merge your branch into another branch

It doesn’t actually matter which you are doing, but sometimes one approach may seem more appropriate over another.

For example, if you have an actively developed branch, and you need to bring it up to date with the master – then you would execute

cd /home/sonassi/subdomains/bugfix1999
git fetch origin master
git merge master

Or, as another example, you have finished with your branch and intend to fold it into the master – then you would execute

cd /home/sonassi/subdomains/bugfix1999
git commit -am "Final changes to bugfix1999- to resolve ticket #1999, prior to folding into master"
git checkout master
git merge bugfix1999
git branch -D bugfix1999
git push origin master

Conflict resolution

Git is very sophisticated with auto-merging and will only fall back to typical conflict resolution as a last resort, but once you get your head around the <<<<<<<< notation – it will soon make sense, and you’ll no longer have that sinking feeling when you’ve hit a conflict. A common file for conflicts to exist is likely to be a stylesheet, so we’ll use style.css as our example.

Auto-merging style.css
CONFLICT (content): Merge conflict in style.css
Automatic merge failed; fix conflicts and then commit the result.

First thing to do is make sure you don’t panic, resolving a conflict manually isn’t any where near as hard as you might think it is. The second step is to open the conflicted file and scroll down to the conflict, you’ll notice that Git has added a number of left chevrons to denote where the conflict(s) has happened.

  • HEAD represents the current working directory/branch you have selected – ie. your code.
  • master represents the current remove directory/branch – ie. someone else’s code (similarly, this could be a branch or anywhere else you are merging from)
<<<<<<<< HEAD
body {
 background:blue;
 font-size:28px;
 font-weight:normal;
}
=======
body {
 background:red;
 font-size:48px;
 font-weight:bold;
}
>>>>>>>> master

So you have to make the educated decision as to what the correct output should be; which may be …

  1. Entirely the other code
  2. Entirely your code
  3. A combination of the two.

The key here is to make a decision and edit the file accordingly – making sure you remove the conflict notation (<<<<<<<<,======= and >>>>>>>>) as necessary. So in our example, we’ve decided that we actually need a bit of both, leaving:

body {
 background:blue;
 font-size:48px;
 font-weight:bold;
}

Then, after resolving your conflict(s), its worth committing those changes before actually continuing with your work.

Your daily routine

Git Daily Workflow (courtesy of Naked Startup)

Getting into the habit of pulling down the changes from the remote repository in the morning and pushing your changes (or at the very least, committing your changes) in the evening is important to ensure you have a (relatively) conflict free and up-to-date working copy of the site. The last thing you want to do is go 3 days without a pull, suddenly finding dozens of conflicts that you’ve got to fiddle your way through.

So, print this PDF off, stick it on your wall – and live by it – the Git daily work-flow guide.

If you already having an existing branch that you are working on, then the first thing to do is merge the latest changes from master into your branch. You shouldn’t have to commit any changes in your branch because you should have done that the night before!

git pull origin master
git merge master

If you have any conflicts, work through them, if not – you can carry on with the development for the day. Be sure to regularly commit your changes throughout the day (with meaningful commits). Then as it comes to the end of the day, you’ll need to make sure any new files you’ve added to the repository have been added to version control and then make a final commit for the end of the day.

git add a_new_file
git commit -am "Addition of a_new_file, and commit of existing files for EOD 29/07/2012"

Do not merge your changes to the remote repository’s master unless you have sufficient time to test and gain approval on the staging site.

 

Undoing a file deletion from a historical commit

Find the last commit that affected the given path. As the file isn’t in the HEAD commit, this commit must have deleted it; then checkout the version at the commit before.

git rev-list -n 1 HEAD -- path/to/file
git checkout revision^ -- path/to/file

Or in one command, if $file is the file in question.

file="path/to/file"
git checkout $(git rev-list -n 1 HEAD -- "$file")^ -- "$file"

Quick and dirty restore points

Git Index Structure

The Git index is a great resource for making checkpoints in your code, that don’t really require a commit exactly. If you are not familiar with the Git index, it is the “placeholder” in which changes are stored after a git add and before a git commit (see diagram). Normally, there isn’t any time between add and commit – but you can actually use the index to your advantage.

The index can be used exactly the same way you would rely on the undo history in your PHP editor to provide a way to restore back to a previous point in the code. So if you are ultimately quite happy with your progress on a specific file – or you want to try some slightly different/risky code, then before you proceed you can quickly add that file to the index.

git add ./app/code/community/Sonassi/MyNewExtension/etc/config.xml

Then if you do happen to make a change and you want to restore back to the previous “working” version of the file quickly, you can just checkout the file.

git checkout ./app/code/community/Sonassi/MyNewExtension/etc/config.xml

This way, you can add files to the index repeatedly throughout the day without muddying your commit history.

Amending previous commits

git commit –amend

One unmentioned commit management feature is git commit –amend which would allow you to update the last commit with new edits. If you’re familiar with git rebase -i squashing, then this is like squashing your index into the last commit. You could also amend with the working files by using git commit –amend -a or providing specific files on the command line.

If you have made a mistake and want to remove the last commit – provided you still haven’t pushed it yet, it is simply a case of running a single command

git reset --soft HEAD~1

This will not undo any changes in the files, but rather just remove the last commit from the repository. If you swapped --soft for --hard, it would perform the same action, but also remove all your changes too.

Rebase-ing

In Git, there are two main ways to integrate changes from one branch into another – the merge and the rebase. Rather than clone someone else’s article, there is a fantastic and clear explanation of rebasing – what it does, when to do it and importantly, when not to do it. You can find the article here

 

Resources

This article provided an insight into Git management and work-flow – but is only a taster for what can be done with Git. It also wouldn’t have been possible to write it without the excellent resources that people have taken the time to write; so here are some great websites that you can also use to supplement your knowledge.

The Simple Guide Guidehttp://rogerdudler.github.com/git-guide/
Git Crash Coursehttp://git.or.cz/course/svn.html
The Definitive Git Websitehttp://git-scm.com/
Kent Nguyen’s Git Practiceshttp://kentnguyen.com/development/visualized-git-practices-for-team/
Oliver Steel’s Git Workflowhttp://osteele.com/archives/2008/05/my-git-workflow
Git for Beginners – The definitive practical guidehttp://stackoverflow.com/questions/315911/git-for-beginners-the-definitive-practical-guide
Joe Maller’s Web Focused Git Work Flowhttp://joemaller.com/990/a-web-focused-git-workflow/
Vincent Driessen’s Successful Branching Modelhttp://nvie.com/posts/a-successful-git-branching-model/
Naked Startup’s Simple Daily Git Work Flowhttp://nakedstartup.com/2010/04/simple-daily-git-workflow

  • http://about.me/hans2103 hans2103

    Useful documentation. Thank you!

  • Sander Mangel

    Great post, just what I was looking for. Im looking for a way to use Git between development and live but im strugling with database changes. When I alter the database i’d like to store that (like configuration changes in the backend > system). But I dont want to overwrite the orders table in the live environment when pushing the development to live.
    Does anyone have any suggestions?

  • Wes

    Hey guys,

    Excellent post!

    Question for you, how do you manage the product images of large sites, especially when doing development work on your local machine? One option I’ve looked at is just keeping an active rsync directory to continually sync changes, but this doesn’t seem like a perfect solution. Or possibly using URL rewrites to link up the production server images… do you guys have any suggestions here?

    Thanks again!

  • http://twitter.com/Alphydan Alphydan

    I would be careful with “git add .” you don’t want all kinds of libraries, tar files, OS files and such potential new things added by mistake. Maybe “git add -u” or start with “git status”, and then “git add” some specific files?

  • http://www.zone-connect.com/ zone-connect

    gitignore – will be your solution here i suppose..

  • Dmitriy Zavalkin

    What about git submodules? We have around 10 gitsubmodules in our magento project. Is there some best practices how to create feature branches in main repo and submodules and merge them to master? Also, how to deal with merging submodules pointers in main repo?
    When person A in submodule 1 merges feature branch to this submodule master and commits. Then person B in submodule 1 merges feature branch to this submodule master and commits. Then person B commits submodule pointer (maybe along with other changes) to the main repo. Then person A commits submodule pointer to the main repo. Result – submodule pointer is set on old revision and feature committed by person B is broken.

  • yesvinnie

    Great article but there are a few confusing bits where you don’t know whether the commands need to be executed on the server or the developer’s machine. It would be of immense help if you could clarify a little bit more on how a developer would check out to his machine from the staging environment. Thanks.

  • http://about.me/mtupper Michael Tupper

    Great post.
    I experience the same problem as @Sander with the issue of managing changes to the DB, since content elements are split between the web files and the DB. If it were just the web files this would be all you need, but how do you manage changes made on a dev/staging DB to the production DB? Is MySQL replication the answer?

  • yesvinnie

    BTW, the rsync –delete –exclude “media” deletes any folder named media in any directory/subdirectory.

  • sonassi

    Actually the –delete argument removes everything from the destination directory that isn’t in the source directory

  • Melvyn Sopacua

    You’re carefully avoiding the real challenge in this article: how do you handle data mutations from the live site? There’s two ways to deal with it: record changes in the development environment and apply them at the live site or use incremental sales data import/export and freeze all other operations at the live site during acceptance stage.
    Otherwise, this a solid article for when you’re allowed to have customer data on ::1.

  • sonassi

    We don’t go into too much detail because it simply isn’t relevant to code versioning. Magento modules have installer files which modify the DB structure, the only changing element is customer data/orders/products/cms – syncing these conditionally would be pointless. It would be just as easy to take a DB dump from the live site and drop that into any respective branch (which we do touch on). If you are not storing cardholder data (which the majority of merchants are not), there is no PCI-DSS compliance issue in having order history and customer information in a local development environment.

    Ps. At least someone is using IPv6 (::1) 😀

  • Melvyn Sopacua

    It’s true that Magento modules have installer files, but rarely the defaults are good enough for the client. Just take Idev OSC for example: it requires a license key and the installer doesn’t know what it is (duh!). So, as the person that cuts the releases, I create simple SQL statements that update core_config_data and stick them in deploy/ (sister dir of webroot). This works for small changes and I do that just by diffing the live and dev dump of the table (–skip-extended-insert). More elaborate changes I expect my team mates to document them or create upgrade scripts in deploy/ themselves or use an upgrade module using Magento’s upgrade framework. And yes, this is kept in git, cleaned out from master branch after deployment and merged to develop.
    But when you add a new storeview with it’s own root catalog, locale etc., creating upgrade scripts for deployment is madness. This is where you want to build the new storeview on it’s own project branch, branched from master, built up on stage and pushed to live with incremental updates of the sales data.
    I’m really interested in what others are doing, like with Capistrano, but I don’t think it’s fundamentally different then what we’re doing now and it still impacts development in that you have to register your data changes one way or the other.

    Re: PCI-DSS – while it’s an industry standard, clients are allowed to be more paranoid and we refer to them as “careful”. :)

    Ps. Yep, it’s really sad to see Magento site doesn’t, especially when you’re using IPv6 only hosts.

  • Zac Courie

    I’m confused by the command “git add app skin js lib” that comes right after the .gitignore file which is set to ignore “js” and “lib” directories. Is that a mistake in your article; if not, please explain. Thanks!

  • sonassi

    A total mistake! Sometimes we monitor those dirs, sometimes we don’t. I think there’s just a bit of a mix of those two different concepts going on.

    I’ll edit it and make it clearer.

  • Zac Courie

    Thanks. Yeah, I was using it as a step-by-step set up guide with our Magento install, but that, of course, got me stuck. Thanks, again, and please let me know when it’s updated. I’m curious to see what the update is.

  • sonassi

    Just make a judgement call as to whether you want to version control those DIRs or not.

  • James Phillips

    How do you go about applying patches to magento? What steps would be taken? If the patch modifies files that are not watched by git what is the reasonable work flow?

    – After becoming more familiar with git I now realize this is a simple task.

  • Jon

    The issue isn’t that it deletes anything, but it doesn’t copy ANY directories named “media” in the path. For example this path won’t be copied using that rsync command: app/design/adminhtml/default/default/template/media/
    Cheers!

  • kayintveen

    Its a little older topic but still very relevant. We do love versioning, but we struggle with the same question “Sander Mangel” does. We often change settings, update extensions. or sometimes even upgrade entire magento from 1.7.0.2 to 1.9 for example. how do you go around with those database related changes without having the problem of overwriting order, customer and product changes in the mean time?

  • Dave S

    You talk about setting up a staging env by using rsync, but you don’t talk about creating a dev env. Perhaps you should explain how you expect to bring on a new developer and how the process of them starting up their dev env would work. Also, how do you go about updating your magento?