Mercurial Hook for Syntax Checking (PHP)
Friday, October 8th, 2010For those unfamiliar with Mercurial, it is an awesome Source Control Management (SCM) tool. One of my favorite features of Mercurial is that the repositories are distributed which allows each machine to have a full copy of the project's history. Being distributed has many advantages such as faster committing, branching, tagging, merging, etc. since it is all done locally. Of course this setup also creates a backup of the repository each time an engineer clones a repository. There are a lot of benefits to using Mercurial, but that is not the focus of this post.
In this article, I am going to discuss how to setup a Mercurial hook to handle checking the syntax of files. Specifically, the hook will be setup to check the syntax of PHP files. This is beneficial as it will prevent users from adding files to Mercurial that are invalid and will keep the repository clean. Better yet, when dealing with a repository for a live website, it will prevent invalid files from ever being added to the live site.
The Pretxnchangegroup Event
Mercurial hooks are programs that Mercurial will execute during specific events. Ideally, a hook such as checking syntax would happen just before a commit is being made (the precommit event). Since Mercurial is distributed, this would require each client to install and setup the hook. This may work for some, but it does require more work and can cause issues if the hook is not setup correctly on each machine.
There is a better solution for environments that have a central repository for everyone to push their changes to. Basically, the hook can be setup on the pretxnchangegroup event. This event is executed just before a changeset (group of commits) is added to a remote repository (during a push).
To setup a hook on the pretxnchangegroup event, the syntax checking will need to build a list of every file that was changed for each changeset and then check the syntax on the latest version of each file. If there is a syntax error, the hook can exit with the appropriate status code to prevent the changesets from being added to the central repository.
When using the pretxnchangegroup event, each machine will be able to commit changes with files that have syntax errors. However, when trying to push the files to the central server, the changesets will be rejected until the syntax errors have been fixed.
In Process vs. External Hooks
With Mercurial, there are two types of hooks: an in-process and an external hook. An in-process hook is a Python module that is loaded at the time the Mercurial starts. An external hook can use any programming language that is supported by the OS.
These are advantages to using both an in-process and an external hook. An external hook is most beneficial when the code is already written in another language or the developers are more familiar with a language other than Python. An in-process hook has some nice advantages as it allows the developer access to the internals of Mercurial. It also gives the ability to display a message to the user when making a change in the repository.
External Hook Using a Shell Script
In order to show how Mercurial hooks work, I have developed both an external and in-process hook to check the syntax of PHP files. Below is the source code for an external hook. This hook is a bash script that I named php_syntax.sh.
#!/usr/local/bin/bash
echo "STARTING PHP SYNTAX CHECK..."
# create a random temp file
temp_file=`/usr/bin/mktemp -t php_syntax_files`
# get all modified files and remove duplicate's
#note: use file_mods,file_adds instead
hg log -r $HG_NODE:tip --template "{files}\n" | sort | uniq > $temp_file
# Walk through each line
#for line in "$temp_file"; do
for line in $(< $temp_file); do
# Make sure it is a php file
if [ `echo $line | grep -Ei ".+\.(php)|(php4)|(php5)$"` ]
then
# create a random temp file
php_file=`/usr/bin/mktemp -t php_syntax_check`
# save the contents of this file (latest commit) to the temp file
hg cat -r tip $line > $php_file
# check the syntax
php_syntax_output=`/usr/local/bin/php -l -d display_errors=1 -d error_reporting=4 -d html_errors=0 < $php_file`;
# remove the temp file
rm -f $php_file;
test_syntax=`echo $php_syntax_output | grep "Parse error"`
if [ "$test_syntax" ];then
exit 1;
fi
fi
done
rm -f "$temp_file"
The above code will check the latest version of each file that is being changed when pushing to the server. It will only check files that have an extension of PHP, PHP4 or PHP5. The content of each file that is being pushed to the server is then stored in a temporary file and passed to PHP to check the syntax. If the syntax check fails, the program returns a 1 for failure which causes the entire push to fail so that no changes are pushed to the server. If there are no syntax errors, the hook exits normally and continues to push the files to the server.
In order to install the above hook in Mercurial, simply add the following 2 lines to the .hgrc and/or the hgweb.config file.
[hooks] pretxnchangegroup.syntax_check = /usr/home/mercurial/php_syntax.sh
Of course the path in the above line needs to be updated to where the bash script was saved. The bash script will most likely need to be updated to contain the correct paths as well.
With all of the above in place the following message will be displayed to the user when trying to push a file that has a syntax error:
running hook pretxnchangegroup.syntax_check: /usr/home/code.softwareprojects.com/php_syntax.sh transaction abort! rollback completed abort: pretxnchangegroup.syntax_check hook exited with status 127 warning: commit.autopush hook exited with status 1
In-Process Hook Using Python
The major flaw with using the above shell script is it does not allow us to display a nice informative error to the user when their push fails due to a syntax error. This is one advantage of using a Python in-process hook instead. I have written very similar logic in Python which can be seen below:
import subprocess,os,re
import os.path
from mercurial import ui
from random import randrange
from time import time
def check(ui, repo, hooktype, node, **kwargs):
#initialize variables
error = ""
fileSet = set()
# Loop through each changeset being added to the repository
for change_id in xrange(repo[node].rev(), len(repo)):
# Loop through each file for the current changeset
for currentFile in repo[change_id].files():
# Only Check PHP Files
if re.match('.*\.(php)|(php4)|(php5)',currentFile):
# Build a unique list of each file that has changed
fileSet.add(currentFile)
# Loop through each file that has changed
for currentFile in fileSet:
# Grab the latest version of the current file in the changeset
ctx = repo['tip']
# Do not check the file if it is being deleted
if currentFile not in ctx:
continue;
# Generate a unique temporary file name using random number and timestamp
temp_file = '/tmp/php_syntax_check.%s%s' % (randrange(0,100000),int(time()))
# Open the temp file for writing
f = open(temp_file,'w')
# Get the file context
fctx = ctx[currentFile]
# Save the contents of the current file to the temp file
f.write(fctx.data())
# Close the temp file
f.close()
# Check the syntax of the current/temp file
proc = subprocess.Popen('/usr/local/bin/php-cgi -l -d display_errors=1 -d error_reporting=4 -d html_errors=0 < %s' % temp_file, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# Retrieve the output of the syntax check
out,err = proc.communicate()
# Check for syntax errors and save them
if 'Parse error' in out:
error += "%s%s\n" % (out,currentFile)
# Check if an error occured in any of the files that were changed
if error != "":
# Display a message to the user about each file that contained a syntax error
ui.warn("******************************************************" +
error +
"******************************************************\n")
# Reject the changesets
return 1
# Accept the changesets
return 0
This code is very similar in functionality to the shell script. It first builds a list of all of the files being pushed that have a PHP, PHP4 or PHP5 extension. Then it obtains the contents of each file that is being pushed and stores each file in a random temporary file. It checks the syntax of each file and then cancels the push if there is one or more files with invalid syntax.
Since this is an in-process hook, it is able to display a nice message to the user about why the push was not allowed. This hook is also set up to check every single file and display a message about every file that has a syntax error. This allows the hook to display a message to the user such as the following:
****************************************************** Parse error: syntax error, unexpected T_ECHO in - on line 3 Errors parsing - afile_test.php Parse error: syntax error, unexpected '@' in - on line 15 Errors parsing - anotherfile_test.php ******************************************************
In order to setup this hook with Mercurial, save the above Python code in a file that is on the PYTHONPATH. Then add the following two lines of code to the .hgrc and/or the hgweb.config file.
[hooks] pretxnchangegroup.syntax_check = python:php_syntax.check
It is important to point out that the text on the right half of the equals sign tells Mercurial what to load. In this example, it says use Python, look for a file named php_syntax.py and call the function check.
Also, Mercurial will need to be restarted after setting up the above hook or after each time the hook is modified. This is because the in-process hook is loaded when Mercurial/Python is first started.
Conclusion
Mercurial is a great SCM tool and can be very powerful when combined with either in-process or external hooks. In-process hooks provide much more control and are the preferred method in most cases. The examples above are just an introduction to Mercurial hooks and they can easily be modified for specific environments or checking the syntax of other languages.
Please leave a comment if you have found this code useful or share your experiences with Merucial and hooks.



