May 16, 2011 2

PHP Azure Contest Wrap Up

By admin in PHP Azure

Over the past couple of months I’ve participated in the #PHPAzureContest. As I was doing a large body of work for my dissertation it seemed like a good opportunity to try something new and get my project running on a new platform.

A Quick Recap Of The Product

The product is called ‘Twitter Sentiment Engine’. It looks at the sentiment towards search terms on Twitter. It first assembles a training sample of Tweets using the Twitter Search API and then monitors the search terms using the Twitter Streams API. A basic user interface is provided for monitoring the results of the analysis. A full explanation of the product can be found in this post.

At The End Of The Competition What Shape Is The Product In?

The product itself is very much a proof of concept, I have put little work into the UI and instead concentrated on the sentiment calculation implementation and building the two RESTful API’s (public and private – both hosted on Azure) that gather the data from Twitter.

The project is in three tears. The API (hosted on Azure) marshals requests from the other two elements, the UI and the worker nodes.

The UI

The UI of the project is a basic interface to the data gathered by the components below. It allows new keywords to be tracked, tracked keywords to be monitored and shows graph and tabular output of the data gathered on tracked keywords.

Worker Nodes

Worker nodes are the work horse of the application. They make requests to The API which allocates them jobs to work on. Jobs consist of new keywords to track (which require a training sample to be gathered) or an instruction to create and monitor a Twitter stream for a tracked keyword.

The Fulfilment API

The fulfilment API is my first attempt at creating a fully RESTful API using APP.  It glues the UI and worker nodes together. It is responsible for provisioning incoming requests to track new keywords, exposing processing jobs to worker nodes (sample gathering and stream monitoring) and exposing data to the UI to display.

This is the element of the architecture hosted on Azure. It also makes use of a SQL Azure database to sotre data gathered on keywords and data used in maintaining the state of a keyword (whether in sampling or tracking).

The diagram below shows some of the interactions:

Azure: The Good & The Bad

Component Used

I hosted the fulfillment API on an Azure extra-small compute instance and I used SQL Azure as the database for the project.

I found I need more control over the worker processes so these were hosted on a Windows Server 208 VPS on Rackspace. More on this later.

The Good

  1. SQL Azure was pretty fantastic. Microsoft have got it just right here. it ‘Just Works™’. It’s fully compatible with SQL Management Studio which compares favorably to the MySQL gui tools. It is also very easy to provision new databases and set up access white lists. The PDO_SQLSRV driver allows PHP developers to access SQLSRV in a familiar way. The Doctrine 2 project support this driver which allowed me to use Doctrine 2 as the DBAL / ORM for the project.  See here for my brief post on setting up an SQL Azure instance.
  2. The Windows Azure Command Line Tools For PHP are also a good innovation from the Microsoft Camp. The tools provide a mechanism for packaging PHP projects, testing them on Dev Fabric and then preparing them for a release on the live platform. In my opinion this is perforable to forcing developers to use Eclipse PDT to release project. The PHP IDE market is a lot more varied than the Microsoft /.net one (where VS is god). My criticism of the tools is that the learning curve is a little steep and documentation for packaging only the simplest of projects is available.
  3. The platform management console: Good Points. The Azure Manager has a mechanism allowing a quick seemles switch between staging and production environments.

The Bad

  1. Azure for PHP is a relatively new platform. The interoperability team are making great efforts to produce guides and tutorials on a wide range of subjects. While most of the time it was easy to find out the information I needed, frustratingly when encountering edge cases the guides and tutorials I needed wither were not there or I couldn’t find them using Google.  An example of this is how to offload PHP libraries to some other kind of storage (my attempt here) to reduce upload time, creating an extra small compute instance with command line tools and gaining access to PHP error logs (my attempt here).
  2. Total Deployment Time. Each time a change is made to a project it’s entire source must be re packaged and uploaded to Azure. With the ZF and D2 in the project this took in excess of 30 min for packaging and upload. While there is a testing version of Azure often subtle differences meant that it could take three or four attempts to get a deployment right. Deployments where further hampered by a sometimes clunky interface to the platform manager screen.

The Microsoft / PHP Experience In General

I was surprised and pleased to find that the Microsoft on PHP has a great community of people. A range of sites exist to help people find their feet on the MS platform.

Interoperability Bridge

Ubelly

Brian Swan’s Blog – Brain is MS PHP evangelist. He helped a lot with a few issues I had with the project.

Josh Holmes’s Blog – I met josh at PHPUK 2010 he has a few good PHP Azure Articles on his blog.

Juozas Kaziukėnas Blog – Jo Is a Doctrine 2 Core Contributor. He is responsible for interop with Windows.

In general I found that PHP on Microsoft to be a good platform to work on. As of PHP5.3, PHP is ‘feature complete’ on windows and runs well on IIS. The IIS control pannel is also really smooth giving a nice graphical interface to site management.

What’s Next For The Project?

Clearly the project is in it’s infancy, there is so much I’d like to do with it. The first step will to be re work the fulfillment API into a queue based architecture. Queues would allow a better and more reliable way to provision work between the worker processes. In addition it would allow a single point of contact with Twitter. This is more scalable than my current implementation of using Twitter Search API calls and Twitter Filter Streams both of which are rate limited.  The diagram bellow shows what this might look like.

Finally

Thanks to those who helped me along the way, I’ve enjoyed participating in the contest and I look forward to working on the Azure platform in the future.


Tags: ,

2 Responses to “PHP Azure Contest Wrap Up”

  1. khy says:

    Hi Ben,

    Great project !

    Are you going to make your project available to all ?

    I wanted to create similar engine but for a different cause .

    Any chance of get hold of the code if you do make it open source.

  2. admin says:

    Hi,

    The code for the whole project is on my github page: http://github.com/benwaine I also but all the bayesian logic in a reusable component.

    Hope this helps.

Leave a Reply