Finally managed to complete an early version of my PyArb statistical arbitrage project... Can I call myself a quant now? ;) I published it in GitHub as a Python module here, although the best way to view it right away would be to check out the IPython Notebook here at nbviewer.ipython.org. It is a model dependent equity statistical arbitrage backtest module for Python. Roughly speaking, the input is a universe of N stock prices over a selected time period, and the output is a mean reverting portfolio which can be used for trading. The idea is to model "interacting" (correlated, anticorrelated or cointegrated) stock prices as a system of stochastic differential equations, roughly as
$$ dX_t^i = A^i_j X_t^j dt + X_t^i dW_t^i,$$
where $X_t$ are the prices and $dW_t$ are white noises.
The stochastic part doesn't yet play any important role, but that will soon change as I'm planning to implement a Box-Tiao style predictability measure for determining the portfolios (see e.g. the paper by de Prado)... not sure if it will improve the performance though.
This is just a backtest for a strategy, so there's no saying it will actually work in a live situation (but I'm planning to try paper trading next). Specifically, there's no slippage and impact modelling, short sell contract and borrow costs etc. I just assumed a flat rate \$.005 per share cost from Interactive Brokers' website as a sort of ballpark figure. It gives a roughly 12% annualized returns with a Sharpe ratio of about 5 and a maximum drawdown of 0.6%. Maybe that sounds a bit too good to be true? Well maybe I made a mistake, go ahead and check the code! :) (I need to check it again myself anyway or give it a go in e.g. Quantopian).
Here's a plot of the cumulative returns for a period of about 300 days. The "mode=0" is the best portfolio and corresponds to the lowest eigenvalue of the evolution matrix $A$ in the equation above.