Wednesday, December 18, 2013

Statistical arbitrage with Python

UPDATE: The current Github version of the backtest is a bit broken: there was a silly bug that caused the algo to "see" 1 min into the future. Also, the overnight effects/ jumps in the morning kinda ruin the intraday trades, so I'm currently rewriting the code for 1 sec bar bid/ask data to be used intraday only... the current model is also a bit too naive (especially for higher frequencies) and will need several other improvements to be useful in practice. I will probably not share it publicly in the future though, but if you want to talk about it, feel free to drop me an email etc.

Finally managed to complete an early version of my PyArb statistical arbitrage project... I published it in GitHub as a Python module here, although the best way to view it right away would be to check out the IPython Notebook here at It is a model dependent equity statistical arbitrage backtest module for Python. Roughly speaking, the input is a universe of N stock prices over a selected time period, and the output is a mean reverting portfolio which can be used for trading. The idea is to model "interacting" (correlated, anticorrelated or cointegrated) stock prices as a system of stochastic differential equations, roughly as

$$ dX_t^i = A^i_j X_t^j dt + X_t^i dW_t^i,$$

where $X_t$ are the prices and $dW_t$ are white noises.

The stochastic part doesn't yet play any important role, but that will soon change...

This is just a backtest for a strategy, so there's no saying it will actually work in a live situation (but I'm planning to try paper trading next). Specifically, there's no slippage and impact modelling, short sell contract and borrow costs etc. I just assumed a flat rate \$.005 per share cost from Interactive Brokers' website as a sort of ballpark figure. It gives a roughly 12% annualized returns with a Sharpe ratio of about 5 and a maximum drawdown of 0.6%. Maybe that sounds a bit too good to be true? Well maybe I made a mistake, go ahead and check the code! :) (I need to check it again myself anyway or give it a go in e.g. Quantopian).

Here's a plot of the cumulative returns for a period of about 300 days. The "mode=0" is the best portfolio and corresponds to the lowest eigenvalue of the evolution matrix $A$ in the equation above.