## Monday, March 2, 2015

### Playing with AlgoCanvas

Some time ago I discovered an interesting data analysis (and trading) platform called AlgoCanvas. AlgoCanvas is a product of a Finnish startup called Streamr, which actually did quite well in the recent Slush startup competiton (they got to the top 20).

Basically the idea is to allow the user to come up with trading algorithms easily by simply connecting different modules with "cables" and eventually routing the output to (for example) a chart/plot. I've found however that it's also a very useful data munging tool for data pre-processing and analysis. One picture is worth n words, so here's a screenshot (click for full size image):

Here I took the bid and ask prices for AAPL during one day and sent it to the Chart on the right. Then I took the cumulative dollar value trades (price * quantity * sign... basically the order flow), discarded the hidden orders and sent it to the Chart on a different axis. Pretty simple!

Getting the order flow took a bit too many modules (Multiply, Not and Sum modules), but it was pretty quick to do. It might make sense to have a custom "Function" module which would have user defined inputs and outputs and a simple mathematical expression between... anyway, the product is still in beta and developing fast, so maybe that will be implemented eventually. Mind you, there is a custom "JavaModule", but since I don't really know Java, it's pretty useless for me... anyway, I've heard a Python module is coming too!

Also, if you have a trading algorithm in mind, you can run a full backtest... and this is with event time data, bid and ask prices, trades etc... basically full order book data with all orders sent to the order book (not just trades as in my example above). Pretty impressive when compared to e.g. Quantopian's 1 min bars...

## Using Ipython Notebook and Theano on an AWS GPU machine

Using a service such as Sage Math Cloud is an excellent way to use Ipython Notebooks (or Sage Worksheets) online, but I wanted to give Theano's GPU features a serious try, so I decided to spin up a VM with a powerful GPU (using Amazon AWS's g2.2xlarge instance). This post is a sort of a "note-to-self" type step-by-step guide on how to get it done, since it's not completely trivial... I'm not going to repeat what others have written about this, so I just added the relevant links.

1) Check out this post to set up Theano on AWS. Make sure to run the test too... The test passed for me, but turns out I got further problems with BLAS (see below)

2) Check out this post to set up an Ipython Notebook server on AWS. Make sure to allow inbound traffic: go to Security Groups on AWS Dashboard, click Actions and Edit Inbound Rules. Here add a custom TCP rule with a port number you wish to use

3) I still got an error "WARNING (theano.tensor.blas): We did not found [sic] a dynamic library into the library_dir of the library we use for blas. If you use ATLAS, make sure to compile it with dynamics library." Maybe it's just something I did, but apparently Theano cannot find BLAS, although it did install with Anaconda... Anyway, I just edited the .theanorc file and added a flag for blas like this (after installing openblas; or maybe it was installed by default??):

[blas]
ldflags = -lopenblas

That sorted it out... and now it's running like crazy, with only about $0.09 per hour!! ;) ## Friday, October 10, 2014 ## Cython is awesome After some initial frustration with learning Cython, I found the Scipy 2013 Cython tutorial by Kurt Smith of Enthought, which was a hugely positive surprise. IMO it's by far the best intorduction to Cython (the official Cython tutorials can be a bit tedious...). So I decided to put my newly learned skills to use in optimizing some stochastic differential equation simulations. To make a long story short, here's the original Python+Numpy version: ...and here's the Cython optimized version: Previously I had to wait about 30s for 1000000 iterations, and now it takes only about .1 seconds... definitely worth the effort ;) (BTW sorry about the lack of documentation... anyway, the algo has an adaptive time step, since I'm simulating some pretty wildly behaving SDEs). ## Wednesday, August 6, 2014 ### Holographic principle So just a while ago I received an email from World Scientific about my article "Easy Holography", which was published in their journal IJMPD. According to the email, my paper "...is a great contribution in understanding the relevant field of study, which is worth to be shared." Oo well gee, thanks for the recognition World Scientific (I'm sure only select few receive such encouragement)! Oh well, I guess I'll just do as I'm told and share the article! Anyway, the idea is roughly as follows (it's been a while since I worked on this stuff, so my apologies for some possible inaccuracies): let's pretend we live in a two dimensional universe and that this universe is a two dimensional quantum gravity theory, which is "asymptotically$AdS_2$", which is very similar to a hyperbolic plane$\mathbb H_2$but of minkowskian signature (google it or stop reading). Or just think of$\mathbb H_2$. The "asymptotic symmetries" of$\mathbb H_2$are such diffeomorphisms which preserve the metric near the boundary. For$\mathbb H_2$these symmetries are isomorphic to circle diffeomorphisms$Diff(S^1)$, which is an infinite dimensional group. The only thing we need to assume about the quantum gravity theory is that it is integrable w.r.t to the asymptotic symmetries, which in this context just means that every physical state can be mapped to another physical state by the asymptotic symmetries extended to all of the space (a la Conformal Field Theory). The toy model I used in the "Easy Holography" paper was basically just a pixelated image (of a bunny, since it was submitted on Easter). A rough estimate of the entropy of that "model" is then a logarithm of the number of all distinct images that are mapped from an original image by the asymptotic symmetry group, which is isomorphic to$Diff(S^1)$. So first I have a bunny. Now "coarse grain" it on an$N \times N$grid, so that a grid is black if most of the (Euclidean) surface area under it is black and white otherwise. Then apply a map from the asymptotic symmetries on the original image and coarse grain that. If the coarse grained image is different from the original coarse grained image, count that as a distinct state/configuration. Repeat this for all as. symmetries (there's a noncountably infinite number of them) and write down the number of all distinct images, or configurations, as a function of the number of$N$. If the maps were just the usual diffeomorphisms, it's easy to see (see the image above or the paper) that the number of distinct configurations is just$2^{N^2}$and thus the entropy is$S = N^2 \log(2)$. So the entopy is just proportional to the volume of the system,$N^2$. So that's the old physics. But if instead we use the asymptotic symmetry maps, it turns out that the entropy thus computed is now proportional to$N$, or the "area" of the system! This happens also for$AdS_3$, whose as. symmetries are the conformal group, isomorphic (if I recall correctly) to$Diff(S^1) \times Diff(S^1)$. In the$AdS_3$the entropy would be proportional to$N^3$(volume) for ordinary diffeomorphisms and to$N^2$(area) for the asymptotic symmetries. So the conclusion would seem to be that the asymtptotic symmetries and the asymptotic$AdS$structure of the universe is somehow related to the Holographic Principle, which states that the entropy is indeed proportional to the surface are of the system, instead of the volume... but of course the universe is not three dimensional! So what happens in higher dimensions? The asymptotic symmetry group of$AdS_4$is disappointingly not infinite dimensional, but instead the six dimensional$SO(2,2)$. That would actually just lead to a logarithmic entropy, so no luck there... But what actually is an$AdS$space and why would they be interesting? Let's look at (one) definition of an$AdS$space. We can define$AdS_d$as the homogeneous space$SO(2, d-1) / SO(2, d-2)$. For$d=2$, we have$AdS_2 \simeq SO(2, 1) / SO(2)$and for$d=3$,$AdS_3 = SO(2, 2) / SO(2,1)$. For low dimensional Lie groups there exist so called "accidental isomorphies", which we can use to write$SO(2,1) \simeq SL(2)$and$SO(2,2) \simeq SL(2) \times SL(2)$, where$SL(2)$is the special linear group. So we can write the$AdS$spaces in an equivalent way as$AdS_2 \simeq SL(2)/SO(2)$and$AdS_3 \simeq SL(2)$(Lie groups are manifolds too). For$d>2$such accidental isomorphisms do not exist. More specifically, the higher dimensional$AdS$spaces are not related at all to the special linear groups$SL(n)$. A short recap:$AdS_2$and$AdS_3$are intimately related to the Lie group$SL(2)$, but higher dimensional$AdS$spaces are not.$AdS_2$and$AdS_3$admit infinite dimensional as. symmetries, but higher dimensional$AdS$spaces do not. Clearly the Lie group$SL(2)$is the key here! Some natural questions are then 1) Do the higher dimensional Lie groups$SL(n)$(or related homogeneous spaces) also admit infinite dimensional asymptotic symmetries? 2) If they do, will they also incorporate a "natural holography" as with$AdS_2$and$AdS_3$? 3) Is it then possible come up with a higher dimensional quantum gravity theory, which is similarly exactly solvable as$AdS_3$quantum gravity (well, it should be exactly solvable I guess...)? The answer to part 1) above is "yes". I actually showed that for a homogeneous (symmetric) space$SL(3)/SO(3)$in an earlier paper called "Infinite symmetry on the boundary of$SL(3)/SO(3)$". That paper is not very elegant, and I was working on an extended and more sophisticated version, but I didn't have time to finish writing it... the final equations were actually solved by Robert Bryant at mathoverflow. I couldn't get any research funding for that though and now I'm doing something completely different! Anyway, hopefully someone will some day find these results useful. ## Wednesday, December 18, 2013 ### Statistical arbitrage with Python UPDATE: The current Github version of the backtest is a bit broken: there was a silly bug that caused the algo to "see" 1 min into the future. Also, the overnight effects/ jumps in the morning kinda ruin the intraday trades, so I'm currently rewriting the code for 1 sec bar bid/ask data to be used intraday only... the current model is also a bit too naive (especially for higher frequencies) and will need several other improvements to be useful in practice. I will probably not share it publicly in the future though, but if you want to talk about it, feel free to drop me an email etc. Finally managed to complete an early version of my PyArb statistical arbitrage project... I published it in GitHub as a Python module here, although the best way to view it right away would be to check out the IPython Notebook here at nbviewer.ipython.org. It is a model dependent equity statistical arbitrage backtest module for Python. Roughly speaking, the input is a universe of N stock prices over a selected time period, and the output is a mean reverting portfolio which can be used for trading. The idea is to model "interacting" (correlated, anticorrelated or cointegrated) stock prices as a system of stochastic differential equations, roughly as $$dX_t^i = A^i_j X_t^j dt + X_t^i dW_t^i,$$ where$X_t$are the prices and$dW_t$are white noises. The stochastic part doesn't yet play any important role, but that will soon change... This is just a backtest for a strategy, so there's no saying it will actually work in a live situation (but I'm planning to try paper trading next). Specifically, there's no slippage and impact modelling, short sell contract and borrow costs etc. I just assumed a flat rate \$.005 per share cost from Interactive Brokers' website as a sort of ballpark figure. It gives a roughly 12% annualized returns with a Sharpe ratio of about 5 and a maximum drawdown of 0.6%. Maybe that sounds a bit too good to be true? Well maybe I made a mistake, go ahead and check the code! :) (I need to check it again myself anyway or give it a go in e.g. Quantopian).

Here's a plot of the cumulative returns for a period of about 300 days. The "mode=0" is the best portfolio and corresponds to the lowest eigenvalue of the evolution matrix $A$ in the equation above.

## Saturday, September 14, 2013

### Funny statistics in stock market data

Funny statistics in stock market data

# Market data structure functions

I recently got some minute level stock market data from The Bonnot Gang for some data analytic (and stat arb design) purposes, when I noticed some funny behavior in the data structure function. Now the concept of a structure function may not be very widely known with quants/ data analysts/ economists, so here's a definition:

Suppose there's a time series Xt$X_t$. The structure function of Xt$X_t$ is defined as

Sn(τ)E(|Xt+τXt|n)

where for a given sample of data you just replace the ensemble expectation E()$\mathbb E()$ by the sample mean, 1Nt=0N()$\frac{1}{N} \sum\limits_{t=0}^N()$.

These types of structure functions have been studied for some time now in finance in the context of similarities between financial markets and hydrodynamic turbulence. I think it all started in 1996 with the paper Turbulent cascades in foreign exchange markets by Ghashghaie et al. They computed the structure functions for some FX market data, and found a scaling relation Sn(τ)ξn$S_n(\tau) \propto \xi_n$, where ξn$\xi_n$ is a concave function of n$n$, implying multiscaling in FX markets, similarly to hydrodynamic turbulence (BTW their conclusions about the result were a bit out there, but I guess the data analysis is still good).

So I did some of my own data analysis with the Bonnot Gang data (I hope it's not bad data!). Here's a few plots of the structure functions, first for n=1$n=1$:

Then n=3$n=3$:

This is close to linear, i.e. ξ31$\xi_3 \approx 1$, as in turbulence. Then n=10$n=10$:

Clearly you can't fit a power law in all of this, but there seems to be clear power law regimes divided by about 6, 18, 60 and 180 minutes! I don't know the reason for this, but if I had to guess, I'd say it's because of traders/ algorithms operating w.r.t different data timeframes... or maybe it's because of the finite tick size...

Anyway, I don't have time to get to the bottom of this, but maybe someone else will... so if you see this stuff on a paper someday, you saw it first here!! ;)

Written with StackEdit. Try it out, it's awesome!! You can do MathJax and sync everything in Google Drive or Dropbox and publish directli in Blogger, Wordpress, Tumblr etc.!