Monday, January 13, 2014

Last post on this blog, we are moving.

Goodbye Matlab, hello Python !

This blog will not be updated any more (except for the post comments).
New location is http://tradingwithpython.blogspot.com/

Please update your bookmarks and feeds.


Monday, February 4, 2013

Trading With Python online course starting in April

A couple of observant readers already figured out that in spite of the url of this blog, past posts have been based on Python code. The gradual transfer of all of my research code from Matlab to Python is now complete. After working with Python for more than two years, I can state that it an excellent tool for research and data crunching. When it comes to collecting data from the web, cleaning and aligning datasets with missing data or building GUIs, it is the best tool I have ever come across. And don't forget that Python is open source and free.
The only downside that I found is that it can be challenging for a new user to set up the needed tools and libraries and the information is scattered around the web. This is the reason that I decided to create a 'Trading With Python' course which is focused on trading strategy research and automation of boring daily tasks. The course will take you all the way from installation & data acquisition to strategy design and backtesting. After just four weeks you'll be able to do most of the things you have seen on this blog yourself.

Tuesday, January 1, 2013

Intraday mean reversion

In my previous post I came to a conclusion that close-to-close pairs trading is not as profitable today as it used to be before 2010. A reader pointed out that it could be that mean-reverting nature of spreads just shifted towards shorter timescales. I happen to share the same idea, so I decided to test this hypothesis.

This time only one pair is tested: 100$ SPY vs -80$ IWM. Backtest is performed on 30-second bar data from 11.2011 to 12.2012.
The rules are simple and similar to strategy I tested in the last post:
if bar return of the pair exceeds  1 on z-score, trade the next bar.
The result looks very pretty:

I would consider this to be enough proof that there is still plenty of mean-reversion on 30-second scale.
If you think that this chart is too good to be true, that is unfortunately indeed the case. No transaction costs or bid-ask spread were taken into account. In fact, I would doubt that there would be any profit left after subtracting all trading costs.
Still, this kind of charts is the carrot dangling in front of my nose, keeping me going...

Sunday, December 30, 2012

Is pairs trading dead?

Bad news everybody, according to my calculations, ( which I sincerely hope are incorrect) the classical pairs trading is dead. Some people would strongly disagree, but here is what I found:

Let's take a hypothetical strategy that works on a basket of etfs:
['SPY','XLY','XLE','XLF','XLI','XLB','XLK','IWM','QQQ','DIA']
From these etfs 90 unique pairs can be made. Each pair is constructed as a market-neutral spread.

Strategy rules:
On each day, for each pair, calculate z-score based on 25-day standard deviation.
If z-score > threshold, go short, close next day
If z-score < -threshold go long, close next day

To keep it all simple, the calculation is done without any capital management (one can have up to 90 pairs in portfolio on each day) . Transaction costs are not taken into account either.

To put it simply,  this strategy tracks one-day mean reverting nature of market neutral spreads.
Here are the results simulated for several thresholds:


No matter what threshold is used, the strategy is highly profitable in 2008, pretty good throuh 2009 and completely worthless from early 2010.
This is not the first time I came across this change in mean-reverting behavior in etfs. No matter what I've tried, I had no luck in finding a pairs trading strategy that would work on ETFs past 2010. My conclusion is that these types of simple stat-arb models just don't cut it any more.

Pca - how it really works

I suppose that my previous post did not provide insights on how PCA really works. Here is another try at the subject, using a simple pair as an example.
Let's take SPY and IWM, which are highly correlated. If daily returns of IWM are plotted against daily returns of SPY, the relationship is highly linear (see left chart).
Applying PCA on this data gives two principal component vectors, plotted in red (first) and green (second). These two vectors are orhogonal, with the first one pointing in the direction of highest variance. Transformed data is nothing more than the original data projected on the new coordinate axis formed by these two vectors. The transformed data is shown in the right chart. As you can clearly see, all  points are still there, but the dataset is rotated.
The second vector is in this case -0.78 SPY + 0.62 IWM which produces a market-neutral spread.  Of course the same result would be achieved by using the beta of IWM.
The fun thing about PCA is that it is useful in building three- and more legged spreads. The procedure is exactly the same as above, but the transformation is done in a higer dimensional space. 

Monday, December 3, 2012

Using PCA for spread trading

Classical pairs trading usually involves building a pair consisting of two legs, which ideally should be market-neutral or in other words, pair returns should have zero correlation with market returns. The process of building a 'good' pair is pretty standard. A typical way of building a pair (spread) involve choosing two correlated securities and forming a market-neutral pair using stock betas.

Multi-legged spreads are more advanced and very difficult to build using the traditional method.
However, there is a mathematical method called Principal Component Analysis that can be easily used to create stable (=tradeable?) spreads. All the linear algebra is luckily hidden inside the princomp function, but if you'd like to understand how PCA really works, take a look at this tutorial. The transformed data can be described as : 1-st component: 'max volatility portfolio', which is usually very highly correlated with the market. 2-nd component: 'market-neutral' portfolio, having maximum variance. 3-d and further components have decreasing degrees of variance. Note that by design, PCA produces orthogonal components, meaning that all portfolios are not correlated to each other. So 2nd and further portfolios are market-neutral.

Here is an example of applying PCA on some correlated etfs in the energy sector:
The upper chart shows raw prices, the lower char are the cumulative returns of principal components. To compute the principal components I only used first 250 days of data. It seems that the principal components, which are linear combinations of each security returns are quite stable out-of-sample, which is a pleasant surprise. First (blue) component has most of the variance, and it is clearly correlated to the movement of the prices in the upper chart.

Let's take a closer look at the last two components: these seem to be quite stable and tradeable even far out-of-sample.


Thursday, September 27, 2012

Gap strategy with intraday data

The gap fading strategy from previous posts looked all right, but my worry is that Yahoo data does not provide accurate quotes. To check the strategy performance, I've generated a new OHLC dataset based on the Weighted Average Price (wap) of 30-second intraday data. So the opening quote is the wap of first 30 seconds of trading and close is the last 30-second wap. To make sure that my dataset is correct, I have compared it to the yahoo quotes. As shown in the chart below, the difference between the two quotes is ~5ct which seems very reasonable.
Now, testing the gap fade strategy on the OHLC data that I generated myself produces much less favorable result:
One look at the pnl chart is enough to say that this strategy would be rubbish.
This brings me to a conclusion that I already was aware of: Yahoo opening quotes are not suitable for strategy backtesting.