New Trader (finding order in chaos): 2009

Saturday, October 31, 2009

Blog is Moving

I have moved this blog to tr8dr.wordpress.com as blogger seems to have been left to rot by google (ie too many things are broken and do not work properly).

Friday, October 30, 2009

Intraday volatility prediction and estimation

GARCH has been shown to be a reasonable estimator of variance for daily or longer period returns. Some have adapted GARCH to use intraday returns to improve daily returns. GARCH does very poorly in estimating intra-day variance, however.

The GARCH model is based on the empirical observation that there is strong autocorrelation in the square of returns for lower frequencies (such as daily). This can be easily seen by observing clustering and "smooth-ish" decay of squared returns on main daily series.

Intra-day squared returns, however, have many jumps, with little in the way of autocorrelated decay pattern. Here is a sample of squared returns for EUR/USD. There are many spikes followed by immediate drops in return (as opposed to smoother decay). There does appear to be a longer-term pattern, though, allowing for a model.

With expanded processing power and general access to tick data, research has begun to focus on intra-day variance estimation. In particular, expressing variance in terms of price duration has become an emergent theme. Andersen, Dobrev, and Schaumburg are among a growing community developing this in a new direction.

At this point have disqualified GARCH as a useful measure for my work, but am investigating a formulation of a duration based measure.

Thursday, October 29, 2009

Hawkes Process & Strategies

Call me unread, but I had not encountered the Hawkes process before today. The Hawkes process is a "point process" modeling event intensity incorporating empirical event occurrence.

The discrete form of the process is:

where t_i is the i^th occurrence at time t_i < t for some t. The form of the function is typically an exponential, but can be any function that models decay as a counting process:

Ok, that's great but what are the applications in strategies research?

Intra-day Stochastic Volatility Prediction
The recent theme in the literature has been to replace the quadratic-variance approach with a time-based approach. The degree of movement within an interval of time is equivalent in measure to the amount of time required for a given movement, and can be interchanged easily as Andersen, Dobrev, and Schaumburg have shown in "Duration-Based Volatility Estimation".

Cai, Kim, and Leduc in "A model for intraday volatility" approached the problem by combining an Autoregressive Conditional Duration process and a Hawkes process to model decay, showing that:

and then equivalently expressed in terms of intensity (where N represents the number of events of size dY):

relating back to volatility measure as:

The intensity process is comprised of an ACD part and a Hawkes part:

They claim to model the intra-day volatility closely and propose a long/short straddle strategy to take advantage of the predictive ability.

High Frequency Order Prediction Strategy
The literature suggests the use of Hawkes processes to model the buying and selling processes of market participants.

John Carlsson in "Modeling Stock Orders Using Hawkes's Self-Exciting Process", suggests a strategy where if the Hawkes predicted ratio of buy/sell intensity exceeds a threshold (say 5) buy (sell) and exit position within N seconds (he used 10).

This plays on the significant autocorrelation (ie non-zero decay time) of the intensity back to the mean. A skewed ratio of buy vs sell orders will surely influence the market in the direction of order skew.

The strategy can be enhanced to include information about volume, trade size, etc. We can also look at the buy/sell intensity of highly correlated assets and use to enhance the signal.

Wednesday, October 7, 2009

What multivariate approaches can tell us

I've focused on "univariate" strategies in the high frequency space for the last few years. Recently I did some work on medium/long term strategies for the Canadian market. In the course of investigation realized that I've been ignoring information by just focusing on signals using a single asset and not looking at related assets to provide additional signal.

Of course it has always been in the back of my mind to diversify into strategies of more than one asset. But even if one intends to trade single securities, the information that other related assets or indicators can provide gives us an edge. In particular I need to be looking at:

Multivariate SDEs with jointly distributed series
In the simplest case this is expressed in covariance, but covariance is just one of the moments of relationships.
Economic signals (well for medium / long term)
Cointegration relationships
Not only linear relationships but quadratic. These need to be tested carefully in and out of sample

My initial successes with Canadian strategies underscored how effective multivariate signals can be.

Wednesday, September 30, 2009

Determining whether a movement is mean reverting

What characteristics would positive or negative momentum have to indicate that this is not movement to a new level, but ultimately a mean reverting cycle (in the short term)? Ultimately would like to come up with a view on the likely duration of the cycle (how strong and how long).

I have not fully studied this, but would look at the following:

volume in the buying or selling relative to historical for the time period as well as recent
lack or presence of news event
speed of ascent / descent (how do we distinguish period aggressive execution from sustained)
changes in complexion of order book

Wednesday, September 23, 2009

Cointegration "Machine"

I've done some interesting cointegration work for canadian securities. We should have worked out a basket trading strategy shortly.

I was thinking about the equity and FX markets and the vast number of total securities one might investigate. We can test for stable cointegration and mean-reversion style trading in a systematic manner. Why not create a "machine" to test the many combinations for viable trading strategies.

Of course we will need market data for a large number of securities to pull this off ...

Thursday, September 17, 2009

Momentum strong indicator in daily returns

On the side have been working with someone who is looking for long term strategies in the fixed income space. My strategies focus on intra-day trading primarily, but have found the start of a number of very attractive longer term (low frequency) strategies.

In particular, we are building a multi-factor model to predict market movements for Canadian bonds. Alternatively, we are also looking at cointegration models that would be implemented as long/short baskets of securities.

Sometimes the simplest ideas work best. I decided to look at a function of momentum over a period as a predictor of return over the following period. Did not expect to have such strong results. Here is the average return predicted by momentums at various standard deviations from parity:

An alternate graph of this showing standard deviation bands for returns against momentum levels:

There is certainly more work to be done to understand maximum drawdown and optimal money management.

Next Steps
Beyond momentum, we are also looking at building a continuous economic index (much like the Aruoba-Diebold-Scotto Business Conditions Index). This provides a continuous forecast of economic variables based on a stochastic state space model. Will discuss further in the next post.

Saturday, September 12, 2009

Cointegration Models

A colleague had asked if I could help develop a multi-factor cointegration model for the Canadian bond market on daily or more frequent sampling, based on a variety of market data and fundamental factors. I had not developed a model like this before and was skeptical that could produce a useful result short of some man years of research.

To my surprise, found a very high probability model with 95% R-squared values and very high significance in a variety of tests. Now have a variety of models based on it depending on all or some of the below:

US 3m rates
US 2y swap rates
S&P 500
S&P / TSE Composite
Shanghai Composite index (SSE 300)
Momentum
CAD/USD fx rate
CAD 5Y liquid bond
Surprise Index

With the 2 variable cointegration, one is simply trading mean reversion on the spread between one security and another. With a multivariate cointegration, one trades a long or short basket against the cointegrating security.

Monday, August 3, 2009

Strategy Discovery

Today I want to discuss the process of building or discovering a strategy. Generally medium to high-frequency models fall into one of the following catagories:

set of rules / heuristics on top of statistical observations
analysis of price signal
evolving state-based model of prices
spread or portfolio based relationships
technical indicators
some combination of these within an bayesian or amplification framework

These models share a common problem in that they are just crude approximations. They attempt to accurately determine behavior on a macro level.

The market is the emergent behavior of the trades and order activity of all of its participants. The perfect model is one that would have to be able to predict the behavior of each individual participant and be aware of all external stimuli affecting their behavior. This is at worst unknowable and at best would require something akin to an omniscient AI.

The best we can do is have a view or views around how to model market behavior. We can chose one of three approaches towards modelling:

create models that rationalize some statistical or behavioral aspect of the market
create models using a evolved program or regression, without a preconceived rationalization
create models that embody a combination of the above two approaches

Preconceived models have the advantage of being explainable, whereas, generated models often are not. That said, it is intriguing to pursue evolution and/or program generation as a means of discovering strategies in an automated fashion.

Rationale
Manual model development and testing is very time consuming. One will start with a conjecture or skeleton idea for a new strategy. The parameter space or variants of the idea may be large. Each has to be tested, optimized, retested.

Many of my strategies start out as models that digest raw prices and produce some form of "hidden state". This hidden state is designed to tell us something useful with less noise than the original signal. This state may be multidimensional and may require further regression to map to buy and sell signals.

Obtaining optimal strategies point towards a multivariate numerical or codified regression approach. The testing and discovery of parameters and model variations would best be automated.

Tools
There are a number of approaches used in optimization, regression, or discovery problems:

Regression
ANN (Artificial Neural Nets), SVM (Scalable Vector Machine), RL (Reinforcement Learning)
Optimization
GA (Genetic Algorithms), Gradient Descent, Quadratic Optimization, etc
Discovery
GP (Genetic Programming), perhaps ANN as well

Strategy Discovery
Thus far I have mostly used tools in the Regression and Optimization categories to calibrate models. Genetic Programming represents an interesting alternative, where we generate "programs" or strategies, testing for viability and performance.

The "program" represents a random combination from an algebra of possible operations that operates on a set of inputs to produce an output. In our case, our inputs will be the digested information that our models produce. The program will map this into something that can be used to generate buy/sell/out signals.

Thousands of such programs are generated and evaluated against a fitness function. The fitest programs replicate, perform crossover, and mutate. This can be repeated for thousands of generations until programs with strong trading performance are determined.

An alternative and perhaps simpler approach is to use an ANN coupled with a GA. The GA generates weights / connections between neurons to produce a model between inputs and outputs.

Questions Under Consideration
ANNs and GPs differ in a number of important ways. Need to think further on the following:

ANNs and GPs can represent an infinite number of functions
ANNs accomplish this, though, at the cost of numerous neurons
ANNs and GPs may have a very different search space in terms of volume
We want to choose an approach that will converge more quickly (ie have a smaller search space)
How should we constrain the algebra or permutations to affect convergence
There are many "programs" which are equivalent, there may also be certain permutations we may not want to allow.
What sort of inputs are useful and how do we detect those that are not
Inputs that are not useful should ultimately have very little trace through the model. Will have to determine how to detect and prune these.

More thought needed ...

Friday, July 31, 2009

Price Path Probability (Again)

So I completed a model and calibration which determines the probability of a price going through a level within a given time period. If one can arrive at a high confidence level, this is incredibly useful for multi-leg execution and as a prop strategy in its own right.

The model uses a SDE with mean reversion, a trend component, and an evolving distribution to determine the price across time. The SDE is evaluated as a monte carlo simulation on a grid. We determine the conditional probability of going from one price level to the next for a given (small) time interval. The sum of the product of the probabilities along the price path represents the posterior probability of being at the given price node at a given time.

With the grid in hand, one can query the grid to determine the probability of a price being above or below a level within a given time, etc. For some markets, we are seeing a 75% confidence level, meaning we are right 3/4 of the time. There were some markets where there was no distinct edge in the approach, I have ideas on how to adjust this, but have not had the time to revisit.

The evolution of the distribution was the most complex to model. Unlike idealized option models, where the distribution is stationary and generally gaussian, the observed intra-day distribution over short periods is neither gaussian nor stationary. We noted that the first 3 or 4 moments have dynamics which can be modelled and fitted on top of an empirical distribution.

The trending and mean reversion functions were fitted using a maximum likelihood estimate, which was easily obtained from the distribution for each time step under given assumptions. The parameters were evolved with a GA to maximize the likelihood.

Switch Hitting

I've used many languages across the years, both functional and imperative. Since I live in the real-world (ie non-academic), I've generally had to use languages with a large user base, strong momentum, etc. In general my must-have criteria has been:

performance
access to wide range of libraries, both infrastructural and numerical
scalable to large, complex, applications
runs on one of the major VMs (JVM or CLR)

I would have liked to have had the following as well:

elegance
functional programming constructs

Changing Climate
Things have changed over the last few years though. We now have strong functional languages with high performance, access to diverse libraries, and a wider user base. There is now even talk in the wider (non-functional programming) community about functional languages.

On the imperative side have used (by frequency Java, C#, Python, C/C++, Fortran, etc). I have probably invested the most in Java at this point, as have found the complexity of C++ over the years distasteful and time consuming.

I write parallel-targetted numerical models and write trading strategies around them. The problem is that none of the above imperative languages map well to the way I think about problems and require a lot of unnecessary scaffolding.

Functional Languages on the VM
There have been some stealth functional language projects (ie not all that well known outside of the functional community), such as Clojure for the JVM. Clojure, however, does not have the performance level I am looking for and is not much elevated above lisp. I love scheme / Lisp, but feel that it doesn't scales well for large projects.

Relatively new on the scene is Scala. Scala aims to be a statically typed language with OO and functional features, bridging between the two paradigms. It has many nice features, but is more verbose then other functional languages. It is certainly much more concise than Java, which is a big plus, but does have some bizarre syntax and inconsistencies I'm not thrilled with. There are also some performance issues related to for-comprehensions that mean I cannot use it right now. Nevertheless, for the JVM, I think it is the only language I could consider for functional programming.

Enter F#. Ok, Microsoft scares me. That said, as with C# and the CLR, they have been leading the pack in innovation. They picked up on the Java mantle, fixed its flaws, and have since evolved into Java into something much better: C#.

F# is a new language (well a few years old) for the .NET CLR based on OCAML, not too distant from Haskell or ML. The language is concise, integrates well with the CLR and libraries, and performs very well.

Frankly, I think F# is the superior to any of the other languages available on the JVM or CLR. My concern now is: is it worthwhile switching from the JVM to the CLR. I have a large set of libraries written in Java. I am also concerned about the degree of portability and that the Web 2.0 APIs (which Google is largely defining) tend to be Java / JVM based.

Practical Details
Am I going to have to live in a hybrid world where I write, say, some GWT apps in java and do my core work in the .NET CLR? Is there any hope of anything like F# on the horizon for the JVM? Should I abandon eclipse and my other tools and look at Mono tools or worse be required to use Visual Studio?

Give me a F# equivalent on the JVM. Please!

Thursday, July 30, 2009

Trader Bots

I came across this site today. I'm not a huge believer in technical analysis as a basis for trading, however these guys are doing something interesting. They are generating / seeding strategies as a genetic program based on a combination of technical, momentum, and sentiment inputs into a neural net. These are then bred / cross-pollinated to refine further.

The next part is an extrapolation from the very little they have indicated. I suspect they are doing the following:

Generate initial strategies using a random genetic program that selects inputs from a subset of available technical, sentiment, and momentum indicators.
Calibrate to best possible trading signal (given inputs) using a ANN (neural net)
Evaluate utility function across some years of historical data
Based on results, refine by breeding the strategies with a GA
Rinse and Repeat

It is an automated approach to strategy descovery, avoiding costly manual research. Though it does not appear to make use of more sophisticated inputs & models, the general approach is nice. It would not be a surprise to find that some of these strategies are successful.

The approach can be expanded to incorporate more sophisticated models as inputs (such as basis function based signal decomposition, stochastic state systems, etc).

Monday, July 27, 2009

Future For Commercial HPC

As I noted in an earlier post I have High Performance Computing requirements. Basically if you can give me thousands of processors, I can use them. The problem with HPC today is that it is one or more of the following (depending on where you are):

academic and only open on a limited basis to researchers based on their proposals
internal
available but not cost effective (8 core @ $7000 / compute year at amazon)

This flies in the face of what we know:

there are many thousands of under or un-utilized computers available
the true cost of computing power + ancillary costs (power, people) can be scaled to a much lower #
organizations should want to monetize this underutilized capacity

Why do I care about this? Well, I could use cheap computing power today, but also I used to be a parallel algorithm researcher back in the day, so have been waiting for this for a long time.

The solution needs to be to allow compute resource providers a means to auction their unused resources for blocks of time, immediate or future. HPC users that want to evaluate a massively parallel problem can collect a forward dated/timed group of nodes for execution, finding a group within their cost range or wait for lower cost nodes to become available.

How would this be accomplished?

Exchanges are set up for geographical areas where providers can offer gflop-hr futures and consumers can buy computing futures or alternatively sell their unused futures.
Contract requires standardized power metrics (SPECfprate2006 for instance)
Contract requires standardized non-CPU resource (min memory, disk)
Standard means of code and data delivery (binary form, encryption, etc)
Safe VM in which to run code
Checkpointing to allow for a computation to be moved (optional)

Research into auction-based scheduling and resource allocation began in the early 90s, perhaps earlier. The first paper I saw in this regard was in 1991. There are now hundreds of papers on this and a few academic experiments. There should be a big market for this amongst web hosting companies, etc.

Amazon and Google, although likely to be very efficient with resource utilization, are likely to have peak periods and slack periods like everyone else. The strategy would be to price resources lower during slack periods to attract "greedy" computations looking for cheap power.

I have specific ideas about how this would be implemented. Contact me if you are interested.

Feed Forward NN in "real life"

Turns out that the Nematode Caenorhabditis Elegans has a nervous system that is similar to a feed forward network. A feed forward network is one where neurons have no backward feedback from neurons "downsignal" (i.e. the neurons and synapses can be arranged as a directed acyclic graph). This is very analogous to the feed forward network first envisaged for Artificial Neural Networks.

The worm has exactly 302 neurons and ~5000 synapses, with little variation in connection between one worm and another. This implies on average less than 20 synapse connections per neuron. This is in contrast to the mammalian brain, where most neurons have a feedback loop back from other neurons downstream of the signal.

I am very enthusiastic about this area of research as it progresses us step-by-step closer to realizing mapping an organism brain onto a machine substrate. The nematode is quite tractable because of the fixed and very finite number of neurons.

ANNs are no longer in vogue, but I use feed forward ANNs for some regression problems. Of course my activation function is likely to be quite different from the biological equivalent. ANNs are not a very active area of research given their limitations, but one does find them convenient for massive multivariate regression problems where one does not understand the dynamics.

The regressions that I solve only have sparse {X,Y} pairs if at all and can only be evaluated as a utility function across the whole data set. This precludes the various standard incremental "learning" approaches. Instead I use a genetic algorithm to find the synapse matrix that maximizes the utility function.

SVM is more likely to be used in this era than ANNs for regression. Its drawback is that it requires one to do much trial and error to determine an appropriate basis function, transforming a nonlinear data set into a reasonably hyperlinear dataset in another space.

High Performance Computing on the Cheap

I have a couple of trading strategies in research that require extremely compute intensive calibrations that can run for many days or weeks on a multi-cpu box. Fortunately the problem lends itself to massive parallelism.

I am starting my own trading operation, so it is especially important to determine how to maximize my gflops / $. Some preliminaries:

my calibration is not amenable to SIMD (therefore GPUs are not going to help much)
I need to have a minimum of 8 GB memory available
my problem performance is best characterized by the SPECfprate benchmark

I started by investigating grid solutions. Imagine if I could use a couple of thousand boxes on one of the grids for a few hours. How much would that cost?

Commercial Grids
So I investigated Amazon EC2 and the Google app engine. Of the two only Amazon looked to have higher performance servers available. Going through the cost math for both Amazon and Google revealed that neither of these platforms is costed in a reasonable way for HPC.

Amazon charges 0.80 cents per compute hour, $580 / month or $7000 / compute year on one of their "extra-large high cpu" boxes. This configuration of box is a 2007 spec Opteron or Xeon. This would imply a dual Xeon X5300 family 8 core with a SPECfprate of 66, at best. $7000 per compute year is much too dear, certainly there are cheaper options.

Hosting Services
It turns out that there are some inexpensive hosting services that can provide SPECfprate ~70 machines for around $150 / month. That works out to $1800 / year. Not bad, but can we do better?

Just How Expensive Is One of these "High Spec" boxes?
The high-end MacPro 8 core X5570 based box is the least expensive high-end Xeon based server . It does not, however, offer the most !/$ if your computation can be distributed. The X5500 family performs at 140-180 SPECfprates, at a cost of > $2000 just for the 2 CPUs.

There is a new kid on the block, the Core i7 family. The Core i7 920, priced at $230 generates ~80 SPECfprates and can be overclocked to around 100. A barebones compute box can be built for around $550. I could build 2 of these and surpass the performance of a dual cpu X5500 system, saving $2000 (given that the least expensive such X5500 system is ~$3000).

Cost Comparison Summary
Here is a comparison of cost / 100 SPECfprate compute year, for the various alternatives. We will assume 150 watt power comsumption per cpu at 0.10 / Kwh, in addition to system costs.

Amazon EC2
$10,600 / year. 100/66 perf x 0.80 / hr x 365 x 24
Hosting Service
$2,700 / year. 100/70 perf x $150 x 12
MacPro 2009 8 core dual X5570
$1070 / year. 100 / 180 perf x $3299 / 2 + $160 power
Core i7 920 Custom Build
$430 / year. 100 / 80 perf x $550 / 2 + $88 power
Core i7 920 Custom Build Overclocked
$385 / year. 100 / 100 perf x $550 / 2 + $100 power

The Core i7 920 build is the clear winner. One can build 5-6 of these for the cost of every X5570 based system. Will build a cluster of these.

New Trader (finding order in chaos)