We will look at mathematical foundations of propability theory and how deep our team has to dive into this theory to manage its optimizing solutions.
Random number generators
Random number generator is the base part of many optimization and or computational algorithms. Be it Monte Carlo methods, genetic, evolution based algorithms or initialization of weights in deep neural networks. There are still many unknowns about how to use them. And general human intuition falls short.
Ideal random number generator has uniform distribution of produced values and there is absolutely no relationship between them.
Truly random generators
Truly random events are all around us on the low level – we call it noise.
Atmospheric noise is radio noise caused by natural atmospheric processes – everybody can experience it when tuning a radio to a frequency where no radio station transmit. Others are thermal, electromagnetic and quantum events – cosmic background radiation, radioactive decay or as simple as noise generated by various events on semiconductor’s PN junction.
Pseudo random generators
In computing we are trying to eliminate noise as much as possible. Computers are very exact and precise – exactly opposite to what we need with randomness.
To get something random from them requires collection of random events from their surroundings or a specific hardware with truly random generator.
Examples of events produced around computers are time (very limited use), delays between keyboard’s key presses, precise mouse movements, delays between packets on the network interfaces etc. The bigger the collection of these events creates better entropy for pseudo random generator initialization “vector” or “seed”.
In other words more time the computer is running the higher entropy for a pseudo random generator.
There are many ways how to initialize a random number generator with the first seed/vector. But when we are developing a program it’s essential to be able to compare program runs between each other. It’s useful to set random number generator to the same seed so the run of the program produce the same results.
Human world is driven mostly by Normal distribution processes.
Normal distribution has very low probability of extreme events. Usually in the finance standard deviation of 2 and higher is often considered as an extreme event. In other literature it is strongly advised not to use Normal distribution and it’s connected standard deviation. As it was documented the finance market might be driven by Poisson or other processes with much higher tails.
Human intuitions expects Normal distribution and definitely nothing like flat distribution of the white noise.
Randomness with evolution based algorithms
In finance and or trading the simple evidence is the genetic algorithm being able to find bugs in the backtesting process. If there is a bug allowing to look into the future… it usually takes just tens of generations to find out and exploit this bug.
Human intuition would expect the algorithm not to be able to find such edge case. Our experiences dictate, it’s able to find it each and every time.
At the same time human optimized approach to data generation and caching falls short. Of course the genetic algorithm from its definition is going to explore all of the types of data available to search for the solution probabilistically in the whole multidimensional space. Without any slim tails at the border of the space…
Missing entropy is the main cause of low robustness of financial models and trading systems. Respect for its valuation is going to increase robustness and validation od our optimizing solutions.
Once upon a time there was a term Artificial intelligence defined. It’s subcategories emerged soon after, some of them were inspired by nature. Be it reimplementation of evolution by natural selection proposed by Charles Darwin and Alfred Russel Wallace or deep learning inspired by the structures found in our own brains. Nowadays most of its implementations are tight to computational resources of universal processors, graphical cards, FPGA and ASIC circuits.
Genetic programming algorithm
We might have started from the easier target it seems. Genetic programming is a technique generating computer programs using evolution based algorithm.
As the start of a run the evolution algorithm is generating a population with set number of randomly generated individuals. Each individual is represented as a list of specific number of trees.
Each tree in the individual represents specific trading strategy behavior – be it asset selection, entry condition, entry filter, stop loss, exit condition, exit filter. Six trees in every individual in this specific case.
Each individual is run as a trading strategy. It might generate some trades then statistics are calculated. Various statistics can be used for the individual fitness. Right now we are using sharpe ratio multiplied by number of trades to push the evolution algorithm to prefer individuals with higher number of trades.
Based on fitness the evolution algorithm selects individuals for mating and mutating phase. Mated/mutated individuals are made part of the population and the individual’s fitness is calculated again. The best individual is kept in the hall of fame.
This iterative process goes on and after several generations we are expecting working trading strategy in the hall of fame.
“Slowness” of Python is negligent as most of the time is spend calculating an individual fitness. Our individuals are creating signals – genetic programming trees are made with numpy and talib optimized functions (C code), the trading algorithm is represented as fully optimized state machine in Cython. In another words from programming point of view it’s mixed vectorized and event based approach to trading system programming.
By changing of fitness function we can push the evolution algorithm to generate specific trading strategies. Be it specific risk profile, maximum profit or optimized portfolio. There is a possibility to optimize just part of already existing strategy – for an example we will be taking an existing strategy with defined entries and the genetic programing will be evolving strategy exits. Additional inputs are possible to use as well – sources of sentiment information, market and intermarket indices, deep learning networks with pre-trained alphas and well known market alpha signals, etc.
Interesting thing with genetic programming and big amount of alpha factors might be ability to select ones which really matter in the market without being tight to usual evaluation within specific number of days to the future.