Abstract

In today’s article, we will have a look at the price development analysis which has been known for centuries – technical analysis. We will, however, look at it from a different point of view than usual – via market logic which is hidden behind the technical analysis. Within our applications for revealing and using trading strategies, we have been developing PatternLab software whose foundations are based on the technical analysis.

Patterns in price movement behaviour – where do they come from and how to use them?

The basis of systems used for trading on financial markets is usually a particular pattern in price movement behaviour, which tends to be repeated and it is, therefore, becomes something which can be called a systematic error. These errors (interferences) in data are created by specific repeated behaviour of strong market participants; to understand them, it is important to have at least basic knowledge of the market microstructure (= understanding terms such as market participant, command types, order book and depth of market).

An example of a specific price behaviour pattern (double bottom, technical analysis)

Repeated price patterns are an accompanying phenomenon which reflects behaviour (buy/sell) of a market participant which has the power to move the market price of the asset being bought/sold when performing their buys and sells, where this accompanying phenomenon does not have to be noticed by the very market participant at all. As an example, let’s take an imaginary pension fund which at 11:00 adjusts its positions of “S” company stock, for instance, it will liquidate its positions (selling them). Liquidation of large positions is often administered by easy algorithms which set the timing of sales and compartmentalise them in such a way not to create price pressure, which would be damaging the very order executor (they would be selling for a lower price). Technically speaking, the market participant – the pension fund, sends a considerable amount of sale orders to the stock-exchange order book, thereby creates pressure on the price drop. Considering the fact that they know about the situation, the orders are sent to the order book in time intervals so that the impact of their doing on the asset price is as low as possible. The price pattern, which you can see in the picture above, is created as a side effect of this process.

Such systematic errors are created in different time frameworks, in different forms. There can be, of course, various reasons for the creation of such a price pattern, some of which can be exactly observed (in contrast to our example where the process can be observed but the specific cause could be identified with large difficulties), for example, reaction to the announcement of a company’s quarterly results. The table below shows an example of changes in some macroeconomic variables which can lead to the creation of a price pattern.

Changes in economic data

Source: https://tradingeconomics.com/

The business idea, based on the observation of price patterns, is very simple: if I observe a particular pattern (systematic error) in the price development, I receive market information about an emerging familiar situation and I know that I am observing a process in which I know what exactly is happening on the market. Traders call this situation “find your market”. Using this situation is intuitive – if I know who, why and what actions is someone performing in the market, I can place my own trade order according to the situation in a way so that I can participate in the price movement. As I already outlined in one of the previous posts, using this trading opportunity, we do not rely on outputs of prediction models but it is a reaction on the current situation, or more precisely, revealing and using a newly-emerged trading opportunity.

Practical use

Our team are developing an application which can help you when searching your preferred price patterns by scanning thousands of titles and testing whether the occurrence of these price patterns is statistically significant, which will save you a lot of time on studying and scanning markets. In addition, the application is equipped with a function which will instantly inform you every time whenever your tracked market records the preferred price pattern, it will, thereby, ensure you never miss a trading opportunity. After having created the overall application workflow, the research on patterns being shown in the application is reaching the peak. Once the library with predefined patterns has been finished, the final production environment will be created.

If you are interested in more information about our service, do not hesitate to contact us.

Michal Dufek

In today’s post, we will introduce our MTA (Multicriterial Text Analysis) software. The MTA product significantly helps users with decisions in the area of shopping for various products and services.

Motivation

The product aims to help users get their head around the large amounts of opinions published on the internet on specific goods or services which they would like to buy or use. User reviews and ratings are scattered on various discussion forums, product review websites and portals dedicated to specific areas. It is difficult and time-consuming for an ordinary user to look up this information, familiarise with it and make own opinion on it.

MTA architecture

Data Collection

To collect data, we use a set of tools (crawlers) to download user reviews and articles about the selected group of products or services. These crawlers are adjusted to the structure of defined websites from which they collect relevant data that can be helpful for topic analysis and attitudes. We have a set of crawlers through which we have already downloaded more than a million user reviews.

Pre-processing

When collecting data, we usually face a few problems. One of the biggest ones is related to varied ways of tagging products on different websites. Even though it is an identical product, there are distinctions in the name, which makes the product identification complicated. For instance, the product “Canon EOS 600D” is listed in all of the following sales names:

It is important to correctly recognise which names identify the same product and assign them with the published reviews. We use methods of machine learning in this process.

For further analysis, it is necessary to modify the obtained reviews. The first step is to divide them into individual sentences which usually include independent topics. Furthermore, we transform words into their basic form and remove diacritics. Additionally, it is applicable to remove words which do not bare any required information value (such as prepositions, conjunctions etc.). To do this, we use our own POS analyser which assigns the word class to words in the sentence and we also use a dataset with word traces created by our own means. Documents edited this way are transferred into vector form, using Tf-idf methodology.

Text Analysis

To analyse large amounts of unstructured data, we use methods of machine learning. Using these, we identify the most discussed topics in the data and we determine reviewers’ positive or negative attitude towards individual features of the products. Using cluster methods (k-means), we divide reviews into clusters with the same topics. We are successfully able to identify clusters with the a high degree of internal integrity where are identified topics related to the main parameters of the product segment which is being looked into. These created clusters for a particular segment, based on professional articles, are further used for classification of reviews to individual products.

Results Presentation

The easiest way how we present results of text analysis is a static report. This output includes product names, their discussed features and statistics on how often are the listed features perceived positively or negatively.

Example

Nikon D850

positive:

* excellent image sensor resolution,
* excellent focus sensitivity,
* comfortable grab,
* unrivalled image quality,
* rear buttons backlit,
* 4k uhd video 1920 x 1080 / record slow motion,
* pleasantly surprised with nikon d850,
* well-managed sum values 6400,
* ergonomics.

negative:

* price,
* gb high consumption,
* more expensive optics,
* use of potential is needed to have adequate quality optics which means the best mani,
* price quality is not free of charge.

We are currently developing an interactive website application as well as an app for mobile devices. At the same time, for easy integration into already existing solutions, there will be API with regularly updated data.

Do not hesitate to contact us for more information or to provide us with feedback.

Jan Přichystal

In today’s article, we will take a look at the possibilities of starting cooperation with our team. The point is to use a specific example to show the potential strength that cooperation offers. You might be saying, what could such cooperation possibly bring me? Read the following paragraphs.

Our activities can help active traders with demonstrable trading results (it does not mean hundreds of per cent in profit but the ability to prove they understand what they are doing) and with evident experience.

What we are offering is the development and enhancement of trading strategies – we have developers, analysts, traders, statistics. A trader who administers their own strategy focuses rather on topics of robustness, trading conditions, risks and strategy workflow more than on yields. And that is exactly where we see the greatest potential for cooperation: every trading strategy can be reinforced, alternatively enhanced by one of the following points:

One of the real-life examples: during a recent opportunity, we met an options trader who successfully trades several of his (relatively well-known and standard) strategies. Knowing that his strategy “suffers” from certain weaknesses, he decided to start cooperation with our group in order to remove these weaknesses. Sharing experience and knowledge, both parties profited from the mutual cooperation – our team gained innovative incentives for our development which consist of a certain (different) view on a generally known trading strategy, and, at the same time, the trader received professional support in the area of automation and statistics/likelihood, which leads to increasing the performance of trading strategies. There can be a multitude of results – from automatic screener and notificator of new trading opportunities, to fully automatised trading strategies including execution, position-sizing and position management.

If you are interested in this article, be it from the point of view of a potential cooperator or client, do not hesitate to contact us with your questions, we are happy to meet you and discuss all possibilities.

Michal Dufek

Data for text analysis is often available only in web presentations in an unstructured form. How to get the data as easily as possible?

For the purpose of downloading text from websites, there are specialised tools called scrapers or crawlers. For some programming languages, there are frameworks which considerably simplify the creation of the scraper tool for individual websites. We use one of the most popular frameworks called Scrapy written in Python.

As a practical example, we can mention a tool for collecting subsidy incentives as supporting documents for Dotační manager (Grant Manager), the largest portal about grants in the Czech Republic, which unites “calls” from various public sources. The tool automatically looks through the structure of a portal, such as Agency for business and innovations, finds a website of s subsidy call and mechanically processes it into a structured format. The tool can be run repeatedly so that it can also catch newly published calls. The whole tool including the source code is available at http://git.pef.mendelu.cz/MTA/oppik-scraper/.

The above example is fairly simple, however, the practice tends to be more complicated. Website structure of every portal varies, often, it is not unified even within one portal, it changes in time etc. In order not to write similar tools for individual sources again and again, we are developing our own robust crawler which is able to automatically get text data from various sources.

Vladimír Vacula

Development of investment strategies is most of the time based on an analysis of the behavior of the financial instruments on historical data. The main axiom is that we are able to quantify certain behavior patterns that will occur in the future and will be similar.

We can identify behavior patterns using three types of analysis

The most commonly used patterns are based on technical analysis. Historical pricing data is very easy to access even in one-minute granularity for all market, and basically it is an analysis of the relative positions of open, high, low and close prices over a certain time period or the value of technical indicators. To search for patterns using technical analysis, you can use the Ta-lib library (https://www.ta-lib.org/ ), which is open-source solutions and contains the well-known technical indicators (Bollinger Bands, Average True Range, Moving Average, RSI, Commodity Channel Index, …) and also includes predefined price patterns (Three Black Crows, Doji, Hanging Man, Marubozu, Shooting Star, …). The use of the price patterns themselves for the analysis of future price developments is a very powerful tool because of the possibility of automated testing on historical data and a direct expression of the price behavior of the financial instruments.

Innovation

As part of the internal development of PatternLab, the goal is to take advantage of standard, affordable and well-known price patterns and to enrich them with additional input parameters. We do enrichment in two ways. The first method is based on the position of the price pattern against the historical price. In practice, this means that in the case of monitoring the price pattern DOJI we are still interested in whether this pattern occurred, for example, by the maximum of the current week or at the maximum of the last month.

The second way to enrich pricing patterns is to deliver fundamental information such as the date of the dividend, earnings, changes in the company, and so on. This information can be obtained from Quandl (https://www.quandl.com ), which aggregates various sources of data. In practice, this means that the DOJI pattern is monitored only a few days after the dividend or only with surprising earnings results etc.

It is also possible to implement alternative data sources such as social media analysis (Instagram, Facebook, Youtube, Twitter – https://www.quandl.com/databases/SMA1 ) , satellite images (https://www.quandl.com/databases/RSMMS ), railroad traffic for individual commodities (https://www.quandl.com/data/RR1-Railroad-Traffic ).

Benefit for PatternLab users

All of these features mentioned above, you can see in PatternLab application which serve to survey and find demande price patterns. PatternLab users will be able to easily use enriched price patterns and get a tool to more accurately analyze future developments in the price of the monitored financial instruments. Users will be able to get information about enriched price patterns historical performance and will know more information about risks and profit potential.

For more information about the analytics tools, please contact us at brno@cyrrusadvisory.cz or +420 538 705 775.

Jan Budík

We will look at mathematical foundations of propability theory and how deep our team has to dive into this theory to manage its optimizing solutions.

Random number generators

Random number generator is the base part of many optimization and or computational  algorithms. Be it Monte Carlo methods, genetic, evolution based algorithms or initialization of weights in deep neural networks. There are still many unknowns about how to use them. And general human intuition falls short.

Ideal random number generator has uniform distribution of produced values and there is absolutely no relationship between them.

Types

Truly random generators

Truly random events are all around us on the low level – we call it noise.

Atmospheric noise is radio noise caused by natural atmospheric processes – everybody can experience it when tuning a radio to a frequency where no radio station transmit. Others are thermal, electromagnetic and quantum events – cosmic background radiation, radioactive decay or as simple as noise generated by various events on semiconductor’s PN junction.

White noise

White noise distribution

Pseudo random generators

In computing we are trying to eliminate noise as much as possible. Computers are very exact and precise – exactly opposite to what we need with randomness.

To get something random from them requires collection of random events from their surroundings or a specific hardware with truly random generator.

Examples of events produced around computers are time (very limited use), delays between keyboard’s key presses, precise mouse movements, delays between packets on the network interfaces etc. The bigger the collection of these events creates better entropy for pseudo random generator initialization “vector” or “seed”.

In other words more time the computer is running the higher entropy for a pseudo random generator.

In programming

There are many ways how to initialize a random number generator with the first seed/vector. But when we are developing a program it’s essential to be able to compare program runs between  each other. It’s useful to set random number generator to the same seed so the run of the program produce the same results.

Random intuition

Human world is driven mostly by Normal distribution processes.

 

pd

Normal distribution

Normal distribution has very low probability of extreme events. Usually in the finance standard deviation of 2 and higher is often considered as an extreme event. In other literature it is strongly advised not to use Normal distribution and it’s connected standard deviation. As it was documented the finance market might be driven by Poisson or other processes with much higher tails.

Poisson distribution

Human intuitions expects Normal distribution and definitely nothing like flat distribution of the white noise.

Randomness with evolution based algorithms

In finance and or trading the simple evidence is the genetic algorithm being able to find bugs in the backtesting process. If there is a bug allowing to look into the future… it usually takes just tens of generations to find out and exploit this bug.

Human intuition would expect the algorithm not to be able to find such edge case. Our experiences dictate, it’s able to find it each and every time.

At the same time human optimized approach to data generation and caching falls short. Of course the genetic algorithm from its definition is going to explore all of the types of data available to search for the solution probabilistically in the whole multidimensional space. Without any slim tails at the border of the space…

Missing entropy is the main cause of low robustness of financial models and trading systems. Respect for its valuation is going to increase robustness and validation od our optimizing solutions.

Milon Krejca

Sorry, this entry is only available in Czech.

One of the previous contributions “RelativeValues Lab principles” dealt with a brief description of the partial problems that software for creating trading strategy based on Relative Value approach should allow to solve. A key step in such a strategy is the identification of pairs of sufficiently similar assets. This is a very unspecified concept and therefore this step can be implemented in different ways. In order to have the searched pairs as stable as possible, it makes sense first to filter out the assets that are unsuitable for pair trading because they have very little in common (technically, economically, politically, etc.) despite their temporary (random) behavior can be similar. Initial selection can be done through various easily available indicators (descriptive and fundamental, but technical indicators can be used for these purposes too). … Among the pre-selected candidates, the pairs of those who have historically moved as much as possible together are being sought.

Several procedures based on different mathematical and statistical methods can be used for co-movements testing of time series pairs:

The output of the analysis of correlation between the time series values differences tells about the strength of the common movement match, so that series with a high correlation of value differences can move away from each other (eg. both grow, while one grows faster) and no longer have to get closer to each other.However, desirable for pair trading is a situation where the time series are driven back each other again after a temporary deviation, what is not guaranteed by the presence of a high correlation of value differences. The co-integration analysis is able to better identify this time series attraction phenomenon.The analysis of the value differences correlation can be used as one of the initial filters for selecting candidates, among which the pairs suitable for pairs trading are then sought.

Cointegration analysis is the most common current approach used to identify pairs. It relies on the statistical concept of stationary random processes, which are non-trend processes, having the constant variance with changing time and the same course of autocorrelation function. Most financial time series do not meet these characteristics, so they are non-stationary processes, however stationarity testing can be used to find common behavior of time series.

In general, a linear combination of two non-stationary time series results again in a non-stationary time series. However, if time series are generated by the same or very similar processes, it is possible that some linear combination of these series results in a stationary series. The basic idea of co-integration analysis is thus to find a pair of non-stationary time series whose linear combination creates a stationary time series. In order to consider series to be co-integrated, residuals from the found linear combination of the series must be stationary.

For time series entering the analysis, various transformations, such as normalization, original values logarithms, or moving averages, are often used due to assumptions for certain methods using. The most known method for finding a linear combination is the so-called ordinary least squares method, the so-called total least squares method (orthogonal regression) can also be used. Many unit root tests are dealing with testing of the series (residues) stationarity, the most known representative is the ADF test, PP test or KPSS test. For better decision making, it is advisable to perform a KPSS test and one of the other tests of this type (ADF). The KPSS test differs from the others by constructing a null hypothesis (it looks for evidence for non-stationarity, while ADF and others for stationarity), if the tests agree, there is a very high probability that the conclusion is really true.

In addition to testing the stationarity of residuals from the found linear combination, it is advisable to perform a simple analysis of the frequency of the series returns to its mean value, or time between intersections respectively. The advantage of this analysis is the fact that it targets directly a particular feature of a series that is, from a practical point perspective, most important for pair trading purposes. The found pair exhibiting residual stationarity may have such a low frequency of return to its mean that it will be uninteresting for pairs trading, as it would be needed by such a pair to hold positions open for too long, which can be a problem from a risk management perspective. Thus, stationarity itself does not have such a strong meaning here, one of its manifestations is more significant rather: the high frequency of time series returns to its mean value.

However, after finding the pairs with properties close to ideal requirements (with the residuals from a linear combination that are stationary and having a high enough frequency of return to mean), it can be assumed with certainty that these relationships will not last forever. A period of 6 months is normally recommended as a suitable period for repeating the search process.

Naďa Chalupová