Historically, the expertise and tools required in cancer research had little, if nothing, to do with the skills and tools used in quantitative finance. But new machine learning techniques are proving useful in both health and finance.
A working paper by Zura Kakushadze, president and co-founder of Quantigic Solutions and a professor at the Free University of Tbilisi’s business school and school of physics, along with Willie Yu, a research fellow at Singapore’s Duke-NUS Medical School, outlines a machine learning model for predicting treasury yields that has roots in the world of health care.
In 2015, Kakushadze received an email from an old friend who works in computational biology in the field of cancer research, asking him to look at a problem he was stuck on. “That was out of left field for me because I had never worked on computational biology. I was not a cancer research expert in any way, shape or form and so I was surprised. But I asked my friend to send me the data so I could look at it with no expectations.”
Kakushadze was intrigued by the data because he had solved similar problems in quantitative trading.
During the work, Kakushadze used a machine learning tool called non-negative matrix factorization, which is popular in fields like computer vision, document classification, biogenetics and computational biology, but hasn’t been used widely in quantitative finance and investing.
A particular problem faced by the cancer researchers was that, when applying the machine learning algorithm, there was a lot of noise. But Kakushadze was able to apply his expertise from finance to help denoise the data. “Where my expertise came in, and the quantitative trading expertise came in, was that I was able to apply certain methodologies to first take the data, remove the noise from the data and then apply this machine learning algorithm called non-negative matrix factorization to the so-denoised data and then magically things got improved by an order of magnitude.”
After this interdisciplinary success, Kakushadze decided to try to turn the tables and apply the popular method of non-negative matrix factorization to quantitative finance.
While this kind of algorithm can’t use negative numbers, there are many negative numbers in finance. So Kakushadze mulled over where this could be applied in finance and landed on treasury yields.
“One idea was to take the treasury yields — which is exactly what we did in the paper — and try to understand what are the underlying factors in these treasury yields using non-negative matrix factorization.”
The paper walks through the details of applying the machine learning algorithm to treasury yields, using learnings from the work done previously in the area of cancer research such as the importance of denoising the data. “We had all of those tools and we had to understand how to apply them to treasury yields and that’s exactly what we did.”
By using denoised data and non-negative matrix factorization, the researchers were able to build a factor model and identify clean underlying factors that drive treasury yields.
Investors can then look back at periods of time to find trends.
People are also interested in forecasting, Kakushadze says. “Now just as with the stock market, treasury yields are no different. There’s no crystal ball. So you can’t just write a perfect model that will predict your interest rates moving forward because, if you could do that, then you wouldn’t have to do anything.”
He also highlights that there’s always noise and unforeseen events. However, the algorithm can help identify good factors and provide their weights. “Then you can look at these factors and . . . you can assume that the weights are going to remain constant, and then by looking at the trends and the factors you can try to predict what’s going to happen into the future.”
Kakushadze does caution it isn’t a perfect system. “With our methodology, the advantage is that the factors are clearly interpretable . . . and you are able to use them for forecasting, with a caveat, which is present in every single method under the sun, that basically something in your machine learning computation may or may not be stable for the time period into the future for which you were trying to do forecasting.”
A copy of the paper can be found here: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3514832