Topics on Machine Learning for Algorithmic Trading
Time: Mon 2024-12-16 14.00
Location: F3 (Flodis), Lindstedtsvägen 26 & 28, Stockholm
Language: English
Subject area: Applied and Computational Mathematics Mathematical Statistics
Doctoral student: Hanna Hultin , Sannolikhetsteori, matematisk fysik och statistik
Opponent: Professor Olivier Guéant, Université Paris 1 Panthéon-Sorbonne
Supervisor: Professor Henrik Hult, Sannolikhetsteori, matematisk fysik och statistik; Alexandre Proutiere, Reglerteknik
QC 2024-11-20
Abstract
Recent advancements in machine learning have opened up new possibilities for algorithmic trading, enabling the optimization of trading strategies in complex market environments. This thesis aims to improve algorithmic trading methods by developing machine learning models for the realistic simulation of limit order books and the learning of optimal strategies. Consisting of three papers, the thesis combines theoretical insights with practical applications.
The first paper presents a generative model for the dynamic evolution of a limit order book, using recurrent neural networks. The model captures the complete dynamics of the limit order book by decomposing the probability of each transition of the limit order book into a product of conditional probabilities for order type, price level, order size, and time delay. Each of these conditional probabilities is modeled by a recurrent neural network. Additionally, the paper introduces several evaluation metrics for generative models related to order execution. The generative model is trained on both synthetic data generated by a Markov model and real data from the Nasdaq Stockholm exchange.
The second paper proposes an iterative deterministic policy gradient method for stochastic control problems in finance, which incorporates both temporary and permanent market impact. The method is based on a derived policy gradient theorem and uses mini-batch stochastic gradient descent for optimization. It is applied to both order execution and option hedging, demonstrating consistently strong performance across several objectives and market dynamics.
The third paper studies a policy gradient method with parameter-based exploration, where a single deterministic policy is sampled at the beginning of an episode and used throughout the whole episode. A marginal equivalence between parameter-based and action-based exploration is shown, facilitating the adaption of previously established convergence results for policy gradient methods with action-based exploration. Convergence rates to first-order stationary points are derived under mild assumptions, and global convergence is established under an introduced Fisher-non-degenerate condition for parameter-based exploration.