The popularity of data science techniques such as data mining and machine learning has grown enormously in recent years. They present effective solutions to process and analyze the huge amount of data available to risk managers and financial analysts.

With the advances in computing power and distributed processing, it is now possible to process - and make sense of - the vast array of information that can be gathered from several different data sources.

This hands-on programme covers key techniques - including several aspects of supervised and unsupervised machine learning - that can be used when mining financial data. The programme also focuses on advanced data science techniques that are becoming widely used in financial markets for text analysis and artificial intelligence (AI): Natural Language Processing (NLP) and Deep Learning (DL).

The programme is delivered entirely through workshops and case studies. Participants will learn how to implement natural language processing techniques by building a sentiment analysis model to analyze text. In the deep learning section, participants will focus on the different neural networks that can be put at work for data classification, time-series forecasting and pattern recognition.

All exercises and case studies are illustrated in Python, allowing you to learn how to work with this flexible, open-source programming language.

Date: 10th - 12th April 2019

Venue: Central London

Fee: £1330 per day

You might be eligible for preferential rates. Please contact us to check if your company is a member of the LFS Global Client Programme.

Who The Course is For

  • Portfolio managers
  • Risk managers
  • Professionals looking to introduce data-mining concepts in their day-to-day tasks
  • IT developers
  • Statisticians
  • Quant analysts
  • Financial engineers
  • Consultants

Learning Objectives

  • Build a solid knowledge base on data mining techniques and tools, as well as their application to the financial industry
  • Gain hands-on experience with Natural Language Processing and Deep Learning in finance
  • Learn how to apply Python to data mining and processing, and to solve real-world NLP and DL problems
  • Gain an understanding of Artificial Neural Networks (ANN) algorithms and how to use them to design, build and develop DL models

Prior Knowledge

  • Basic notions of statistics
  • Good working knowledge of Excel
  • No prior knowledge of Python is required

Course Outline

Day One

Overview of Data Mining

Laying out the different components of data mining

  • Association rules
  • Classification vs. regression problems
  • Clustering analysis

Data Visualization

  • Overview of third-party solutions (Tableau, QlikeTech, etc.) for visualization of large sets of data. Case studies will be worked out using matplotlib-library and plotly (open-source online data-collaboration platform)
  • Graphical databases: applying network theory to portfolio analysis and introduction to graphical databases
  • Outlier detection
  • Mahalanobis Distance


  • OLS (ordinary least squares)
  • Ridge regression
  • Sparsity
  • Lasso
  • Elastic Net

Workshop: Working out the optimal hedge of a large real-world equity portfolio using futures. The portfolio has a global nature (100+ shares), but only a limited set of futures is available

Principal Component Analysis (PCA)

  • Principal component analysis of the term structure of interest rates and implied volatilities
  • Principal component regression (PCR)
  • Partial least squares (PLS)

Workshop: Using PCA to reduce the dimensionality of a large data set of historical interest rate curves. The complex behaviour of this curve is spread over different maturities and this technique allows a risk manager to have a much better view of the dynamics of interest rate curves

Data Classification – Regression

Kernel Density Estimation and Classification

  • Kernel density estimation is an unsupervised learning procedure, which leads to a simple family of procedures for non-parametric classification

Case Study: Using kernels to derive probability distributions for financial data

Classification - Part I

  • Naive Bayes classification: A straightforward and powerful technique to classify data

Case Study: Working out a Bayes-predictor for a large data set containing different attributes of US banks. The Bayes classifier will be used to separate those banks that are likely to fail from those that are going to remain solvent

Classification - Part II

  • Robust data mining techniques
  • Logistic Regression

Case Study: Applying log-regression on a real-world dataset with high dimensionality

Day Two

Data Classification (cont.)

Classification - Part III

  • Classification Trees: CART-modelling leads to easy-to-use practical decision trees
  • The concept of decision trees will be extended with techniques such as Random Forest and Bagging

Case Study: Concepts such as cost functions, impurity levels, tree pruning and cross-validation will be handled in detail

  • K-Nearest Neighbour learning
  • Logistic Regression

Case Study: The classification methods (K-Nearest and CART) are going to be put at work on different technical indicators (RSI, MACD, etc.) of large sets of real-world financial data. This will illustrate how these classifiers can be used to partition stocks in different buckets according to the strength of different attributes in a fast way

Workshop: Data mining tools

An introduction to Python - A powerful programming language

The applicability of Python in the domain of data analysis will be illustrated through practical examples with a focus on machine learning using the 'scikit-learn' package. All examples will be covered in Jupyter-notebooks. Delegates will learn how to build custom reports in Python

Day Three

Natural Language Processing

Extracting real value from social media posts, images, email, PDFs and other sources of unstructured data is a big challenge for enterprises.

This section is devoted to the application of Natural Language Processing (NLP) to extract value from unstructured data. Several real-world examples of examining unstructured data in finance - including sentiment analysis of financial news - will be explored.

Workshop: Using the NLTK package of Python to:

  • Explore and tokenize a text using Tf-Idf and Count Vectors
  • Predict words in a text: building a word predictor starting from a text; writing a programme that can predict the word that follows a given word
  • Understand the sentiment of a news item on a particular stock

Deep Learning

Deep Learning as a subfield of machine learning - Artificial Neural Networks (ANN) algorithms.

  • Introduction to Deep Learning
  • Forward propagation
  • Word2vec approach
  • Deeper networks and forward propagation
  • Optimizing Neural Network with backward propagation

Case Study: Building a Deep Learning model with Python (with a focus on the Keras and Tensorflow packages)

Program taught in:

See 4 more programs offered by London Financial Studies »

Last updated October 25, 2018
This course is Campus based
Start Date
Apr. 10, 2019
3 days
3,975 GBP
£1325 per day
By locations
By date
Start Date
Apr. 10, 2019
End Date
Apr. 12, 2019
Application deadline

Apr. 10, 2019

Application deadline
End Date
Apr. 12, 2019

LFS Webcast series - Applying Data-Mining in Finance