Intro to Machine Learning in Less Than 50 Lines of Code

Machine learning is increasing in popularity and is a buzzword in the quantitative finance community. After all, it is a branch of artificial intelligence where algorithms and mathematical models are used to progressively improve performance on a specific task.  Today we will be covering the basic framework of coding out a machine learning algorithm on FXCM’s CFD index, SPX500.  This article is based on the free course Introduction to Machine Learning by QuantInsti. You can find this course along with many others within our QuantNews community here (https://community.quantnews.com/c/fx-cfd-crypto).

The machine learning algorithm in this article will learn from basic open and close data. To begin, there are two main types of machine learning models; supervised and unsupervised models (much more on these topics can be found in this course). Unsupervised models only have input data and no corresponding output data. The algorithm does not have any pre-set categories to classify the output into.  Supervised learning models, on the other hand, are models where you have input variables and an output variable. An algorithm is used to classify input into one category of output.  The input is represented by (X) and the output is represented by (Y). The X is a dataset that holds the variables which are used to predict (Y). For today’s example, the (X) variable will consist of ‘Open – Close’ and ‘High – Low’ data on the FXCM instrument SPX500.

 1. Connect to REST API

We will be using the fxcmpy wrapper as well as REST API so if you don’t have a token yet, go ahead and follow the steps on this page first and come back.

2. Install and Import the Libraries

There are many machine learning libraries for Python that can be used but today we will be using one of the most popular libraries called scikit-learn. First, make sure to install the needed libraries by using pip. You will need a working installation of numpy and scipy which will already be installed if you use anaconda.


pip install -U scikit-learn

Then open the IDE you will be using and import the libraries you will need today. I am using Jupyter notebook as my environment. The first three imports will be from the Sk-learn library and will be used in the machine learning process. Pandas and numpy will be used for data manipulation and matplotlib and seaborn will be used for plotting the instrument and our returns.  Lastly, I will be importing the FXCMpy wrapper so that I pull FXCM historical data for our machine learning algo.

from sklearn.svm import SVC
from sklearn.metrics import scorer
from sklearn.metrics import accuracy_score

import pandas as pd
import numpy as np
import fxcmpy

import matplotlib.pyplot as plt
import seaborn

3. Download the Data

Next we need to pull the historical data for the instrument we want to use machine learning on. The token for my FXCM demo account has been saved in a config file so I can easily connect to a FXCM demo account and pull historical data. One year of data will be pulled for the SPX500 and it will be visualized using matplotlib. The SPX500 is an index CFD, which is a derivative product based on the price of the E-Mini S&P 500 Future.

# Pull the historical Data
import datetime as dt
socket = fxcmpy.fxcmpy(config_file = 'fxcm.cfg')
import matplotlib.pyplot as plt
%matplotlib inline

Df = socket.get_candles(instrument = 'SPX500', period = 'D1', start = dt.datetime(2017,1,1), end = dt.datetime(2018, 8, 1))        
Df= Df.dropna()
Df = Df.rename (columns={'askopen':'Open', 'askhigh':'High','asklow':'Low', 'askclose':'Close'})

Df.Close.plot(figsize=(10,5))
plt.ylabel("SPX500 Price")
plt.show()

4. Determine Trading Signal

Next we will determine the signal for trading. If tomorrow’s price is greater than today’s price then we will buy the SPX500. If not, we will sell the SPX500. We will store +1 for buy signal and -1 for sell signal in Signal column. Y is a target dataset storing the correct trading signal which the machine learning algorithm will try to predict.

Y = np.where(Df['Close'].shift(-1) > Df['Close'],1,-1)

5. Create the Predictor Dataset

The input data will be defined in the next step. The X dataset will hold the variables which will be used to predict Y.  X consists of variables such as ‘Open – Close’ and ‘High – Low’. These are indicators that the algorithm will use to predict tomorrow’s trend; (1) if the price will go up and (-1) if it will go down.

Df['Open-Close'] = Df.Open - Df.Close
Df['High-Low'] = Df.High - Df.Low
X=Df[['Open-Close','High-Low']]
X.head()

6. Split Data into Training and Test Dataset

The data will now be split between a training and a test set. The First 80% of the data is used for training and the remaining data for testing. We will name the training datasets X_train and Y_train. The test data sets will be named X_test and Y_test.

split_percentage = 0.8
split = int(split_percentage*len(Df))
# Train data set
X_train = X[:split]
Y_train = Y[:split]

# Test data set
X_test = X[split:]
Y_test = Y[split:]

7. Define the Classifier

We will use SVC() function from sklearn.svm.SVC library for the classification and to create our classifier model using the fit() method on the training data set. SVC stands for Support Vector Classification. The classification will essentially fit the data provided and return a “best fit”. We define the classifier with the following code:

cls = SVC().fit(X_train, Y_train)

 8. Calculate accuracy of the Classifier

To compute the accuracy of the algorithm on the train and test data set, the actual values of signal will be compared with the predicted values of signal. The function accuracy_score() will be used to calculate the accuracy.
The syntax is as follows: accuracy_score(target_actual_value,target_predicted_value)
1. target_actual_value: correct signal values
2. target_predicted_value: predicted signal values

accuracy_train = accuracy_score(Y_train, cls.predict(X_train))
accuracy_test = accuracy_score(Y_test, cls.predict(X_test))

print('\nTrain Accuracy:{: .2f}%'.format(accuracy_train*100))
print('Test Accuracy:{: .2f}%'.format(accuracy_test*100))

Traders will want to look for a percentage higher than 50% in the test accuracy results. An accuracy of 50% or more in test data suggests that the classifier model is effective.

9. Prediction

We will predict the signal (buy or sell) for the test data set, using the cls.predict() function.  We will compute the strategy returns based on the predicted signal, and then save it in the column ‘Strategy_Return’ and plot the cumulative strategy returns.

Df['Predicted_Signal'] = cls.predict(X)
# Calculate log returns
Df['Return'] = np.log(Df.Close.shift(-1) / Df.Close)*100
Df['Strategy_Return'] = Df.Return * Df.Predicted_Signal
Df.Strategy_Return.iloc[split:].cumsum().plot(figsize=(10,5))
plt.ylabel("Strategy Returns (%)")
plt.show()

Based on the graph, the strategy generates a return of up to 7% when tested on the test data set. You can learn more about machine learning models in the QuantInsti course here –  Trading with Machine Learning: Classification and SVM. As mentioned before, this article was based on the free course on Introduction to Machine Learning by QuantInsti. You can enroll now by following this link: https://community.quantnews.com/t/free-course-by-quantra-intro-to-machine-learning-for-trading/57.


Risk Warning: The FXCM Group does not guarantee accuracy and will not accept liability for any loss or damage which arise directly or indirectly from use of or reliance on information contained within the webinars. The FXCM Group may provide general commentary which is not intended as investment advice and must not be construed as such. FX/CFD trading carries a risk of losses in excess of your deposited funds and may not be suitable for all investors. Please ensure that you fully understand the risks involved.