Exploring FXCM’s Free Trader Sentiment Data with Python and Pandas

FXCM offers premium data packages with valuable sentiment, volume and order flow data. In this article we will download a sample of the sentiment data set into a Pandas DataFrame and do some exploratory data analysis to better understand the story this data tells.

What is Sentiment Data?

FXCM’s Speculative Sentiment Index (SSI) focuses on buyers and sellers, comparing how many are active in the market and producing a ratio to indicate how traders are behaving in relation to a particular currency pair. A positive SSI ratio indicates more buyers are in the market than sellers, while a negative SSI ratio indicates that more sellers are in the market. FXCM’s sentiment data was designed around this index, providing 12 sentiment measurements per minute:

  • LongAmountK is the total volume of FXCM retail clients that are long
  • ShortAmountK is the total volume of FXCM retail clients that are short
  • SSIHist is calculated by LongAmountK/ShortAmountK if the ratio is positive, or the ShortAmountK/LongAmountK if the ratio is negative
  • LongAmountKNET is the total volume of all retail clients what have a net long position on the instrument
  • ShortAmountKNET is the total volume of all retail clients what have a net short position on the instrument
  • SSIHistNET is calculated by LongAmountKNET/ShortAmountKNET if there are more net longs than shorts, and ShortAmountKNET/LongAmountKNET if there are more net shorts than longs
  • LongAmountKOrders is the total number of open long positions among FXCM retail clients
  • ShortAmountKOrders is the total number of open short positions among FXCM retail clients
  • SSIHistOrders is calculated by LongAmountKOrders/ShortAmountKOrders if there are more long positions than short positions, and ShortAmountKOrders/LongAmountKOrders if there are more short positions than long positions
  •  LongAmountKOrdersNET is the total number of retail clients that have a net long open position
  • ShortAmountKOrdersNET is the total number of retail clients that have a net short open position
  • SSIHistOrdersNET is calculated by dividing LongAmountKOrdersNet/ShortAmountKOrdersNET if there are more net long orders than net short positions, and ShortAmountKOrdersNET/LongAmountKOrdersNet if there are more net short positions than net long positions

Download Sentiment Data into a Pandas DataFrame

First, we will download the sentiment data set into a Pandas DataFrame. The sample data is stored in a GNU compressed zip file on FXCM’s GitHub as https://sampledata.fxcorporate.com/sentiment/{instrument}.csv.gz. To download the file, we’ll use this URL, but change {instrument} to the instrument of our choice. For this example we’ll use EUR/USD price.

import datetime
import pandas as pd
url = 'https://sampledata.fxcorporate.com/sentiment/EURUSD.csv.gz'
data = pd.read_csv(url, compression='gzip', index_col='DateTime', parse_dates=True)

Dataframe containing sentiment data

Examining the DataFrame, we can notice two things that we need to address before going any further. Firstly, this data set is in Eastern time, however the m1 data we will download later is in UTC, so we will convert the time of the sentiment data into UTC:

import pytz
data = data.tz_localize(pytz.timezone('US/Eastern'))
data = data.tz_convert(pytz.timezone('GMT'))

In the DataFrame each sentiment measurement is combined in the ‘Name’ column, but it would be easier to examine and and evaluate the data by separating each measure into its own column. To do that we will use the pivot method to create a pivot table.

sentiment_pvt = data.tz_localize(None).pivot(columns='Name', values='Value')

Dataframe to pivot table

Pull Historical Price Data for Same Period

Now that we have downloaded sentiment data, it would be helpful to have the price data for the same instrument over the same period for analysis. Note the sentiment data is in 1-minute increments, so we will need to pull 1-minute EURUSD candles. We could pull this data into a DataFrame quickly and easily using fxcmpy, however the limit of the number of candles we can pull using fxcmpy is 10,000, which is fewer than the number of 1-minute candles in January 2018. Instead, we can download the candles in 1-week packages from FXCM’s GitHub and create a loop to compile them into a DataFrame. This sounds like a lot of work, but really it’s only a few lines of code. Similarly to the sentiment data, historical candle data is stored in GNU zip files which can be called by their URL.

url = 'https://candledata.fxcorporate.com/'
periodicity='m1' ##periodicity, can be m1, H1, D1
url_suffix = '.csv.gz' 
symbol = 'EURUSD' 
start_dt =  datetime.date(2018,1,2)##select start date
end_dt = datetime.date(2018,2,1)##select end date

start_wk = start_dt.isocalendar()[1]
end_wk = end_dt.isocalendar()[1] 
year = str(start_dt.isocalendar()[0])


for i in range(start_wk, end_wk+1):
            url_data = url + periodicity + '/' + symbol + '/' + year + '/' + str(i) + url_suffix
            tempdata = pd.read_csv(url_data, compression='gzip', index_col='DateTime', parse_dates=True)
            data=pd.concat([data, tempdata])

Dataframe of price data

We now have a DataFrame with sentiment data and a DataFrame with price data. To make it easier to compare the data, we will combine the AskClose price with the sentiment data:

frames = data['AskClose'], sentiment_pvt.tz_localize(None)
combineddf = pd.concat(frames, axis=1, join_axes=[sentiment_pvt.tz_localize(None).index], ignore_index=False).dropna()

Dataframe of price and sentiment data

Exploratory Data Analysis

Now we can begin to explore the data. First, we can view some descriptive statistics of each index by calling .describe() on a column of the DataFrame:


Next, we can use Seaborn to create a heatmap of the correlation. Seaborn is a great package for creating visualizations from DataFrames, and can be imported as sns as a general convention.

import seaborn as sns


Heatmap of sentiment data and price

We can also get an idea of how the data is distributed by creating a histogram of one of the sentiment indexes.

sns.distplot(combineddf['SSIHistOrders'], hist=True, kde=True, 
             bins=int(180/5), color = 'darkblue', 
             kde_kws={'linewidth': 4})

Histogram of SSI Hist Orders and price

We can clearly see the data is not normally distributed.

Now let’s plot price against a sentiment index, SSIHistOrders, which when negative is the ratio of short orders to long orders. Charting SSIHistOrders and price, it would appear as price increases, SSIHistOrders moves more negative, and when price decreases, SSIHistOrders moves closer to positive territory.

fig, ax = plt.subplots(figsize=(10,8))

ax1 = combineddf['AskClose'].plot(ax=ax, color='b')
ax1.set_ylabel('EURUSD', color='b', fontsize=10)

ax2 = ax1.twinx()
combineddf['SSIHistOrders'].plot(ax=ax2, color='g')
ax2.set_ylabel('SSIHistOrders', color='g',fontsize=10)

ax.set_title("Price vs SSIHistOrders")
ax1.legend(loc='upper left')
ax2.legend(loc='upper right')

Chart of SSI Hist Orders and price

Now that we have price data and sentiment data we can explore the relationship between the two. An interesting experiment would be to run the data through some machine learning algorithms and building a trading strategy with our results. To find out more about the data available for free download, visit FXCM’s Github page.

Need more historical data or want to subscribe to receive this sentiment data live? Contact FXCM’s premium data department at premiumdata@fxcm.com.

Risk Warning: The FXCM Group does not guarantee accuracy and will not accept liability for any loss or damage which arise directly or indirectly from use of or reliance on information contained within the webinars. The FXCM Group may provide general commentary which is not intended as investment advice and must not be construed as such. FX/CFD trading carries a risk of losses in excess of your deposited funds and may not be suitable for all investors. Please ensure that you fully understand the risks involved.