Digital Assets

Love it or hate it, crypto is here to stay!

The digital asset world is anxiously waiting for the Securities and Exchange Commission’s decision on the  Winklevoss Bitcoin Trust ETF (COIN). Digital Asset Services, LLC, the sponsor of the Winklevoss Bitcoin Trust, has filed a preliminary registration statement for the Trust with the U.S. Securities and Exchange Commission (SEC) to offer its Winklevoss Bitcoin Shares to investors after the Trust’s registration statement is declared effective by the SEC. The Trust’s purpose is to hold only Bitcoin. The Shares will be listed on the Bats BZX Exchange under the ticker symbol “COIN”

For some firms, regulated investment vehicles are the only practical way to obtain these new digital assets like Bitcoin. However, COIN is not the first product available to investors.

Global Advisors, regulated by the Jersey Financial Services Commission have the Global Advisors Bitcoin Investment Fund plc (GABI) which is listed on the Channel Islands Securities Exchange. In addition to GABI, Global Advisors also manages two bitcoin-tracking Exchange Traded Notes (ETN), both listed on Nordic Nasdaq under tickers COINXBT & COINXBT. These are issued by XBT Provider AB located in Stockholm, a company acquired by Global Advisors.

Grayscale Investments have the Bitcoin Investment Trust. Eligible shares of the Bitcoin Investment Trust are quoted on OTCQX® under the symbol GBTC.

The Zurich-based private bank Vontobel AG offers a tracker certificate, listed on the Swiss SIX Structured Products Exchange.

Revoltura  have an Exchange Traded Instrument (ETI) listed on the Gibraltar Stock Exchange and trades on the Deutsche Börse.

Maltese Prime Broker EXANTE has a Bitcoin Fund which is traded exclusively on the EXANTE fund platform. They are Authorized and regulated by the Malta Financial Services Authority.

Investment firms Fortress Investment Group (FIG), Benchmark Capital and Ribbit Capital have teamed up with Pantera Capital, the U.S. Bitcoin investment firm to launch a Bitcoin investment fund called the Pantera Bitcoin Partners Fund.

SolidX Partners Inc have filed a registration statement with the SEC to offer a Bitcoin Trust on the NYSE Arca exchange.

The second product to be launched by Grayscale Investments, Ethereum (ETC) Investment Trust, allows investors to gain exposure to the price movement of Ethereum Classic through the purchase of a titled security.

The world’s leading and most diverse derivatives marketplace, CME Group, is getting in on the action too. CME Group has launched the CME CF Bitcoin Reference Rate (BRR) and CME CF Bitcoin Real Time Index (BRTI), a standardized reference rate and spot price index with independent oversight. These tools are expected to boost the adoption of Bitcoin trading and hopefully establish digital assets as a new asset class.

If you aren’t investing in or trading digital assets yet, you should pay attention. This space could get really big!

I’m offering a “Cryptocurrency Trading with Python” online workshop – moderated by Ernest Chan.

It’s on March 11th / 18th  (2 Consecutive Saturdays). Registration is here:

Gemini‘s Sandbox environment will be used, which offers full exchange functionality using test funds, for testing API connectivity and the execution of strategies.

The goal of the workshop is to help raise awareness of crypto. With exchanges acting as the on/off ramps between digital and fiat currency, more liquidity at the exchanges will ultimately help with the adoption of digital assets. This is a good thing – I am a believer and I am long crypto. More info can be found on The Cointelegraph.


Strategy Replication – Evolutionary Optimization based on Financial Sentiment Data

Wow, I enjoyed replicating this neatly written paper by Ronald Hochreiter.
Ronald is an Assistant Professor at the Vienna University of Economics and Business (Institute for Statistics and Mathematics).

In his paper he applies evolutionary optimization techniques to compute optimal rule-based trading strategies based on financial sentiment data.

The evolutionary technique is a general Genetic Algorithm (GA).

The GA is a mathematical optimization algorithm drawing inspiration from the processes of biological evolution to breed solutions to problems. Each member of the population (genotype) encodes a solution (phenotype) to the problem. Evolution in the population of encodings is simulated by means of evolutionary processes; selection, crossover and mutation.
Selection exploits information in the current population, concentrating interest on high-fitness solutions. Crossover and mutation perturb these solutions in an attempt to uncover better solutions. Mutation does this by introducing new gene values into the population, while crossover allows the recombination of fragments of existing solutions to create new ones.

After reading Ronald’s paper I immediately wanted to test the hypothesis that the model is good at predicting the 1-day ahead direction of returns. For example, when the rule determines to go long, are the next day returns positive and when the rule determines to exit the long position or stay flat, are the next day returns negative. The results are not much better than a flip of a coin (see the results in the attachment below). Also, turnover is high (see the plot) which may warrant the strategy useless.

However, many variations on this genetic algorithm exist; different selection and mutation operators could be tested and a crossover operator could be added. Instead of using financial sentiment data a variety of technical indicators could be used for generating an optimal trading rule – e.g. see “Evolving Trading Rule-Based Policies“.

I emailed Ronald to get clarification regarding several questions I had. He kindly and swiftly responded with appropriate answers.

  • No crossover is used as the chromosome is too short.
  • The target return for the Markowitz portfolio is calculated as the mean of the scenario means, i.e. mean of the mean vector.
  • Pyramiding is not considered. The rule just checks whether we are invested (long) in the asset or not.
  • A maximum number of iterations is specified as the stopping rule.

Like my earlier post, End-of-Day (EOD) stock prices are sourced from QuoteMedia through Quandl’s premium subscription and the StockTwits data is sourced from PsychSignal.

The following comparisons and portfolios were constructed:

  1. In-Sample single stock results – Long-only buy-and-hold strategy vs. Optimal rule-based trading strategy
  2. Out-of-Sample Buy-and-hold optimal Markowitz portfolio
  3. Out-of-Sample Buy-and-hold 1-over-N portfolio
  4. Out-of-Sample Equally weighted portfolio of the single investment evolutionary strategies

I used R packages quadprog and PerformanceAnalytics but I wrote my own Genetic Algorithm. I’ll continue using this algorithm to evaluate other indicatorssignals and rules  🙂

Here’s some code with the results. The evolutionary risk metrics (pg. 11) are not as good as those in the original paper (I used 100 generations for my GA) but as you can see, my output is almost identical to Ronald’s. A clone perhaps – hehe.


If you have a specific paper or strategy that you would like replicated, either for viewing publically or for private use, please contact me.

Strategy Replication – Nonlinear SVMs can systematically identify stocks with high and low future returns

I’ve replicated the following academic paper from my favourite journal;

• Title: Nonlinear support vector machines can systematically identify stocks with high and low future returns
• Authors: Ramon Huerta, Fernando Corbacho, and Charles Elkan
• Journal: Algorithmic Finance (2013) 45-58 45, DOI 10.3233/AF-13016, IOS Press,


The authors explore if there are features in accounting data and in historical price information that can help predict the stock price changes of companies.

The original source data was from the CRSP(Center for Research in Security Prices)/Compustat merged database (CCM); 7 technical features are calculated from CRSP and 44 fundamental features are obtained from Compustat.
All U.S. stocks between 1981 and 2011 are used.

Support Vector Machines (SVM) is used as the classifier technique to help predict the future direction of the stock price returns. A distinct contribution by the authors is the selection of hyper-parameters of the SVM model by a type of reinforcement learning.

Stocks that do not seem to have strong correlations with the technical and fundamental features are removed from the training data set. This leads to a significant reduction in computational time without hindering the predictive power of the model.
When forming the tail sets that constitute the positive and negative classes of the training data, an ordered list of stocks with volatility-adjusted returns is created. The estimate of the volatility is an exponential moving average, using a type of absolute deviation calculation.

The fundamental features come from accounting data; Income Statements, Balance Sheets, Statement of Cash Flows (e.g. BV, EPS). Technical features were selected by the authors based on the following claims:
• Stocks with high (low) returns over periods of three to 12 months continue to have high (low) returns over subsequent three to 12 month periods.
• Volume is a way to characterize under-reactions and over-reactions in stock price changes.
• The number of n-day highs and n-day lows as suggested in prominent literature.
• The maximum of daily returns is considered an indicator of the interest for traders with few open positions.
• A proxy for the resistance level is used (i.e. the percentage difference from the closest peak in the past) because, at least psychologically, it can be an important factor for traders.

A model for each sector is built, as defined by the Global Industry Classification Standard (GICS); Energy (10), Materials (15), Industrials (20), Consumer Discretionary (25), Consumer Staples (30), Health Care (35), Financials (40), Information Technology (45), Telecommunication Services (50), and Utilities (55). Stocks without a sector are omitted and because the Telecommunication Services and Utilities sectors have so few stock, they too are omitted.

My implementation

I was able to download the CCM data through the Wharton Research Data Service (WRDS) interface. Substantial effort was required to load, subset, merge and clean the data.

It is worth noting that a Google Summer of Code project to develop a WRDS CRSP R package is in progress and it’s use is something to consider in the future.

Typically, the number of features characterizing each stock varies from 7 to 51 depending on whether technical data, fundamental data, or both are used.
However, using all these features with 30 years of data for ALL stocks in the U.S. resulted in the replication taking far too long to simulate. The SVM model was also re-trained every day. I contacted the original authors of the paper and they told me I may reproduce the results in 3-4 months of computation if I have access to several multicore computers.

Therefore, to ensure I could at least work through the strategy replication process and test the main hypothesis, I had to reduce the dataset as follows:
• Removal of the fundamental features. The paper demonstrated that these were not as good as the technical features in characterizing the stocks (i.e. the predictive power was not as good).
• The stocks are divided into 8 sectors. I have concentrated on one sector only (Energy).
• Reduce the time series to 2 years only.
• I did not apply the stock filters (liquidity (LIQ) and dollar trading volume (DTV)). These filters eliminate stocks that do not have sufficient capacity to be traded by large mutual funds only.

I developed the SVM model using the e1071 R package. Each day, the model was trained on tail sets using a quantile of 25% (I.e. the 25% highest and 25% lowest ranked vol-adj-returns) using a history of 10 days. The model was then used to predict the future (1-day-ahead) direction of stock prices. After each prediction was made, the model was tested for accuracy.

I was unable to use the R package PortfolioAnalytics because the portfolio rebalancing was too complicated – i.e. form portfolios of 10 equally weighted long and 10 equally weighted short positions. Each position is closed at the end of the last trading day in the following 91 days. Every 28 days we open an additional 20 positions.
Hence, I wrote my own code to manage and track the sub-portfolios (i.e. I used a “queue”. The first in the queue is always the sub-portfolio to be sold first).

I implemented all 7 technical features from the original source paper:
• Momentum 3 months
• Momentum 1 year
• Volume change 3 months
• Volume change 1 month
• 12-month Highs/Lows
• maxR
• Resistance levels

A full set of summary statistics was written, utilizing R packages PerformanceAnalytics and quantmod when needed:
• Annualized Returns
• Cumulative Returns
• Annualized Sharpe ratio (SR)
• Non-normality adjusted Standard Error (se) for the Annualized SR
• Annualized STARR using a Non-Parametric Expected Tail Loss (ETL) estimate
• Maximum Drawdown
• Average Turnover
• Average Diversification
• Accuracy of the SVM model

Please note, when constructing and rebalancing the sub-portfolios, transaction costs were ignored and it was assumed that execution was instant (i.e. no price slippage).
An especially common error when backtesting and making a decision at time t is to use/include the data at time t+1. Thus, particular attention was given to any possibility of look-ahead bias. For example, when using the model to make predictions for time t+1, only data at time t was used; i.e. the 7 technical features from EOD yesterday were used to predict the price direction for today.



## [1] “====================================================================”
## [1] “====================================================================”
## [1] “”
## [1] “ENERGY SECTOR (U.S. STOCKS) 2009-01-01::2010-12-31 SVM MODEL”
## [1] “”
## [1] “”
## [1] “Annualized return (arith): 9.9 %”
## [1] “Annualized Geo mean return: 8.7 %”
## [1] “Cumulative returns: 131.9 %”
## [1] “Annualized volatility: 15.2 %”
## [1] “Annualized sharpe ratio: 0.65 (0.326)”
## [1] “Ratio (SR/se(SR)): 0.502”
## [1] “STARR (5% ETL): -24.321”
## [1] “NSTARR (5% NETL): 2.151”
## [1] “Annualized STARR (5% ETL): -175.384”
## [1] “Annualized NSTARR (5% NETL): 15.511”
## [1] “Max Drawdown: 0.366”
## [1] “SVM Accuracy (Mean): 48.8 %”
## [1] “SVM Accuracy (Max): 58.1 %”
## [1] “SVM Accuracy (Min): 41.1 %”
## [1] “====================================================================”


The SVM model accuracy is not good at all. It’s possible that the technical features do not characterize the stocks well enough or the hyper-parameters are not tuned.

The returns and SR are similar to those demonstrated in the original source paper but the backtest needs to be run for longer to have statistical significance in its results.

If the model was not re-trained every day, a significant reduction in the time needed to run the backtest would be achieved. How often the classifier should be trained is an active area of research.

It was fun to replicate the paper and much was learned. The paper is definitely worth reading !

Tap into the Pulse of the Markets

My first post 🙂

So to begin, here is a strategy I created recently, combining sentiment data with technical indicators, and using a machine learning classification technique named Support Vector Machines (SVM).

End-of-Day (EOD) U.S. stock prices are sourced from QuoteMedia through Quandl’s premium subscription.
The chosen source of sentiment is from StockTwits message posts that have been aggregated and scored by PsychSignal. This data too can be obtained through Quandl.


systematic trading systems development