Table Of Contents
Recap
Revisiting Our Pre-Cost SR Mockup
Look-ahead Bias In PNL Calculations
Migrating The Pre-Cost SR Calculation
Pre-cost SR Does Not Match Expected Value
Syncing State Between Old & New Backtester
Cleaning Up Pre-Cost SR Code
Testing The Post-Cost SR
Recap
In our last refactoring article we migrated over and tested the implementation to rebalance our positions with an error threshold of 10%. During this we found ourselves in need of a lot of other functionality from our old backtest.py
and had a look at the design and implementation of the concepts involved. All this was on top of the fact that we initially wanted to just implement and test the calculation of our strategies pre-cost Sharpe Ratio. We ended up traversing the complete tree of logic, going through all its different branches until we ended up inside their leaf nodes (not possible to simplify further). This is fine though! The Sharpe Ratio is a complex concept that relies on these leaf nodes to become meaningful to us.
Along the way we identified a lot of common code smells, mainly duplication and tight coupling, which we tackled by favoring composition over inheritance. When porting over the logic to create continuous volatility adjusted forecasts we decoupled its ingredients into instruments, trading rules, features and their endproduct: the continuos signal.
This all may have seem like a lot of unproductive and tedious work but it lead us to working code modules including test suites via assert
calls for:
fetching EOD prices from DB
the EMAC trading rule (raw forecast)
volatility normalized, rescaled and capped forecasts
daily volatility normalized & overall risk-targeted positions (ideal positions)
error threshold rebalancing
which we can rely on in the future to check if our code still works after we make changes or introduce new stuff.
Since we now have actual positions, we can calculate our PNL, which we need for calculating our pre-cost Sharpe Ratio and later on the post-cost Sharpe Ratio.
Revisiting Our Pre-Cost SR Mockup
Let's have a look at our initial pre-cost SR mockup function. It helped us better visualize the underlying design of calculating our strategies performance and which concepts/objects are involved in the process.
def calculate_strat_pre_cost_sr(price_series):
signals = generate_signals(price_series)
rebalanced_positions = generate_rebalanced_positions(rebalance_threshold=10)
raw_usd_pnl = price_series.diff() * rebalanced_positions.shift(1)
pnl_stddev = calculate_vol(price_series.diff())
raw_returns_vol_adjusted = raw_usd_pnl.mean() / pnl_stddev
return raw_returns_vol_adjusted
# return 1.9914208916281093
The Sharpe Ratio is calculated using our strategies mean returns and adjusting them by their volatility. Calculating returns of our strategy is easy. We can simply multiply the instruments returns with our rebalanced positions.
Look-ahead Bias In PNL Calculations
We have to be careful though to not introduce lookahead bias (also called data leakage). Lots of people don't realize that if they just push through the latest EOD price, they actually use future data, which they couldn't have possibly acted upon in live trading.
When we look at the latest close price - bear in mind that we're not working with up to date data but a snapshot so today means the 10th of March 2025 - it says 78532.00.
# close
# time_close
# 2025-03-06 89961.727244
# 2025-03-07 86742.675624
# 2025-03-08 86154.593210
# 2025-03-09 80601.041311
# 2025-03-10 78532.001808
However, since today is the 10th of March 2025, in reality we don't have that close price yet. We only get the close price for the 10th of March 2025 when the day is done, in other words on the 11th of March 2025. To iron things out we can simply use "yesterdays" close price by lagging our signal - or positions in our case because we calculated them from our signal - using the function shift(1)
. This way we can simulate acting and trading on latest data in live-trading.
Another thing to note here is that in traditional futures trading we'd also need to multiply our PNL by the instruments contract_unit
. Since in our example the contract unit is 1 we can omit it but it's something you have to be wary about for other contract specifications.
Migrating The Pre-Cost SR Calculation
Since we alread have implemented and tested our rebalanced positions, we can simply pass them into our current mockup function. If you're an OOP fanatic and think this SR calculation function should be a method in a strategy
object instead, you're right but we're going to come to that soon!
def calculate_strat_pre_cost_sr(rebalanced_positions, price_series):
instr_price_returns = price_series['close'].diff()
raw_usd_pnl = instr_price_returns * rebalanced_positions.shift(1)
raw_usd_pnl = raw_usd_pnl.fillna(0)
perc_returns = (raw_usd_pnl / 10_000) * 100
pnl_stddev = calculate_vol(perc_returns, VOL_LOOKBACK)
perc_returns_vol_adjusted = perc_returns.mean() / pnl_stddev
annualized_vol_adj_perc_returns = np.sqrt(365) * perc_returns_vol_adjusted
return annualized_vol_adj_perc_returns
# return 1.9914208916281093
pre_cost_sr = calculate_strat_pre_cost_sr(rebalanced_positions, price_series).iloc[-1]
assert pre_cost_sr == 1.9914208916281093, f"Pre-cost Sharpe Ratio {pre_cost_sr} does not match expected value."
Pre-cost SR Does Not Match Expected Value
If we run our script now we get AssertionError: Pre-cost Sharpe Ratio 1.6779372930452778 does not match expected value.
Our assert
call is letting us now that clearly we messed up at some point.
When calculating the volatility of our strategies returns we didn't specify the lookback of 35 days for the ewm()
function as named parameter span
but only passed 35 in as the first argument in our old backtest.py
: strat_std_dev = strat_pct_returns.ewm(35, min_periods=10).std()
!
Now if you look at the ewm()
documentation, its first parameter isn't span
, it's com
. That's something totally different!
Well.. that's not good.. It's definitely a bug! Remember: This system isn't ready to trade live yet! If you try to use this in the hopes of recreating its result, you're going to have a bad time!
Glancing over the old backtest.py
we can see that we did fail to specify it as span
in a bunch of places. This type of bug happens often when you always type out everything manually instead of reusing code. Being consistent when talking to your function signatures can be hard!
Luckily we caught that and were more diligent in specifying it for the calculation algorithm in our second, new iteration - our current backtest_refactored.py
. Since we're now reusing it instead of typing it out manally each time, this won't happen again. It's already fixed!
Syncing State Between Old & New Backtester
This divergence makes relying on our assert
calls somewhat weird because our "new" results differ from the old ones. If we just quickly stub out the correct version for the old one, we can see that it in fact is the same pre-cost SR again.
[...]
perc_returns = (raw_usd_pnl / 10_000) * 100
#pnl_stddev = calculate_vol(perc_returns, VOL_LOOKBACK)
pnl_stddev = perc_returns.ewm(VOL_LOOKBACK, min_periods=10).std() # faulty span
[...]
pre_cost_sr = calculate_strat_pre_cost_sr(rebalanced_positions, price_series).iloc[-1]
assert np.isclose(
pre_cost_sr,
1.9914208916281093,
rtol=0.01
), "Sharpe Ratio calculations differ by more than 1%"
To make the migration easier we can and should also update the old backtester.py
. I've simply grep
ped the backtest.py
for ewm(35
and replaced it with ewm(span=35
, ran the script again and updated the asserts
accordingly:
assert (strat_tot_return == 958.3412684422372)
assert (strat_mean_ann_return == 65.7261486248434)
assert (strat_std_dev.iloc[-1] == 2.1539648978227586)
assert (strat_sr.iloc[-1] == 1.597177306126823)
assert (df['fees_paid'].sum() == 1038.6238698915147)
assert (df['slippage_paid'].sum() == 944.2035180831953)
assert (df['funding_paid'].sum() == 3130.3644113437113)
assert (ann_turnover == 37.672650094739545)
assert (rolling_pre_cost_sr.iloc[-1] == 1.6785643249186135)
assert (rolling_post_cost_sr.iloc[-1] == 1.597177306126823)
assert (strat_rolling_trading_costs_sr.iloc[-1] == 0.08138701879179044)
Then I transfered over the new state of truth to our new tests.py
so we're on the same page again.
Cleaning Up Pre-Cost SR Code
Our pre-cost SR calculation works now! It does not use any mocked or stubbed values anymore and calculates the Sharpe Ratio of our strategy in realtime. However, its current implementation is pretty ugly and also not really decoupled. Let's fix that!
Instead of calculating the instruments raw returns manually - and possibly every other time we're interested in them, just like with our previous ewm()
problem - we can add it as a method on our new instrument
class itself and then pass it into our SR calculation instead of the price_series
. We can also pass in our trading_capital
instead of hardcoding the 10_000
because we already specified it for our strategy. We didn't take the time to make it a proper config
object yet, but again, we'll come to that. Other than that, renaming a few things should also help:
class Instrument:
CLOSE_COLUMN = 'close'
[...]
def get_raw_returns(self):
return self.get_feature(self.CLOSE_COLUMN).diff()
[...]
def calculate_pnl(instrument, rebalanced_positions):
pnl = instrument.get_raw_returns() * rebalanced_positions.shift(1)
return pnl.fillna(0)
def calculate_strat_pre_cost_sr(instrument, trading_capital, rebalanced_positions):
raw_pnl = calculate_pnl(instrument, rebalanced_positions)
perc_pnl = (raw_pnl / trading_capital) * 100
perc_pnl_vol = calculate_vol(perc_pnl, VOL_LOOKBACK)
# daily SR because we use daily returns
daily_sr = perc_pnl.mean() / perc_pnl_vol
annualized_sr = np.sqrt(instrument.trading_days_in_year) * daily_sr
return annualized_sr
The full code can be found in this GitHub repository
Testing The Post-Cost SR
In our next article we're going to do the same thing but for the post-cost SR, which includes the calculation and testing of current cost assumptions like funding, fees, etc. We're again going to have a look at current designs and implementations and how we can improve them during the process, then do another iteration of design<->implementation optimization before we finally move on to other things.
At that point our very first real backtester MVP is done. Which features and improvements we need to work on from there depends on where we want to go with it, what type of strategy we want to run, etc. We'll continue to build it in public, so stay tuned!
So long, happy trading!
- Hōrōshi バガボンド
Disclaimer: The content and information provided by Vagabond Research, including all other materials, are for educational and informational purposes only and should not be considered financial advice or a recommendation to buy or sell any type of security or investment. Vagabond Research and its members are not currently regulated or authorised by the FCA, SEC, CFTC, or any other regulatory body to give investment advide. Always conduct your own research and consult with a licensed financial professional before making investment decisions. Trading and investing can involve significant risk, and you should understand these risks before making any financial decisions. Backtested and actual historic results are no guarantee of future performance. Use of the material presented is entirely at your own risk.