Backtest example

duanqs123 2016-08-06

展开全文

回测分析入门 —— 双均线策略

本篇中，我们将通过技术分析流派中经典的“双均线策略”，向大家展现如何在量化实验室中使用Python测试自己的想法，并最终将它转化为策略！

1. 准备工作?

一大波Python库需要在使用之前被导入：

matplotlib用于绘制图表
numpy时间序列的计算
pandas处理结构化的表格数据
DataAPI通联数据提供的数据API
seaborn用于美化matplotlib图表

In [ ]:

from matplotlib import pylab

import numpy as np

import pandas as pd

import DataAPI

import seaborn as sns

sns.set_style('white')

我们的关注点是关于一只ETF基金的投资：华夏上证50ETF，代码：510050.XSHG。我们考虑的回测周期：

起始：2008年1月1日
结束：2015年4月23日

这里我们使用数据API函数MktFunddGet获取基金交易价格的日线数据，最后获得security是pandas下的DataFrame对象：

In [ ]:

secID = '510050.XSHG'

start = '20080101'

end = '20150423'

security = DataAPI.MktFunddGet(secID, beginDate=start, endDate=end, field=['tradeDate', 'closePrice'])

security['tradeDate'] = pd.to_datetime(security['tradeDate'])

security = security.set_index('tradeDate')

security.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1775 entries, 2008-01-02 00:00:00 to 2015-04-23 00:00:00
Data columns (total 1 columns):
closePrice    1775 non-null float64
dtypes: float64(1)

最近5天的收盘价如下：

In [ ]:

security.tail()

Out[ ]:

	closePrice
tradeDate
2015-04-17	3.185
2015-04-20	3.103
2015-04-21	3.141
2015-04-22	3.241
2015-04-23	3.212

适当的图表可以帮助研究人员直观的了解标的的历史走势，这里我们直接借助DataFrame的plot成员：

In [ ]:

security['closePrice'].plot(grid=False, figsize=(12,8))

sns.despine()

2. 策略描述?

这里我们以经典的“双均线”策略为例，讲述如何使用量化实验室进行分析研究。

这里我们使用的均线定义为：

短期均线：window_short = 20，相当于月均线
长期均线：window_long = 120，相当于半年线
偏离度阈值：SD = 5%，区间宽度，这个会在后面有详细解释

计算均值我们借助了numpy的内置移动平均函数：rolling_mean

In [ ]:

window_short = 20

window_long = 120

SD = 0.05

security['short_window'] = np.round(pd.rolling_mean(security['closePrice'], window=window_short), 2)

security['long_window'] = np.round(pd.rolling_mean(security['closePrice'], window=window_long), 2)

security[['closePrice', 'short_window', 'long_window']].tail()

Out[ ]:

	closePrice	short_window	long_window
tradeDate
2015-04-17	3.185	2.82	2.30
2015-04-20	3.103	2.85	2.31
2015-04-21	3.141	2.87	2.33
2015-04-22	3.241	2.90	2.34
2015-04-23	3.212	2.93	2.35

仍然地，我们可以把包含收盘价的三条线画到一张图上，看看有没有什么启发？

In [ ]:

security[['closePrice', 'short_window', 'long_window']].plot(grid=False, figsize=(12,8))

sns.despine()

2.1 定义信号?

买入信号：短期均线高于长期日均线，并且超过SD个点位；

卖出信号：不满足买入信号的所有情况；

我们首先计算短期均线与长期均线的差s-l，这样的向量级运算，在pandas中可以像普通标量一样计算：

In [ ]:

security['s-l'] = security['short_window'] - security['long_window']

security['s-l'].tail()

Out[ ]:

tradeDate
2015-04-17    0.52
2015-04-20    0.54
2015-04-21    0.54
2015-04-22    0.56
2015-04-23    0.58
Name: s-l, dtype: float64

根据s-l的值，我们可以定义信号：

$s - l > SD \times long\_window$，支持买入，定义Regime为True
其他情形下，卖出信号，定义Regime为False

In [ ]:

security['Regime'] = np.where(security['s-l'] > security['long_window'] * SD, 1, 0)

security['Regime'].value_counts()

Out[ ]:

0    1394
1     381
dtype: int64

上面的统计给出了总共有多少次买入信号，多少次卖出信号。

下图给出了信号的时间分布：

In [ ]:

security['Regime'].plot(grid=False, lw=1.5, figsize=(12,8)) pylab.ylim((-0.1,1.1)) sns.despine()

我们可以在有了信号之后执行买入卖出操作，然后根据操作计算每日的收益。这里注意，我们计算策略收益的时候，使用的是当天的信号乘以次日的收益率。这是因为我们的决定是当天做出的，但是能享受到的收益只可能是第二天的（如果用当天信号乘以当日的收益率，那么这里面就有使用未来数据的问题）。

In [ ]:

security['Market'] = np.log(security['closePrice'] / security['closePrice'].shift(1))

security['Strategy'] = security['Regime'].shift(1) * security['Market']

security[['Market', 'Strategy', 'Regime']].tail()

Out[ ]:

	Market	Strategy	Regime
tradeDate
2015-04-17	0.012638	0.012638	1
2015-04-20	-0.026083	-0.026083	1
2015-04-21	0.012172	0.012172	1
2015-04-22	0.031341	0.031341	1
2015-04-23	-0.008988	-0.008988	1

最后我们把每天的收益率求和就得到了最后的累计收益率（这里因为我们使用的是指数收益率，所以将每日收益累加是合理的），这个累加的过程也可以通过DataFrame的内置函数cumsum轻松完成：

In [ ]:

security[['Market', 'Strategy']].cumsum().apply(np.exp).plot(grid=False, figsize=(12,8))

sns.despine()

3 使用quartz实现策略

上面的部分介绍了从数据出发，在量化实验室内研究策略的流程。实际上我们可以直接用量化实验室内置的quartz框架。quartz框架为用户隐藏了数据获取、数据清晰以及回测逻辑。用户可以更加专注于策略逻辑的描述：

4. 串起来放在一起:

def backtest8(ohlc=ohlc, SD=1.0, n_short=2, n_long=20):
    import matplotlib
    #import seaborn as sns
    #sns.set_style('white')

    myfontprops = matplotlib.font_manager.FontProperties(
                        fname='C:/Windows/Fonts/msyh.ttf')#微软雅黑

    maShort = pd.Series.rolling(ohlc.C, n_short).mean()
    maLong = pd.Series.rolling(ohlc.C, n_long).mean()

    plt.figure() # create new figure
    ohlc.iloc[:,[0,1,2,3]].plot(grid=True,figsize=(8,4))
    plt.title( s=u'历史股价', fontproperties=myfontprops)

#    SD=1.0
    regime = np.where( maShort/maLong > SD, 1, 0)
    regime = pd.Series(regime, index=maShort.index)
    print ('Regime Length = %s'%regime.size)
    plt.figure() # create new figure
    regime[:].plot(lw=1.5, ylim=(-0.1, 1.1), figsize=(8,4), title=u'Regime')
    plt.figure() # create new figure
    regime[-100:].plot(lw=1.5, ylim=(-0.1, 1.1), figsize=(8,4), title=u'Regime')



    pp_ratio_bnh = np.log(ohlc.C / ohlc.C.shift(1) )
    pp_ratio_strategy = regime.shift(1) * pp_ratio_bnh
    #最后我们把每天的收益率求和就得到了最后的累计收益率
    #（这里因为我们使用的是指数收益率，所以将每日收益累加是合理的），
    #这个累加的过程也可以通过DataFrame的内置函数cumsum轻松完成：
    norm_return_bnh      = pp_ratio_bnh     .cumsum().apply(np.exp)
    norm_return_strategy = pp_ratio_strategy.cumsum().apply(np.exp)

    plt.figure() # create new figure
    norm_return_strategy. plot(lw=1.5, figsize=(8,4), label=u'Strategy')
    norm_return_bnh.      plot(lw=1.5, label=u'BnH')

    plt.legend(loc='best')
    plt.title(s=u'策略收益率与历史价格对比', fontproperties=myfontprops)

    assert (regime.index == ohlc.C.index).all()==True # 'signal index not equals price index'
    # assert用来判断语句的真假，如果为假的话将触发AssertionError错误, 为开发人员提示出错的表达式

backtest8()

结果图:

Fuck! Why I can NOT insert a picture from local PC???