1000字范文 > 量化投资实战（三）之配对交易策略---协整法

量化投资实战（三）之配对交易策略---协整法

时间：2023-03-15 05:47:16

点赞、关注再看，养成良好习惯
Life is short, U need Python
初学量化投资实战，[快来点我吧]

配对交易策略实战—协整法

基本流程
配对组合 --> 计算价差 --> 决策标准 --> 确定头寸 --> 平仓获利

案例描述

本案例以上证50股票数据为对象、以协整模型为方法、以Python语言为工具进行配对交易策略的实证分析。在实证分析的过程中，主要借助于 arch 和 statsmodels等第三方库单位根检验和协整检验，进而选取股票对；然后设置决策标准和交易头寸进行回测交易。

数据集

上证50：sh50_stock_data.csv

导入模块

import pandas as pdimport numpy as npimport matplotlib.pyplot as pltplt.rcParams['font.sans-serif'] = ['SimHei'] # 字体设置import matplotlibmatplotlib.rcParams['axes.unicode_minus']=False # 负号显示问题from arch.unitroot import ADF # pip install archimport statsmodels.api as sm

导入数据

sh = pd.read_csv('sh50_stock_data.csv',index_col='Trddt')sh.index = pd.to_datetime(sh.index)

形成期：协整检验

# 提取数据P_zhonghang = sh['601988'] # 中国银行P_pufa = sh['600000'] # 浦发银行

# 设置形成期formStart = '-01-01'formEnd = '-01-01'P_zhonghang_f = P_zhonghang[formStart:formEnd]P_pufa_f = P_pufa[formStart:formEnd]

# 形成期：协整关系检验# 中国银行（A）一阶单整检验log_P_zhonghang_f = np.log(P_zhonghang_f)adf_zhonghang = ADF(log_P_zhonghang_f)print(adf_zhonghang.summary().as_text())

Augmented Dickey-Fuller Results =====================================Test Statistic 3.409P-value1.000Lags 12-------------------------------------Trend: ConstantCritical Values: -3.46 (1%), -2.87 (5%), -2.57 (10%)Null Hypothesis: The process contains a unit root.Alternative Hypothesis: The process is weakly stationary.

# 形成期：差分后的协整检验adf_zhonghang_diff = ADF(log_P_zhonghang_f.diff()[1:])print(adf_zhonghang_diff.summary().as_text())

Augmented Dickey-Fuller Results =====================================Test Statistic -4.571P-value0.000Lags 11-------------------------------------Trend: ConstantCritical Values: -3.46 (1%), -2.87 (5%), -2.57 (10%)Null Hypothesis: The process contains a unit root.Alternative Hypothesis: The process is weakly stationary.

# 形成期：对数序列可视化log_P_zhonghang_f.plot()plt.title('图1 中国银行对数收益率序列趋势（形成期）')plt.show()

# 形成期：差分对数序列可视化log_P_zhonghang_f.diff()[1:].plot()plt.title('图2 中国银行差分对数收益率序列趋势（形成期）')plt.show()

# 形成期：协整关系检验# 浦发银行（B）一阶单整检验log_P_pufa_f = np.log(P_pufa_f)adf_pufa = ADF(log_P_pufa_f)print(adf_pufa.summary().as_text())

Augmented Dickey-Fuller Results =====================================Test Statistic 2.392P-value0.999Lags 12-------------------------------------Trend: ConstantCritical Values: -3.46 (1%), -2.87 (5%), -2.57 (10%)Null Hypothesis: The process contains a unit root.Alternative Hypothesis: The process is weakly stationary.

# 形成期：差分后的协整检验adf_pufa_diff = ADF(log_P_pufa_f.diff()[1:])print(adf_pufa_diff.summary().as_text())

Augmented Dickey-Fuller Results =====================================Test Statistic -3.888P-value0.002Lags 11-------------------------------------Trend: ConstantCritical Values: -3.46 (1%), -2.87 (5%), -2.57 (10%)Null Hypothesis: The process contains a unit root.Alternative Hypothesis: The process is weakly stationary.

# 形成期：对数序列可视化log_P_pufa_f.plot()plt.title('图3 浦发银行对数收益率序列趋势（形成期）')plt.show()

# 形成期：差分对数序列可视化log_P_pufa_f.diff()[1:].plot()plt.title('图4 浦发银行差分对数收益率序列趋势（形成期）')plt.show()

形成期：股票对的回归方程（协整模型）

model = sm.OLS(log_P_pufa_f,sm.add_constant(log_P_zhonghang_f))result = model.fit()print(result.summary())

OLS Regression Results ==============================================================================Dep. Variable: 600000 R-squared: 0.949Model: OLS Adj. R-squared: 0.949Method: Least Squares F-statistic: 4560.Date:Fri, 20 Nov Prob (F-statistic):1.83e-159Time: 10:59:05 Log-Likelihood: 509.57No. Observations: 245 AIC: -1015.Df Residuals: 243 BIC: -1008.Df Model: 1 Covariance Type: nonrobust ==============================================================================coef std errtP>|t|[0.0250.975]------------------------------------------------------------------------------const1.22690.01583.0710.000 1.198 1.256601988 1.06410.01667.5310.000 1.033 1.095==============================================================================Omnibus: 19.538 Durbin-Watson: 0.161Prob(Omnibus): 0.000 Jarque-Bera (JB):13.245Skew: 0.444 Prob(JB): 0.00133Kurtosis: 2.286 Cond. No.15.2==============================================================================Warnings:[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.D:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py:2389: FutureWarning: Method .ptp is deprecated and will be removed in a future version. Use numpy.ptp instead.return ptp(axis=axis, out=out, **kwargs)

# 设置alpha、beta（交易头寸，即投资比例）alpha = result.params[0]beta = result.params[1]

形成期：残差单位根检验（残差序列为价格差序列）

# 回归方程（形成期）spread_f = log_P_pufa_f - beta * log_P_zhonghang_f - alphaadfSpread = ADF(spread_f)print(adfSpread.summary().as_text())

Augmented Dickey-Fuller Results =====================================Test Statistic -3.193P-value0.020Lags 0-------------------------------------Trend: ConstantCritical Values: -3.46 (1%), -2.87 (5%), -2.57 (10%)Null Hypothesis: The process contains a unit root.Alternative Hypothesis: The process is weakly stationary.

# 计算协整方程中残差序列的均值、方差mu = np.mean(spread_f)sd = np.std(spread_f)

构造Pair Trading类（封装协整检验）

# 构建类(协整检验)class PairTrading:# 形成期的协整检验（形成期协整方程残差的ADF检验）def cointegration(self,priceX,priceY):log_priceX = np.log(priceX)log_priceY = np.log(priceY)results = sm.OLS(log_priceY,sm.add_constant(log_priceX)).fit()resid = results.residadfSpread = ADF(resid)if adfSpread.pvalue >= 0.05:print('''交易价格不具有协整关系.P-value of ADF test: %fCoefficients of regression:Intercept: %fBeta: %f''' % (adfSpread.pvalue, results.params[0], results.params[1]))return(None)else:print('''交易价格具有协整关系.P-value of ADF test: %fCoefficients of regression:Intercept: %fBeta: %f''' % (adfSpread.pvalue, results.params[0], results.params[1]))return(results.params[0], results.params[1])# 根据形成期的协整方程系数构造交易期协整方程，进而计算交易期协整残差序列：Co_Spread_Tradedef CointegrationSpread(self,priceX,priceY,formStart,formEnd,tradeStart,tradeEnd):formX = priceX[formStart:formEnd]formY = priceY[formStart:formEnd]tradeX = priceX[tradeStart:tradeEnd]tradeY = priceY[tradeStart:tradeEnd]coefficients = self.cointegration(formX,formY)if coefficients is None:print('未形成协整关系,无法配对.')else:spread = np.log(tradeY) - coefficients[0] - coefficients[1] * np.log(tradeX)return(spread)# 计算形成期的最优策略阈值：UpperBound 和 LowerBound，作为交易期的策略阈值def Cointegration_calBound(self,priceX,priceY,formStart,formEnd,width):spread = self.CointegrationSpread(priceX,priceY,formStart,formEnd,formStart,formEnd)mu = np.mean(spread)sd = np.std(spread)UpperBound = mu + width * sdLowerBound = mu - width * sdreturn(UpperBound,LowerBound)

# 设定交易期tradeStart = '-01-01'tradeEnd = '-06-30'

# 选取标的price_zhonghang = sh['601988']price_pufa = sh['600000']

# 形成期标的价格price_zhonghang_f = price_zhonghang[formStart:formEnd]price_pufa_f = price_pufa[formStart:formEnd]

# 类的实例化pt = PairTrading() # 调用类PairTrading（）

# 协整关系的系数（形成期）coefficients = pt.cointegration(price_zhonghang_f,price_pufa_f)

交易价格具有协整关系.P-value of ADF test: 0.020415Coefficients of regression:Intercept: 1.226852Beta: 1.064103

# 形成期：价差序列CoSpreadForm = pt.CointegrationSpread(price_zhonghang,price_pufa,formStart,formEnd,formStart,formEnd)

交易价格具有协整关系.P-value of ADF test: 0.020415Coefficients of regression:Intercept: 1.226852Beta: 1.064103

# 绘制价差序列图（形成期）plt.figure(figsize=(12,6))CoSpreadForm.plot()plt.title('图5 价差序列(协整配对)（形成期）',loc='center', fontsize=16)plt.axhline(y=mu,color='black')plt.axhline(y=mu+0.2*sd,color='blue',ls='-',lw=2)plt.axhline(y=mu-0.2*sd,color='blue',ls='-',lw=2)plt.axhline(y=mu+1.5*sd,color='green',ls='--',lw=2.5)plt.axhline(y=mu-1.5*sd,color='green',ls='--',lw=2.5)plt.axhline(y=mu+2.5*sd,color='red',ls='-.',lw=3) plt.axhline(y=mu-2.5*sd,color='red',ls='-.',lw=3) plt.show()

# 交易期：价差序列CoSpreadTrade = pt.CointegrationSpread(price_zhonghang,price_pufa,formStart,formEnd,tradeStart,tradeEnd)

交易价格具有协整关系.P-value of ADF test: 0.020415Coefficients of regression:Intercept: 1.226852Beta: 1.064103

# 根据形成期协整配对后价差序列得到的阈值bound = pt.Cointegration_calBound(price_zhonghang,price_pufa,formStart,formEnd,width=1.5)

交易价格具有协整关系.P-value of ADF test: 0.020415Coefficients of regression:Intercept: 1.226852Beta: 1.06410

交易期：回测检验

# 提取交易期标的数据P_zhonghang_t = P_zhonghang[tradeStart:tradeEnd]P_pufa_t = P_pufa[tradeStart:tradeEnd]

# 计算交易期协整方程的残差序列（价差序列）CoSpreadTrade = np.log(P_pufa_t) - beta * np.log(P_zhonghang_t) - alpha

# 绘制价格区间图（交易期）plt.figure(figsize=(12,6))CoSpreadTrade.plot()plt.title('图6 价差序列(协整配对)（交易期）',loc='center', fontsize=16)plt.axhline(y=mu,color='black')plt.axhline(y=mu+0.2*sd,color='blue',ls='-',lw=2)plt.axhline(y=mu-0.2*sd,color='blue',ls='-',lw=2)plt.axhline(y=mu+1.5*sd,color='green',ls='--',lw=2.5)plt.axhline(y=mu-1.5*sd,color='green',ls='--',lw=2.5)plt.axhline(y=mu+2.5*sd,color='red',ls='-.',lw=3) plt.axhline(y=mu-2.5*sd,color='red',ls='-.',lw=3) plt.show()

设置策略区间

# 设置触发区间：t_0=0.2; t_1=1.5; t_2=2.5level = (float('-inf'),mu-2.5*sd,mu-1.5*sd,mu-0.2*sd,mu+0.2*sd,mu+1.5*sd,mu+2.5*sd,float('inf'))

# 把交易期的价差序列按照触发区间标准分类【-3，+3】prcLevel = pd.cut(CoSpreadTrade,level,labels=False)-3 #剪切函数pd.cut()

pandas.cut: pandas.cut(x, bins, right=True, labels=None,
retbins=False, precision=3, include_lowest=False)
参数：
(1) x，类array对象，且必须为一维，待切割的原形式(2) bins, 整数、序列尺度、或间隔索引。如果bins是一个整数，它定义了x宽度范围内的等宽面元数量，但是在这种情况下，x的范围在每个边上被延长1%，以保证包括x的最小值或最大值。如果bin是序列，它定义了允许非均匀bin宽度的bin边缘。在这种情况下没有x的范围的扩展。(3) right,布尔值。是否是左开右闭区间(4) labels,用作结果箱的标签。必须与结果箱相同长度。如果FALSE，只返回整数指标面元(区间位置，即第几个区间，注意以0位置开始)。(5) retbins,布尔值。是否返回面元(6) precision，整数。返回面元的小数点几位(7) include_lowest，布尔值。第一个区间的左端点是否包含返回值：若labels为False则返回整数填充的Categorical(分类)或数组或Series(序列)若retbins为True还返回用浮点数填充的N维数组

构造交易信号函数

# 构造交易信号函数def TradeSig(prcLevel):n = len(prcLevel)signal = np.zeros(n)for i in range(1,n):if prcLevel[i-1] == 1 and prcLevel[i] == 2: # 价差从1区上穿2区，反向建仓signal[i] = -2elif prcLevel[i-1] == 1 and prcLevel[i] == 0: # 价差从1区下穿0区，平仓signal[i] = 2elif prcLevel[i-1] == 2 and prcLevel[i] == 3: # 价差从2区上穿3区，即突破3区，平仓signal[i] = 3elif prcLevel[i-1] == -1 and prcLevel[i] == -2:# 价差从-1区下穿-2区，正向建仓signal[i] = 1elif prcLevel[i-1] == -1 and prcLevel[i] == 0:# 价差从-1区上穿0区，平仓signal[i] = -1elif prcLevel[i-1] == -2 and prcLevel[i] == -3:# 价差从-2区下穿-3区，即突破-3区，平仓signal[i] = -3return(signal)

# 设置每个每天的交易信号signal = TradeSig(prcLevel)

position = [signal[0]]ns = len(signal)

# 设置每天开仓、平仓指令for i in range(1,ns):position.append(position[-1])if signal[i] == 1:position[i] = 1 # 价差从-1区下穿-2区，正向建仓: 买B卖A <------（价差为B-A）elif signal[i] == -2:position[i] = -1# 价差从1区上穿2区，反向建仓：卖B买A <------（价差为B-A）elif signal[i] == -1 and position[i-1] == 1:position[i] = 0# 平仓elif signal[i] == 2 and position[i-1] == -1:position[i] = 0# 平仓elif signal[i] == 3:position[i] = 0 # 平仓elif signal[i] == -3:position[i] = 0 # 平仓

position = pd.Series(position, index=CoSpreadTrade.index)

# 构造交易模拟函数def TradeSim(priceX,priceY,position):n = len(position)size = 1000 # 因为浦发银行价格15左右，1000股对应15000,10%—20%保证金即：初始资金cash=1500-3000shareY = size * positionshareX = [(-beta) * shareY[0] * priceY[0] / priceX[0]] # CoSpreadT = np.log(PBt)-beta*np.log(PAt)-alphacash = [2000]for i in range(1,n):shareX.append(shareX[i-1])cash.append(cash[i-1])if position[i-1] == 0 and position[i] == 1:shareX[i] = (-beta)*shareY[i]*priceY[i]/priceX[i]cash[i] = cash[i-1]-(shareY[i]*priceY[i]+shareX[i]*priceX[i])elif position[i-1] == 0 and position[i] == -1:shareX[i] = (-beta)*shareY[i]*priceY[i]/priceX[i]cash[i] = cash[i-1]-(shareY[i]*priceY[i]+shareX[i]*priceX[i])elif position[i-1] == 1 and position[i] == 0:shareX[i] = 0cash[i] = cash[i-1]+(shareY[i-1]*priceY[i]+shareX[i-1]*priceX[i])elif position[i-1] == -1 and position[i] == 0:shareX[i] = 0cash[i] = cash[i-1]+(shareY[i-1]*priceY[i]+shareX[i-1]*priceX[i])cash = pd.Series(cash,index=position.index)shareY = pd.Series(shareY,index=position.index)shareX = pd.Series(shareX,index=position.index)asset = cash + shareY*priceY + shareX*priceXaccount = pd.DataFrame({'Position':position,'ShareY':shareY,'ShareX':shareX,'Cash':cash,'Asset':asset})return(account)

account = TradeSim(P_zhonghang_t,P_pufa_t,position)

account.iloc[:, [1,2,3,4]].plot(style=['--','--','-',':'], figsize=(16,8))plt.title('图6 配对交易账户（交易期）',loc='center', fontsize=16) plt.show()

其中，（1）ShareY 可以表示配对仓位（1000倍）（正反向持仓与不持仓）；（2）ShareX 表示对应配对仓位（1000倍）（反正向持仓与不持仓）；（3）Cash 曲线表示现金的变化，初始现金为2000元；（4）Asset 曲线表示资产的变化。

结论：

观察交易仓位曲线图，可以看出自1月1日到6月底期间，配对交易信号触发不多（计4次）。

观察现金曲线图，由于开仓可能需要现金，现金曲线有升有降，而第三次平仓之后获利很多，现金曲线大幅上涨，到6月底，现金部位达到了5992.514元。

再观察资产曲线图，配对资产整体呈现上升趋势，资产由2000元转变成5992.514元。

整体而言，对中国银行和浦发银行两只股票进行配对交易的策略绩效表现不错。

构造Back Testing类（封装回测检验）

class BackTesting:# 构造交易信号函数def TradeSigPosition(self,CoSpreadT,level,k):prcLevel = pd.cut(CoSpreadT,level,labels=False)-kn = len(prcLevel)signal = np.zeros(n)for i in range(1,n):if prcLevel[i-1] == 1 and prcLevel[i] == 2: # 价差从1区上穿2区，反向建仓signal[i] = -2elif prcLevel[i-1] == 1 and prcLevel[i] == 0: # 价差从1区下穿0区，平仓signal[i] = 2elif prcLevel[i-1] == 2 and prcLevel[i] == 3: # 价差从2区上穿3区，即突破3区，平仓signal[i] = 3elif prcLevel[i-1] == -1 and prcLevel[i] == -2:# 价差从-1区下穿-2区，正向建仓signal[i] = 1elif prcLevel[i-1] == -1 and prcLevel[i] == 0:# 价差从-1区上穿0区，平仓signal[i] = -1elif prcLevel[i-1] == -2 and prcLevel[i] == -3:# 价差从-2区下穿-3区，即突破-3区，平仓signal[i] = -3position = [signal[0]]ns = len(signal)# 设置每天开仓、平仓指令for i in range(1,ns):position.append(position[-1])if signal[i] == 1:position[i] = 1 # 价差从-1区下穿-2区，正向建仓: 买B卖A <------（价差为B-A）elif signal[i] == -2:position[i] = -1# 价差从1区上穿2区，反向建仓：卖B买A <------（价差为B-A）elif signal[i] == -1 and position[i-1] == 1:position[i] = 0# 平仓elif signal[i] == 2 and position[i-1] == -1:position[i] = 0# 平仓elif signal[i] == 3:position[i] = 0 # 平仓elif signal[i] == -3:position[i] = 0 # 平仓position = pd.Series(position, index=CoSpreadT.index)return(position)# 构造交易模拟函数def TradeSim(self,priceX,priceY,position,size,cash):n = len(position)# size = 1000 # 因为浦发银行价格15左右，1000股对应15000,10%—20%保证金即：初始资金cash=1500-3000shareY = size * positionshareX = [(-beta) * shareY[0] * priceY[0] / priceX[0]] # CoSpreadT = np.log(PBt)-beta*np.log(PAt)-alpha# cash = [2000]for i in range(1,n):shareX.append(shareX[i-1])cash.append(cash[i-1])if position[i-1] == 0 and position[i] == 1:shareX[i] = (-beta)*shareY[i]*priceY[i]/priceX[i]cash[i] = cash[i-1]-(shareY[i]*priceY[i]+shareX[i]*priceX[i])elif position[i-1] == 0 and position[i] == -1:shareX[i] = (-beta)*shareY[i]*priceY[i]/priceX[i]cash[i] = cash[i-1]-(shareY[i]*priceY[i]+shareX[i]*priceX[i])elif position[i-1] == 1 and position[i] == 0:shareX[i] = 0cash[i] = cash[i-1]+(shareY[i-1]*priceY[i]+shareX[i-1]*priceX[i])elif position[i-1] == -1 and position[i] == 0:shareX[i] = 0cash[i] = cash[i-1]+(shareY[i-1]*priceY[i]+shareX[i-1]*priceX[i])cash = pd.Series(cash,index=position.index)shareY = pd.Series(shareY,index=position.index)shareX = pd.Series(shareX,index=position.index)asset = cash + shareY*priceY + shareX*priceXaccount = pd.DataFrame({'Position':position,'ShareY':shareY,'ShareX':shareX,'Cash':cash,'Asset':asset})return(account)

bt = BackTesting()position = bt.TradeSigPosition(CoSpreadTrade,level,3)account = bt.TradeSim(P_zhonghang_t,P_pufa_t,position,1000,[2000])account.iloc[:, [1,2,3,4]].plot(style=['--','--','-',':'], figsize=(16,8))plt.title('配对交易账户（交易期）',loc='center', fontsize=16) plt.show()