如何对因子算法做参数优化?

步骤

Step_1: 设计编写因子算法的类,逐行计算合成DataFrame,返回MultiIndex

Step_2: 股票池与数据读取

Step_3: 实例化因子的类,并获取因子的数值

Step_4: 可视化与计算IC绩效

Step_5: 优化因子的参数

Step_6: 最优绩效结果可视化与保存Excel

Step_1: 设计编写因子算法的类,逐行计算合成DataFrame,返回MultiIndex

  • 因子的类:父类, 子类, 继承, 实例化
  • 参数与算法设计编写
  • 数据格式与合成
  • 返回MultiIndex
# -*- coding: utf-8 -*-
import pandas as pd
from fxdayu_alphaman.factor import Factor

class Factor_Volume001(Factor):

    c = 3

    def calculate_factor_by_column(self, data):
        # 逐支股票计算volume001因子
        candle_data = data[1].dropna()
        if len(candle_data) == 0:
            return
        high = candle_data["high"]
        volume = candle_data["volume"]
        adv_s = self.ts_mean(volume, 10)
        result = - self.correlation(high, adv_s, self.c) #计算因子值
        result.index = candle_data.index
        result = pd.DataFrame(result)
        result.columns = [data[0],]
        return  result

    def factor_calculator(self, pn_data):
        # volume001
        factor = map(self.calculate_factor_by_column, pn_data.iteritems())
        factor = pd.concat(factor, axis=1)
        factor = self.winsorize(factor) #去极值
        factor = self.standardize(factor) #标准化
        factor = self.factor_df_to_factor_mi(factor) #转化成MuitiIndex格式(相当与stack()方法)
        return factor

Step_2: 股票池与数据读取

  • 选取沪深300的股票代号,需要元组类型
  • 获取因子的Panel数据与价格的DataFrame数据
from datetime import datetime
from fxdayu_data import DataAPI
# 时间设置
start = datetime(2015, 1, 1)
end = datetime(2017, 8, 1, 15)
periods = (1, 5, 10)

# 获取数据
codes = DataAPI.info.codes('hs300')
PN = DataAPI.candle(codes, 'D', ('high','close', 'volume'), start=start, end=end, adjust='after')
prices = PN.minor_xs('close')
print (PN)
print (prices.head())
<class 'pandas.core.panel.Panel'>
Dimensions: 300 (items) x 629 (major_axis) x 3 (minor_axis)
Items axis: 000001.XSHE to 603993.XSHG
Major_axis axis: 2015-01-05 15:00:00 to 2017-08-01 15:00:00
Minor_axis axis: close to volume
                     000001.XSHE  000002.XSHE  000008.XSHE  000009.XSHE  \
datetime                                                                  
2015-01-05 15:00:00   889.136424  1340.204221    30.280252          NaN   
2015-01-06 15:00:00   875.850859  1290.735544    30.462517          NaN   
2015-01-07 15:00:00   859.160869  1279.118634    31.161198    71.912931   
2015-01-08 15:00:00   830.264765  1221.614928    32.151504    72.015559   
2015-01-09 15:00:00   836.990582  1209.029942    32.333768    70.021643

                     000027.XSHE  000039.XSHE  000060.XSHE  000061.XSHE  \
datetime                                                                  
2015-01-05 15:00:00    62.456065   491.222155   244.407901   285.692765   
2015-01-06 15:00:00    62.569829   487.457473   253.067838   273.653467   
2015-01-07 15:00:00    65.090124   474.786104   256.338847   272.370415   
2015-01-08 15:00:00    62.456065   461.012876   250.032154   268.991713   
2015-01-09 15:00:00    60.259558   454.103307   243.466604   262.020468

                     000063.XSHE  000069.XSHE     ...       601933.XSHG  \
datetime                                          ...                     
2015-01-05 15:00:00   213.049186   323.423454     ...         18.016449   
2015-01-06 15:00:00   225.373661   310.671992     ...         17.974898   
2015-01-07 15:00:00   220.888270   306.537427     ...         17.895951   
2015-01-08 15:00:00   224.034945   294.520139     ...         17.638336   
2015-01-09 15:00:00   219.094113   290.385574     ...         17.185432

                     601939.XSHG  601958.XSHG  601985.XSHG  601988.XSHG  \
datetime                                                                  
2015-01-05 15:00:00     8.443892    13.505666          NaN     5.533117   
2015-01-06 15:00:00     8.321436    13.165430          NaN     5.708088   
2015-01-07 15:00:00     8.089190    13.349342          NaN     5.683707   
2015-01-08 15:00:00     7.748563    13.047202          NaN     5.421250   
2015-01-09 15:00:00     7.809087    12.851467          NaN     5.596221

                     601989.XSHG  601998.XSHG  603000.XSHG  603885.XSHG  \
datetime                                                                  
2015-01-05 15:00:00    15.737842     9.711167    42.404787          NaN   
2015-01-06 15:00:00    16.123040     9.436754    44.042705          NaN   
2015-01-07 15:00:00    15.989058     9.413886    43.324463          NaN   
2015-01-08 15:00:00    15.287328     8.973047    43.123029          NaN   
2015-01-09 15:00:00    14.803319     9.044191    42.748648          NaN

                     603993.XSHG  
datetime                          
2015-01-05 15:00:00     3.302398  
2015-01-06 15:00:00     3.398134  
2015-01-07 15:00:00     3.514912  
2015-01-08 15:00:00     3.418123  
2015-01-09 15:00:00     3.394978

[5 rows x 300 columns]

Step_3: 实例化因子的类,并获取因子与收益的数据

  • 用get_factor方法
  • 输入Panel数据
  • 用alphalens的utils.get_clean_factor_and_forward_returns获取未来收益
volume001 = Factor_Volume001()
factor = volume001.get_factor(PN)
factor.head()
factor
date asset
2015-01-20 15:00:00 000001.XSHE -0.604585
000002.XSHE -0.618333
000008.XSHE -0.632032
000027.XSHE -0.615691
000039.XSHE -0.627686

Step_4: 可视化与计算IC绩效

  • 用alphalens的plotting可视化IC
  • 用calculate_performance计算绩效
from alphalens import utils,performance,plotting
import matplotlib.pyplot as plt

def plot_performance(factor, prices):

    factor_data = utils.get_clean_factor_and_forward_returns(factor,prices,quantiles=5,periods=(1,5,10))
    ic = performance.factor_information_coefficient(factor_data)
    plotting.plot_ic_hist(ic)
    plotting.plot_ic_ts(ic)

    mean_ic = performance.mean_information_coefficient(factor_data,by_time="M")
    plotting.plot_monthly_ic_heatmap(mean_ic)

    # 按quantile区分的持股平均收益(减去了总体平均值)
    mean_return_by_q = performance.mean_return_by_quantile(factor_data, by_date=True, demeaned=True)[0]    
    # 按quantile画出累积持有收益
    for i in [1, 5, 10]:
        plotting.plot_cumulative_returns_by_quantile(mean_return_by_q, period=i)
    plt.show()

plot_performance(factor,prices)

png

png

png

png

png

png

from fxdayu_alphaman.factor.admin import Admin

factor_admin = Admin()
original_perf = factor_admin.calculate_performance('Volume001',
                                                   factor,
                                                   start,
                                                   end,
                                                   periods=(1,5,10),
                                                   quantiles=5,
                                                   price=prices)
original_perf.mean_ic
0
1 0.014715
5 0.018427
10 0.015767

Step_5: 优化因子的参数,并读取最优参数的因子值与绩效

  • 用enumerate_parameter枚举优化参数,返回因子值与参数值的列表
  • 用show_factors_performance计算绩效,返回每个参数绩效的对象
  • 用rank_performance对绩效排序,输出第一个的绩效与参数
  • 用instantiate_factor_and_get_factor_value实例化因子并返回最优因子值
  • 用calculate_performance计算获取最优绩效
# 参数优化

para_range_dict = {"c":range(3,11,1)}
# 1.枚举参数空间
factor_value_list,para_dict_list = factor_admin.enumerate_parameter('Factor_Volume001',
                                                                    para_range_dict,
                                                                    codes,start,
                                                                    end, 
                                                                    Factor=volume001,data=PN)
factor_name_list = []
for para_dict in para_dict_list:
    factor_name_list.append("Factor_Volume001+" + str(para_dict))

factors_dict = dict(zip(factor_name_list,factor_value_list))

# 2.获得不同参数下计算得出的因子值的表现
performance_list = factor_admin.show_factors_performance(factors_dict,
                                                         start,
                                                         end,
                                                         periods=(1,5,10),
                                                         quantiles=5,
                                                         price=prices)

# print("#####################################################################################")
# 按绩效指标对结果排序(寻优) 本例按10天持有期的mean_IC降序排列了所有结果。
performance_list = factor_admin.rank_performance(performance_list,
                                                 target_period=10,
                                                 ascending=False)

# 输出最优因子的ic和名称
print (performance_list[0].mean_ic)
print (performance_list[0].factor_name)
           0
1   0.015963
5   0.019791
10  0.026054
Factor_Volume001+{'c': 6}
factor_opt = factor_admin.instantiate_factor_and_get_factor_value('Factor_Volume001',
                                                                  start=start, 
                                                                  end=end, 
                                                                  pool=codes, 
                                                                  data=PN, 
                                                                  Factor=volume001,
                                                                  para_dict={'c':6})
perf = factor_admin.calculate_performance('Volume001',
                                          factor_opt,
                                          start,
                                          end,
                                          periods=(1,5,10),
                                          quantiles=5, 
                                          price=prices)
perf.mean_ic
0
1 0.015963
5 0.019791
10 0.026054

Step_6: 最优绩效结果可视化与保存Excel

plot_performance(factor_opt,prices)

png

png

png

png

png

png

import alphalens
import numpy as np

quantile = alphalens.utils.quantize_factor(factor_opt,quantiles=5)
factor_sheet = quantile[quantile==5].unstack().replace(np.nan, 0).replace(5, 1)
factor_sheet.to_excel('factor_opt.xlsx')