分享

WorldQuant101因子集构建完成,重构Qlib Alpha158因子(代码+数据)

 AI量化实验室 2024-04-23 发布于北京

原创文章第519篇,专注“AI量化投资、世界运行的规律、个人成长与财富自由"。

前两天的文章,我们开始构建WorldQuant101因子集:

WorldQuant101因子集整合Quantlab表达式引擎

在Quantlab里复现WorldQuant101因子表达式引擎(代码+数据)

WorldQuant101因子库

有前两天的函数积累,我们直接添加因子即可:

大家仔细看会发现,WorldQuant101的因子,思路上“雷同”的居多。

就是价量关系,排序后取相关性,也就是“价量背离”。

names.append('alpha012')
features.append('(sign(delta(volume, 1)) * (-1 * delta(close, 1)))')

names.append('alpha013')
features.append('(-1 * rank(covariance(rank(close), rank(volume), 5))) ')

names.append('alpha014')
features.append('((-1 * rank(delta(returns, 3))) * correlation(open, volume, 10))')

names.append('alpha015')
features.append('(-1 * sum(rank(correlation(rank(high), rank(volume), 3)), 3))')

比如,29和30号因子,这么长:

Alpha#29: (min(product(rank(rank(scale(log(sum(ts_min(rank(rank((-1 * rank(delta((close - 1), 5))))), 2), 1))))), 1), 5) + ts_rank(delay((-1 * returns), 6), 5)) 
Alpha#30: (((1.0 - rank(((sign((close - delay(close, 1))) + sign((delay(close, 1) - delay(close, 2)))) + sign((delay(close, 2) - delay(close, 3)))))) * sum(volume, 5)) / sum(volume, 20))

咱们的表达式引擎计算起来也毫无压力。

第31号因子,涉及到一个新函数 decay_linear,我们需要实现即可:

Alpha#31: ((rank(rank(rank(decay_linear((-1 * rank(rank(delta(close, 10)))), 10)))) + rank((-1 * delta(close, 3)))) + sign(scale(correlation(adv20, low, 12))))

把WorldQuant 101刷了一轮,补充了所有的运算函数

from datafeed.expr_functions import *


class AlphaBase:
pass


# https://www./data/dict/alpha101
class WorldQuant101(AlphaBase):
def get_names_features(self):
names = []
features = []

# names.append('alpha001')
# features.append('(rank(ts_argmax(signed_power((stddev(returns, 20) if (returns < 0) else close), 2.), '
# '5)) - 0.5)')

names.append('alpha002')
features.append('(-1 * correlation(rank(delta(log(volume), 2)), rank(((close - open) / open)), 6))')

names.append('alpha003')
features.append('(-1 * correlation(rank(open), rank(volume), 10))')

names.append('alpha004')
features.append('(-1 * ts_rank(rank(low), 9))')

'''
Alpha#5: (rank((open - (sum(vwap, 10) / 10))) * (-1 * abs(rank((close - vwap)))))
Alpha#6: (-1 * correlation(open, volume, 10))
'''

names.append('alpha006')
features.append('(-1 * correlation(open, volume, 10))')

'''
Alpha#7: ((adv20 < volume) ? ((-1 * ts_rank(abs(delta(close, 7)), 60)) * sign(delta(close, 7))) : (-1 * 1))
Alpha#8: (-1 * rank(((sum(open, 5) * sum(returns, 5)) - delay((sum(open, 5) * sum(returns, 5)), 10))))
'''

names.append('alpha008')
features.append(
'(-1 * rank(((sum(open, 5) * sum(returns, 5)) - delay((sum(open, 5) * sum(returns, 5)), 10))))')

'''
Alpha#11: ((rank(ts_max((vwap - close), 3)) + rank(ts_min((vwap - close), 3))) * rank(delta(volume, 3)))
Alpha#12: (sign(delta(volume, 1)) * (-1 * delta(close, 1)))
Alpha#13: (-1 * rank(covariance(rank(close), rank(volume), 5)))
Alpha#14: ((-1 * rank(delta(returns, 3))) * correlation(open, volume, 10))
Alpha#15: (-1 * sum(rank(correlation(rank(high), rank(volume), 3)), 3))
'''
names.append('alpha012')
features.append('(sign(delta(volume, 1)) * (-1 * delta(close, 1)))')

names.append('alpha013')
features.append('(-1 * rank(covariance(rank(close), rank(volume), 5))) ')

names.append('alpha014')
features.append('((-1 * rank(delta(returns, 3))) * correlation(open, volume, 10))')

names.append('alpha015')
features.append('(-1 * sum(rank(correlation(rank(high), rank(volume), 3)), 3))')

'''
Alpha#18: (-1 * rank(((stddev(abs((close - open)), 5) + (close - open)) + correlation(close, open, 10))))
Alpha#19: ((-1 * sign(((close - delay(close, 7)) + delta(close, 7)))) * (1 + rank((1 + sum(returns, 250)))))
Alpha#20: (((-1 * rank((open - delay(high, 1)))) * rank((open - delay(close, 1)))) * rank((open - delay(low, 1))))
'''
names.append('alpha018')
features.append('(-1 * rank(((stddev(abs((close - open)), 5) + (close - open)) + correlation(close, open, 10))))')

names.append('alpha019')
features.append(
'((-1 * sign(((close - delay(close, 7)) + delta(close, 7)))) * (1 + rank((1 + sum(returns, 250)))))')

names.append('alpha020')
features.append(
'(((-1 * rank((open - delay(high, 1)))) * rank((open - delay(close, 1)))) * rank((open - delay(low, 1))))')

'''
Alpha#22: (-1 * (delta(correlation(high, volume, 5), 5) * rank(stddev(close, 20))))
Alpha#26: (-1 * ts_max(correlation(ts_rank(volume, 5), ts_rank(high, 5), 5), 3))
Alpha#28: scale(((correlation(adv20, low, 5) + ((high + low) / 2)) - close))

Alpha#29: (min(product(rank(rank(scale(log(sum(ts_min(rank(rank((-1 * rank(delta((close - 1), 5))))), 2), 1))))), 1), 5) + ts_rank(delay((-1 * returns), 6), 5))
Alpha#30: (((1.0 - rank(((sign((close - delay(close, 1))) + sign((delay(close, 1) - delay(close, 2)))) + sign((delay(close, 2) - delay(close, 3)))))) * sum(volume, 5)) / sum(volume, 20))

'''
names.append('alpha022')
features.append(
'(-1 * (delta(correlation(high, volume, 5), 5) * rank(stddev(close, 20))))')

names.append('alpha026')
features.append(
'(-1 * ts_max(correlation(ts_rank(volume, 5), ts_rank(high, 5), 5), 3))')

#names.append('alpha028')
#features.append(
# 'scale(((correlation(adv20, low, 5) + ((high + low) / 2)) - close))')

names.append('alpha029')
features.append(
'(min(product(rank(rank(scale(log(sum(ts_min(rank(rank((-1 * rank(delta((close - 1), 5))))), 2), 1))))), 1), 5) + ts_rank(delay((-1 * returns), 6), 5))')

names.append('alpha030')
features.append('(((1.0 - rank(((sign((close - delay(close, 1))) + sign((delay(close, 1) - delay(close, 2)))) + sign((delay(close, 2) - delay(close, 3)))))) * sum(volume, 5)) / sum(volume, 20))')

'''
Alpha#31: ((rank(rank(rank(decay_linear((-1 * rank(rank(delta(close, 10)))), 10)))) + rank((-1 * delta(close, 3)))) + sign(scale(correlation(adv20, low, 12))))
Alpha#32: (scale(((sum(close, 7) / 7) - close)) + (20 * scale(correlation(vwap, delay(close, 5), 230))))
Alpha#33: rank((-1 * ((1 - (open / close))^1)))
Alpha#34: rank(((1 - rank((stddev(returns, 2) / stddev(returns, 5)))) + (1 - rank(delta(close, 1)))))
Alpha#35: ((Ts_Rank(volume, 32) * (1 - Ts_Rank(((close + high) - low), 16))) * (1 - Ts_Rank(returns, 32)))
'''

names.append('alpha034')
features.append(
'rank(((1 - rank((stddev(returns, 2) / stddev(returns, 5)))) + (1 - rank(delta(close, 1)))))')

names.append('alpha035')
features.append(
'((ts_rank(volume, 32) * (1 - ts_rank(((close + high) - low), 16))) * (1 - ts_rank(returns, 32)))')

'''
Alpha#36: (((((2.21 * rank(correlation((close - open), delay(volume, 1), 15))) + (0.7 * rank((open - close)))) + (0.73 * rank(Ts_Rank(delay((-1 * returns), 6), 5)))) + rank(abs(correlation(vwap, adv20, 6)))) + (0.6 * rank((((sum(close, 200) / 200) - open) * (close - open)))))
Alpha#37: (rank(correlation(delay((open - close), 1), close, 200)) + rank((open - close)))
Alpha#38: ((-1 * rank(Ts_Rank(close, 10))) * rank((close / open)))
Alpha#39: ((-1 * rank((delta(close, 7) * (1 - rank(decay_linear((volume / adv20), 9)))))) * (1 + rank(sum(returns, 250))))
Alpha#40: ((-1 * rank(stddev(high, 10))) * correlation(high, volume, 10))
'''

names.append('alpha037')
features.append(
'(rank(correlation(delay((open - close), 1), close, 200)) + rank((open - close)))')

names.append('alpha038')
features.append(
'((-1 * rank(ts_rank(close, 10))) * rank((close / open)))')

names.append('alpha040')
features.append(
'((-1 * rank(stddev(high, 10))) * correlation(high, volume, 10))')

'''
Alpha#44: (-1 * correlation(high, rank(volume), 5))
Alpha#45: (-1 * ((rank((sum(delay(close, 5), 20) / 20)) * correlation(close, volume, 2)) * rank(correlation(sum(close, 5), sum(close, 20), 2))))
'''

names.append('alpha044')
features.append(
'(-1 * correlation(high, rank(volume), 5))')

names.append('alpha045')
features.append(
'(-1 * ((rank((sum(delay(close, 5), 20) / 20)) * correlation(close, volume, 2)) * rank(correlation(sum(close, 5), sum(close, 20), 2))))')

'''
Alpha#52: ((((-1 * ts_min(low, 5)) + delay(ts_min(low, 5), 5)) * rank(((sum(returns, 240) - sum(returns, 20)) / 220))) * ts_rank(volume, 5))
Alpha#53: (-1 * delta((((close - low) - (high - close)) / (close - low)), 9))
Alpha#54: ((-1 * ((low - close) * (open^5))) / ((low - high) * (close^5)))
Alpha#55: (-1 * correlation(rank(((close - ts_min(low, 12)) / (ts_max(high, 12) - ts_min(low, 12)))), rank(volume), 6))
Alpha#60: (0 - (1 * ((2 * scale(rank(((((close - low) - (high - close)) / (high - low)) * volume)))) - scale(rank(ts_argmax(close, 10))))))
'''

names.append('alpha052')
features.append(
'((((-1 * ts_min(low, 5)) + delay(ts_min(low, 5), 5)) * rank(((sum(returns, 240) - sum(returns, 20)) / 220))) * ts_rank(volume, 5))')

names.append('alpha053')
features.append(
'(-1 * delta((((close - low) - (high - close)) / (close - low)), 9))')

names.append('alpha055')
features.append(
'(-1 * correlation(rank(((close - ts_min(low, 12)) / (ts_max(high, 12) - ts_min(low, 12)))), rank(volume), 6))')

names.append('alpha060')
features.append('(0 - (1 * ((2 * scale(rank(((((close - low) - (high - close)) / (high - low)) * volume)))) - scale(rank(ts_argmax(close, 10))))))')

names.append('alpha101')
features.append('((close - open) / ((high - low) + .001))')
return names, features

几点收获:

一、积累运算函数库,后续我们机器挖掘可以复用,给gplearn和强化学习。

二、因子构造的思路,其实这个我之前让GPT做过:

Quantlab3.9代码:内置大模型LLM因子挖掘,全A股数据源以及自带GUI界面

今天文章涉及的代码在如下位置:

AI量化实验室——2024量化投资的星辰大海(代码在星球发布,每周至少更新一版本),明天刷下Qlib的Alpha158。

量化数据中心建设

昨天有同学问数据的事情,咱们公众号对应的文章,使用到的数据,都会随代码一起打包在星球发布。

但为了更方便使用,咱们可以考虑建设一个会自动更新的数据中台。

这也印证了咱们是想把AI量化当作事业持续做下去的。

大模型LLM

最近大模型领域,重磅消息是Lalla3开源。

目前看来,llama3-8B的版本,很多小团队有几张卡就可以玩,这是个令人振奋的消息。

可以做的事情有三件: 增量预训练(比如加入更多的中文数据集或者领域数据集)_如果只是小数据集,在增量预训练中如何起作用指令微调(让它更加能听懂人话),SFT(监督微调)

需要自己构建数据集,指令集,领域数据集等,同时还需要有评测手段,以评估训练的结果

另外,就是基于大模型的api,使用指令(prompts)与大模型交互。这个优点是自己要做的事情比较少,调API就好了,缺点是模型不符合预期时,没法干预,还是token还是比较贵的。

吾日三省吾身

最近老看到一些职场里,竞业限制引发的争端,甚至很多人赔了不少钱。

以前这种事情可能只会发生于大公司,高管圈之中。而且多数这些钱对他们来讲,小case。更多是为了专利保护,技术保密之类的诉求。

现在似乎情况发生了变化。

有些人就是普通的开发,运营。

以前的互联网,人员流动比较快。甚至很多公司在做类似项目时,就高价到对方公司挖人。

以前的互联网,开除人也比较快。现在随着新劳动法的出台,基本上没有N+1是结束不了。

企业无法高效激发活力,这是一种负担。人才不能有效流动。这也是一种负担。

不过从成熟的,契约精神的角度,平衡工作与生活的角度,也许未必是错的。

毕竟,工作都是为了更好地生活嘛。

下一个时代,除了特别有企业家冒险精神的人,也许,“一人企业”是更多的补充和追求吧”U盘化生存”:年入百万的一人企业,是未来的趋势

历史文章:

WorldQuant101因子集整合Quantlab表达式引擎

AI量化实验室——2024量化投资的星辰大海

    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多