https://blog.csdn.net/cdj0311/article/details/107246476 在遇到大规模推荐算法训练时,我们常常会有不同种类的特征,大体上可分为稀疏特征和稠密特征两类。
稀疏特征一般会经过Embedding转为稠密特征再传入全连接层。然而,当稀疏特征中包含大量ID类特征时,由于其原始维度非常高(如UserID几乎都是千万级以上),训练如此庞大的Embedding时会非常缓慢。一种解决方案是增大学习率,但学习率过大又会影响稠密特征(如一些向量特征)的训练,所以可以设计两个优化器分别以不同的学习率去优化稀疏Embedding和稠密特征。
这里以tf.estimator + tf.feature_column实现:
def isSparse(variable, fields): if filed in variable.name: global_step = tf.train.get_global_step() trainable_variables = [variable for variable in tf.trainable_variables()] sparse_list = [x.name for x in params["feature_configs"].all_columns.values() if "EmbeddingColumn" in str(type(x)) and "HashedCategoricalColumn" in str(type(x.categorical_column)) and x.categorical_column.hash_bucket_size > 200000 embedding_variables = [variable for variable in trainable_variables if "embedding_weights" in variable.name] embedding_sparse_variables = [variable for variable in embedding_variables if isSparse(variable, sparse_list)] embedding_dense_variables = [variable for variable in embedding_variables if variable not in embedding_sparse_variables] param_variables = [variable for variable in trainable_variables if variable not in embedding_variables] all_variables = [variable for variable in trainable_variables] optimizer_sparse_emb = tf.train.AdagradOptimizer(learning_rate=0.01) train_op_sparse_emb = optimizer_sparse_emb.minimize(loss, var_list=embedding_sparse_variables) optimizer_dense_emb = tf.train.AdagradOptimizer(learning_rate=0.001) train_op_dense_emb = optimizer_dense_emb.minimize(loss, var_list=embedding_dense_variables) optimizer_param = tf.train.AdamOptimizer(learning_rate=params["learning_rate"]) train_op_param = optimizer_param.minimize(loss, var_list=param_variables, global_step=global_step) update_ops = tf.compat.v1.get_collection(tf.GraphKeys.UPDATE_OPS) train_op = tf.group(update_ops, train_op_sparse_emb, train_op_dense_emb, train_op_param) return tf.estimator.EstimatorSpec(mode, loss=loss, train_op=train_op)
|