【推荐】多目标排序以及MMOE原理解析
一、多目标排序简介
推荐系统中的显示反馈较少,因为大多数用户反馈都不是直接评分。推荐系统大多数基于隐式反馈来推荐,比如说用户的点击、收藏、分享、观看时长、购买等,在评估用户满意度的时候会存在一些认知偏差,主要包括目标偏差、物品偏差、用户偏差。
目标偏差:目标不同,偏好程度不同。比如电商场景中,购买这个行为表达的偏好高于收藏;比如视频的场景中,观看时长超过20s这个行为表达的偏好高于点击。
物品偏差:被推荐物品需要多个衡量目标,单个目标衡量不全面。短视频推荐,如果仅以视频播放完成率为目标,就可能在视频中留下悬念,需要观看下一个,让用户多操作,用户会不满意;比如信息流推荐,如果仅以点击率为目标,就可能被标题党钻了空子;如果仅以转发分享为目标,就可能存在转发保平安之类的内容。
用户偏差:不同用户表达满意的方式不同。比如知乎上观看文章,有的用户喜欢点赞,有的用户喜欢收藏。
综上,多目标排序的目的就是:
通过一些方法来解决目标偏差、物品偏差和用户偏差的问题,完成多目标的优化。比如说,电商场景,希望能够在优化GMV(付款+未付款)的基础上提高点击率;信息流场景,希望在提高用户点击率的基础上提高用户关注,点赞,评论等行为。因此推荐系统做到后期,往往会向多目标学习演化,承担起更多的业务目标,以提高用户的粘性。
多目标排序的主要方式:
- 改变样本权重
- 多模型分数融合
- 多任务学习(MTL)
大厂的一些实现方案:
二、MMOE的网络结构
多任务模型的发展中,参数共享主要有一下三个阶段,shared-bottom,moe,mmoe
Shared-Bottom Multi-task Model 是多任务学习(Multi-Task Learning,MTL)中最经典的 DNN 网络结构,如下图:
MOE修改了其中的shared bottom部分,改成由一组神经网络结构来作为专家网络(expert network),再加上一个门控网络
MMOE就是在MOE的基础上,增加多个gate门控网络来对应不同的task,对不同的task任务学习expert的不同组合模型,即针对任务自适应加权。
MMOE主要是解决传统的 multi-task 网络 (主要采用 Shared-Bottom Structure) 可能在任务相关性不强的情况下效果不佳的问题, 有研究揭示了 multi-task 模型的效果高度依赖于任务之间的相关性; MMoE 借鉴 MoE 的思路, 引入多个 Experts (即多个 NN 网络) 网络, 然后再对每个 task 分别引入一个 gating network, gating 网络针对各自的 task 学习 experts 网络的不同组合模式, 即对 experts 网络的输出进行自适应加权. 这一点非常像 Attention, Experts 网络学习出 embedding 序列, 而 gating 网络学习自适应的权重并对 Experts 网络的输出进行加权求和, 得到对应的结果之后再分别输入到各个 task 对应的 tower 网络中. 注意 gating 网络的数量和任务的数量是一致的.
提升效果的trick:
- 如果MTL中有个别任务数据十分稀疏,可以直接尝试一下何凯明大神的Focal loss!笔者在短视频推荐方向尝试过这个loss,对于点赞/分享/转发这种特别稀疏的信号,加上它说不定你会啪一下站起来的。
- 仔细分析和观察数据分布,如果某个任务数据不稀疏,但负例特别多,或者简单负例特别多,对负例进行降权/找更难的负例也可能有奇效果哦。正所谓:负例为王。
- 另外一个其实算trick吧?将任务A的预测作为任务B的输入。实现的时候需要注意:任务B的梯度别再直接传给A的预测了。
网络细节上:
- Gate
把输入通过一个线性变换映射到num_expert维,再算个softmax得到每个Expert的权重,可以增加多层网络实验对比 - Expert
简单的基层全连接网络,relu激活,每个Expert独立权重
三、代码实现:
1. MMOE layer
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras import activations, initializers, regularizers, constraints
from tensorflow.keras.layers import Layer, InputSpec
class MMoE(Layer):
"""
Multi-gate Mixture-of-Experts model.
"""
def __init__(self,
units,
num_experts,
num_tasks,
use_expert_bias=True,
use_gate_bias=True,
expert_activation='relu',
gate_activation='softmax',
expert_bias_initializer='zeros',
gate_bias_initializer='zeros',
expert_bias_regularizer=None,
gate_bias_regularizer=None,
expert_bias_constraint=None,
gate_bias_constraint=None,
expert_kernel_initializer='VarianceScaling',
gate_kernel_initializer='VarianceScaling',
expert_kernel_regularizer=None,
gate_kernel_regularizer=None,
expert_kernel_constraint=None,
gate_kernel_constraint=None,
activity_regularizer=None,
**kwargs):
"""
Method for instantiating MMoE layer.
:param units: Number of hidden units
:param num_experts: Number of experts
:param num_tasks: Number of tasks
:param use_expert_bias: Boolean to indicate the usage of bias in the expert weights
:param use_gate_bias: Boolean to indicate the usage of bias in the gate weights
:param expert_activation: Activation function of the expert weights
:param gate_activation: Activation function of the gate weights
:param expert_bias_initializer: Initializer for the expert bias
:param gate_bias_initializer: Initializer for the gate bias
:param expert_bias_regularizer: Regularizer for the expert bias
:param gate_bias_regularizer: Regularizer for the gate bias
:param expert_bias_constraint: Constraint for the expert bias
:param gate_bias_constraint: Constraint for the gate bias
:param expert_kernel_initializer: Initializer for the expert weights
:param gate_kernel_initializer: Initializer for the gate weights
:param expert_kernel_regularizer: Regularizer for the expert weights
:param gate_kernel_regularizer: Regularizer for the gate weights
:param expert_kernel_constraint: Constraint for the expert weights
:param gate_kernel_constraint: Constraint for the gate weights
:param activity_regularizer: Regularizer for the activity
:param kwargs: Additional keyword arguments for the Layer class
"""
# Hidden nodes parameter
self.units = units
self.num_experts = num_experts
self.num_tasks = num_tasks
# Weight parameter
self.expert_kernels = None
self.gate_kernels = None
self.expert_kernel_initializer = initializers.get(expert_kernel_initializer)
self.gate_kernel_initializer = initializers.get(gate_kernel_initializer)
self.expert_kernel_regularizer = regularizers.get(expert_kernel_regularizer)
self.gate_kernel_regularizer = regularizers.get(gate_kernel_regularizer)
self.expert_kernel_constraint = constraints.get(expert_kernel_constraint)
self.gate_kernel_constraint = constraints.get(gate_kernel_constraint)
# Activation parameter
self.expert_activation = activations.get(expert_activation)
self.gate_activation = activations.get(gate_activation)
# Bias parameter
self.expert_bias = None
self.gate_bias = None
self.use_expert_bias = use_expert_bias
self.use_gate_bias = use_gate_bias
self.expert_bias_initializer = initializers.get(expert_bias_initializer)
self.gate_bias_initializer = initializers.get(gate_bias_initializer)
self.expert_bias_regularizer = regularizers.get(expert_bias_regularizer)
self.gate_bias_regularizer = regularizers.get(gate_bias_regularizer)
self.expert_bias_constraint = constraints.get(expert_bias_constraint)
self.gate_bias_constraint = constraints.get(gate_bias_constraint)
# Activity parameter
self.activity_regularizer = regularizers.get(activity_regularizer)
# Keras parameter
self.input_spec = InputSpec(min_ndim=2)
self.supports_masking = True
super(MMoE, self).__init__(**kwargs)
def build(self, input_shape):
"""
Method for creating the layer weights.
:param input_shape: Keras tensor (future input to layer)
or list/tuple of Keras tensors to reference
for weight shape computations
"""
assert input_shape is not None and len(input_shape) >= 2
input_dimension = input_shape[-1]
# Initialize expert weights (number of input features * number of units per expert * number of experts)
self.expert_kernels = self.add_weight(
name='expert_kernel',
shape=(input_dimension, self.units, self.num_experts),
initializer=self.expert_kernel_initializer,
regularizer=self.expert_kernel_regularizer,
constraint=self.expert_kernel_constraint,
)
# Initialize expert bias (number of units per expert * number of experts)
if self.use_expert_bias:
self.expert_bias = self.add_weight(
name='expert_bias',
shape=(self.units, self.num_experts),
initializer=self.expert_bias_initializer,
regularizer=self.expert_bias_regularizer,
constraint=self.expert_bias_constraint,
)
# Initialize gate weights (number of input features * number of experts * number of tasks)
self.gate_kernels = [self.add_weight(
name='gate_kernel_task_{}'.format(i),
shape=(input_dimension, self.num_experts),
initializer=self.gate_kernel_initializer,
regularizer=self.gate_kernel_regularizer,
constraint=self.gate_kernel_constraint
) for i in range(self.num_tasks)]
# Initialize gate bias (number of experts * number of tasks)
if self.use_gate_bias:
self.gate_bias = [self.add_weight(
name='gate_bias_task_{}'.format(i),
shape=(self.num_experts,),
initializer=self.gate_bias_initializer,
regularizer=self.gate_bias_regularizer,
constraint=self.gate_bias_constraint
) for i in range(self.num_tasks)]
self.input_spec = InputSpec(min_ndim=2, axes={-1: input_dimension})
super(MMoE, self).build(input_shape)
def call(self, inputs, **kwargs):
"""
Method for the forward function of the layer.
:param inputs: Input tensor
:param kwargs: Additional keyword arguments for the base method
:return: A tensor
"""
gate_outputs = []
final_outputs = []
# f_{i}(x) = activation(W_{i} * x + b), where activation is ReLU according to the paper
expert_outputs = tf.tensordot(a=inputs, b=self.expert_kernels, axes=1)
# Add the bias term to the expert weights if necessary
if self.use_expert_bias:
expert_outputs = K.bias_add(x=expert_outputs, bias=self.expert_bias)
expert_outputs = self.expert_activation(expert_outputs)
# g^{k}(x) = activation(W_{gk} * x + b), where activation is softmax according to the paper
for index, gate_kernel in enumerate(self.gate_kernels):
gate_output = K.dot(x=inputs, y=gate_kernel)
# Add the bias term to the gate weights if necessary
if self.use_gate_bias:
gate_output = K.bias_add(x=gate_output, bias=self.gate_bias[index])
gate_output = self.gate_activation(gate_output)
gate_outputs.append(gate_output)
# f^{k}(x) = sum_{i=1}^{n}(g^{k}(x)_{i} * f_{i}(x))
for gate_output in gate_outputs:
expanded_gate_output = K.expand_dims(gate_output, axis=1)
weighted_expert_output = expert_outputs * K.repeat_elements(expanded_gate_output, self.units, axis=1)
final_outputs.append(K.sum(weighted_expert_output, axis=2))
return final_outputs
def compute_output_shape(self, input_shape):
"""
Method for computing the output shape of the MMoE layer.
:param input_shape: Shape tuple (tuple of integers)
:return: List of input shape tuple where the size of the list is equal to the number of tasks
"""
assert input_shape is not None and len(input_shape) >= 2
output_shape = list(input_shape)
output_shape[-1] = self.units
output_shape = tuple(output_shape)
return [output_shape for _ in range(self.num_tasks)]
def get_config(self):
"""
Method for returning the configuration of the MMoE layer.
:return: Config dictionary
"""
config = {
'units': self.units,
'num_experts': self.num_experts,
'num_tasks': self.num_tasks,
'use_expert_bias': self.use_expert_bias,
'use_gate_bias': self.use_gate_bias,
'expert_activation': activations.serialize(self.expert_activation),
'gate_activation': activations.serialize(self.gate_activation),
'expert_bias_initializer': initializers.serialize(self.expert_bias_initializer),
'gate_bias_initializer': initializers.serialize(self.gate_bias_initializer),
'expert_bias_regularizer': regularizers.serialize(self.expert_bias_regularizer),
'gate_bias_regularizer': regularizers.serialize(self.gate_bias_regularizer),
'expert_bias_constraint': constraints.serialize(self.expert_bias_constraint),
'gate_bias_constraint': constraints.serialize(self.gate_bias_constraint),
'expert_kernel_initializer': initializers.serialize(self.expert_kernel_initializer),
'gate_kernel_initializer': initializers.serialize(self.gate_kernel_initializer),
'expert_kernel_regularizer': regularizers.serialize(self.expert_kernel_regularizer),
'gate_kernel_regularizer': regularizers.serialize(self.gate_kernel_regularizer),
'expert_kernel_constraint': constraints.serialize(self.expert_kernel_constraint),
'gate_kernel_constraint': constraints.serialize(self.gate_kernel_constraint),
'activity_regularizer': regularizers.serialize(self.activity_regularizer)
}
base_config = super(MMoE, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
2. mmoe with task tower:
构建多目标模型的话就是在mmoe layer的基础上,增加不同task tower,NN网络
# Set up the input layer
input_layer = Input(shape=(num_features,))
# Set up MMoE layer
mmoe_layers = MMoE(
units=4,
num_experts=3,
num_tasks=2
)(input_layer)
output_layers = []
# Build tower layer from MMoE layer
for index, task_layer in enumerate(mmoe_layers):
tower_layer = Dense(
units=8,
activation='relu',
kernel_initializer=VarianceScaling())(task_layer)
output_layer = Dense(
units=output_info[index][0],
name=output_info[index][1],
activation='softmax',
kernel_initializer=VarianceScaling())(tower_layer)
output_layers.append(output_layer)
# Compile model
model = Model(inputs=[input_layer], outputs=output_layers)
adam_optimizer = Adam()
model.compile(
loss={'income': 'binary_crossentropy', 'marital': 'binary_crossentropy'},
optimizer=adam_optimizer,
metrics=['accuracy', tf.keras.metrics.AUC(curve='ROC'), tf.keras.metrics.AUC(curve='PR')]
)
# Print out model architecture summary
model.summary()
参考:
大厂技术实现 | 多目标优化及应用(含代码实现)@推荐与计算广告系列 (showmeai.tech)
Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts (acm.org)
MMOE 多任务学习模型介绍与源码浅析 - 知乎 (zhihu.com)