【推荐】多目标排序以及MMOE原理解析 ^置顶！

【推荐】多目标排序以及MMOE原理解析

一、多目标排序简介

推荐系统中的显示反馈较少，因为大多数用户反馈都不是直接评分。推荐系统大多数基于隐式反馈来推荐，比如说用户的点击、收藏、分享、观看时长、购买等，在评估用户满意度的时候会存在一些认知偏差，主要包括目标偏差、物品偏差、用户偏差。

目标偏差：目标不同，偏好程度不同。比如电商场景中，购买这个行为表达的偏好高于收藏；比如视频的场景中，观看时长超过20s这个行为表达的偏好高于点击。

物品偏差：被推荐物品需要多个衡量目标，单个目标衡量不全面。短视频推荐，如果仅以视频播放完成率为目标，就可能在视频中留下悬念，需要观看下一个，让用户多操作，用户会不满意；比如信息流推荐，如果仅以点击率为目标，就可能被标题党钻了空子；如果仅以转发分享为目标，就可能存在转发保平安之类的内容。

用户偏差：不同用户表达满意的方式不同。比如知乎上观看文章，有的用户喜欢点赞，有的用户喜欢收藏。

综上，多目标排序的目的就是：

通过一些方法来解决目标偏差、物品偏差和用户偏差的问题，完成多目标的优化。比如说，电商场景，希望能够在优化GMV（付款+未付款）的基础上提高点击率；信息流场景，希望在提高用户点击率的基础上提高用户关注，点赞，评论等行为。因此推荐系统做到后期，往往会向多目标学习演化，承担起更多的业务目标，以提高用户的粘性。

‍

多目标排序的主要方式：

改变样本权重
多模型分数融合
多任务学习（MTL）

大厂的一些实现方案：

二、MMOE的网络结构

多任务模型的发展中，参数共享主要有一下三个阶段，shared-bottom,moe,mmoe

Shared-Bottom Multi-task Model 是多任务学习（Multi-Task Learning，MTL）中最经典的 DNN 网络结构，如下图：

MOE修改了其中的shared bottom部分，改成由一组神经网络结构来作为专家网络（expert network），再加上一个门控网络

MMOE就是在MOE的基础上，增加多个gate门控网络来对应不同的task，对不同的task任务学习expert的不同组合模型，即针对任务自适应加权。

MMOE主要是解决传统的 multi-task 网络 (主要采用 Shared-Bottom Structure) 可能在任务相关性不强的情况下效果不佳的问题, 有研究揭示了 multi-task 模型的效果高度依赖于任务之间的相关性; MMoE 借鉴 MoE 的思路, 引入多个 Experts (即多个 NN 网络) 网络, 然后再对每个 task 分别引入一个 gating network, gating 网络针对各自的 task 学习 experts 网络的不同组合模式, 即对 experts 网络的输出进行自适应加权. 这一点非常像 Attention, Experts 网络学习出 embedding 序列, 而 gating 网络学习自适应的权重并对 Experts 网络的输出进行加权求和, 得到对应的结果之后再分别输入到各个 task 对应的 tower 网络中. 注意 gating 网络的数量和任务的数量是一致的.

‍

提升效果的trick：

如果MTL中有个别任务数据十分稀疏，可以直接尝试一下何凯明大神的Focal loss！笔者在短视频推荐方向尝试过这个loss，对于点赞/分享/转发这种特别稀疏的信号，加上它说不定你会啪一下站起来的。
仔细分析和观察数据分布，如果某个任务数据不稀疏，但负例特别多，或者简单负例特别多，对负例进行降权/找更难的负例也可能有奇效果哦。正所谓：负例为王。
另外一个其实算trick吧？将任务A的预测作为任务B的输入。实现的时候需要注意：任务B的梯度别再直接传给A的预测了。

网络细节上：

Gate
把输入通过一个线性变换映射到num_expert维，再算个softmax得到每个Expert的权重，可以增加多层网络实验对比
Expert
简单的基层全连接网络，relu激活，每个Expert独立权重

三、代码实现：

1. MMOE layer

import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras import activations, initializers, regularizers, constraints
from tensorflow.keras.layers import Layer, InputSpec


class MMoE(Layer):
    """
    Multi-gate Mixture-of-Experts model.
    """

    def __init__(self,
                 units,
                 num_experts,
                 num_tasks,
                 use_expert_bias=True,
                 use_gate_bias=True,
                 expert_activation='relu',
                 gate_activation='softmax',
                 expert_bias_initializer='zeros',
                 gate_bias_initializer='zeros',
                 expert_bias_regularizer=None,
                 gate_bias_regularizer=None,
                 expert_bias_constraint=None,
                 gate_bias_constraint=None,
                 expert_kernel_initializer='VarianceScaling',
                 gate_kernel_initializer='VarianceScaling',
                 expert_kernel_regularizer=None,
                 gate_kernel_regularizer=None,
                 expert_kernel_constraint=None,
                 gate_kernel_constraint=None,
                 activity_regularizer=None,
                 **kwargs):
        """
         Method for instantiating MMoE layer.

        :param units: Number of hidden units
        :param num_experts: Number of experts
        :param num_tasks: Number of tasks
        :param use_expert_bias: Boolean to indicate the usage of bias in the expert weights
        :param use_gate_bias: Boolean to indicate the usage of bias in the gate weights
        :param expert_activation: Activation function of the expert weights
        :param gate_activation: Activation function of the gate weights
        :param expert_bias_initializer: Initializer for the expert bias
        :param gate_bias_initializer: Initializer for the gate bias
        :param expert_bias_regularizer: Regularizer for the expert bias
        :param gate_bias_regularizer: Regularizer for the gate bias
        :param expert_bias_constraint: Constraint for the expert bias
        :param gate_bias_constraint: Constraint for the gate bias
        :param expert_kernel_initializer: Initializer for the expert weights
        :param gate_kernel_initializer: Initializer for the gate weights
        :param expert_kernel_regularizer: Regularizer for the expert weights
        :param gate_kernel_regularizer: Regularizer for the gate weights
        :param expert_kernel_constraint: Constraint for the expert weights
        :param gate_kernel_constraint: Constraint for the gate weights
        :param activity_regularizer: Regularizer for the activity
        :param kwargs: Additional keyword arguments for the Layer class
        """
        # Hidden nodes parameter
        self.units = units
        self.num_experts = num_experts
        self.num_tasks = num_tasks

        # Weight parameter
        self.expert_kernels = None
        self.gate_kernels = None
        self.expert_kernel_initializer = initializers.get(expert_kernel_initializer)
        self.gate_kernel_initializer = initializers.get(gate_kernel_initializer)
        self.expert_kernel_regularizer = regularizers.get(expert_kernel_regularizer)
        self.gate_kernel_regularizer = regularizers.get(gate_kernel_regularizer)
        self.expert_kernel_constraint = constraints.get(expert_kernel_constraint)
        self.gate_kernel_constraint = constraints.get(gate_kernel_constraint)

        # Activation parameter
        self.expert_activation = activations.get(expert_activation)
        self.gate_activation = activations.get(gate_activation)

        # Bias parameter
        self.expert_bias = None
        self.gate_bias = None
        self.use_expert_bias = use_expert_bias
        self.use_gate_bias = use_gate_bias
        self.expert_bias_initializer = initializers.get(expert_bias_initializer)
        self.gate_bias_initializer = initializers.get(gate_bias_initializer)
        self.expert_bias_regularizer = regularizers.get(expert_bias_regularizer)
        self.gate_bias_regularizer = regularizers.get(gate_bias_regularizer)
        self.expert_bias_constraint = constraints.get(expert_bias_constraint)
        self.gate_bias_constraint = constraints.get(gate_bias_constraint)

        # Activity parameter
        self.activity_regularizer = regularizers.get(activity_regularizer)

        # Keras parameter
        self.input_spec = InputSpec(min_ndim=2)
        self.supports_masking = True

        super(MMoE, self).__init__(**kwargs)

    def build(self, input_shape):
        """
        Method for creating the layer weights.

        :param input_shape: Keras tensor (future input to layer)
                            or list/tuple of Keras tensors to reference
                            for weight shape computations
        """
        assert input_shape is not None and len(input_shape) >= 2

        input_dimension = input_shape[-1]

        # Initialize expert weights (number of input features * number of units per expert * number of experts)
        self.expert_kernels = self.add_weight(
            name='expert_kernel',
            shape=(input_dimension, self.units, self.num_experts),
            initializer=self.expert_kernel_initializer,
            regularizer=self.expert_kernel_regularizer,
            constraint=self.expert_kernel_constraint,
        )

        # Initialize expert bias (number of units per expert * number of experts)
        if self.use_expert_bias:
            self.expert_bias = self.add_weight(
                name='expert_bias',
                shape=(self.units, self.num_experts),
                initializer=self.expert_bias_initializer,
                regularizer=self.expert_bias_regularizer,
                constraint=self.expert_bias_constraint,
            )

        # Initialize gate weights (number of input features * number of experts * number of tasks)
        self.gate_kernels = [self.add_weight(
            name='gate_kernel_task_{}'.format(i),
            shape=(input_dimension, self.num_experts),
            initializer=self.gate_kernel_initializer,
            regularizer=self.gate_kernel_regularizer,
            constraint=self.gate_kernel_constraint
        ) for i in range(self.num_tasks)]

        # Initialize gate bias (number of experts * number of tasks)
        if self.use_gate_bias:
            self.gate_bias = [self.add_weight(
                name='gate_bias_task_{}'.format(i),
                shape=(self.num_experts,),
                initializer=self.gate_bias_initializer,
                regularizer=self.gate_bias_regularizer,
                constraint=self.gate_bias_constraint
            ) for i in range(self.num_tasks)]

        self.input_spec = InputSpec(min_ndim=2, axes={-1: input_dimension})

        super(MMoE, self).build(input_shape)

    def call(self, inputs, **kwargs):
        """
        Method for the forward function of the layer.

        :param inputs: Input tensor
        :param kwargs: Additional keyword arguments for the base method
        :return: A tensor
        """
        gate_outputs = []
        final_outputs = []

        # f_{i}(x) = activation(W_{i} * x + b), where activation is ReLU according to the paper
        expert_outputs = tf.tensordot(a=inputs, b=self.expert_kernels, axes=1)
        # Add the bias term to the expert weights if necessary
        if self.use_expert_bias:
            expert_outputs = K.bias_add(x=expert_outputs, bias=self.expert_bias)
        expert_outputs = self.expert_activation(expert_outputs)

        # g^{k}(x) = activation(W_{gk} * x + b), where activation is softmax according to the paper
        for index, gate_kernel in enumerate(self.gate_kernels):
            gate_output = K.dot(x=inputs, y=gate_kernel)
            # Add the bias term to the gate weights if necessary
            if self.use_gate_bias:
                gate_output = K.bias_add(x=gate_output, bias=self.gate_bias[index])
            gate_output = self.gate_activation(gate_output)
            gate_outputs.append(gate_output)

        # f^{k}(x) = sum_{i=1}^{n}(g^{k}(x)_{i} * f_{i}(x))
        for gate_output in gate_outputs:
            expanded_gate_output = K.expand_dims(gate_output, axis=1)
            weighted_expert_output = expert_outputs * K.repeat_elements(expanded_gate_output, self.units, axis=1)
            final_outputs.append(K.sum(weighted_expert_output, axis=2))

        return final_outputs

    def compute_output_shape(self, input_shape):
        """
        Method for computing the output shape of the MMoE layer.

        :param input_shape: Shape tuple (tuple of integers)
        :return: List of input shape tuple where the size of the list is equal to the number of tasks
        """
        assert input_shape is not None and len(input_shape) >= 2

        output_shape = list(input_shape)
        output_shape[-1] = self.units
        output_shape = tuple(output_shape)

        return [output_shape for _ in range(self.num_tasks)]

    def get_config(self):
        """
        Method for returning the configuration of the MMoE layer.

        :return: Config dictionary
        """
        config = {
            'units': self.units,
            'num_experts': self.num_experts,
            'num_tasks': self.num_tasks,
            'use_expert_bias': self.use_expert_bias,
            'use_gate_bias': self.use_gate_bias,
            'expert_activation': activations.serialize(self.expert_activation),
            'gate_activation': activations.serialize(self.gate_activation),
            'expert_bias_initializer': initializers.serialize(self.expert_bias_initializer),
            'gate_bias_initializer': initializers.serialize(self.gate_bias_initializer),
            'expert_bias_regularizer': regularizers.serialize(self.expert_bias_regularizer),
            'gate_bias_regularizer': regularizers.serialize(self.gate_bias_regularizer),
            'expert_bias_constraint': constraints.serialize(self.expert_bias_constraint),
            'gate_bias_constraint': constraints.serialize(self.gate_bias_constraint),
            'expert_kernel_initializer': initializers.serialize(self.expert_kernel_initializer),
            'gate_kernel_initializer': initializers.serialize(self.gate_kernel_initializer),
            'expert_kernel_regularizer': regularizers.serialize(self.expert_kernel_regularizer),
            'gate_kernel_regularizer': regularizers.serialize(self.gate_kernel_regularizer),
            'expert_kernel_constraint': constraints.serialize(self.expert_kernel_constraint),
            'gate_kernel_constraint': constraints.serialize(self.gate_kernel_constraint),
            'activity_regularizer': regularizers.serialize(self.activity_regularizer)
        }
        base_config = super(MMoE, self).get_config()

        return dict(list(base_config.items()) + list(config.items()))

2. mmoe with task tower:

构建多目标模型的话就是在mmoe layer的基础上，增加不同task tower，NN网络

    # Set up the input layer
    input_layer = Input(shape=(num_features,))

    # Set up MMoE layer
    mmoe_layers = MMoE(
        units=4,
        num_experts=3,
        num_tasks=2
    )(input_layer)

    output_layers = []

    # Build tower layer from MMoE layer
    for index, task_layer in enumerate(mmoe_layers):
        tower_layer = Dense(
            units=8,
            activation='relu',
            kernel_initializer=VarianceScaling())(task_layer)
        output_layer = Dense(
            units=output_info[index][0],
            name=output_info[index][1],
            activation='softmax',
            kernel_initializer=VarianceScaling())(tower_layer)
        output_layers.append(output_layer)

    # Compile model
    model = Model(inputs=[input_layer], outputs=output_layers)
    adam_optimizer = Adam()
    model.compile(
        loss={'income': 'binary_crossentropy', 'marital': 'binary_crossentropy'},
        optimizer=adam_optimizer,
        metrics=['accuracy', tf.keras.metrics.AUC(curve='ROC'), tf.keras.metrics.AUC(curve='PR')]
    )

    # Print out model architecture summary
    model.summary()

‍

参考：

大厂技术实现 | 多目标优化及应用（含代码实现）@推荐与计算广告系列 (showmeai.tech)

Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts (acm.org)

drawbridge/keras-mmoe: A TensorFlow Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018) (github.com)

MMOE 多任务学习模型介绍与源码浅析 - 知乎 (zhihu.com)

‍

标题：【推荐】多目标排序以及MMOE原理解析
作者：凌陨心
地址：https://jditlee.github.io/articles/2023/05/25/1685011984247.html

Neptune

Dark, cold, and whipped by supersonic winds, ice giant Neptune is the eighth and most distant planet in our solar system.

【推荐】多目标排序以及MMOE原理解析 ^置顶！

【推荐】多目标排序以及MMOE原理解析

一、多目标排序简介

二、MMOE的网络结构

三、代码实现：

1. MMOE layer

2. mmoe with task tower:

【推荐】多目标排序以及MMOE原理解析 置顶！

【推荐】多目标排序以及MMOE原理解析

一、多目标排序简介

二、MMOE的网络结构

三、代码实现：

1. MMOE layer

2. mmoe with task tower:

【推荐】多目标排序以及MMOE原理解析 ^置顶！