# DBMTL

### 简介

DBMTL构建了多个目标之间的贝叶斯网络，显式建模了多个目标之间可能存在的因果关系，通过对不同任务间的贝叶斯关系来同时优化场景中的多个指标。
![dbmtl.png](../../images/models/dbmtl.png)

底层的shared layer和specific layer是通过hard parameter sharing方式来人工配置的，而google的MMoE是基于soft parameter sharing来实现不同任务底层特征和网络共享，并在Youtube场景中取得了不错的效果。因此DBMTL同样支持将shared layer和specific layer模块替换成MMoE模块，即通过task gate的方式在多组expert参数中加权组合出对应task的feature。

![dbmtl_mmoe.png](../../images/models/dbmtl_mmoe.png)

在多模态（图像、视频、文本）推荐场景，DBMTL支持使用[CMBF模型](cmbf.md)或[UNITER模型](uniter.md)作为底层的`shared layer`，以便充分利用多模态特征，取到更好的推荐效果。

### 配置说明

#### DBTML

```protobuf
model_config {
  model_class: "DBMTL"
  feature_groups {
    group_name: "all"
    feature_names: "user_id"
    feature_names: "cms_segid"
    ...
    feature_names: "tag_brand_list"
    wide_deep: DEEP
  }
  dbmtl {
    bottom_dnn {
      hidden_units: [1024, 512, 256]
    }
    task_towers {
      tower_name: "ctr"
      label_name: "clk"
      loss_type: CLASSIFICATION
      metrics_set: {
        auc {}
      }
      dnn {
        hidden_units: [256, 128, 64, 32]
      }
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    task_towers {
      tower_name: "cvr"
      label_name: "buy"
      loss_type: CLASSIFICATION
      metrics_set: {
        auc {}
      }
      dnn {
        hidden_units: [256, 128, 64, 32]
      }
      relation_tower_names: ["ctr"]
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    l2_regularization: 1e-6
  }
  embedding_regularization: 5e-6
}
```

- model_class: 'DBMTL', 不需要修改
- feature_groups: 配置一个名为'all'的feature_group。
- dbmtl: dbmtl相关的参数
  - experts
    - expert_name
    - dnn deep part的参数配置
      - hidden_units: dnn每一层的channel数目，即神经元的数目
  - task_towers 根据任务数配置task_towers
    - tower_name
    - dnn deep part的参数配置
      - hidden_units: dnn每一层的channel数目，即神经元的数目
    - 默认为二分类任务，即num_class默认为1，weight默认为1.0，loss_type默认为CLASSIFICATION，metrics_set为auc
    - 注：label_fields需与task_towers一一对齐。
  - embedding_regularization: 对embedding部分加regularization，防止overfit

#### DBMTL Based On Backbone

```protobuf
model_config {
  model_name: "DBMTL"
  model_class: "MultiTaskModel"
  feature_groups {
    group_name: "all"
    feature_names: "user_id"
    feature_names: "cms_segid"
    ...
    feature_names: "tag_brand_list"
    wide_deep: DEEP
  }
  backbone {
    blocks {
      name: "mask_net"
      inputs {
        feature_group_name: "all"
      }
      keras_layer {
        class_name: 'MaskNet'
        masknet {
          mask_blocks {
            aggregation_size: 512
            output_size: 256
          }
          mask_blocks {
            aggregation_size: 512
            output_size: 256
          }
          mask_blocks {
            aggregation_size: 512
            output_size: 256
          }
          mlp {
            hidden_units: [512, 256]
          }
        }
      }
    }
  }
  model_params {
    task_towers {
      tower_name: "ctr"
      label_name: "clk"
      loss_type: CLASSIFICATION
      metrics_set: {
        auc {}
      }
      dnn {
        hidden_units: [256, 128, 64, 32]
      }
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    task_towers {
      tower_name: "cvr"
      label_name: "buy"
      loss_type: CLASSIFICATION
      metrics_set: {
        auc {}
      }
      dnn {
        hidden_units: [256, 128, 64, 32]
      }
      relation_tower_names: ["ctr"]
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    l2_regularization: 1e-6
  }
  embedding_regularization: 5e-6
}
```

该案例添加了一个额外的`MaskNet`层，为了展示以组件化方式搭建模型的灵活性。

- model_name: 任意自定义字符串，仅有注释作用

- model_class: 'MultiTaskModel', 不需要修改, 通过组件化方式搭建的多目标排序模型都叫这个名字

- backbone: 通过组件化的方式搭建的主干网络，[参考文档](../component/backbone.md)

  - blocks: 由多个`组件块`组成的一个有向无环图（DAG），框架负责按照DAG的拓扑排序执行个`组件块`关联的代码逻辑，构建TF Graph的一个子图
  - name/inputs: 每个`block`有一个唯一的名字（name），并且有一个或多个输入(inputs)和输出
  - keras_layer: 加载由`class_name`指定的自定义或系统内置的keras layer，执行一段代码逻辑；[参考文档](../component/backbone.md#keraslayer)
  - masknet: MaskNet模型的参数，详见[参考文档](../component/component.md#id4)

- 其余与dbmtl一致

#### DBMTL+MMOE

```protobuf
model_config {
  model_class: "DBMTL"
  feature_groups {
    group_name: "all"
    feature_names: "user_id"
    feature_names: "cms_segid"
    ...
    feature_names: "tag_brand_list"
    wide_deep: DEEP
  }
  dbmtl {
    bottom_dnn {
      hidden_units: [1024]
    }
    expert_dnn {
      hidden_units: [256, 128, 64, 32]
    }
    num_expert: 8
    task_towers {
      tower_name: "ctr"
      label_name: "clk"
      loss_type: CLASSIFICATION
      metrics_set: {
        auc {}
      }
      dnn {
        hidden_units: [256, 128, 64, 32]
      }
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    task_towers {
      tower_name: "cvr"
      label_name: "buy"
      loss_type: CLASSIFICATION
      metrics_set: {
        auc {}
      }
      dnn {
        hidden_units: [256, 128, 64, 32]
      }
      relation_tower_names: ["ctr"]
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    l2_regularization: 1e-6
  }
  embedding_regularization: 5e-6
}
```

- dbmtl
  - expert_dnn: MMOE的专家DNN配置
    - hidden_units: dnn每一层的channel数目，即神经元的数目
  - expert_num: 专家DNN的数目
  - 其余与dbmtl一致

DBMTL模型每个塔的输出名为："logits\_" / "probs\_" / "y\_" + tower_name
其中，logits/probs/y对应: sigmoid之前的值/概率/回归模型的预测值
DBMTL模型每个塔的指标为：指标名+ "\_" + tower_name

#### DBMTL+CMBF

多模态、多目标推荐模型

```protobuf
model_config: {
  model_class: 'DBMTL'
  feature_groups: {
    group_name: 'image'
    feature_names: 'embedding'
    wide_deep: DEEP
  }
  feature_groups: {
    group_name: 'general'
    feature_names: 'user_id'
    feature_names: 'movie_id'
    feature_names: 'gender'
    feature_names: 'age'
    feature_names: 'occupation'
    feature_names: 'zip_id'
    feature_names: 'movie_year_bin'
    feature_names: 'score_year_diff'
    feature_names: 'score_time'
    wide_deep: DEEP
  }
  feature_groups: {
    group_name: 'text'
    feature_names: 'title'
    feature_names: 'genres'
    wide_deep: DEEP
  }
  dbmtl {
    bottom_cmbf {
      multi_head_num: 2
      image_multi_head_num: 2
      text_multi_head_num: 2
      image_feature_patch_num: 8
      image_head_size: 32
      text_head_size: 8
      image_self_attention_layer_num: 2
      text_self_attention_layer_num: 2
      cross_modal_layer_num: 3
      image_cross_head_size: 8
      text_cross_head_size: 16
      max_position_embeddings: 16
      use_token_type: true
    }
    task_towers {
      tower_name: "classify"
      label_name: "label"
      loss_type: CLASSIFICATION
      metrics_set: {
        auc {}
      }
      metrics_set: {
        gauc {
          uid_field: 'user_id'
        }
      }
      dnn {
        hidden_units: [256, 128, 64]
      }
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    task_towers {
      tower_name: "rating"
      label_name: "rating"
      loss_type: L2_LOSS
      metrics_set: {
        mean_squared_error {}
      }
      dnn {
        hidden_units: [256, 128, 64]
      }
      relation_tower_names: ["classify"]
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    l2_regularization: 1e-6
  }
  embedding_regularization: 1e-6
}
```

- dbmtl
  - bottom_cmbf: 跨模态融合模型CMBF的相关配置，请参考[CMBF模型](cmbf.md)配置。
    - image_feature_dim: 在单模态学习模块之前做图像特征维度调整，调整到该参数指定的维度
    - multi_head_num: 跨模态融合模块中的 head 数量，默认为1
    - image_multi_head_num: 图像单模态学习模块中的 head 数量，默认为1
    - text_multi_head_num: 文本单模态学习模块中的 head 数量，默认为1
    - image_head_size: 单模态学习模块中的图像tower，multi-headed self-attention的每个head的size
    - text_head_size: 单模态学习模块中的文本tower，multi-headed self-attention的每个head的size
    - image_feature_patch_num: \[可选，默认值为1\] 当只有一个image feature时生效，表示该图像特征是一个复合embedding，维度为`image_feature_patch_num * embedding_size`。
    - image_self_attention_layer_num: 单模态学习模块中的图像tower，multi-headed self-attention的层数
    - text_self_attention_layer_num: 单模态学习模块中的文本tower，multi-headed self-attention的层数
    - cross_modal_layer_num: 跨模态融合模块的层数，建议设在1到5之间，默认为1
    - image_cross_head_size: 跨模模态学习模块中的图像tower，multi-headed attention的每个head的size
    - text_cross_head_size: 跨模模态学习模块中的文本tower，multi-headed attention的每个head的size
    - attention_probs_dropout_prob: self/cross attention模块attention权重的dropout概率
    - hidden_dropout_prob: multi-headed attention模块中FC layer的dropout概率
    - use_token_type: bool，default is false；是否使用token type embedding区分不同的text sequence feature
    - use_position_embeddings: bool, default is true；是否为文本序列添加位置编码
    - max_position_embeddings: 文本序列的最大位置，当`use_position_embeddings`为true时，必须配置；并且必须大于或等于所有特征配置`max_seq_len`的最大值
    - text_seq_emb_dropout_prob: 文本序列embedding的dropout概率
    - other_feature_dnn: \[可选\] 其他特征的MLP网络配置
  - 其余与dbmtl一致

#### DBMTL+UNITER

多模态、多目标推荐模型

```protobuf
model_config: {
  model_class: 'DBMTL'
  feature_groups: {
    group_name: 'image'
    feature_names: 'embedding'
    wide_deep: DEEP
  }
  feature_groups: {
    group_name: 'general'
    feature_names: 'user_id'
    feature_names: 'movie_id'
    feature_names: 'gender'
    feature_names: 'age'
    feature_names: 'occupation'
    feature_names: 'zip_id'
    feature_names: 'movie_year_bin'
    wide_deep: DEEP
  }
  feature_groups: {
    group_name: 'other'
    feature_names: 'score_year_diff'
    feature_names: 'score_time'
    wide_deep: DEEP
  }
  feature_groups: {
    group_name: 'text'
    feature_names: 'title'
    feature_names: 'genres'
    wide_deep: DEEP
  }
  dbmtl {
    bottom_uniter {
      hidden_size: 512
      num_attention_heads: 4
      num_hidden_layers: 2
      intermediate_size: 512
      hidden_act: 'swish'
      max_position_embeddings: 16
      hidden_dropout_prob: 0.1
      attention_probs_dropout_prob: 0
      other_feature_dnn: {
        hidden_units: 256
        hidden_units: 128
      }
    }
    task_towers {
      tower_name: "classify"
      label_name: "label"
      loss_type: CLASSIFICATION
      metrics_set: {
        auc {}
      }
      metrics_set: {
        gauc {
          uid_field: 'user_id'
        }
      }
      dnn {
        hidden_units: [256, 128, 64]
      }
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    task_towers {
      tower_name: "rating"
      label_name: "rating"
      loss_type: L2_LOSS
      metrics_set: {
        mean_squared_error {}
      }
      dnn {
        hidden_units: [256, 128, 64]
      }
      relation_tower_names: ["classify"]
      relation_dnn {
        hidden_units: [32]
      }
      weight: 1.0
    }
    l2_regularization: 1e-6
  }
  embedding_regularization: 1e-6
}
```

- dbmtl
  - bottom_uniter: 多模态融合模型UNITER的相关配置，请参考[UNITER模型](uniter.md)配置。

    - num_attention_heads: 跨模态融合模块中的 head 数量
    - hidden_size: transformer模块的隐层单元数，必须是`num_attention_heads`的整数倍。
    - num_hidden_layers: transformer模块中的self-attention层数，建议设在1到5之间，默认为1
    - intermediate_size: transformer模块中MLP层的神经元个数
    - hidden_act: transformer模块中MLP层的激活函数，目前支持：'relu'、'swish'、'gelu'、'tanh'
    - attention_probs_dropout_prob: self/cross attention模块attention权重的dropout概率
    - hidden_dropout_prob: multi-headed attention模块中FC layer的dropout概率
    - use_position_embeddings: bool, default is true；是否为文本序列添加位置编码
    - max_position_embeddings: 文本序列的最大位置，当`use_position_embeddings`为true时，必须配置；并且必须大于或等于所有特征配置`max_seq_len`的最大值
    - other_feature_dnn: \[可选\] 其他特征的MLP网络配置

  - 其余与dbmtl一致

### 示例Config

- [DBMTL_demo.config](https://easyrec.oss-cn-beijing.aliyuncs.com/config/dbmtl.config)
- [DBMTL_MMOE_demo.config](https://easyrec.oss-cn-beijing.aliyuncs.com/config/dbmtl_mmoe.config)
- [DBMTL_Backbone_demo.config](https://github.com/alibaba/EasyRec/blob/master/samples/model_config/dbmtl_backbone_on_taobao.config)
- [DBMTL_CMBF_demo.config](https://github.com/alibaba/EasyRec/blob/master/samples/model_config/dbmtl_cmbf_on_movielens.config)
- [DBMTL_UNITER_demo.config](https://github.com/alibaba/EasyRec/blob/master/samples/model_config/dbmtl_uniter_on_movielens.config)

### 参考论文

[DBMTL](https://dl.acm.org/doi/pdf/10.1145/3219819.3220007)