# Local Tutorial

### 安装EasyRec

我们提供了`本地Anaconda安装`和`Docker镜像启动`两种方式。

有技术问题可加钉钉群：37930014162

#### 本地Anaconda安装

温馨提示：**在搭载Apple M系列芯片的MacBook上必须使用TensorFlow 2.5或更高版本**，安装方法请查看TF官方文档。

Demo实验中使用的环境为 `python=3.6.8` + `tensorflow=1.12.0`

```bash
conda create -n py36_tf12 python=3.6.8
conda activate py36_tf12
pip install tensorflow==1.12.0
pip install tensorflow_probability==0.5.0
```

注意：必须要安装`tensorflow_probability`包，需要根据tensorflow的版本安装对应版本的`tensorflow_probability`包。

常见版本对应关系：

| TensorFlow版本 | TensorFlowProbability版本 |
| ------------ | ----------------------- |
| 1.12         | 0.5.0                   |
| 1.15         | 0.8.0                   |
| 2.5.0        | 0.13.0                  |
| 2.6.0        | 0.14.0                  |
| 2.7.0        | 0.15.0                  |
| 2.8.0        | 0.16.0                  |
| 2.10         | 0.18.0                  |
| 2.11         | 0.19.0                  |
| 2.12         | 0.20.0                  |

其他版本对应关系请查看链接：[Releases · tensorflow/probability](https://github.com/tensorflow/probability/releases)。

```bash
git clone https://github.com/alibaba/EasyRec.git
cd EasyRec
bash scripts/init.sh
python setup.py install
```

#### Docker镜像启动

Docker的环境为`python=3.6.9` + `tensorflow=1.15.5`

##### 方法一：拉取已上传的镜像（推荐）

```bash
git clone https://github.com/alibaba/EasyRec.git
cd EasyRec
docker pull mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.8.5
docker run -td --network host -v /local_path/EasyRec:/docker_path/EasyRec mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-0.8.5
docker exec -it <CONTAINER_ID> bash
```

可选镜像：

- mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-paitf1.12-0.8.5 \[只能跑在DLC环境\]
- mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-paitf1.15-0.8.5
- mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15.5-0.8.5
- mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15.5-gpu-0.8.5
- mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py39-tf2.11-0.8.5
- mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py38-tf2.12-0.8.5
- mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py38-tf2.12-gpu-0.8.5

##### 方法二：自行构建Docker镜像

我们提供四个版本的tensorflow镜像构建示例，对应的脚本路径如下：

- scripts/build_docker_tf112.sh
- scripts/build_docker_tf115.sh
- scripts/build_docker_tf210.sh
- scripts/build_docker_tf212.sh

默认使用`tensorflow 1.15`的版本，示例脚本如下，请根据需要替换脚本路径：

```bash
git clone https://github.com/alibaba/EasyRec.git
cd EasyRec
bash scripts/build_docker.sh
sudo docker run -td --network host -v /local_path:/docker_path mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easyrec/easyrec:py36-tf1.15-<easyrec_version>
sudo docker exec -it <CONTAINER_ID> bash
```

注：\<easyrec_version>需匹配当前EasyRec版本。

### 输入数据:

输入一般是csv格式的文件。

#### 示例数据(点击下载)

- train: [dwd_avazu_ctr_deepmodel_train.csv](http://easyrec.oss-cn-beijing.aliyuncs.com/data/dwd_avazu_ctr_deepmodel_train.csv)
- test: [dwd_avazu_ctr_deepmodel_test.csv](http://easyrec.oss-cn-beijing.aliyuncs.com/data/dwd_avazu_ctr_deepmodel_test.csv)
- 示例:

```
1,10,1005,0,85f751fd,c4e18dd6,50e219e0,0e8e4642,b408d42a,09481d60,a99f214a,5deb445a, f4fffcd0,1,0,2098,32,5,238,0,56,0,5
```

- **Note: csv文件不需要有header!!!**

### 启动命令:

#### 配置文件:

[dwd_avazu_ctr_deepmodel_local.config](https://easyrec.oss-cn-beijing.aliyuncs.com/config/DeepFM/dwd_avazu_ctr_deepmodel_local.config), 配置文件采用prototxt格式

#### GPU单机单卡:

```bash
CUDA_VISIBLE_DEVICES=0 python -m easy_rec.python.train_eval --pipeline_config_path dwd_avazu_ctr_deepmodel_local.config
```

- --pipeline_config_path: 训练用的配置文件
- --continue_train: 是否继续训

#### GPU PS训练

- ps跑在CPU上
- master跑在GPU:0上
- worker跑在GPU:1上
- Note: 本地只支持ps, master, worker模式，不支持ps, chief, worker, evaluator模式

```bash
wget https://easyrec.oss-cn-beijing.aliyuncs.com/scripts/train_2gpu.sh
sh train_2gpu.sh dwd_avazu_ctr_deepmodel_local.config
```

#### 评估:

- **Note: 本示例仅仅展示流程，效果无参考价值。**

```bash
CUDA_VISIBLE_DEVICES=0 python -m easy_rec.python.eval --pipeline_config_path dwd_avazu_ctr_deepmodel_local.config
```

#### 导出:

```bash
CUDA_VISIBLE_DEVICES='' python -m easy_rec.python.export --pipeline_config_path dwd_avazu_ctr_deepmodel_local.config --export_dir dwd_avazu_ctr_export
```

#### CPU训练/评估/导出

不指定CUDA_VISIBLE_DEVICES即可，例如：

```bash
 python -m easy_rec.python.train_eval --pipeline_config_path dwd_avazu_ctr_deepmodel_local.config
```

### 配置文件:

#### 输入输出

```protobuf
# 训练文件和测试文件
train_input_path: "dwd_avazu_ctr_deepmodel_train.csv"
eval_input_path: "dwd_avazu_ctr_deepmodel_test.csv"
# 模型保存路径
model_dir: "experiments/easy_rec/"
```

#### 数据相关

数据配置具体见：[数据](../feature/data.md)

```protobuf
# 数据相关的描述
data_config {
  # 字段之间的分隔符
  separator: ","
  # 和csv或者odps table里面字段一一对应
  input_fields: {
    input_name: "label"
    input_type: FLOAT
    default_val:""
  }
  ...
  input_fields: {
    input_name: "site_id"
    input_type: STRING
    default_val:""
  }
  input_fields: {
    input_name: "site_domain"
    input_type: STRING
    default_val:""
  }
}
```

#### 特征相关

特征配置具体见：[特征](../feature/feature.md)

```protobuf
feature_config: {
  features: {
    input_names: "hour"
    # 特征类型
    feature_type: IdFeature
    # embedding向量的dimension
    embedding_dim: 16
    # hash_bucket大小，通过tf.strings.to_hash_bucket将hour字符串映射到0-49的Id
    hash_bucket_size: 50
  }
  features: {
    input_names: "c1"
    feature_type: IdFeature
    embedding_dim: 16
    hash_bucket_size: 10
  }
  ...
  features: {
    input_names: "site_category"
    feature_type: IdFeature
    embedding_dim: 16
    hash_bucket_size: 100
  }
  features: {
    input_names: "app_id"
    feature_type: IdFeature
    embedding_dim: 32
    hash_bucket_size: 10000
  }
  ...
  features: {
    input_names: "c15"
    feature_type: IdFeature
    embedding_dim: 16
    hash_bucket_size: 500
  }
  features: {
    input_names: "c16"
    feature_type: IdFeature
    embedding_dim: 16
    hash_bucket_size: 500
  }
  ...
  features: {
    input_names: "c20"
    feature_type: IdFeature
    embedding_dim: 16
    hash_bucket_size: 500
  }
  features: {
    input_names: "c21"
    feature_type: IdFeature
    embedding_dim: 16
    hash_bucket_size: 500
  }
}
```

#### 训练相关

训练配置具体见：[训练](../train.md)

```protobuf
# 训练相关的参数
train_config {
  # 每200轮打印一行log
  log_step_count_steps: 200
  # 优化器相关的参数
  optimizer_config: {
    adam_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.0001
          decay_steps: 100000
          decay_factor: 0.5
          min_learning_rate: 0.0000001
        }
      }
    }
    use_moving_average: false
  }
  # 使用SyncReplicasOptimizer进行分布式训练(同步模式)
  sync_replicas: true
  # num_steps = total_sample_num * num_epochs / batch_size / num_workers
  num_steps:1000
}
```

#### 评估相关

评估配置具体见：[评估](../eval.md)

```protobuf
eval_config {
  # 仅仅评估1000个样本，这里是为了示例速度考虑，实际使用时需要删除
  num_examples: 1000
  metrics_set: {
    # metric为auc
    auc {}
  }
}
```

#### 模型相关

```protobuf
model_config:{
  model_class: "MultiTower"
  feature_groups: {
    group_name: "item"
    feature_names: "c1"
    feature_names: "banner_pos"
    feature_names: "site_id"
    feature_names: "site_domain"
    feature_names: "site_category"
    feature_names: "app_id"
    feature_names: "app_domain"
    feature_names: "app_category"
    wide_deep:DEEP
  }
  feature_groups: {
    group_name: "user"
    feature_names: "device_id"
    feature_names: "device_ip"
    feature_names: "device_model"
    feature_names: "device_type"
    feature_names: "device_conn_type"
    wide_deep:DEEP
  }
  feature_groups: {
    group_name: "user_item"
    feature_names: "hour"
    feature_names: "c14"
    feature_names: "c15"
    feature_names: "c16"
    feature_names: "c17"
    feature_names: "c18"
    feature_names: "c19"
    feature_names: "c20"
    feature_names: "c21"
    wide_deep:DEEP
  }

  multi_tower {
    towers {
      input: "item"
      dnn {
        hidden_units: [384, 320, 256, 192, 128]
      }
    }
    towers {
      input: "user"
      dnn {
        hidden_units: [384, 320, 256, 192, 128]
      }
    }
    towers {
      input: "user_item"
      dnn {
        hidden_units: [384, 320, 256, 192, 128]
      }
    }
    final_dnn {
      hidden_units: [256, 192, 128, 64]
    }
    l2_regularization: 0.0
  }
  embedding_regularization: 0.0
}

```

#### 参考手册

[EasyRecConfig参考手册](../reference.md)