# EMR Tutorial

## 输入数据:

输入一般是csv格式的文件。 如下所示，列之间用,分割

- 示例数据:
  - train: [dwd_avazu_ctr_deepmodel_train.csv](http://easyrec.oss-cn-beijing.aliyuncs.com/data/dwd_avazu_ctr_deepmodel_train.csv)
  - test: [dwd_avazu_ctr_deepmodel_test.csv](http://easyrec.oss-cn-beijing.aliyuncs.com/data/dwd_avazu_ctr_deepmodel_test.csv)
  - 示例:

```
1,10,1005,0,85f751fd,c4e18dd6,50e219e0,0e8e4642,b408d42a,09481d60,a99f214a,5deb445a, f4fffcd0,1,0,2098,32,5,238,0,56,0,5
```

- **Note: csv文件不需要有header!!!**

## 创建DataScience集群:

[DataScience集群](https://help.aliyun.com/document_detail/170836.html)参考

## Copy data to HDFS

```bash
hadoop fs -mkdir -p hdfs://emr-header-1:9000/user/easy_rec/data/
hadoop fs -put dwd_avazu_ctr_deepmodel_train.csv hdfs://emr-header-1:9000/user/easy_rec/data/
hadoop fs -put dwd_avazu_ctr_deepmodel_test.csv hdfs://emr-header-1:9000/user/easy_rec/data/
```

## 训练:

- 配置文件: [dwd_avazu_ctr_deepmodel.config](https://easyrec.oss-cn-beijing.aliyuncs.com/config/emr/dwd_avazu_ctr_deepmodel.config) \*\* \*\* 配置文件采用prototxt格式，内容解析见[配置文件](#Qgqxc)
- 使用el_submit提交训练任务，**el_submit**相关参数请参考[**tf_on_yarn**](../tf_on_yarn.md)

### 开源TF模式

`el_submit -yaml train.tf.yaml` 配置文件内容如下

```bash
app:
    app_type: tensorflow-ps
    app_name: easyrec_tf_train
    mode: local
    exit_mode: true
    verbose: true
    files: dwd_avazu_ctr_deepmodel.config
    command: python -m easy_rec.python.train_eval --pipeline_config_path dwd_avazu_ctr_deepmodel.config --continue_train
    wait_time: 8
    hook: /usr/local/dstools/bin/hooks.sh

resource:
    ps_num: 1
    ps_cpu: 1
    ps_memory: 10g
    ps_mode_arg:
    worker_num: 2
    worker_cpu: 6
    worker_gpu: 1
    worker_memory: 10g
    worker_mode_arg:

```

### Paitf模式

**使用Paitf需要token授权， 请联系产品架构团队索取token**
`el_submit -yaml train.paitf.yaml `配置文件内容如下

```bash
app:
    app_type: tensorflow-ps
    app_name: easyrec_paitf_train
    mode: docker-pai
    mode_arg: paitf:1.12-gpu
    token: AAAAAAAAAAAAAAABBBBBBBBBBBBBBB==
    exit_mode: true
    verbose: true
    files: dwd_avazu_ctr_deepmodel.config
    command: python -m easy_rec.python.train_eval --pipeline_config_path dwd_avazu_ctr_deepmodel.config --continue_train
    hook: /usr/local/dstools/bin/hooks.sh
    wait_time: 8

resource:
    ps_num: 1
    ps_cpu: 1
    ps_memory: 10g
    ps_mode_arg:  paitf:1.12-cpu
    worker_num: 1
    worker_cpu: 6
    worker_gpu: 1
    worker_memory: 10g
    worker_mode_arg: paitf:1.12-gpu

```

- [查看任务日志](../emr_yarn_log.md)

## 评估:

- 使用el_submit提交评估任务，**el_submit**相关参数请参考[**tf_on_yarn**](../tf_on_yarn.md)
- **Note: 本示例仅仅展示流程，效果无参考价值。**

### 开源TF模式

`el_submit -yaml eval.tf.yaml `配置文件内容如下

```bash
app:
    app_type: standalone
    app_name: easyrec_tf_eval
    mode: local
    exit_mode: true
    verbose: true
    files: dwd_avazu_ctr_deepmodel.config
    command: python -m easy_rec.python.eval --pipeline_config_path dwd_avazu_ctr_deepmodel.config
    wait_time: 8
    hook: /usr/local/dstools/bin/hooks.sh

resource:
    worker_num: 1
    worker_cpu: 6
    worker_gpu: 1
    worker_memory: 10g
    worker_mode_arg:

```

### Paitf模式

**使用Paitf需要token授权， 请联系产品架构团队索取token**
`el_submit -yaml eval.paitf.yaml `配置文件内容如下

```bash
app:
    app_type: standalone
    app_name: easyrec_paitf_eval
    mode: docker-pai
    mode_arg: paitf:1.12-gpu
    token: AAAAAAAAAAAAAAABBBBBBBBBBBBBBB==
    exit_mode: true
    verbose: true
    files: dwd_avazu_ctr_deepmodel.config
    command: python -m easy_rec.python.eval --pipeline_config_path dwd_avazu_ctr_deepmodel.config
    wait_time: 8
    hook: /usr/local/dstools/bin/hooks.sh

resource:
    worker_num: 1
    worker_cpu: 6
    worker_gpu: 1
    worker_memory: 10g
    worker_mode_arg: paitf:1.12-gpu

```

## 导出:

- 使用el_submit提交导出任务, **el_submit**相关参数请参考[**tf_on_yarn**](https://help.aliyun.com/document_detail/93031.html)

--pipeline_config_path: EasyRec配置文件
--export_dir: 导出模型目录 
--checkpoint_path: 指定checkpoint，默认不指定，不指定则使用model_dir下面最新的checkpoint

### 开源TF模式

`el_submit -yaml export.tf.yaml `配置文件内容如下

```bash
app:
    app_type: standalone
    app_name: easyrec_tf_export
    mode: local
    exit_mode: true
    verbose: true
    files: dwd_avazu_ctr_deepmodel.config
    command: python -m easy_rec.python.export --pipeline_config_path dwd_avazu_ctr_deepmodel.config --export_dir hdfs://emr-header-1:9000/user/easy_rec/experiment/export
    wait_time: 8
    hook: /usr/local/dstools/bin/hooks.sh

resource:
    worker_num: 1
    worker_cpu: 6
    worker_gpu: 1
    worker_memory: 10g
    worker_mode_arg:

```

### Paitf模式

**使用Paitf需要token授权， 请联系产品架构团队索取token**
`el_submit -yaml export.paitf.yaml `配置文件内容如下

```bash
app:
    app_type: standalone
    app_name: easyrec_paitf_export
    mode: docker-pai
    mode_arg: paitf:1.12-gpu
    token: AAAAAAAAAAAAAAABBBBBBBBBBBBBBB==
    exit_mode: true
    verbose: true
    files: dwd_avazu_ctr_deepmodel.config
    command: python -m easy_rec.python.export --pipeline_config_path dwd_avazu_ctr_deepmodel.config --export_dir hdfs://emr-header-1:9000/user/easy_rec/experiment/export
    wait_time: 8
    hook: /usr/local/dstools/bin/hooks.sh

resource:
    worker_num: 1
    worker_cpu: 6
    worker_gpu: 1
    worker_memory: 10g
    worker_mode_arg: paitf:1.12-gpu

```

### 查看导出结果

```bash
hadoop fs -ls hdfs://emr-header-1:9000/user/easy_rec/experiment/export
```

## 部署到Pai-EAS服务

### 1. 导出模型到savedmodel， 并压缩成tar包

```
hadoop fs -get /user/easy_rec/experiment/export/mazeng/1606721697 savedmodel
tar zcvf savedmodel.tar.gz savedmodel
```

### 2. 配置AK， 部署eas服务需要

```
eascmd64 config -i AAAAAAAAAA -k BBBBBBBBBBB -e pai-eas.cn-beijing.aliyuncs.com
```

### 3. 上传模型压缩包，获取oss url

```
eascmd64 upload savedmodel.tar.gz
```

### 4. 部署到线上服务（慎行）

```
eascmd64 create pmml.json
```

pmml.json配置文件内容如下, easyrec是基于tensorflow/paitf的， 因此processor需选择tensorflow相关的

```bash
{
  "name": "demo0",
  "generate_token": "true",
  "model_path": "oss://eas-model-beijing/166408185111/savedmodel.tar.gz",
  "processor": "tensorflow_cpu_1.14",
  "metadata": {
    "instance": 1,
    "eas.enabled_model_verification": false,
    "cpu": 1
  }
}
```

### 5. 构造服务请求

参考 [https://help.aliyun.com/document_detail/111055.html](https://help.aliyun.com/document_detail/111055.html)

#### 1) 获取模型input output信息

```
curl http://pai-eas-vpc.cn-beijing.aliyuncs.com/api/predict/mnist_saved_model_example | python -mjson.tool
```

#### 2) python版

参考 [https://github.com/pai-eas/eas-python-sdk](https://github.com/pai-eas/eas-python-sdk)

```
#!/usr/bin/env python

from eas_prediction import PredictClient
from eas_prediction import StringRequest
from eas_prediction import TFRequest

if __name__ == '__main__':
    client = PredictClient('http://1828488879222746.cn-beijing.pai-eas.aliyuncs.com', 'mnist_saved_model_example')
    client.set_token('AAAAAAAAAAAAAAABBBBBBBBBBBBBBB==')
    client.init()

    #request = StringRequest('[{}]')
    req = TFRequest('predict_images')
    req.add_feed('images', [1, 784], TFRequest.DT_FLOAT, [1] * 784)
    for x in range(0, 1000000):
        resp = client.predict(req)
        print(resp)

```

#### 3) 其他语言版

参考 [https://help.aliyun.com/document_detail/111055.html](https://help.aliyun.com/document_detail/111055.html)

### 配置文件:

#### 输入输出

```protobuf
# 训练表和测试数据
train_input_path: "hdfs://emr-header-1:9000/user/easy_rec/data/dwd_avazu_ctr_deepmodel_train.csv"
eval_input_path: "hdfs://emr-header-1:9000/user/easy_rec/data/dwd_avazu_ctr_deepmodel_test.csv"
# 模型保存路径
model_dir: "hdfs://emr-header-1:9000/user/easy_rec/experiment/"
```

#### 数据相关

```protobuf
# 数据相关的描述
data_config {
  separator: ","
  input_fields: {
    input_name: "label"
    input_type: FLOAT
    default_val:""
  }
  input_fields: {
    input_name: "hour"
    input_type: STRING
    default_val:""
  }
  input_fields: {
    input_name: "c1"
    input_type: STRING
    default_val:""
  }
  ...
  input_fields: {
    input_name: "c20"
    input_type: STRING
    default_val:""
  }
  input_fields: {
    input_name: "c21"
    input_type: STRING
    default_val:""
  }

  label_fields: "label"

  batch_size: 1024
  prefetch_size: 32
  input_type: CSVInput
}
```

#### 特征相关

```protobuf
feature_config:{
  features: {
    input_names: "hour"
    feature_type: IdFeature
    embedding_dim: 16
    hash_bucket_size: 50
  }
  features: {
    input_names: "c1"
    feature_type: IdFeature
    embedding_dim: 16
    hash_bucket_size: 10
  }
  ...
  features: {
    input_names: "c20"
    feature_type: IdFeature
    embedding_dim: 16
    hash_bucket_size: 500
  }
  features: {
    input_names: "c21"
    feature_type: IdFeature
    embedding_dim: 16
    hash_bucket_size: 500
  }
}

```

#### 训练相关

```protobuf
# 训练相关的参数
train_config {
  # 每200轮打印一行log
  log_step_count_steps: 200
  # 优化器相关的参数
  optimizer_config: {
    adam_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.0001
          decay_steps: 100000
          decay_factor: 0.5
          min_learning_rate: 0.0000001
        }
      }
    }
    use_moving_average: false
  }
  # 使用SyncReplicasOptimizer进行分布式训练(同步模式)
  sync_replicas: true
  # num_steps = total_sample_num * num_epochs / batch_size / num_workers
  num_steps: 2000
}
```

#### 评估相关

```protobuf
eval_config {
  # 仅仅评估1000个样本，这里是为了示例速度考虑，实际使用时需要删除
  num_examples: 1000
  metrics_set: {
    # metric为auc
    auc {}
  }
}
```

#### 模型相关

```protobuf
model_config:{
  model_class: "MultiTower"
  feature_groups: {
    group_name: "item"
    feature_names: "c1"
    feature_names: "banner_pos"
    feature_names: "site_id"
    feature_names: "site_domain"
    feature_names: "site_category"
    feature_names: "app_id"
    feature_names: "app_domain"
    feature_names: "app_category"
    wide_deep:DEEP
  }
  feature_groups: {
    group_name: "user"
    feature_names: "device_id"
    feature_names: "device_ip"
    feature_names: "device_model"
    feature_names: "device_type"
    feature_names: "device_conn_type"
    wide_deep:DEEP
  }
  feature_groups: {
    group_name: "user_item"
    feature_names: "hour"
    feature_names: "c14"
    feature_names: "c15"
    feature_names: "c16"
    feature_names: "c17"
    feature_names: "c18"
    feature_names: "c19"
    feature_names: "c20"
    feature_names: "c21"
    wide_deep:DEEP
  }

  multi_tower {
    towers {
      input: "item"
      dnn {
        hidden_units: [384, 320, 256, 192, 128]
      }
    }

    towers {
      input: "user"
      dnn {
        hidden_units: [384, 320, 256, 192, 128]
      }
    }

    towers {
      input: "user_item"
      dnn {
        hidden_units: [384, 320, 256, 192, 128]
      }
    }

    final_dnn {
      hidden_units: [256, 192, 128, 64]
    }
    l2_regularization: 0.0
  }
  embedding_regularization: 0.0
}
```

#### Config下载

[dwd_avazu_ctr_deepmodel.config](http://easyrec.oss-cn-beijing.aliyuncs.com/config/emr/dwd_avazu_ctr_deepmodel.config)

#### ExcelConfig下载

ExcelConfig比Config更加简明

- [dwd_avazu_ctr_deepmodel.xls](http://easyrec.oss-cn-beijing.aliyuncs.com/data/dwd_avazu_ctr_deepmodel.xls)
- [ExcelConfig 转 Config](../feature/excel_config.md)

### 参考手册

- [EasyRecConfig参考手册](../reference.md)

- [TF on EMR参考手册](../tf_on_yarn.md)

- [DataScience集群手册](https://help.aliyun.com/document_detail/170836.html)

- [EMR Tensorboard](../emr_tensorboard.md)