Protocol Documentation
Table of Contents
easy_rec/python/protos/autoint.proto
Top
AutoInt
Field | Type | Label | Description |
multi_head_num |
uint32 |
required |
The number of heads Default: 1 |
multi_head_size |
uint32 |
required |
The dimension of heads |
interacting_layer_num |
uint32 |
required |
The number of interacting layers Default: 1 |
l2_regularization |
float |
required |
Default: 0.0001 |
easy_rec/python/protos/backbone.proto
Top
BackboneTower
Field | Type | Label | Description |
packages |
BlockPackage |
repeated |
a few sub DAGs |
blocks |
Block |
repeated |
a few blocks generating a DAG |
concat_blocks |
string |
repeated |
the names of output blocks |
top_mlp |
MLP |
optional |
optional top mlp layer |
Block
Field | Type | Label | Description |
name |
string |
required |
|
inputs |
Input |
repeated |
the input names of feature groups or other blocks |
input_concat_axis |
int32 |
optional |
Default: -1 |
merge_inputs_into_list |
bool |
optional |
|
extra_input_fn |
string |
optional |
|
layers |
Layer |
repeated |
sequential layers |
input_layer |
InputLayer |
optional |
|
lambda |
Lambda |
optional |
|
keras_layer |
KerasLayer |
optional |
|
recurrent |
RecurrentLayer |
optional |
|
repeat |
RepeatLayer |
optional |
|
BlockPackage
a package of blocks for reuse; e.g. call in a contrastive learning manner
Field | Type | Label | Description |
name |
string |
required |
package name |
blocks |
Block |
repeated |
a few blocks generating a DAG |
concat_blocks |
string |
repeated |
the names of output blocks |
Field | Type | Label | Description |
feature_group_name |
string |
optional |
|
block_name |
string |
optional |
|
package_name |
string |
optional |
|
use_package_input |
bool |
optional |
|
input_fn |
string |
optional |
|
input_slice |
string |
optional |
|
ignore_input |
bool |
optional |
Default: false |
reset_input |
InputLayer |
optional |
|
package_input |
string |
optional |
|
package_input_fn |
string |
optional |
|
Field | Type | Label | Description |
do_batch_norm |
bool |
optional |
|
do_layer_norm |
bool |
optional |
|
dropout_rate |
float |
optional |
|
feature_dropout_rate |
float |
optional |
|
only_output_feature_list |
bool |
optional |
|
only_output_3d_tensor |
bool |
optional |
|
output_2d_tensor_and_feature_list |
bool |
optional |
|
output_seq_and_normal_feature |
bool |
optional |
|
wide_output_dim |
uint32 |
optional |
|
concat_seq_feature |
bool |
optional |
Default: true |
Lambda
Field | Type | Label | Description |
expression |
string |
required |
|
Layer
RecurrentLayer
Field | Type | Label | Description |
num_steps |
uint32 |
required |
Default: 1 |
fixed_input_index |
uint32 |
optional |
|
keras_layer |
KerasLayer |
required |
|
RepeatLayer
Field | Type | Label | Description |
num_repeat |
uint32 |
required |
Default: 1 |
output_concat_axis |
int32 |
optional |
default output the list of multiple outputs |
keras_layer |
KerasLayer |
required |
|
input_slice |
string |
optional |
|
input_fn |
string |
optional |
|
easy_rec/python/protos/cmbf.proto
Top
CMBF
Field | Type | Label | Description |
config |
CMBFTower |
required |
|
final_dnn |
DNN |
required |
|
CMBFTower
Field | Type | Label | Description |
multi_head_num |
uint32 |
required |
The number of heads of cross modal fusion layer Default: 1 |
image_multi_head_num |
uint32 |
required |
The number of heads of image feature learning layer Default: 1 |
text_multi_head_num |
uint32 |
required |
The number of heads of text feature learning layer Default: 1 |
text_head_size |
uint32 |
required |
The dimension of text heads |
image_head_size |
uint32 |
required |
The dimension of image heads Default: 64 |
image_feature_patch_num |
uint32 |
required |
The number of patches of image feature, take effect when there is only one image feature Default: 1 |
image_feature_dim |
uint32 |
required |
Do dimension reduce to this size for image feature before single modal learning module Default: 0 |
image_self_attention_layer_num |
uint32 |
required |
The number of self attention layers for image features Default: 0 |
text_self_attention_layer_num |
uint32 |
required |
The number of self attention layers for text features Default: 1 |
cross_modal_layer_num |
uint32 |
required |
The number of cross modal layers Default: 1 |
image_cross_head_size |
uint32 |
required |
The dimension of image cross modal heads |
text_cross_head_size |
uint32 |
required |
The dimension of text cross modal heads |
hidden_dropout_prob |
float |
required |
Dropout probability for hidden layers Default: 0 |
attention_probs_dropout_prob |
float |
required |
Dropout probability of the attention probabilities Default: 0 |
use_token_type |
bool |
required |
Whether to add embeddings for different text sequence features Default: false |
use_position_embeddings |
bool |
required |
Whether to add position embeddings for the position of each token in the text sequence Default: true |
max_position_embeddings |
uint32 |
required |
Maximum sequence length that might ever be used with this model Default: 0 |
text_seq_emb_dropout_prob |
float |
required |
Dropout probability for text sequence embeddings Default: 0.1 |
other_feature_dnn |
DNN |
optional |
dnn layers for other features |
easy_rec/python/protos/collaborative_metric_learning.proto
Top
CoMetricLearningI2I
easy_rec/python/protos/data_source.proto
Top
Field | Type | Label | Description |
category_path |
string |
repeated |
support gfile.Glob |
dense_path |
string |
repeated |
|
label_path |
string |
repeated |
|
DatahubServer
Field | Type | Label | Description |
akId |
string |
required |
|
akSecret |
string |
required |
|
endpoint |
string |
required |
|
project |
string |
required |
|
topic |
string |
required |
|
offset_info |
string |
optional |
in json format: {"0":{"cursor": ""}, "1":{"cursor":""}} |
offset_time |
string |
optional |
offset_time could be two formats:
1: %Y%m%d %H:%M:%S "20220508 12:00:00"
2: %s "1651982400" |
KafkaServer
Field | Type | Label | Description |
server |
string |
required |
|
topic |
string |
required |
|
group |
string |
required |
|
offset_info |
string |
optional |
in json format: {'0':10, '1':20} |
offset_time |
string |
optional |
offset_time could be two formats:
1: %Y%m%d %H:%M:%S '20220508 12:00:00'
2: %s '1651982400' |
config_global |
string |
repeated |
kafka global config, such as: fetch.max.bytes=1024 |
config_topic |
string |
repeated |
kafka topic config, such as: max.partition.fetch.bytes=1024 |
easy_rec/python/protos/dataset.proto
Top
DatasetConfig
Field | Type | Label | Description |
batch_size |
uint32 |
optional |
mini batch size to use for training and evaluation. Default: 32 |
auto_expand_input_fields |
bool |
optional |
set auto_expand_input_fields to true to
auto_expand field[1-21] to field1, field2, ..., field21 Default: false |
label_fields |
string |
repeated |
label fields, normally only one field is used.
For multiple target models such as MMOE
multiple label_fields will be set. |
label_sep |
string |
repeated |
label separator |
label_dim |
uint32 |
repeated |
label dimensions which need to be set when there
are labels have dimension > 1 |
shuffle |
bool |
optional |
whether to shuffle data Default: true |
shuffle_buffer_size |
int32 |
optional |
shufffle buffer for better performance, even shuffle buffer is set,
it is suggested to do full data shuffle before training
especially when the performance of models is not good. Default: 32 |
num_epochs |
uint32 |
optional |
The number of times a data source is read. If set to zero, the data source
will be reused indefinitely. Default: 0 |
prefetch_size |
uint32 |
optional |
Number of decoded batches to prefetch. Default: 32 |
shard |
bool |
optional |
shard dataset to 1/num_workers in distribute mode
this param is not used anymore Default: false |
file_shard |
bool |
optional |
shard by file, not by sample, valid only for CSVInput Default: false |
input_type |
DatasetConfig.InputType |
required |
|
separator |
string |
optional |
separator of column features, only used for CSVInput*
not used in OdpsInput*
binary separators are supported:
CTRL+A could be set as '\001'
CTRL+B could be set as '\002'
CTRL+C could be set as '\003'
for RTPInput and OdpsRTPInput it is usually set
to '\002' Default: , |
num_parallel_calls |
uint32 |
optional |
parallel preproces of raw data, avoid using too small
or too large numbers(suggested be to small than
number of the cores) Default: 8 |
selected_cols |
string |
optional |
only used for OdpsInput/OdpsInputV2/OdpsRTPInput, comma separated
for RTPInput, selected_cols use indices as column names
such as '1,2,4', where 1,2 are label columns, and
4 is the feature column, column 0,3 are not used, |
selected_col_types |
string |
optional |
selected col types, only used for OdpsInput/OdpsInputV2
to avoid error setting of data types |
input_fields |
DatasetConfig.Field |
repeated |
the input fields must be the same number and in the
same order as data in csv files or odps tables |
rtp_separator |
string |
optional |
for RTPInput only Default: ; |
ignore_error |
bool |
optional |
ignore some data errors
it is not suggested to set this parameter Default: false |
pai_worker_queue |
bool |
optional |
whether to use pai global shuffle queue, only for OdpsInput,
OdpsInputV2, OdpsRTPInputV2 Default: false |
pai_worker_slice_num |
int32 |
optional |
Default: 100 |
chief_redundant |
bool |
optional |
if true, one worker will duplicate the data of the chief node
and undertake the gradient computation of the chief node Default: false |
sample_weight |
string |
optional |
input field for sample weight |
data_compression_type |
string |
optional |
the compression type of tfrecord |
n_data_batch_tfrecord |
uint32 |
optional |
n data for one feature in tfrecord |
with_header |
bool |
optional |
for csv files, may optionally with an header
in that case, input_name must match header name,
and the number and the order of input_fields
may not be the same as that in csv files. Default: false |
feature_fields |
string |
repeated |
|
negative_sampler |
NegativeSampler |
optional |
|
negative_sampler_v2 |
NegativeSamplerV2 |
optional |
|
hard_negative_sampler |
HardNegativeSampler |
optional |
|
hard_negative_sampler_v2 |
HardNegativeSamplerV2 |
optional |
|
negative_sampler_in_memory |
NegativeSamplerInMemory |
optional |
|
eval_batch_size |
uint32 |
optional |
Default: 4096 |
drop_remainder |
bool |
optional |
Default: false |
DatasetConfig.Field
Field | Type | Label | Description |
input_name |
string |
required |
|
input_type |
DatasetConfig.FieldType |
required |
Default: STRING |
default_val |
string |
optional |
|
input_dim |
uint32 |
optional |
Default: 1 |
input_shape |
uint32 |
optional |
Default: 1 |
user_define_fn |
string |
optional |
user-defined function for label. eg: tf.math.log1p, remap_lbl |
user_define_fn_path |
string |
optional |
user-defined function path. eg: /samples/demo_script/process_lbl.py |
user_define_fn_res_type |
DatasetConfig.FieldType |
optional |
output field type of user-defined function. |
ignore_val |
string |
optional |
ignore value |
HardNegativeSampler
Weighted Random Sampling ItemID not in Batch and Sampling Hard Edge
Field | Type | Label | Description |
user_input_path |
string |
required |
user data path
userid weight |
item_input_path |
string |
required |
item data path
itemid weight attrs |
hard_neg_edge_input_path |
string |
required |
hard negative edge path
userid itemid weight |
num_sample |
uint32 |
required |
number of negative sample |
num_hard_sample |
uint32 |
required |
max number of hard negative sample |
attr_fields |
string |
repeated |
field names of attrs in train data or eval data |
item_id_field |
string |
required |
field name of item_id in train data or eval data |
user_id_field |
string |
required |
field name of user_id in train data or eval data |
attr_delimiter |
string |
optional |
Default: : |
num_eval_sample |
uint32 |
optional |
Default: 0 |
field_delimiter |
string |
optional |
only works on DataScience/Local Default: |
HardNegativeSamplerV2
Weighted Random Sampling ItemID not with Edge and Sampling Hard Edge
Field | Type | Label | Description |
user_input_path |
string |
required |
user data path
userid weight |
item_input_path |
string |
required |
item data path
itemid weight attrs |
pos_edge_input_path |
string |
required |
positive edge path
userid itemid weight |
hard_neg_edge_input_path |
string |
required |
hard negative edge path
userid itemid weight |
num_sample |
uint32 |
required |
number of negative sample |
num_hard_sample |
uint32 |
required |
max number of hard negative sample |
attr_fields |
string |
repeated |
field names of attrs in train data or eval data |
item_id_field |
string |
required |
field name of item_id in train data or eval data |
user_id_field |
string |
required |
field name of user_id in train data or eval data |
attr_delimiter |
string |
optional |
Default: : |
num_eval_sample |
uint32 |
optional |
Default: 0 |
field_delimiter |
string |
optional |
only works on DataScience/Local Default: |
NegativeSampler
Weighted Random Sampling ItemID not in Batch
Field | Type | Label | Description |
input_path |
string |
required |
sample data path
itemid weight attrs |
num_sample |
uint32 |
required |
number of negative sample |
attr_fields |
string |
repeated |
field names of attrs in train data or eval data |
item_id_field |
string |
required |
field name of item_id in train data or eval data |
attr_delimiter |
string |
optional |
Default: : |
num_eval_sample |
uint32 |
optional |
Default: 0 |
field_delimiter |
string |
optional |
only works on DataScience/Local Default: |
NegativeSamplerInMemory
Field | Type | Label | Description |
input_path |
string |
required |
sample data path
itemid weight attrs |
num_sample |
uint32 |
required |
number of negative sample |
attr_fields |
string |
repeated |
field names of attrs in train data or eval data |
item_id_field |
string |
required |
field name of item_id in train data or eval data |
attr_delimiter |
string |
optional |
Default: : |
num_eval_sample |
uint32 |
optional |
Default: 0 |
field_delimiter |
string |
optional |
only works on DataScience/Local Default: |
NegativeSamplerV2
Weighted Random Sampling ItemID not with Edge
Field | Type | Label | Description |
user_input_path |
string |
required |
user data path
userid weight |
item_input_path |
string |
required |
item data path
itemid weight attrs |
pos_edge_input_path |
string |
required |
positive edge path
userid itemid weight |
num_sample |
uint32 |
required |
number of negative sample |
attr_fields |
string |
repeated |
field names of attrs in train data or eval data |
item_id_field |
string |
required |
field name of item_id in train data or eval data |
user_id_field |
string |
required |
field name of user_id in train data or eval data |
attr_delimiter |
string |
optional |
Default: : |
num_eval_sample |
uint32 |
optional |
Default: 0 |
field_delimiter |
string |
optional |
only works on DataScience/Local Default: |
DatasetConfig.FieldType
Name | Number | Description |
INT32 |
0 |
|
INT64 |
1 |
|
STRING |
2 |
|
FLOAT |
4 |
|
DOUBLE |
5 |
|
BOOL |
6 |
|
Name | Number | Description |
CSVInput |
10 |
csv format input, could be used in local or hdfs
support .gz compression(but not .tar.gz files) |
CSVInputV2 |
11 |
@Depreciated |
CSVInputEx |
12 |
extended csv format, allow quote in fields |
OdpsInput |
2 |
@Depreciated, has memory leak problem |
OdpsInputV2 |
3 |
odps input, used on pai |
DataHubInput |
15 |
|
OdpsInputV3 |
9 |
|
RTPInput |
4 |
|
RTPInputV2 |
5 |
|
OdpsRTPInput |
601 |
|
OdpsRTPInputV2 |
602 |
|
TFRecordInput |
7 |
|
BatchTFRecordInput |
14 |
|
DummyInput |
8 |
for the purpose to debug performance bottleneck of
input pipelines |
KafkaInput |
13 |
|
HiveInput |
16 |
|
HiveRTPInput |
17 |
|
HiveParquetInput |
18 |
|
ParquetInput |
19 |
All features are packed into one field for fast copying to gpu,
and there are no feature preprocessing step, it is assumed that
features are preprocessed before training.
Requirements: python3 and tf2.x due to multiprocssing spawn and
RaggedTensor apis. |
ParquetInputV2 |
20 |
Features are not packed, and are preprocessing separately.
Requirements: python3 and tf2.x due to multiprocssing spawn and
RaggedTensor apis. |
ParquetInputV3 |
21 |
c++ version of parquet dataset which currently are only available
with deeprec. |
CriteoInput |
1001 |
|
easy_rec/python/protos/dbmtl.proto
Top
DBMTL
Field | Type | Label | Description |
bottom_cmbf |
CMBFTower |
optional |
shared bottom cmbf layer |
bottom_uniter |
UniterTower |
optional |
shared bottom uniter layer |
bottom_dnn |
DNN |
optional |
shared bottom dnn layer |
expert_dnn |
DNN |
optional |
mmoe expert dnn layer definition |
num_expert |
uint32 |
optional |
number of mmoe experts Default: 0 |
task_towers |
BayesTaskTower |
repeated |
bayes task tower |
l2_regularization |
float |
optional |
l2 regularization Default: 0.0001 |
easy_rec/python/protos/dcn.proto
Top
CrossTower
Field | Type | Label | Description |
input |
string |
required |
|
cross_num |
uint32 |
required |
The number of cross layers Default: 3 |
DCN
Field | Type | Label | Description |
deep_tower |
Tower |
required |
|
cross_tower |
CrossTower |
required |
|
final_dnn |
DNN |
required |
|
l2_regularization |
float |
required |
Default: 0.0001 |
easy_rec/python/protos/deepfm.proto
Top
DeepFM
Field | Type | Label | Description |
dnn |
DNN |
required |
|
final_dnn |
DNN |
optional |
|
wide_output_dim |
uint32 |
optional |
Default: 1 |
wide_regularization |
float |
optional |
deprecated Default: 0.0001 |
dense_regularization |
float |
optional |
deprecated Default: 0.0001 |
l2_regularization |
float |
optional |
Default: 0.0001 |
easy_rec/python/protos/dlrm.proto
Top
DLRM
Field | Type | Label | Description |
top_dnn |
DNN |
required |
|
bot_dnn |
DNN |
required |
|
arch_interaction_op |
string |
optional |
options are: dot and cat Default: dot |
arch_interaction_itself |
bool |
optional |
whether a feature will interact with itself Default: false |
arch_with_dense_feature |
bool |
optional |
whether to include dense features after interaction Default: false |
l2_regularization |
float |
optional |
Default: 1e-05 |
easy_rec/python/protos/dnn.proto
Top
DNN
Field | Type | Label | Description |
hidden_units |
uint32 |
repeated |
hidden units for each layer |
dropout_ratio |
float |
repeated |
ratio of dropout |
activation |
string |
optional |
activation function Default: tf.nn.relu |
use_bn |
bool |
optional |
use batch normalization Default: true |
MLP
Field | Type | Label | Description |
hidden_units |
uint32 |
repeated |
hidden units for each layer |
dropout_ratio |
float |
repeated |
ratio of dropout |
activation |
string |
optional |
activation function Default: relu |
use_bn |
bool |
optional |
use batch normalization Default: true |
use_final_bn |
bool |
optional |
Default: true |
final_activation |
string |
optional |
Default: relu |
use_bias |
bool |
optional |
Default: false |
initializer |
string |
optional |
kernel_initializer Default: he_uniform |
use_bn_after_activation |
bool |
optional |
|
use_final_bias |
bool |
optional |
Default: false |
easy_rec/python/protos/dropoutnet.proto
Top
DropoutNet
Field | Type | Label | Description |
user_content |
DNN |
required |
|
user_preference |
DNN |
required |
|
item_content |
DNN |
required |
|
item_preference |
DNN |
required |
|
user_tower |
DNN |
required |
|
item_tower |
DNN |
required |
|
l2_regularization |
float |
required |
Default: 0 |
user_dropout_rate |
float |
required |
Default: 0 |
item_dropout_rate |
float |
required |
Default: 0.5 |
softmax_loss |
SoftmaxCrossEntropyWithNegativeMining |
optional |
|
easy_rec/python/protos/dssm.proto
Top
DSSM
Field | Type | Label | Description |
user_tower |
DSSMTower |
required |
|
item_tower |
DSSMTower |
required |
|
l2_regularization |
float |
required |
Default: 0.0001 |
simi_func |
Similarity |
optional |
Default: COSINE |
scale_simi |
bool |
optional |
add a layer for scaling the similarity Default: true |
item_id |
string |
optional |
|
ignore_in_batch_neg_sam |
bool |
required |
Default: false |
DSSMTower
Field | Type | Label | Description |
id |
string |
required |
|
dnn |
DNN |
required |
|
easy_rec/python/protos/easy_rec_model.proto
Top
DummyModel
for input performance test
EasyRecModel
KD
for knowledge distillation
Field | Type | Label | Description |
loss_name |
string |
optional |
|
pred_name |
string |
required |
|
pred_is_logits |
bool |
optional |
default to be logits Default: true |
soft_label_name |
string |
required |
for CROSS_ENTROPY_LOSS, soft_label must be logits instead of probs |
label_is_logits |
bool |
optional |
default to be logits Default: true |
loss_type |
LossType |
required |
currently only support CROSS_ENTROPY_LOSS and L2_LOSS |
loss_weight |
float |
optional |
Default: 1 |
temperature |
float |
optional |
only for loss_type == CROSS_ENTROPY_LOSS Default: 1 |
ModelParams
configure backbone network common parameters
EasyRecModel.LossWeightStrategy
Name | Number | Description |
Fixed |
0 |
|
Uncertainty |
1 |
|
Random |
2 |
|
easy_rec/python/protos/esmm.proto
Top
ESMM
Field | Type | Label | Description |
groups |
Tower |
repeated |
|
ctr_tower |
TaskTower |
required |
|
cvr_tower |
TaskTower |
required |
|
l2_regularization |
float |
required |
Default: 0.0001 |
easy_rec/python/protos/eval.proto
Top
AUC
Field | Type | Label | Description |
num_thresholds |
uint32 |
optional |
Default: 200 |
Accuracy
AvgPrecisionAtTopK
Field | Type | Label | Description |
topk |
uint32 |
optional |
Default: 5 |
EvalConfig
Message for configuring EasyRecModel evaluation jobs (eval.py).
Field | Type | Label | Description |
num_examples |
uint32 |
optional |
Number of examples to process of evaluation. Default: 0 |
eval_interval_secs |
uint32 |
optional |
How often to run evaluation. Default: 300 |
max_evals |
uint32 |
optional |
Maximum number of times to run evaluation. If set to 0, will run forever. Default: 0 |
save_graph |
bool |
optional |
Whether the TensorFlow graph used for evaluation should be saved to disk. Default: false |
metrics_set |
EvalMetrics |
repeated |
Type of metrics to use for evaluation.
possible values: |
eval_online |
bool |
optional |
Evaluation online with batch forward data of training Default: false |
EvalMetrics
GAUC
Field | Type | Label | Description |
uid_field |
string |
required |
uid field name |
reduction |
string |
optional |
reduction method for auc of different users
* "mean": simple mean of different users
* "mean_by_sample_num": weighted mean with sample num of different users
* "mean_by_positive_num": weighted mean with positive sample num of different users Default: mean |
Max_F1
MeanAbsoluteError
MeanSquaredError
Precision
Recall
RecallAtTopK
Field | Type | Label | Description |
topk |
uint32 |
optional |
Default: 5 |
RootMeanSquaredError
SessionAUC
Field | Type | Label | Description |
session_id_field |
string |
required |
session id field name |
reduction |
string |
optional |
reduction: reduction method for auc of different sessions
* "mean": simple mean of different sessions
* "mean_by_sample_num": weighted mean with sample num of different sessions
* "mean_by_positive_num": weighted mean with positive sample num of different sessions Default: mean |
easy_rec/python/protos/export.proto
Top
ExportConfig
Message for configuring exporting models.
Field | Type | Label | Description |
batch_size |
int32 |
optional |
batch size used for exported model, -1 indicates batch_size is None
which is only supported by classification model right now, while
other models support static batch_size Default: -1 |
exporter_type |
string |
optional |
type of exporter [final | latest | best | none] when train_and_evaluation
final: performs a single export in the end of training
latest: regularly exports the serving graph and checkpoints
best: export the best model according to best_exporter_metric
none: do not perform export Default: final |
best_exporter_metric |
string |
optional |
the metric used to determine the best checkpoint Default: auc |
metric_bigger |
bool |
optional |
metric value the bigger the best Default: true |
enable_early_stop |
bool |
optional |
enable early stop Default: false |
early_stop_func |
string |
optional |
custom early stop function, format:
early_stop_func(eval_results, early_stop_params)
return True if should stop |
early_stop_params |
string |
optional |
custom early stop parameters |
max_check_steps |
int32 |
optional |
early stop max check steps Default: 10000 |
multi_placeholder |
bool |
optional |
each feature has a placeholder Default: true |
exports_to_keep |
int32 |
optional |
export to keep, only for exporter_type in [best, latest] Default: 1 |
multi_value_fields |
MultiValueFields |
optional |
multi value field list |
auto_multi_value |
bool |
optional |
auto analyze multi value fields Default: false |
placeholder_named_by_input |
bool |
optional |
is placeholder named by input Default: false |
filter_inputs |
bool |
optional |
filter out inputs, only keep effective ones Default: true |
export_features |
bool |
optional |
export the original feature values as string Default: false |
export_rtp_outputs |
bool |
optional |
export the outputs required by RTP Default: false |
asset_files |
string |
repeated |
export asset files |
MultiValueFields
Field | Type | Label | Description |
input_name |
string |
repeated |
|
easy_rec/python/protos/feature_config.proto
Top
AttentionCombiner
EVParams
Field | Type | Label | Description |
filter_freq |
uint64 |
optional |
Default: 0 |
steps_to_live |
uint64 |
optional |
Default: 0 |
use_cache |
bool |
optional |
use embedding cache, only for sok hybrid embedding Default: false |
init_capacity |
uint64 |
optional |
for sok hybrid key value embedding Default: 8388608 |
max_capacity |
uint64 |
optional |
Default: 16777216 |
FeatureConfig
Field | Type | Label | Description |
feature_name |
string |
optional |
|
input_names |
string |
repeated |
input field names: must be included in DatasetConfig.input_fields |
feature_type |
FeatureConfig.FeatureType |
required |
Default: IdFeature |
embedding_name |
string |
optional |
|
embedding_dim |
uint32 |
optional |
Default: 0 |
hash_bucket_size |
uint64 |
optional |
Default: 0 |
num_buckets |
uint64 |
optional |
for categorical_column_with_identity Default: 0 |
boundaries |
double |
repeated |
only for raw features |
separator |
string |
optional |
separator with in features Default: | |
kv_separator |
string |
optional |
delimeter to separator key from value |
seq_multi_sep |
string |
optional |
delimeter to separate sequence multi-values |
max_seq_len |
uint32 |
optional |
truncate sequence data to max_seq_len |
vocab_file |
string |
optional |
|
vocab_list |
string |
repeated |
|
shared_names |
string |
repeated |
many other field share this config |
lookup_max_sel_elem_num |
int32 |
optional |
lookup max select element number, default 10 Default: 10 |
max_partitions |
int32 |
optional |
max_partitions Default: 1 |
combiner |
string |
optional |
combiner Default: sum |
initializer |
Initializer |
optional |
embedding initializer |
precision |
int32 |
optional |
number of digits kept after dot in format float/double to string
scientific format is not used.
in default it is not allowed to convert float/double to string Default: -1 |
min_val |
double |
optional |
normalize raw feature to [0-1] Default: 0 |
max_val |
double |
optional |
Default: 0 |
normalizer_fn |
string |
optional |
normalization function for raw features:
such as: tf.math.log1p |
raw_input_dim |
uint32 |
optional |
raw feature of multiple dimensions Default: 1 |
sequence_combiner |
SequenceCombiner |
optional |
sequence feature combiner |
sub_feature_type |
FeatureConfig.FeatureType |
optional |
sub feature type for sequence feature Default: IdFeature |
sequence_length |
uint32 |
optional |
sequence length Default: 1 |
expression |
string |
optional |
for expr feature |
ev_params |
EVParams |
optional |
embedding variable params |
combo_join_sep |
string |
optional |
for combo feature:
if not set, use cross_column
otherwise, the input features are first joined
and then passed to categorical_column |
combo_input_seps |
string |
repeated |
separator for each inputs
if not set, combo inputs will not be split |
FeatureConfigV2
FeatureGroupConfig
Field | Type | Label | Description |
group_name |
string |
optional |
|
feature_names |
string |
repeated |
|
wide_deep |
WideOrDeep |
optional |
Default: DEEP |
sequence_features |
SeqAttGroupConfig |
repeated |
|
negative_sampler |
bool |
optional |
Default: false |
MultiHeadAttentionCombiner
SeqAttGroupConfig
Field | Type | Label | Description |
group_name |
string |
optional |
|
seq_att_map |
SeqAttMap |
repeated |
|
tf_summary |
bool |
optional |
Default: false |
seq_dnn |
DNN |
optional |
|
allow_key_search |
bool |
optional |
Default: false |
need_key_feature |
bool |
optional |
Default: true |
allow_key_transform |
bool |
optional |
Default: false |
transform_dnn |
bool |
optional |
Default: false |
SeqAttMap
Field | Type | Label | Description |
key |
string |
repeated |
|
hist_seq |
string |
repeated |
|
aux_hist_seq |
string |
repeated |
|
SequenceCombiner
TextCnnCombiner
Field | Type | Label | Description |
filter_sizes |
uint32 |
repeated |
|
num_filters |
uint32 |
repeated |
|
FeatureConfig.FeatureType
Name | Number | Description |
IdFeature |
0 |
|
RawFeature |
1 |
|
TagFeature |
2 |
|
ComboFeature |
3 |
|
LookupFeature |
4 |
|
SequenceFeature |
5 |
|
ExprFeature |
6 |
|
FeatureConfig.FieldType
Name | Number | Description |
INT32 |
0 |
|
INT64 |
1 |
|
STRING |
2 |
|
FLOAT |
4 |
|
DOUBLE |
5 |
|
BOOL |
6 |
|
WideOrDeep
Name | Number | Description |
DEEP |
0 |
|
WIDE |
1 |
|
WIDE_AND_DEEP |
2 |
|
easy_rec/python/protos/fm.proto
Top
FM
Field | Type | Label | Description |
use_variant |
bool |
optional |
|
l2_regularization |
float |
optional |
Default: 0.0001 |
easy_rec/python/protos/hive_config.proto
Top
HiveConfig
Field | Type | Label | Description |
host |
string |
required |
hive master's ip |
port |
uint32 |
required |
hive port Default: 10000 |
username |
string |
required |
hive username Default: admin |
database |
string |
required |
hive database Default: default |
table_name |
string |
required |
|
easy_rec/python/protos/hyperparams.proto
Top
ConstantInitializer
Field | Type | Label | Description |
consts |
float |
repeated |
|
GlorotNormalInitializer
Initializer
Proto with one-of field for initializers.
L1L2Regularizer
Configuration proto for L2 Regularizer.
Field | Type | Label | Description |
scale_l1 |
float |
optional |
Default: 1 |
scale_l2 |
float |
optional |
Default: 1 |
L1Regularizer
Configuration proto for L1 Regularizer.
Field | Type | Label | Description |
scale |
float |
optional |
Default: 1 |
L2Regularizer
Configuration proto for L2 Regularizer.
Field | Type | Label | Description |
scale |
float |
optional |
Default: 1 |
RandomNormalInitializer
Configuration proto for random normal initializer. See
https://www.tensorflow.org/api_docs/python/tf/random_normal_initializer
Field | Type | Label | Description |
mean |
float |
optional |
Default: 0 |
stddev |
float |
optional |
Default: 1 |
Regularizer
Proto with one-of field for regularizers.
TruncatedNormalInitializer
Configuration proto for truncated normal initializer. See
https://www.tensorflow.org/api_docs/python/tf/truncated_normal_initializer
Field | Type | Label | Description |
mean |
float |
optional |
Default: 0 |
stddev |
float |
optional |
Default: 1 |
easy_rec/python/protos/keras_layer.proto
Top
KerasLayer
easy_rec/python/protos/layer.proto
Top
AutoDisEmbedding
Field | Type | Label | Description |
embedding_dim |
uint32 |
required |
|
num_bins |
uint32 |
required |
|
keep_prob |
float |
required |
Default: 0.8 |
temperature |
float |
required |
|
output_3d_tensor |
bool |
optional |
|
output_tensor_list |
bool |
optional |
|
Bilinear
Field | Type | Label | Description |
type |
string |
required |
Default: interaction |
use_plus |
bool |
required |
Default: true |
num_output_units |
uint32 |
required |
|
FiBiNet
Field | Type | Label | Description |
bilinear |
Bilinear |
optional |
|
senet |
SENet |
required |
|
mlp |
MLP |
optional |
|
GateNN
Field | Type | Label | Description |
output_dim |
uint32 |
optional |
|
hidden_dim |
uint32 |
optional |
|
activation |
string |
optional |
activation function Default: relu |
use_bn |
bool |
optional |
use batch normalization Default: false |
dropout_rate |
float |
optional |
|
HighWayTower
Field | Type | Label | Description |
input |
string |
optional |
|
emb_size |
uint32 |
required |
|
activation |
string |
required |
Default: gelu |
dropout_rate |
float |
optional |
|
MMoELayer
Field | Type | Label | Description |
num_task |
uint32 |
required |
number of tasks |
expert_mlp |
MLP |
optional |
mmoe expert mlp layer definition |
num_expert |
uint32 |
optional |
number of mmoe experts |
MaskBlock
Field | Type | Label | Description |
reduction_factor |
float |
optional |
|
output_size |
uint32 |
optional |
|
aggregation_size |
uint32 |
optional |
|
input_layer_norm |
bool |
optional |
Default: false |
projection_dim |
uint32 |
optional |
|
MaskNet
Field | Type | Label | Description |
mask_blocks |
MaskBlock |
repeated |
|
use_parallel |
bool |
required |
Default: true |
mlp |
MLP |
optional |
|
input_layer_norm |
bool |
optional |
Default: true |
PPNet
Field | Type | Label | Description |
mlp |
MLP |
required |
|
gate_params |
GateNN |
required |
|
mode |
string |
required |
run mode: eager, lazy Default: eager |
full_gate_input |
bool |
optional |
Default: true |
PeriodicEmbedding
Field | Type | Label | Description |
embedding_dim |
uint32 |
required |
|
sigma |
float |
required |
|
add_linear_layer |
bool |
optional |
Default: true |
linear_activation |
string |
optional |
Default: relu |
output_3d_tensor |
bool |
optional |
|
output_tensor_list |
bool |
optional |
|
SENet
Field | Type | Label | Description |
reduction_ratio |
uint32 |
required |
Default: 4 |
num_squeeze_group |
uint32 |
optional |
Default: 2 |
use_skip_connection |
bool |
optional |
Default: true |
use_output_layer_norm |
bool |
optional |
Default: true |
easy_rec/python/protos/loss.proto
Top
BinaryFocalLoss
Field | Type | Label | Description |
gamma |
float |
required |
Default: 2 |
alpha |
float |
optional |
|
ohem_ratio |
float |
optional |
Default: 1 |
label_smoothing |
float |
optional |
Default: 0 |
CircleLoss
Field | Type | Label | Description |
margin |
float |
required |
Default: 0.25 |
gamma |
float |
required |
Default: 32 |
F1ReweighedLoss
Field | Type | Label | Description |
f1_beta_square |
float |
required |
Default: 1 |
label_smoothing |
float |
required |
Default: 0 |
JRCLoss
Field | Type | Label | Description |
session_name |
string |
required |
|
alpha |
float |
optional |
Default: 0.5 |
same_label_loss |
bool |
optional |
Default: true |
loss_weight_strategy |
string |
required |
Default: fixed |
Loss
MultiSimilarityLoss
Field | Type | Label | Description |
alpha |
float |
required |
Default: 2 |
beta |
float |
required |
Default: 50 |
lamb |
float |
required |
Default: 1 |
eps |
float |
required |
Default: 0.1 |
PairwiseFocalLoss
Field | Type | Label | Description |
gamma |
float |
required |
Default: 2 |
alpha |
float |
optional |
|
hinge_margin |
float |
optional |
Default: 1 |
session_name |
string |
optional |
|
ohem_ratio |
float |
optional |
Default: 1 |
temperature |
float |
optional |
Default: 1 |
PairwiseLogisticLoss
Field | Type | Label | Description |
temperature |
float |
required |
Default: 1 |
session_name |
string |
optional |
|
hinge_margin |
float |
optional |
Default: 1 |
ohem_ratio |
float |
optional |
Default: 1 |
PairwiseLoss
Field | Type | Label | Description |
margin |
float |
required |
Default: 0 |
session_name |
string |
optional |
|
temperature |
float |
optional |
Default: 1 |
SoftmaxCrossEntropyWithNegativeMining
Field | Type | Label | Description |
num_negative_samples |
uint32 |
required |
|
margin |
float |
required |
Default: 0 |
gamma |
float |
required |
Default: 1 |
coefficient_of_support_vector |
float |
required |
Default: 1 |
LossType
Name | Number | Description |
CLASSIFICATION |
0 |
|
L2_LOSS |
1 |
|
SIGMOID_L2_LOSS |
2 |
|
CROSS_ENTROPY_LOSS |
3 |
crossentropy loss/log loss |
SOFTMAX_CROSS_ENTROPY |
4 |
|
CIRCLE_LOSS |
5 |
|
MULTI_SIMILARITY_LOSS |
6 |
|
SOFTMAX_CROSS_ENTROPY_WITH_NEGATIVE_MINING |
7 |
|
PAIR_WISE_LOSS |
8 |
|
F1_REWEIGHTED_LOSS |
9 |
|
BINARY_FOCAL_LOSS |
10 |
|
PAIRWISE_FOCAL_LOSS |
11 |
|
PAIRWISE_LOGISTIC_LOSS |
12 |
|
JRC_LOSS |
13 |
|
easy_rec/python/protos/mind.proto
Top
Capsule
Field | Type | Label | Description |
max_k |
uint32 |
optional |
max number of high capsules Default: 5 |
max_seq_len |
uint32 |
required |
max behaviour sequence length |
high_dim |
uint32 |
required |
high capsule embedding vector dimension |
num_iters |
uint32 |
optional |
number EM iterations Default: 3 |
routing_logits_scale |
float |
optional |
routing logits scale Default: 20 |
routing_logits_stddev |
float |
optional |
routing logits initial stddev Default: 1 |
squash_pow |
float |
optional |
squash power Default: 1 |
scale_ratio |
float |
optional |
output ratio Default: 1 |
const_caps_num |
bool |
optional |
constant interest number
in default, use log(seq_len) Default: false |
MIND
Field | Type | Label | Description |
pre_capsule_dnn |
DNN |
optional |
preprocessing dnn before entering capsule layer |
user_dnn |
DNN |
required |
dnn layers applied on user_context(none sequence features) |
concat_dnn |
DNN |
required |
concat user and capsule dnn |
user_seq_combine |
MIND.UserSeqCombineMethod |
optional |
method to combine several user sequences
such as item_ids, category_ids Default: SUM |
item_dnn |
DNN |
required |
dnn layers applied on item features |
capsule_config |
Capsule |
required |
|
simi_pow |
float |
optional |
similarity power, the paper says that the big
the better Default: 10 |
simi_func |
Similarity |
optional |
Default: COSINE |
scale_simi |
bool |
optional |
add a layer for scaling the similarity Default: true |
l2_regularization |
float |
required |
Default: 0.0001 |
time_id_fea |
string |
optional |
|
item_id |
string |
optional |
|
ignore_in_batch_neg_sam |
bool |
optional |
Default: false |
max_interests_simi |
float |
optional |
if small than 1.0, then a loss will be added to
limit the maximal interest similarities, but
in experiments, setup such a loss leads to low hitrate. Default: 1 |
MIND.UserSeqCombineMethod
Name | Number | Description |
CONCAT |
0 |
|
SUM |
1 |
|
easy_rec/python/protos/mmoe.proto
Top
ExpertTower
Field | Type | Label | Description |
expert_name |
string |
required |
|
dnn |
DNN |
required |
|
MMoE
Field | Type | Label | Description |
experts |
ExpertTower |
repeated |
deprecated: original mmoe experts config |
expert_dnn |
DNN |
optional |
mmoe expert dnn layer definition |
num_expert |
uint32 |
optional |
number of mmoe experts Default: 0 |
task_towers |
TaskTower |
repeated |
task tower |
l2_regularization |
float |
required |
l2 regularization Default: 0.0001 |
easy_rec/python/protos/multi_tower.proto
Top
BSTTower
Field | Type | Label | Description |
input |
string |
required |
|
seq_len |
uint32 |
required |
Default: 5 |
multi_head_size |
uint32 |
required |
Default: 4 |
DINTower
Field | Type | Label | Description |
input |
string |
required |
|
dnn |
DNN |
required |
|
MultiTower
Field | Type | Label | Description |
towers |
Tower |
repeated |
|
final_dnn |
DNN |
required |
|
l2_regularization |
float |
required |
Default: 0.0001 |
din_towers |
DINTower |
repeated |
|
bst_towers |
BSTTower |
repeated |
|
easy_rec/python/protos/multi_tower_recall.proto
Top
MultiTowerRecall
Field | Type | Label | Description |
user_tower |
RecallTower |
required |
|
item_tower |
RecallTower |
required |
|
l2_regularization |
float |
required |
Default: 0.0001 |
final_dnn |
DNN |
required |
|
ignore_in_batch_neg_sam |
bool |
required |
Default: false |
RecallTower
Field | Type | Label | Description |
dnn |
DNN |
required |
|
easy_rec/python/protos/optimizer.proto
Top
AdagradOptimizer
Configuration message for the AdagradOptimizer
See: https://www.tensorflow.org/api_docs/python/tf/train/AdagradOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
initial_accumulator_value |
float |
optional |
Default: 0.1 |
AdamAsyncOptimizer
Only available on pai-tf, which has better performance than AdamOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
beta1 |
float |
optional |
Default: 0.9 |
beta2 |
float |
optional |
Default: 0.999 |
AdamAsyncWOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
weight_decay |
float |
optional |
Default: 1e-06 |
beta1 |
float |
optional |
Default: 0.9 |
beta2 |
float |
optional |
Default: 0.999 |
AdamOptimizer
Configuration message for the AdamOptimizer
See: https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
beta1 |
float |
optional |
Default: 0.9 |
beta2 |
float |
optional |
Default: 0.999 |
AdamWOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
weight_decay |
float |
optional |
Default: 1e-06 |
beta1 |
float |
optional |
Default: 0.9 |
beta2 |
float |
optional |
Default: 0.999 |
ConstantLearningRate
Configuration message for a constant learning rate.
Field | Type | Label | Description |
learning_rate |
float |
optional |
Default: 0.002 |
CosineDecayLearningRate
Configuration message for a cosine decaying learning rate as defined in
utils/learning_schedules.py
Field | Type | Label | Description |
learning_rate_base |
float |
optional |
Default: 0.002 |
total_steps |
uint32 |
optional |
Default: 4000000 |
warmup_learning_rate |
float |
optional |
Default: 0.0002 |
warmup_steps |
uint32 |
optional |
Default: 10000 |
hold_base_rate_steps |
uint32 |
optional |
Default: 0 |
ExponentialDecayLearningRate
Configuration message for an exponentially decaying learning rate.
See https://www.tensorflow.org/versions/master/api_docs/python/train/ \
decaying_the_learning_rate#exponential_decay
Field | Type | Label | Description |
initial_learning_rate |
float |
optional |
Default: 0.002 |
decay_steps |
uint32 |
optional |
Default: 4000000 |
decay_factor |
float |
optional |
Default: 0.95 |
staircase |
bool |
optional |
Default: true |
burnin_learning_rate |
float |
optional |
Default: 0 |
burnin_steps |
uint32 |
optional |
Default: 0 |
min_learning_rate |
float |
optional |
Default: 0 |
FtrlOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
optional float learning_rate = 1 [default=1e-4]; |
learning_rate_power |
float |
optional |
Default: -0.5 |
initial_accumulator_value |
float |
optional |
Default: 0.1 |
l1_reg |
float |
optional |
Default: 0 |
l2_reg |
float |
optional |
Default: 0 |
l2_shrinkage_reg |
float |
optional |
Default: 0 |
LazyAdamOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
beta1 |
float |
optional |
Default: 0.9 |
beta2 |
float |
optional |
Default: 0.999 |
LearningRate
Configuration message for optimizer learning rate.
ManualStepLearningRate
Configuration message for a manually defined learning rate schedule.
ManualStepLearningRate.LearningRateSchedule
Field | Type | Label | Description |
step |
uint32 |
optional |
|
learning_rate |
float |
optional |
Default: 0.002 |
MomentumOptimizer
Configuration message for the MomentumOptimizer
See: https://www.tensorflow.org/api_docs/python/tf/train/MomentumOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
momentum_optimizer_value |
float |
optional |
Default: 0.9 |
MomentumWOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
weight_decay |
float |
optional |
Default: 1e-06 |
momentum_optimizer_value |
float |
optional |
Default: 0.9 |
Optimizer
Top level optimizer message.
PolyDecayLearningRate
Configuration message for a poly decaying learning rate.
See https://www.tensorflow.org/api_docs/python/tf/train/polynomial_decay.
Field | Type | Label | Description |
learning_rate_base |
float |
required |
|
total_steps |
int64 |
required |
|
power |
float |
required |
|
end_learning_rate |
float |
optional |
Default: 0 |
RMSPropOptimizer
Configuration message for the RMSPropOptimizer
See: https://www.tensorflow.org/api_docs/python/tf/train/RMSPropOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
momentum_optimizer_value |
float |
optional |
Default: 0.9 |
decay |
float |
optional |
Default: 0.9 |
epsilon |
float |
optional |
Default: 1 |
Field | Type | Label | Description |
learning_rate_base |
float |
required |
|
hidden_size |
int32 |
required |
|
warmup_steps |
int32 |
required |
|
step_scaling_rate |
float |
optional |
Default: 1 |
easy_rec/python/protos/pdn.proto
Top
PDN
Field | Type | Label | Description |
user_dnn |
DNN |
required |
encode user info |
item_dnn |
DNN |
required |
encode target item info |
u2i_dnn |
DNN |
required |
encode u2i seq info |
trigger_dnn |
DNN |
required |
produce trigger score |
i2i_dnn |
DNN |
required |
encode trigger item seqs to target item co-occurance info |
sim_dnn |
DNN |
required |
produce sim score |
direct_user_dnn |
DNN |
optional |
direct net user_dnn |
direct_item_dnn |
DNN |
optional |
direct net item_dnn |
simi_func |
Similarity |
optional |
for direct net, similar to DSSM Default: COSINE |
scale_simi |
bool |
optional |
for direct net Default: true |
bias_dnn |
DNN |
optional |
bias net dnn |
item_id |
string |
optional |
|
l2_regularization |
float |
optional |
Default: 1e-06 |
easy_rec/python/protos/pipeline.proto
Top
EasyRecConfig
Field | Type | Label | Description |
train_input_path |
string |
optional |
|
kafka_train_input |
KafkaServer |
optional |
|
datahub_train_input |
DatahubServer |
optional |
|
hive_train_input |
HiveConfig |
optional |
|
binary_train_input |
BinaryDataInput |
optional |
|
parquet_train_input |
string |
optional |
|
eval_input_path |
string |
optional |
|
kafka_eval_input |
KafkaServer |
optional |
|
datahub_eval_input |
DatahubServer |
optional |
|
hive_eval_input |
HiveConfig |
optional |
|
binary_eval_input |
BinaryDataInput |
optional |
|
parquet_eval_input |
string |
optional |
|
model_dir |
string |
required |
|
train_config |
TrainConfig |
optional |
train config, including optimizer, weight decay, num_steps and so on |
eval_config |
EvalConfig |
optional |
|
data_config |
DatasetConfig |
optional |
|
feature_configs |
FeatureConfig |
repeated |
for compatibility |
feature_config |
FeatureConfigV2 |
optional |
|
model_config |
EasyRecModel |
required |
recommendation model config |
export_config |
ExportConfig |
optional |
|
fg_json_path |
string |
optional |
Json file[RTP FG] to define input data and features:
* In easy_rec.python.utils.fg_util.load_fg_json_to_config:
data_config and feature_config will be generated
based on fg_json.
* After generation, a prefix '!' is added:
fg_json_path = '!' + fg_json_path
indicates config update is already done, and should not
be updated anymore. In this way, we make load_fg_json_to_config
function reentrant.
This step is done before edit_config_json to take effect. |
easy_rec/python/protos/ple.proto
Top
Field | Type | Label | Description |
network_name |
string |
required |
|
expert_num_per_task |
uint32 |
required |
number of experts per task |
share_num |
uint32 |
optional |
number of experts for share
For the last extraction_network, no need to configure this |
task_expert_net |
DNN |
required |
dnn network of experts per task |
share_expert_net |
DNN |
optional |
dnn network of experts for share
For the last extraction_network, no need to configure this |
PLE
Field | Type | Label | Description |
extraction_networks |
ExtractionNetwork |
repeated |
extraction network |
task_towers |
TaskTower |
repeated |
task tower |
l2_regularization |
float |
optional |
l2 regularization Default: 0.0001 |
easy_rec/python/protos/predict.proto
Top
ContextFeatures
context features
Field | Type | Label | Description |
features |
PBFeature |
repeated |
|
PBFeature
PBRequest
PBRequest specifies the request for aggregator
PBRequest.ContextFeaturesEntry
PBRequest.UserFeaturesEntry
PBResponse
PBResponse specifies the response for aggregator
PBResponse.ContextFeaturesEntry
PBResponse.GenerateFeaturesEntry
PBResponse.ItemFeaturesEntry
PBResponse.RawFeaturesEntry
PBResponse.ResultsEntry
PBResponse.TfOutputsEntry
Results
return results
Field | Type | Label | Description |
scores |
double |
repeated |
|
StatusCode
Name | Number | Description |
OK |
0 |
|
INPUT_EMPTY |
1 |
|
EXCEPTION |
2 |
|
easy_rec/python/protos/rocket_launching.proto
Top
RocketLaunching
Field | Type | Label | Description |
share_dnn |
DNN |
required |
|
booster_dnn |
DNN |
required |
|
light_dnn |
DNN |
required |
|
l2_regularization |
float |
optional |
Default: 0.0001 |
feature_based_distillation |
bool |
optional |
Default: false |
feature_distillation_function |
Similarity |
optional |
COSINE = 0; EUCLID = 1; Default: COSINE |
easy_rec/python/protos/seq_encoder.proto
Top
BSTEncoder
Field | Type | Label | Description |
hidden_size |
uint32 |
required |
Size of the encoder layers and the pooler layer |
num_hidden_layers |
uint32 |
required |
Number of hidden layers in the Transformer encoder |
num_attention_heads |
uint32 |
required |
Number of attention heads for each attention layer in the Transformer encoder |
intermediate_size |
uint32 |
required |
The size of the "intermediate" (i.e. feed-forward) layer in the Transformer encoder |
hidden_act |
string |
required |
The non-linear activation function (function or string) in the encoder and pooler.
"gelu", "relu", "tanh" and "swish" are supported. Default: gelu |
hidden_dropout_prob |
float |
required |
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler Default: 0.1 |
attention_probs_dropout_prob |
float |
required |
The dropout ratio for the attention probabilities Default: 0.1 |
max_position_embeddings |
uint32 |
required |
The maximum sequence length that this model might ever be used with Default: 512 |
use_position_embeddings |
bool |
required |
Whether to add position embeddings for the position of each token in the text sequence Default: true |
initializer_range |
float |
required |
The stddev of the truncated_normal_initializer for initializing all weight matrices Default: 0.02 |
output_all_token_embeddings |
bool |
required |
Whether to output all token embedding, if set to false, then only output the first token embedding Default: true |
target_item_position |
string |
required |
The position of target item (i.e. head, tail, ignore) Default: head |
reserve_target_position |
bool |
required |
Whether to preserve a position for target Default: true |
DINEncoder
Field | Type | Label | Description |
attention_dnn |
MLP |
required |
din attention layer |
need_target_feature |
bool |
required |
whether to keep target item feature Default: true |
attention_normalizer |
string |
required |
option: softmax, sigmoid Default: softmax |
SequenceAugment
Field | Type | Label | Description |
mask_rate |
float |
required |
Percentage length of mask original sequence Default: 0.6 |
crop_rate |
float |
required |
Percentage left of crop original sequence Default: 0.2 |
reorder_rate |
float |
required |
Percentage length of reorder original sequence Default: 0.6 |
easy_rec/python/protos/simi.proto
Top
Similarity
Name | Number | Description |
COSINE |
0 |
|
INNER_PRODUCT |
1 |
|
EUCLID |
2 |
|
easy_rec/python/protos/simple_multi_task.proto
Top
SimpleMultiTask
Field | Type | Label | Description |
task_towers |
TaskTower |
repeated |
|
l2_regularization |
float |
required |
Default: 0.0001 |
easy_rec/python/protos/tf_predict.proto
Top
ArrayProto
Protocol buffer representing an array
Field | Type | Label | Description |
dtype |
ArrayDataType |
|
Data Type. |
array_shape |
ArrayShape |
|
Shape of the array. |
float_val |
float |
repeated |
DT_FLOAT. |
double_val |
double |
repeated |
DT_DOUBLE. |
int_val |
int32 |
repeated |
DT_INT32, DT_INT16, DT_INT8, DT_UINT8. |
string_val |
bytes |
repeated |
DT_STRING. |
int64_val |
int64 |
repeated |
DT_INT64. |
bool_val |
bool |
repeated |
DT_BOOL. |
ArrayShape
Dimensions of an array
Field | Type | Label | Description |
dim |
int64 |
repeated |
|
PredictRequest
PredictRequest specifies which TensorFlow model to run, as well as
how inputs are mapped to tensors and how outputs are filtered before
returning to user.
Field | Type | Label | Description |
signature_name |
string |
|
A named signature to evaluate. If unspecified, the default signature
will be used |
inputs |
PredictRequest.InputsEntry |
repeated |
Input tensors.
Names of input tensor are alias names. The mapping from aliases to real
input tensor names is expected to be stored as named generic signature
under the key "inputs" in the model export.
Each alias listed in a generic signature named "inputs" should be provided
exactly once in order to run the prediction. |
output_filter |
string |
repeated |
Output filter.
Names specified are alias names. The mapping from aliases to real output
tensor names is expected to be stored as named generic signature under
the key "outputs" in the model export.
Only tensors specified here will be run/fetched and returned, with the
exception that when none is specified, all tensors specified in the
named signature will be run/fetched and returned. |
debug_level |
int32 |
|
|
PredictRequest.InputsEntry
PredictResponse
Response for PredictRequest on successful run.
PredictResponse.OutputsEntry
ArrayDataType
Name | Number | Description |
DT_INVALID |
0 |
Not a legal value for DataType. Used to indicate a DataType field
has not been set. |
DT_FLOAT |
1 |
Data types that all computation devices are expected to be
capable to support. |
DT_DOUBLE |
2 |
|
DT_INT32 |
3 |
|
DT_UINT8 |
4 |
|
DT_INT16 |
5 |
|
DT_INT8 |
6 |
|
DT_STRING |
7 |
|
DT_COMPLEX64 |
8 |
Single-precision complex |
DT_INT64 |
9 |
|
DT_BOOL |
10 |
|
DT_QINT8 |
11 |
Quantized int8 |
DT_QUINT8 |
12 |
Quantized uint8 |
DT_QINT32 |
13 |
Quantized int32 |
DT_BFLOAT16 |
14 |
Float32 truncated to 16 bits. Only for cast ops. |
DT_QINT16 |
15 |
Quantized int16 |
DT_QUINT16 |
16 |
Quantized uint16 |
DT_UINT16 |
17 |
|
DT_COMPLEX128 |
18 |
Double-precision complex |
DT_HALF |
19 |
|
DT_RESOURCE |
20 |
|
DT_VARIANT |
21 |
Arbitrary C++ data types |
easy_rec/python/protos/tower.proto
Top
BayesTaskTower
Field | Type | Label | Description |
tower_name |
string |
required |
task name for the task tower |
label_name |
string |
optional |
label for the task, default is label_fields by order |
metrics_set |
EvalMetrics |
repeated |
metrics for the task |
loss_type |
LossType |
optional |
loss for the task Default: CLASSIFICATION |
num_class |
uint32 |
optional |
num_class for multi-class classification loss Default: 1 |
dnn |
DNN |
optional |
task specific dnn |
relation_tower_names |
string |
repeated |
related tower names |
relation_dnn |
DNN |
optional |
relation dnn |
weight |
float |
optional |
training loss weights Default: 1 |
task_space_indicator_label |
string |
optional |
label name for indcating the sample space for the task tower |
in_task_space_weight |
float |
optional |
the loss weight for sample in the task space Default: 1 |
out_task_space_weight |
float |
optional |
the loss weight for sample out the task space Default: 1 |
losses |
Loss |
repeated |
level for prediction
required uint32 prediction_level = 13;
prediction weights
optional float prediction_weight = 14 [default = 1.0];
multiple losses |
use_sample_weight |
bool |
required |
whether to use sample weight in this tower Default: true |
TaskTower
Field | Type | Label | Description |
tower_name |
string |
required |
task name for the task tower |
label_name |
string |
optional |
label for the task, default is label_fields by order |
metrics_set |
EvalMetrics |
repeated |
metrics for the task |
loss_type |
LossType |
optional |
loss for the task Default: CLASSIFICATION |
num_class |
uint32 |
optional |
num_class for multi-class classification loss Default: 1 |
dnn |
DNN |
optional |
task specific dnn |
weight |
float |
optional |
training loss weights Default: 1 |
task_space_indicator_label |
string |
optional |
label name for indicating the sample space for the task tower |
in_task_space_weight |
float |
optional |
the loss weight for sample in the task space Default: 1 |
out_task_space_weight |
float |
optional |
the loss weight for sample out the task space Default: 1 |
losses |
Loss |
repeated |
multiple losses |
use_sample_weight |
bool |
required |
whether to use sample weight in this tower Default: true |
Tower
Field | Type | Label | Description |
input |
string |
required |
|
dnn |
DNN |
required |
|
easy_rec/python/protos/train.proto
Top
IncrementSaveConfig
IncrementSaveConfig.Datahub
IncrementSaveConfig.Datahub.Consumer
Field | Type | Label | Description |
offset |
int64 |
optional |
Default: 0 |
timeout |
int32 |
optional |
Default: 600 |
IncrementSaveConfig.File
Field | Type | Label | Description |
incr_save_dir |
string |
optional |
Default: incr_save |
relative |
bool |
optional |
relative to model_dir Default: true |
mount_path |
string |
optional |
for online inference, please set the storage.mount_path to mount_path
online service will fail Default: /home/admin/docker_ml/workspace/incr_save/ |
IncrementSaveConfig.Kafka
IncrementSaveConfig.Kafka.Consumer
Field | Type | Label | Description |
config_topic |
string |
optional |
|
config_global |
string |
optional |
|
offset |
int64 |
optional |
Default: 0 |
timeout |
int32 |
optional |
Default: 600 |
TrainConfig
Message for configuring EasyRecModel training jobs (train.py).
Next id: 25
Field | Type | Label | Description |
optimizer_config |
Optimizer |
repeated |
optimizer options |
gradient_clipping_by_norm |
float |
optional |
If greater than 0, clips gradients by this value. Default: 0 |
num_steps |
uint32 |
optional |
Number of steps to train the models: if 0, will train the model
indefinitely. Default: 0 |
fine_tune_checkpoint |
string |
optional |
Checkpoint to restore variables from. |
fine_tune_ckpt_var_map |
string |
optional |
|
sync_replicas |
bool |
optional |
Whether to synchronize replicas during training.
In case so, build a SyncReplicateOptimizer Default: true |
sparse_accumulator_type |
string |
optional |
only take effect on pai-tf when sync_replicas is set,
options are:
raw, hash, multi_map, list, parallel
in general, multi_map runs faster than other options. Default: multi_map |
startup_delay_steps |
float |
optional |
Number of training steps between replica startup.
This flag must be set to 0 if sync_replicas is set to true. Default: 15 |
save_checkpoints_steps |
uint32 |
optional |
Step interval for saving checkpoint Default: 1000 |
save_checkpoints_secs |
uint32 |
optional |
Seconds interval for saving checkpoint |
keep_checkpoint_max |
uint32 |
optional |
Max checkpoints to keep Default: 10 |
save_summary_steps |
uint32 |
optional |
Save summaries every this many steps. Default: 1000 |
log_step_count_steps |
uint32 |
optional |
The frequency global step/sec and the loss will be logged during training. Default: 10 |
is_profiling |
bool |
optional |
profiling or not Default: false |
force_restore_shape_compatible |
bool |
optional |
if variable shape is incompatible, clip or pad variables in checkpoint Default: false |
train_distribute |
DistributionStrategy |
optional |
DistributionStrategy, available values are 'mirrored' and 'collective' and 'ess'
- mirrored: MirroredStrategy, single machine and multiple devices;
- collective: CollectiveAllReduceStrategy, multiple machines and multiple devices. Default: NoStrategy |
num_gpus_per_worker |
int32 |
optional |
Number of gpus per machine Default: 1 |
summary_model_vars |
bool |
optional |
summary model variables or not Default: false |
protocol |
string |
optional |
distribute training protocol [grpc++ | star_server]
grpc++: https://help.aliyun.com/document_detail/173157.html?spm=5176.10695662.1996646101.searchclickresult.3ebf450evuaPT3
star_server: https://help.aliyun.com/document_detail/173154.html?spm=a2c4g.11186623.6.627.39ad7e3342KOX4 |
inter_op_parallelism_threads |
int32 |
optional |
inter_op_parallelism_threads Default: 0 |
intra_op_parallelism_threads |
int32 |
optional |
intra_op_parallelism_threads Default: 0 |
tensor_fuse |
bool |
optional |
tensor fusion on PAI-TF Default: false |
write_graph |
bool |
optional |
write graph into graph.pbtxt and summary or not Default: true |
freeze_gradient |
string |
repeated |
match variable patterns to freeze |
incr_save_config |
IncrementSaveConfig |
optional |
increment save config |
enable_oss_stop_signal |
bool |
optional |
enable oss stop signal
stop by create OSS_STOP_SIGNAL under model_dir Default: false |
dead_line |
string |
optional |
stop training after dead_line time, format:
20220508 23:59:59 |
DistributionStrategy
Name | Number | Description |
NoStrategy |
0 |
use old SyncReplicasOptimizer for ParameterServer training |
PSStrategy |
1 |
PSStrategy with multiple gpus on one node could not work
on pai-tf, could only work on TF >=1.15 |
MirroredStrategy |
2 |
could only work on PaiTF or TF >=1.15
single worker multiple gpu mode |
CollectiveAllReduceStrategy |
3 |
Depreciated |
ExascaleStrategy |
4 |
currently not working good |
MultiWorkerMirroredStrategy |
5 |
multi worker multi gpu mode
see tf.distribute.experimental.MultiWorkerMirroredStrategy |
HorovodStrategy |
6 |
use horovod strategy |
SokStrategy |
7 |
support kv embedding, support kv embedding shard |
EmbeddingParallelStrategy |
8 |
support embedding shard, requires horovod |
easy_rec/python/protos/uniter.proto
Top
Uniter
Field | Type | Label | Description |
config |
UniterTower |
required |
|
final_dnn |
DNN |
required |
|
UniterTower
Field | Type | Label | Description |
hidden_size |
uint32 |
required |
Size of the encoder layers and the pooler layer |
num_hidden_layers |
uint32 |
required |
Number of hidden layers in the Transformer encoder |
num_attention_heads |
uint32 |
required |
Number of attention heads for each attention layer in the Transformer encoder |
intermediate_size |
uint32 |
required |
The size of the "intermediate" (i.e. feed-forward) layer in the Transformer encoder |
hidden_act |
string |
required |
The non-linear activation function (function or string) in the encoder and pooler.
"gelu", "relu", "tanh" and "swish" are supported. Default: gelu |
hidden_dropout_prob |
float |
required |
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler Default: 0.1 |
attention_probs_dropout_prob |
float |
required |
The dropout ratio for the attention probabilities Default: 0.1 |
max_position_embeddings |
uint32 |
required |
The maximum sequence length that this model might ever be used with Default: 512 |
use_position_embeddings |
bool |
required |
Whether to add position embeddings for the position of each token in the text sequence Default: true |
initializer_range |
float |
required |
The stddev of the truncated_normal_initializer for initializing all weight matrices Default: 0.02 |
other_feature_dnn |
DNN |
optional |
dnn layers for other features |
easy_rec/python/protos/variational_dropout.proto
Top
VariationalDropoutLayer
Field | Type | Label | Description |
regularization_lambda |
float |
optional |
regularization coefficient lambda Default: 0.01 |
embedding_wise_variational_dropout |
bool |
optional |
variational_dropout dimension Default: false |
easy_rec/python/protos/wide_and_deep.proto
Top
WideAndDeep
Field | Type | Label | Description |
wide_output_dim |
uint32 |
required |
Default: 1 |
dnn |
DNN |
required |
|
final_dnn |
DNN |
optional |
if set, the output of dnn and wide part are concatenated and
passed to the final_dnn; otherwise, they are summarized |
l2_regularization |
float |
optional |
Default: 0.0001 |
Scalar Value Types
.proto Type | Notes | C++ Type | Java Type | Python Type |
double |
|
double |
double |
float |
float |
|
float |
float |
float |
int32 |
Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. |
int32 |
int |
int |
int64 |
Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. |
int64 |
long |
int/long |
uint32 |
Uses variable-length encoding. |
uint32 |
int |
int/long |
uint64 |
Uses variable-length encoding. |
uint64 |
long |
int/long |
sint32 |
Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. |
int32 |
int |
int |
sint64 |
Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. |
int64 |
long |
int/long |
fixed32 |
Always four bytes. More efficient than uint32 if values are often greater than 2^28. |
uint32 |
int |
int |
fixed64 |
Always eight bytes. More efficient than uint64 if values are often greater than 2^56. |
uint64 |
long |
int/long |
sfixed32 |
Always four bytes. |
int32 |
int |
int |
sfixed64 |
Always eight bytes. |
int64 |
long |
int/long |
bool |
|
bool |
boolean |
boolean |
string |
A string must always contain UTF-8 encoded or 7-bit ASCII text. |
string |
String |
str/unicode |
bytes |
May contain any arbitrary sequence of bytes. |
string |
ByteString |
str |