Protocol Documentation
Table of Contents
easy_rec/python/protos/autoint.proto
Top
AutoInt
Field | Type | Label | Description |
multi_head_num |
uint32 |
required |
The number of heads Default: 1 |
multi_head_size |
uint32 |
required |
The dimension of heads |
interacting_layer_num |
uint32 |
required |
The number of interacting layers Default: 1 |
l2_regularization |
float |
required |
Default: 0.0001 |
easy_rec/python/protos/collaborative_metric_learning.proto
Top
CoMetricLearningI2I
easy_rec/python/protos/data_source.proto
Top
DatahubServer
KafkaServer
easy_rec/python/protos/dataset.proto
Top
DatasetConfig
Field | Type | Label | Description |
batch_size |
uint32 |
optional |
mini batch size to use for training and evaluation. Default: 32 |
auto_expand_input_fields |
bool |
optional |
set auto_expand_input_fields to true to
auto_expand field[1-21] to field1, field2, ..., field21 Default: false |
label_fields |
string |
repeated |
label fields, normally only one field is used.
For multiple target models such as MMOE
multiple label_fields will be set. |
label_sep |
string |
repeated |
label separator |
label_dim |
uint32 |
repeated |
label dimensions which need to be set when there
are labels have dimension > 1 |
shuffle |
bool |
optional |
whether to shuffle data Default: true |
shuffle_buffer_size |
int32 |
optional |
shufffle buffer for better performance, even shuffle buffer is set,
it is suggested to do full data shuffle before training
especially when the performance of models is not good. Default: 32 |
num_epochs |
uint32 |
optional |
The number of times a data source is read. If set to zero, the data source
will be reused indefinitely. Default: 0 |
prefetch_size |
uint32 |
optional |
Number of decoded batches to prefetch. Default: 32 |
shard |
bool |
optional |
shard dataset to 1/num_workers in distribute mode Default: false |
input_type |
DatasetConfig.InputType |
required |
|
separator |
string |
optional |
separator of column features, only used for CSVInput*
not used in OdpsInput*
binary separators are supported:
CTRL+A could be set as '\001'
CTRL+B could be set as '\002'
CTRL+C could be set as '\003'
for RTPInput and OdpsRTPInput it is usually set
to '\002' Default: , |
num_parallel_calls |
uint32 |
optional |
parallel preproces of raw data, avoid using too small
or too large numbers(suggested be to small than
number of the cores) Default: 8 |
selected_cols |
string |
optional |
only used for OdpsInput/OdpsInputV2/OdpsRTPInput, comma separated
for RTPInput, selected_cols use indices as column names
such as '1,2,4', where 1,2 are label columns, and
4 is the feature column, column 0,3 are not used, |
selected_col_types |
string |
optional |
selected col types, only used for OdpsInput/OdpsInputV2
to avoid error setting of data types |
input_fields |
DatasetConfig.Field |
repeated |
the input fields must be the same number and in the
same order as data in csv files or odps tables |
rtp_separator |
string |
optional |
for RTPInput only Default: ; |
ignore_error |
bool |
optional |
ignore some data errors
it is not suggested to set this parameter Default: false |
pai_worker_queue |
bool |
optional |
whether to use pai global shuffle queue, only for OdpsInput,
OdpsInputV2, OdpsRTPInputV2 Default: false |
pai_worker_slice_num |
int32 |
optional |
Default: 100 |
chief_redundant |
bool |
optional |
if true, one worker will duplicate the data of the chief node
and undertake the gradient computation of the chief node Default: false |
sample_weight |
string |
optional |
input field for sample weight |
data_compression_type |
string |
optional |
the compression type of tfrecord |
n_data_batch_tfrecord |
uint32 |
optional |
n data for one feature in tfrecord |
with_header |
bool |
optional |
for csv files, may optionally with an header
in that case, input_name must match header name,
and the number and the order of input_fields
may not be the same as that in csv files. Default: false |
negative_sampler |
NegativeSampler |
optional |
|
negative_sampler_v2 |
NegativeSamplerV2 |
optional |
|
hard_negative_sampler |
HardNegativeSampler |
optional |
|
hard_negative_sampler_v2 |
HardNegativeSamplerV2 |
optional |
|
eval_batch_size |
uint32 |
optional |
Default: 4096 |
DatasetConfig.Field
HardNegativeSampler
Weighted Random Sampling ItemID not in Batch and Sampling Hard Edge
Field | Type | Label | Description |
user_input_path |
string |
required |
user data path
userid weight |
item_input_path |
string |
required |
item data path
itemid weight attrs |
hard_neg_edge_input_path |
string |
required |
hard negative edge path
userid itemid weight |
num_sample |
uint32 |
required |
number of negative sample |
num_hard_sample |
uint32 |
required |
max number of hard negative sample |
attr_fields |
string |
repeated |
field names of attrs in train data or eval data |
item_id_field |
string |
required |
field name of item_id in train data or eval data |
user_id_field |
string |
required |
field name of user_id in train data or eval data |
attr_delimiter |
string |
optional |
Default: : |
num_eval_sample |
uint32 |
optional |
Default: 0 |
HardNegativeSamplerV2
Weighted Random Sampling ItemID not with Edge and Sampling Hard Edge
Field | Type | Label | Description |
user_input_path |
string |
required |
user data path
userid weight |
item_input_path |
string |
required |
item data path
itemid weight attrs |
pos_edge_input_path |
string |
required |
positive edge path
userid itemid weight |
hard_neg_edge_input_path |
string |
required |
hard negative edge path
userid itemid weight |
num_sample |
uint32 |
required |
number of negative sample |
num_hard_sample |
uint32 |
required |
max number of hard negative sample |
attr_fields |
string |
repeated |
field names of attrs in train data or eval data |
item_id_field |
string |
required |
field name of item_id in train data or eval data |
user_id_field |
string |
required |
field name of user_id in train data or eval data |
attr_delimiter |
string |
optional |
Default: : |
num_eval_sample |
uint32 |
optional |
Default: 0 |
NegativeSampler
Weighted Random Sampling ItemID not in Batch
Field | Type | Label | Description |
input_path |
string |
required |
sample data path
itemid weight attrs |
num_sample |
uint32 |
required |
number of negative sample |
attr_fields |
string |
repeated |
field names of attrs in train data or eval data |
item_id_field |
string |
required |
field name of item_id in train data or eval data |
attr_delimiter |
string |
optional |
Default: : |
num_eval_sample |
uint32 |
optional |
Default: 0 |
NegativeSamplerV2
Weighted Random Sampling ItemID not with Edge
Field | Type | Label | Description |
user_input_path |
string |
required |
user data path
userid weight |
item_input_path |
string |
required |
item data path
itemid weight attrs |
pos_edge_input_path |
string |
required |
positive edge path
userid itemid weight |
num_sample |
uint32 |
required |
number of negative sample |
attr_fields |
string |
repeated |
field names of attrs in train data or eval data |
item_id_field |
string |
required |
field name of item_id in train data or eval data |
user_id_field |
string |
required |
field name of user_id in train data or eval data |
attr_delimiter |
string |
optional |
Default: : |
num_eval_sample |
uint32 |
optional |
Default: 0 |
DatasetConfig.FieldType
Name | Number | Description |
INT32 |
0 |
|
INT64 |
1 |
|
STRING |
2 |
|
FLOAT |
4 |
|
DOUBLE |
5 |
|
BOOL |
6 |
|
Name | Number | Description |
CSVInput |
10 |
csv format input, could be used in local or hdfs |
CSVInputV2 |
11 |
@Depreciated |
CSVInputEx |
12 |
extended csv format, allow quote in fields |
OdpsInput |
2 |
@Depreciated, has memory leak problem |
OdpsInputV2 |
3 |
odps input, used on pai |
DataHubInput |
15 |
|
OdpsInputV3 |
9 |
|
RTPInput |
4 |
|
RTPInputV2 |
5 |
|
OdpsRTPInput |
6 |
|
OdpsRTPInputV2 |
16 |
|
TFRecordInput |
7 |
|
BatchTFRecordInput |
14 |
|
DummyInput |
8 |
for the purpose to debug performance bottleneck of
input pipelines |
KafkaInput |
13 |
|
HiveInput |
17 |
|
easy_rec/python/protos/dbmtl.proto
Top
DBMTL
Field | Type | Label | Description |
bottom_dnn |
DNN |
optional |
shared bottom dnn layer |
expert_dnn |
DNN |
optional |
mmoe expert dnn layer definition |
num_expert |
uint32 |
optional |
number of mmoe experts Default: 0 |
task_towers |
BayesTaskTower |
repeated |
bayes task tower |
l2_regularization |
float |
optional |
l2 regularization Default: 0.0001 |
easy_rec/python/protos/dcn.proto
Top
CrossTower
Field | Type | Label | Description |
input |
string |
required |
|
cross_num |
uint32 |
required |
The number of cross layers Default: 3 |
DCN
Field | Type | Label | Description |
deep_tower |
Tower |
required |
|
cross_tower |
CrossTower |
required |
|
final_dnn |
DNN |
required |
|
l2_regularization |
float |
required |
Default: 0.0001 |
easy_rec/python/protos/deepfm.proto
Top
DeepFM
Field | Type | Label | Description |
dnn |
DNN |
required |
|
final_dnn |
DNN |
optional |
|
wide_output_dim |
uint32 |
optional |
Default: 1 |
wide_regularization |
float |
optional |
deprecated Default: 0.0001 |
dense_regularization |
float |
optional |
deprecated Default: 0.0001 |
l2_regularization |
float |
optional |
Default: 0.0001 |
easy_rec/python/protos/dlrm.proto
Top
DLRM
Field | Type | Label | Description |
top_dnn |
DNN |
required |
|
bot_dnn |
DNN |
required |
|
arch_interaction_op |
string |
optional |
options are: dot and cat Default: dot |
arch_interaction_itself |
bool |
optional |
whether a feature will interact with itself Default: false |
arch_with_dense_feature |
bool |
optional |
whether to include dense features after interaction Default: false |
l2_regularization |
float |
optional |
Default: 1e-05 |
easy_rec/python/protos/dnn.proto
Top
DNN
Field | Type | Label | Description |
hidden_units |
uint32 |
repeated |
hidden units for each layer |
dropout_ratio |
float |
repeated |
ratio of dropout |
activation |
string |
optional |
activation function Default: tf.nn.relu |
use_bn |
bool |
optional |
use batch normalization Default: true |
easy_rec/python/protos/dropoutnet.proto
Top
DropoutNet
Field | Type | Label | Description |
user_content |
DNN |
required |
|
user_preference |
DNN |
required |
|
item_content |
DNN |
required |
|
item_preference |
DNN |
required |
|
user_tower |
DNN |
required |
|
item_tower |
DNN |
required |
|
l2_regularization |
float |
required |
Default: 0 |
user_dropout_rate |
float |
required |
Default: 0 |
item_dropout_rate |
float |
required |
Default: 0.5 |
softmax_loss |
SoftmaxCrossEntropyWithNegativeMining |
optional |
|
easy_rec/python/protos/dssm.proto
Top
DSSM
Field | Type | Label | Description |
user_tower |
DSSMTower |
required |
|
item_tower |
DSSMTower |
required |
|
l2_regularization |
float |
required |
Default: 0.0001 |
simi_func |
Similarity |
optional |
Default: COSINE |
scale_simi |
bool |
optional |
add a layer for scaling the similarity Default: true |
DSSMTower
Field | Type | Label | Description |
id |
string |
required |
|
dnn |
DNN |
required |
|
easy_rec/python/protos/eas_serving.proto
Top
Config
Field | Type | Label | Description |
column_delim |
string |
|
例如输入特征为"1005,109;0;93eaba74",此时分号分割的为column,
逗号分割的为每个column的多个feature, 下划线分割为feature名字和对应的value。 |
feature_delim |
string |
|
|
hash |
string |
|
指定字符串hash分桶的算法,支持HarmHash(对应于tf.strings.to_hash_bucket_fast())
和SipHash(对应于tf.strings.to_hash_bucket_strong())两种字符串hash分桶算法 |
embeddings |
Config.EmbeddingsEntry |
repeated |
embedding_name to embedding |
embedding_max_norm |
Config.EmbeddingMaxNormEntry |
repeated |
指定embedding lookup的结果的最大L2-norm |
embedding_combiner |
Config.EmbeddingCombinerEntry |
repeated |
指定embedding的combiner策略,支持sum, mean和sqrtn |
model |
Model |
|
|
Config.EmbeddingCombinerEntry
Config.EmbeddingMaxNormEntry
Config.EmbeddingsEntry
Embedding
Field | Type | Label | Description |
partition_num |
int32 |
|
指定该embedding切分的总数 |
parts |
EmbeddingPart |
repeated |
|
EmbeddingPart
Field | Type | Label | Description |
embedding_part_path |
string |
|
指定EmbeddingPartData(*.pb)所在的路径 |
partition_id |
int32 |
|
指定该embedding part所属第几个part |
shape |
int64 |
repeated |
指定该embedding part的shape(可以从EmbeddingPartData中读取) |
deploy_strategy |
string |
|
embedding part的部署策略, 支持本地部署(local)和远程部署(remote) |
EmbeddingPartData
Field | Type | Label | Description |
shape |
int64 |
repeated |
Shape of the embedding |
data |
float |
repeated |
Data |
Model
Field | Type | Label | Description |
model_path |
string |
|
指定模型所在路径,便于加载模型 |
model_signature_name |
string |
|
指定模型的sinature的名字 |
model_inputs |
ModelInput |
repeated |
model input description |
easy_rec/python/protos/easy_rec_model.proto
Top
DummyModel
for input performance test
EasyRecModel
Field | Type | Label | Description |
model_class |
string |
required |
|
feature_groups |
FeatureGroupConfig |
repeated |
actually input layers, each layer produce a group of feature |
dummy |
DummyModel |
optional |
|
wide_and_deep |
WideAndDeep |
optional |
|
deepfm |
DeepFM |
optional |
|
multi_tower |
MultiTower |
optional |
|
fm |
FM |
optional |
|
dcn |
DCN |
optional |
|
autoint |
AutoInt |
optional |
|
dlrm |
DLRM |
optional |
|
dssm |
DSSM |
optional |
|
mind |
MIND |
optional |
|
dropoutnet |
DropoutNet |
optional |
|
metric_learning |
CoMetricLearningI2I |
optional |
|
mmoe |
MMoE |
optional |
|
esmm |
ESMM |
optional |
|
dbmtl |
DBMTL |
optional |
|
simple_multi_task |
SimpleMultiTask |
optional |
|
ple |
PLE |
optional |
|
rocket_launching |
RocketLaunching |
optional |
|
seq_att_groups |
SeqAttGroupConfig |
repeated |
|
embedding_regularization |
float |
optional |
implemented in easy_rec/python/model/easy_rec_estimator
add regularization to all variables with "embedding_weights:"
in name Default: 0 |
loss_type |
LossType |
optional |
Default: CLASSIFICATION |
num_class |
uint32 |
optional |
Default: 1 |
use_embedding_variable |
bool |
optional |
Default: false |
kd |
KD |
repeated |
|
restore_filters |
string |
repeated |
filter variables matching any pattern in restore_filters
common filters are Adam, Momentum, etc. |
variational_dropout |
VariationalDropoutLayer |
optional |
|
losses |
Loss |
repeated |
|
KD
for knowledge distillation
Field | Type | Label | Description |
loss_name |
string |
optional |
|
pred_name |
string |
required |
|
pred_is_logits |
bool |
optional |
default to be logits Default: true |
soft_label_name |
string |
required |
for CROSS_ENTROPY_LOSS, soft_label must be logits instead of probs |
label_is_logits |
bool |
optional |
default to be logits Default: true |
loss_type |
LossType |
required |
currently only support CROSS_ENTROPY_LOSS and L2_LOSS |
loss_weight |
float |
optional |
Default: 1 |
temperature |
float |
optional |
only for loss_type == CROSS_ENTROPY_LOSS Default: 1 |
easy_rec/python/protos/esmm.proto
Top
ESMM
Field | Type | Label | Description |
groups |
Tower |
repeated |
|
ctr_tower |
TaskTower |
required |
|
cvr_tower |
TaskTower |
required |
|
l2_regularization |
float |
required |
Default: 0.0001 |
easy_rec/python/protos/eval.proto
Top
AUC
Field | Type | Label | Description |
num_thresholds |
uint32 |
optional |
Default: 200 |
Accuracy
AvgPrecisionAtTopK
Field | Type | Label | Description |
topk |
uint32 |
optional |
Default: 5 |
EvalConfig
Message for configuring EasyRecModel evaluation jobs (eval.py).
Field | Type | Label | Description |
num_examples |
uint32 |
optional |
Number of examples to process of evaluation. Default: 0 |
eval_interval_secs |
uint32 |
optional |
How often to run evaluation. Default: 300 |
max_evals |
uint32 |
optional |
Maximum number of times to run evaluation. If set to 0, will run forever. Default: 0 |
save_graph |
bool |
optional |
Whether the TensorFlow graph used for evaluation should be saved to disk. Default: false |
metrics_set |
EvalMetrics |
repeated |
Type of metrics to use for evaluation.
possible values: |
eval_online |
bool |
optional |
Evaluation online with batch forward data of training Default: false |
EvalMetrics
GAUC
Field | Type | Label | Description |
uid_field |
string |
required |
uid field name |
reduction |
string |
optional |
reduction method for auc of different users
* "mean": simple mean of different users
* "mean_by_sample_num": weighted mean with sample num of different users
* "mean_by_positive_num": weighted mean with positive sample num of different users Default: mean |
Max_F1
MeanAbsoluteError
MeanSquaredError
Precision
Recall
RecallAtTopK
Field | Type | Label | Description |
topk |
uint32 |
optional |
Default: 5 |
RootMeanSquaredError
SessionAUC
Field | Type | Label | Description |
session_id_field |
string |
required |
session id field name |
reduction |
string |
optional |
reduction: reduction method for auc of different sessions
* "mean": simple mean of different sessions
* "mean_by_sample_num": weighted mean with sample num of different sessions
* "mean_by_positive_num": weighted mean with positive sample num of different sessions Default: mean |
easy_rec/python/protos/export.proto
Top
ExportConfig
Message for configuring exporting models.
Field | Type | Label | Description |
batch_size |
int32 |
optional |
batch size used for exported model, -1 indicates batch_size is None
which is only supported by classification model right now, while
other models support static batch_size Default: -1 |
exporter_type |
string |
optional |
type of exporter [final | latest | best | none] when train_and_evaluation
final: performs a single export in the end of training
latest: regularly exports the serving graph and checkpoints
latest: export the best model according to best_exporter_metric
none: do not perform export Default: final |
best_exporter_metric |
string |
optional |
the metric used to determine the best checkpoint Default: auc |
metric_bigger |
bool |
optional |
metric value the bigger the best Default: true |
enable_early_stop |
bool |
optional |
enable early stop Default: false |
early_stop_func |
string |
optional |
custom early stop function, format:
early_stop_func(eval_results, early_stop_params)
return True if should stop |
early_stop_params |
string |
optional |
custom early stop parameters |
max_check_steps |
int32 |
optional |
early stop max check steps Default: 10000 |
multi_placeholder |
bool |
optional |
each feature has a placeholder Default: true |
exports_to_keep |
int32 |
optional |
export to keep, only for exporter_type in [best, latest] Default: 1 |
multi_value_fields |
MultiValueFields |
optional |
multi value field list |
placeholder_named_by_input |
bool |
optional |
is placeholder named by input Default: false |
filter_inputs |
bool |
optional |
filter out inputs, only keep effective ones Default: true |
export_features |
bool |
optional |
export the original feature values as string Default: false |
export_rtp_outputs |
bool |
optional |
export the outputs required by RTP Default: false |
MultiValueFields
Field | Type | Label | Description |
input_name |
string |
repeated |
|
easy_rec/python/protos/feature_config.proto
Top
AttentionCombiner
FeatureConfig
Field | Type | Label | Description |
feature_name |
string |
optional |
|
input_names |
string |
repeated |
input field names: must be included in DatasetConfig.input_fields |
feature_type |
FeatureConfig.FeatureType |
required |
Default: IdFeature |
embedding_name |
string |
optional |
|
embedding_dim |
uint32 |
optional |
Default: 0 |
hash_bucket_size |
uint64 |
optional |
Default: 0 |
num_buckets |
uint64 |
optional |
for categorical_column_with_identity Default: 0 |
boundaries |
double |
repeated |
only for raw features |
separator |
string |
optional |
separator with in features Default: | |
kv_separator |
string |
optional |
delimeter to separator key from value |
seq_multi_sep |
string |
optional |
delimeter to separate sequence multi-values |
vocab_file |
string |
optional |
|
vocab_list |
string |
repeated |
|
shared_names |
string |
repeated |
many other field share this config |
lookup_max_sel_elem_num |
int32 |
optional |
lookup max select element number, default 10 Default: 10 |
max_partitions |
int32 |
optional |
max_partitions Default: 1 |
combiner |
string |
optional |
combiner Default: mean |
initializer |
Initializer |
optional |
embedding initializer |
precision |
int32 |
optional |
number of digits kept after dot in format float/double to string
scientific format is not used.
in default it is not allowed to convert float/double to string Default: -1 |
min_val |
double |
optional |
normalize raw feature to [0-1] Default: 0 |
max_val |
double |
optional |
Default: 0 |
raw_input_dim |
uint32 |
optional |
raw feature of multiple dimensions Default: 1 |
sequence_combiner |
SequenceCombiner |
optional |
sequence feature combiner |
sub_feature_type |
FeatureConfig.FeatureType |
optional |
sub feature type for sequence feature Default: IdFeature |
sequence_length |
uint32 |
optional |
sequence length Default: 1 |
expression |
string |
optional |
for expr feature |
FeatureConfigV2
FeatureGroupConfig
MultiHeadAttentionCombiner
SeqAttGroupConfig
Field | Type | Label | Description |
group_name |
string |
optional |
|
seq_att_map |
SeqAttMap |
repeated |
|
tf_summary |
bool |
optional |
Default: false |
seq_dnn |
DNN |
optional |
|
allow_key_search |
bool |
optional |
Default: false |
SeqAttMap
Field | Type | Label | Description |
key |
string |
repeated |
|
hist_seq |
string |
repeated |
|
SequenceCombiner
TextCnnCombiner
Field | Type | Label | Description |
filter_sizes |
uint32 |
repeated |
|
num_filters |
uint32 |
repeated |
|
FeatureConfig.FeatureType
Name | Number | Description |
IdFeature |
0 |
|
RawFeature |
1 |
|
TagFeature |
2 |
|
ComboFeature |
3 |
|
LookupFeature |
4 |
|
SequenceFeature |
5 |
|
ExprFeature |
6 |
|
FeatureConfig.FieldType
Name | Number | Description |
INT32 |
0 |
|
INT64 |
1 |
|
STRING |
2 |
|
FLOAT |
4 |
|
DOUBLE |
5 |
|
BOOL |
6 |
|
WideOrDeep
Name | Number | Description |
DEEP |
0 |
|
WIDE |
1 |
|
WIDE_AND_DEEP |
2 |
|
easy_rec/python/protos/fm.proto
Top
FM
Field | Type | Label | Description |
l2_regularization |
float |
optional |
Default: 0.0001 |
easy_rec/python/protos/hive_config.proto
Top
HiveConfig
Field | Type | Label | Description |
host |
string |
required |
hive master's ip |
port |
uint32 |
required |
hive port Default: 10000 |
username |
string |
required |
hive username |
database |
string |
required |
hive database Default: default |
table_name |
string |
required |
|
hash_fields |
string |
required |
|
limit_num |
uint32 |
optional |
Default: 0 |
fetch_size |
uint32 |
required |
Default: 512 |
easy_rec/python/protos/hyperparams.proto
Top
ConstantInitializer
Field | Type | Label | Description |
consts |
float |
repeated |
|
GlorotNormalInitializer
Initializer
Proto with one-of field for initializers.
L1L2Regularizer
Configuration proto for L2 Regularizer.
Field | Type | Label | Description |
scale_l1 |
float |
optional |
Default: 1 |
scale_l2 |
float |
optional |
Default: 1 |
L1Regularizer
Configuration proto for L1 Regularizer.
Field | Type | Label | Description |
scale |
float |
optional |
Default: 1 |
L2Regularizer
Configuration proto for L2 Regularizer.
Field | Type | Label | Description |
scale |
float |
optional |
Default: 1 |
RandomNormalInitializer
Configuration proto for random normal initializer. See
https://www.tensorflow.org/api_docs/python/tf/random_normal_initializer
Field | Type | Label | Description |
mean |
float |
optional |
Default: 0 |
stddev |
float |
optional |
Default: 1 |
Regularizer
Proto with one-of field for regularizers.
TruncatedNormalInitializer
Configuration proto for truncated normal initializer. See
https://www.tensorflow.org/api_docs/python/tf/truncated_normal_initializer
Field | Type | Label | Description |
mean |
float |
optional |
Default: 0 |
stddev |
float |
optional |
Default: 1 |
easy_rec/python/protos/layer.proto
Top
HighWayTower
Field | Type | Label | Description |
input |
string |
required |
|
emb_size |
uint32 |
required |
|
easy_rec/python/protos/loss.proto
Top
CircleLoss
Field | Type | Label | Description |
margin |
float |
required |
Default: 0.25 |
gamma |
float |
required |
Default: 32 |
Loss
Field | Type | Label | Description |
loss_type |
LossType |
required |
|
weight |
float |
required |
Default: 1 |
MultiSimilarityLoss
Field | Type | Label | Description |
alpha |
float |
required |
Default: 2 |
beta |
float |
required |
Default: 50 |
lamb |
float |
required |
Default: 1 |
eps |
float |
required |
Default: 0.1 |
SoftmaxCrossEntropyWithNegativeMining
Field | Type | Label | Description |
num_negative_samples |
uint32 |
required |
|
margin |
float |
required |
Default: 0 |
gamma |
float |
required |
Default: 1 |
coefficient_of_support_vector |
float |
required |
Default: 1 |
LossType
Name | Number | Description |
CLASSIFICATION |
0 |
|
L2_LOSS |
1 |
|
SIGMOID_L2_LOSS |
2 |
|
CROSS_ENTROPY_LOSS |
3 |
crossentropy loss/log loss |
SOFTMAX_CROSS_ENTROPY |
4 |
|
CIRCLE_LOSS |
5 |
|
MULTI_SIMILARITY_LOSS |
6 |
|
SOFTMAX_CROSS_ENTROPY_WITH_NEGATIVE_MINING |
7 |
|
PAIR_WISE_LOSS |
8 |
|
easy_rec/python/protos/mind.proto
Top
Capsule
Field | Type | Label | Description |
max_k |
uint32 |
optional |
max number of high capsules Default: 5 |
max_seq_len |
uint32 |
required |
max behaviour sequence length |
high_dim |
uint32 |
required |
high capsule embedding vector dimension |
num_iters |
uint32 |
optional |
number EM iterations Default: 3 |
routing_logits_scale |
float |
optional |
routing logits scale Default: 20 |
routing_logits_stddev |
float |
optional |
routing logits initial stddev Default: 1 |
MIND
Field | Type | Label | Description |
pre_capsule_dnn |
DNN |
optional |
preprocessing dnn before entering capsule layer |
user_dnn |
DNN |
required |
dnn layers applied on concated results of
capsule output and user_context(none sequence features) |
user_seq_combine |
MIND.UserSeqCombineMethod |
optional |
method to combine several user sequences
such as item_ids, category_ids Default: SUM |
item_dnn |
DNN |
required |
dnn layers applied on item features |
capsule_config |
Capsule |
required |
|
simi_pow |
float |
optional |
similarity power, the paper says that the big
the better Default: 10 |
simi_func |
Similarity |
optional |
Default: COSINE |
l2_regularization |
float |
required |
Default: 0.0001 |
MIND.UserSeqCombineMethod
Name | Number | Description |
CONCAT |
0 |
|
SUM |
1 |
|
easy_rec/python/protos/mmoe.proto
Top
ExpertTower
Field | Type | Label | Description |
expert_name |
string |
required |
|
dnn |
DNN |
required |
|
MMoE
Field | Type | Label | Description |
experts |
ExpertTower |
repeated |
deprecated: original mmoe experts config |
expert_dnn |
DNN |
optional |
mmoe expert dnn layer definition |
num_expert |
uint32 |
optional |
number of mmoe experts Default: 0 |
task_towers |
TaskTower |
repeated |
task tower |
l2_regularization |
float |
required |
l2 regularization Default: 0.0001 |
easy_rec/python/protos/multi_tower.proto
Top
BSTTower
Field | Type | Label | Description |
input |
string |
required |
|
seq_len |
uint32 |
required |
Default: 5 |
multi_head_size |
uint32 |
required |
Default: 4 |
DINTower
Field | Type | Label | Description |
input |
string |
required |
|
dnn |
DNN |
required |
|
MultiTower
Field | Type | Label | Description |
towers |
Tower |
repeated |
|
final_dnn |
DNN |
required |
|
l2_regularization |
float |
required |
Default: 0.0001 |
din_towers |
DINTower |
repeated |
|
bst_towers |
BSTTower |
repeated |
|
easy_rec/python/protos/optimizer.proto
Top
AdagradOptimizer
Configuration message for the AdagradOptimizer
See: https://www.tensorflow.org/api_docs/python/tf/train/AdagradOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
AdamAsyncOptimizer
Only available on pai-tf, which has better performance than AdamOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
beta1 |
float |
optional |
Default: 0.9 |
beta2 |
float |
optional |
Default: 0.999 |
AdamAsyncWOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
weight_decay |
float |
optional |
Default: 1e-06 |
beta1 |
float |
optional |
Default: 0.9 |
beta2 |
float |
optional |
Default: 0.999 |
AdamOptimizer
Configuration message for the AdamOptimizer
See: https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
beta1 |
float |
optional |
Default: 0.9 |
beta2 |
float |
optional |
Default: 0.999 |
AdamWOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
weight_decay |
float |
optional |
Default: 1e-06 |
beta1 |
float |
optional |
Default: 0.9 |
beta2 |
float |
optional |
Default: 0.999 |
ConstantLearningRate
Configuration message for a constant learning rate.
Field | Type | Label | Description |
learning_rate |
float |
optional |
Default: 0.002 |
CosineDecayLearningRate
Configuration message for a cosine decaying learning rate as defined in
utils/learning_schedules.py
Field | Type | Label | Description |
learning_rate_base |
float |
optional |
Default: 0.002 |
total_steps |
uint32 |
optional |
Default: 4000000 |
warmup_learning_rate |
float |
optional |
Default: 0.0002 |
warmup_steps |
uint32 |
optional |
Default: 10000 |
hold_base_rate_steps |
uint32 |
optional |
Default: 0 |
ExponentialDecayLearningRate
Configuration message for an exponentially decaying learning rate.
See https://www.tensorflow.org/versions/master/api_docs/python/train/ \
decaying_the_learning_rate#exponential_decay
Field | Type | Label | Description |
initial_learning_rate |
float |
optional |
Default: 0.002 |
decay_steps |
uint32 |
optional |
Default: 4000000 |
decay_factor |
float |
optional |
Default: 0.95 |
staircase |
bool |
optional |
Default: true |
burnin_learning_rate |
float |
optional |
Default: 0 |
burnin_steps |
uint32 |
optional |
Default: 0 |
min_learning_rate |
float |
optional |
Default: 0 |
FtrlOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
optional float learning_rate = 1 [default=1e-4]; |
learning_rate_power |
float |
optional |
Default: -0.5 |
initial_accumulator_value |
float |
optional |
Default: 0.1 |
l1_reg |
float |
optional |
Default: 0 |
l2_reg |
float |
optional |
Default: 0 |
l2_shrinkage_reg |
float |
optional |
Default: 0 |
LearningRate
Configuration message for optimizer learning rate.
ManualStepLearningRate
Configuration message for a manually defined learning rate schedule.
ManualStepLearningRate.LearningRateSchedule
Field | Type | Label | Description |
step |
uint32 |
optional |
|
learning_rate |
float |
optional |
Default: 0.002 |
MomentumOptimizer
Configuration message for the MomentumOptimizer
See: https://www.tensorflow.org/api_docs/python/tf/train/MomentumOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
momentum_optimizer_value |
float |
optional |
Default: 0.9 |
MomentumWOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
weight_decay |
float |
optional |
Default: 1e-06 |
momentum_optimizer_value |
float |
optional |
Default: 0.9 |
Optimizer
Top level optimizer message.
PolyDecayLearningRate
Configuration message for a poly decaying learning rate.
See https://www.tensorflow.org/api_docs/python/tf/train/polynomial_decay.
Field | Type | Label | Description |
learning_rate_base |
float |
required |
|
total_steps |
int64 |
required |
|
power |
float |
required |
|
end_learning_rate |
float |
optional |
Default: 0 |
RMSPropOptimizer
Configuration message for the RMSPropOptimizer
See: https://www.tensorflow.org/api_docs/python/tf/train/RMSPropOptimizer
Field | Type | Label | Description |
learning_rate |
LearningRate |
optional |
|
momentum_optimizer_value |
float |
optional |
Default: 0.9 |
decay |
float |
optional |
Default: 0.9 |
epsilon |
float |
optional |
Default: 1 |
Field | Type | Label | Description |
learning_rate_base |
float |
required |
|
hidden_size |
int32 |
required |
|
warmup_steps |
int32 |
required |
|
step_scaling_rate |
float |
optional |
Default: 1 |
easy_rec/python/protos/pipeline.proto
Top
EasyRecConfig
easy_rec/python/protos/ple.proto
Top
Field | Type | Label | Description |
network_name |
string |
required |
|
expert_num_per_task |
uint32 |
required |
number of experts per task |
share_num |
uint32 |
optional |
number of experts for share
For the last extraction_network, no need to configure this |
task_expert_net |
DNN |
required |
dnn network of experts per task |
share_expert_net |
DNN |
optional |
dnn network of experts for share
For the last extraction_network, no need to configure this |
PLE
Field | Type | Label | Description |
extraction_networks |
ExtractionNetwork |
repeated |
extraction network |
task_towers |
TaskTower |
repeated |
task tower |
l2_regularization |
float |
optional |
l2 regularization Default: 0.0001 |
easy_rec/python/protos/rocket_launching.proto
Top
RocketLaunching
Field | Type | Label | Description |
share_dnn |
DNN |
required |
|
booster_dnn |
DNN |
required |
|
light_dnn |
DNN |
required |
|
l2_regularization |
float |
optional |
Default: 0.0001 |
feature_based_distillation |
bool |
optional |
Default: false |
feature_distillation_function |
Similarity |
optional |
COSINE = 0; EUCLID = 1; Default: COSINE |
easy_rec/python/protos/simi.proto
Top
Similarity
Name | Number | Description |
COSINE |
0 |
|
INNER_PRODUCT |
1 |
|
EUCLID |
2 |
|
easy_rec/python/protos/simple_multi_task.proto
Top
SimpleMultiTask
Field | Type | Label | Description |
task_towers |
TaskTower |
repeated |
|
l2_regularization |
float |
required |
Default: 0.0001 |
easy_rec/python/protos/tower.proto
Top
BayesTaskTower
Field | Type | Label | Description |
tower_name |
string |
required |
task name for the task tower |
label_name |
string |
optional |
label for the task, default is label_fields by order |
metrics_set |
EvalMetrics |
repeated |
metrics for the task |
loss_type |
LossType |
optional |
loss for the task Default: CLASSIFICATION |
num_class |
uint32 |
optional |
num_class for multi-class classification loss Default: 1 |
dnn |
DNN |
optional |
task specific dnn |
relation_tower_names |
string |
repeated |
related tower names |
relation_dnn |
DNN |
optional |
relation dnn |
weight |
float |
optional |
training loss weights Default: 1 |
task_space_indicator_label |
string |
optional |
label name for indcating the sample space for the task tower |
in_task_space_weight |
float |
optional |
the loss weight for sample in the task space Default: 1 |
out_task_space_weight |
float |
optional |
the loss weight for sample out the task space
level for prediction
required uint32 prediction_level = 13;
prediction weights
optional float prediction_weight = 14 [default = 1.0]; Default: 1 |
TaskTower
Field | Type | Label | Description |
tower_name |
string |
required |
task name for the task tower |
label_name |
string |
optional |
label for the task, default is label_fields by order |
metrics_set |
EvalMetrics |
repeated |
metrics for the task |
loss_type |
LossType |
optional |
loss for the task Default: CLASSIFICATION |
num_class |
uint32 |
optional |
num_class for multi-class classification loss Default: 1 |
dnn |
DNN |
optional |
task specific dnn |
weight |
float |
optional |
training loss weights Default: 1 |
task_space_indicator_label |
string |
optional |
label name for indcating the sample space for the task tower |
in_task_space_weight |
float |
optional |
the loss weight for sample in the task space Default: 1 |
out_task_space_weight |
float |
optional |
the loss weight for sample out the task space Default: 1 |
Tower
Field | Type | Label | Description |
input |
string |
required |
|
dnn |
DNN |
required |
|
easy_rec/python/protos/train.proto
Top
TrainConfig
Message for configuring EasyRecModel training jobs (train.py).
Next id: 25
Field | Type | Label | Description |
optimizer_config |
Optimizer |
repeated |
optimizer options |
gradient_clipping_by_norm |
float |
optional |
If greater than 0, clips gradients by this value. Default: 0 |
num_steps |
uint32 |
optional |
Number of steps to train the models: if 0, will train the model
indefinitely. Default: 0 |
fine_tune_checkpoint |
string |
optional |
Checkpoint to restore variables from. |
fine_tune_ckpt_var_map |
string |
optional |
|
sync_replicas |
bool |
optional |
Whether to synchronize replicas during training.
In case so, build a SyncReplicateOptimizer Default: true |
sparse_accumulator_type |
string |
optional |
only take effect on pai-tf when sync_replicas is set,
options are:
raw, hash, multi_map, list, parallel
in general, multi_map runs faster than other options. Default: multi_map |
startup_delay_steps |
float |
optional |
Number of training steps between replica startup.
This flag must be set to 0 if sync_replicas is set to true. Default: 15 |
save_checkpoints_steps |
uint32 |
optional |
Step interval for saving checkpoint Default: 1000 |
save_checkpoints_secs |
uint32 |
optional |
Seconds interval for saving checkpoint |
keep_checkpoint_max |
uint32 |
optional |
Max checkpoints to keep Default: 10 |
save_summary_steps |
uint32 |
optional |
Save summaries every this many steps. Default: 1000 |
log_step_count_steps |
uint32 |
optional |
The frequency global step/sec and the loss will be logged during training. Default: 10 |
is_profiling |
bool |
optional |
profiling or not Default: false |
force_restore_shape_compatible |
bool |
optional |
if variable shape is incompatible, clip or pad variables in checkpoint Default: false |
train_distribute |
DistributionStrategy |
optional |
DistributionStrategy, available values are 'mirrored' and 'collective' and 'ess'
- mirrored: MirroredStrategy, single machine and multiple devices;
- collective: CollectiveAllReduceStrategy, multiple machines and multiple devices. Default: NoStrategy |
num_gpus_per_worker |
int32 |
optional |
Number of gpus per machine Default: 1 |
summary_model_vars |
bool |
optional |
summary model variables or not Default: false |
protocol |
string |
optional |
distribute training protocol [grpc++ | star_server]
grpc++: https://help.aliyun.com/document_detail/173157.html?spm=5176.10695662.1996646101.searchclickresult.3ebf450evuaPT3
star_server: https://help.aliyun.com/document_detail/173154.html?spm=a2c4g.11186623.6.627.39ad7e3342KOX4 |
inter_op_parallelism_threads |
int32 |
optional |
inter_op_parallelism_threads Default: 0 |
intra_op_parallelism_threads |
int32 |
optional |
intra_op_parallelism_threads Default: 0 |
tensor_fuse |
bool |
optional |
tensor fusion on PAI-TF Default: false |
write_graph |
bool |
optional |
write graph into graph.pbtxt and summary or not Default: true |
freeze_gradient |
string |
repeated |
match variable patterns to freeze |
DistributionStrategy
Name | Number | Description |
NoStrategy |
0 |
use old SyncReplicasOptimizer for ParameterServer training |
PSStrategy |
1 |
PSStrategy with multiple gpus on one node could not work
on pai-tf, could only work on TF >=1.15 |
MirroredStrategy |
2 |
could only work on PaiTF or TF >=1.15
single worker multiple gpu mode |
CollectiveAllReduceStrategy |
3 |
Depreciated |
ExascaleStrategy |
4 |
currently not working good |
MultiWorkerMirroredStrategy |
5 |
multi worker multi gpu mode
see tf.distribute.experimental.MultiWorkerMirroredStrategy |
easy_rec/python/protos/variational_dropout.proto
Top
VariationalDropoutLayer
Field | Type | Label | Description |
regularization_lambda |
float |
optional |
regularization coefficient lambda Default: 0.01 |
embedding_wise_variational_dropout |
bool |
optional |
variational_dropout dimension Default: false |
easy_rec/python/protos/wide_and_deep.proto
Top
WideAndDeep
Field | Type | Label | Description |
wide_output_dim |
uint32 |
required |
Default: 1 |
dnn |
DNN |
required |
|
final_dnn |
DNN |
optional |
if set, the output of dnn and wide part are concatenated and
passed to the final_dnn; otherwise, they are summarized |
l2_regularization |
float |
optional |
Default: 0.0001 |
Scalar Value Types
.proto Type | Notes | C++ Type | Java Type | Python Type |
double |
|
double |
double |
float |
float |
|
float |
float |
float |
int32 |
Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. |
int32 |
int |
int |
int64 |
Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. |
int64 |
long |
int/long |
uint32 |
Uses variable-length encoding. |
uint32 |
int |
int/long |
uint64 |
Uses variable-length encoding. |
uint64 |
long |
int/long |
sint32 |
Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. |
int32 |
int |
int |
sint64 |
Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. |
int64 |
long |
int/long |
fixed32 |
Always four bytes. More efficient than uint32 if values are often greater than 2^28. |
uint32 |
int |
int |
fixed64 |
Always eight bytes. More efficient than uint64 if values are often greater than 2^56. |
uint64 |
long |
int/long |
sfixed32 |
Always four bytes. |
int32 |
int |
int |
sfixed64 |
Always eight bytes. |
int64 |
long |
int/long |
bool |
|
bool |
boolean |
boolean |
string |
A string must always contain UTF-8 encoded or 7-bit ASCII text. |
string |
String |
str/unicode |
bytes |
May contain any arbitrary sequence of bytes. |
string |
ByteString |
str |