easy_rec.python.utils¶

easy_rec.python.utils.compat¶

easy_rec.python.utils.compat.in_python2()[source]¶

easy_rec.python.utils.compat.in_python3()[source]¶

easy_rec.python.utils.config_util¶

Functions for reading and updating configuration files.

Such as Hyper parameter tuning or automatic feature expanding.

easy_rec.python.utils.config_util.search_pipeline_config(directory)[source]¶

easy_rec.python.utils.config_util.get_configs_from_pipeline_file(pipeline_config_path, auto_expand=True)[source]¶

Reads config from a file containing pipeline_pb2.EasyRecConfig.

Parameters:

pipeline_config_path – Path to pipeline_pb2.EasyRecConfig text proto.

Returns:

Dictionary of configuration objects. Keys are model, train_config,: train_input_config, eval_config, eval_input_config. Value are the corresponding config objects.

easy_rec.python.utils.config_util.auto_expand_share_feature_configs(pipeline_config)[source]¶

easy_rec.python.utils.config_util.auto_expand_names(input_name)[source]¶

Auto expand field[1-3] to field1, field2, field3.

Parameters:: input_name – a string pattern like field[1-3]
Returns:: a string list of the expanded names

Todo

could be extended to support more complicated patterns

easy_rec.python.utils.config_util.create_pipeline_proto_from_configs(configs)[source]¶

Creates a pipeline_pb2.EasyRecConfig from configs dictionary.

This function performs the inverse operation of create_configs_from_pipeline_proto().

Parameters:: configs – Dictionary of configs. See get_configs_from_pipeline_file().
Returns:: A fully populated pipeline_pb2.EasyRecConfig.

easy_rec.python.utils.config_util.save_pipeline_config(pipeline_config, directory, filename='pipeline.config')[source]¶

Saves a pipeline config text file to disk.

Parameters:

pipeline_config – A pipeline_pb2.TrainEvalPipelineConfig.
directory – The model directory into which the pipeline config file will be saved.
filename – pipelineconfig filename

easy_rec.python.utils.config_util.edit_config(pipeline_config, edit_config_json)[source]¶

Update params specified by automl.

Parameters:

pipeline_config – EasyRecConfig
edit_config_json – edit config json

easy_rec.python.utils.config_util.save_message(protobuf_message, filename)[source]¶

Saves a pipeline config text file to disk.

Parameters:

protobuf_message – A pipeline_pb2.TrainEvalPipelineConfig.
filename – pipeline config filename

easy_rec.python.utils.config_util.add_boundaries_to_config(pipeline_config, tables)[source]¶

easy_rec.python.utils.config_util.get_compatible_feature_configs(pipeline_config)[source]¶

easy_rec.python.utils.config_util.parse_time(time_data)[source]¶

Parse time string to timestamp.

Parameters:: time_data – could be two formats: ‘%Y%m%d %H:%M:%S’ or ‘%s’
Returns:: int
Return type:: timestamp

easy_rec.python.utils.config_util.search_fg_json(directory)[source]¶

easy_rec.python.utils.config_util.get_input_name_from_fg_json(fg_json)[source]¶

easy_rec.python.utils.config_util.get_train_input_path(pipeline_config)[source]¶

easy_rec.python.utils.config_util.get_eval_input_path(pipeline_config)[source]¶

easy_rec.python.utils.config_util.get_model_dir_path(pipeline_config)[source]¶

easy_rec.python.utils.config_util.set_train_input_path(pipeline_config, train_input_path)[source]¶

easy_rec.python.utils.config_util.set_eval_input_path(pipeline_config, eval_input_path)[source]¶

easy_rec.python.utils.config_util.process_data_path(data_path, hive_util)[source]¶

easy_rec.python.utils.config_util.process_neg_sampler_data_path(pipeline_config)[source]¶

easy_rec.python.utils.config_util.parse_extra_config_param(extra_args, edit_config_json)[source]¶

easy_rec.python.utils.config_util.process_multi_file_input_path(sampler_config_input_path)[source]¶

easy_rec.python.utils.estimator_utils¶

easy_rec.python.utils.estimator_utils.tensor_log_format_func(tensor_dict)[source]¶

class easy_rec.python.utils.estimator_utils.ExitBarrierHook(num_worker, is_chief, model_dir)[source]¶

Bases: SessionRunHook

ExitBarrier to make sure master and workers exit at the same time.

After training finish, master has to do evaluation and model export, so master exits a little late than workers.

__init__(num_worker, is_chief, model_dir)[source]¶

begin()[source]¶: Count the number of workers and masters, and setup barrier queue.

after_create_session(session, coord)[source]¶

Clean up the queue after create session.

Sometimes ps is not exit, the last run enqueued elements will remain in the queue

end(session)[source]¶: Ensure when all workers and master enqueue an element, then exit.

class easy_rec.python.utils.estimator_utils.EvaluateExitBarrierHook(num_worker, is_chief, model_dir, metric_ops=None)[source]¶

Bases: SessionRunHook

ExitBarrier to make sure master and workers exit at the same time.

After training finish, master has to do evaluation and model export, so master exits a little late than workers.

__init__(num_worker, is_chief, model_dir, metric_ops=None)[source]¶

begin()[source]¶: Count the number of workers and masters, and setup barrier queue.

after_create_session(session, coord)[source]¶

Clean up the queue after create session.

Sometimes ps is not exit, the last run enqueued elements will remain in the queue

end(session)[source]¶: Ensure when all workers and master enqueue an element, then exit.

class easy_rec.python.utils.estimator_utils.ProgressHook(num_steps, filename, is_chief)[source]¶

Bases: SessionRunHook

__init__(num_steps, filename, is_chief)[source]¶

Initializes a ProgressHook.

Parameters:

num_steps – total train steps
filename – progress file name
is_chief – is chief worker or not

before_run(run_context)[source]¶

Called before each call to run().

You can return from this call a SessionRunArgs object indicating ops or tensors to add to the upcoming run() call. These ops/tensors will be run together with the ops/tensors originally passed to the original run() call. The run args you return can also contain feeds to be added to the run() call.

The run_context argument is a SessionRunContext that provides information about the upcoming run() call: the originally requested op/tensors, the TensorFlow Session.

At this point graph is finalized and you can not add ops.

Parameters:: run_context – A SessionRunContext object.
Returns:: None or a SessionRunArgs object.

after_run(run_context, run_values)[source]¶

Called after each call to run().

The run_values argument contains results of requested ops/tensors by before_run().

The run_context argument is the same one send to before_run call. run_context.request_stop() can be called to stop the iteration.

If session.run() raises any exceptions then after_run() is not called.

Parameters:

run_context – A SessionRunContext object.
run_values – A SessionRunValues object.

end(session)[source]¶

Called at the end of session.

The session argument can be used in case the hook wants to run final ops, such as saving a last checkpoint.

If session.run() raises exception other than OutOfRangeError or StopIteration then end() is not called. Note the difference between end() and after_run() behavior when session.run() raises OutOfRangeError or StopIteration. In that case end() is called but after_run() is not called.

Parameters:: session – A TensorFlow Session that will be soon closed.

class easy_rec.python.utils.estimator_utils.CheckpointSaverHook(checkpoint_dir, save_secs=None, save_steps=None, saver=None, checkpoint_basename='model.ckpt', scaffold=None, listeners=None, write_graph=True, data_offset_var=None, increment_save_config=None)[source]¶

Bases: CheckpointSaverHook

Saves checkpoints every N steps or seconds.

__init__(checkpoint_dir, save_secs=None, save_steps=None, saver=None, checkpoint_basename='model.ckpt', scaffold=None, listeners=None, write_graph=True, data_offset_var=None, increment_save_config=None)[source]¶

Initializes a CheckpointSaverHook.

Parameters:

checkpoint_dir – str, base directory for the checkpoint files.
save_secs – int, save every N secs.
save_steps – int, save every N steps.
saver – Saver object, used for saving.
checkpoint_basename – str, base name for the checkpoint files.
scaffold – Scaffold, use to get saver object.
listeners – List of CheckpointSaverListener subclass instances. Used for callbacks that run immediately before or after this hook saves the checkpoint.
write_graph – whether to save graph.pbtxt.
data_offset_var – data offset variable.
increment_save_config – parameters for saving increment checkpoints.

Raises:

ValueError – One of save_steps or save_secs should be set.
ValueError – At most one of saver or scaffold should be set.

after_create_session(session, coord)[source]¶

Called when new TensorFlow session is created.

This is called to signal the hooks that a new session has been created. This has two essential differences with the situation in which begin is called:

When this is called, the graph is finalized and ops can no longer be added
to the graph.
This method will also be called as a result of recovering a wrapped
session, not only at the beginning of the overall session.

Parameters:

session – A TensorFlow Session that has been created.
coord – A Coordinator object which keeps track of all threads.

before_run(run_context)[source]¶

Called before each call to run().

You can return from this call a SessionRunArgs object indicating ops or tensors to add to the upcoming run() call. These ops/tensors will be run together with the ops/tensors originally passed to the original run() call. The run args you return can also contain feeds to be added to the run() call.

The run_context argument is a SessionRunContext that provides information about the upcoming run() call: the originally requested op/tensors, the TensorFlow Session.

At this point graph is finalized and you can not add ops.

Parameters:: run_context – A SessionRunContext object.
Returns:: None or a SessionRunArgs object.

after_run(run_context, run_values)[source]¶

Called after each call to run().

The run_values argument contains results of requested ops/tensors by before_run().

The run_context argument is the same one send to before_run call. run_context.request_stop() can be called to stop the iteration.

If session.run() raises any exceptions then after_run() is not called.

Parameters:

run_context – A SessionRunContext object.
run_values – A SessionRunValues object.

end(session)[source]¶

Called at the end of session.

The session argument can be used in case the hook wants to run final ops, such as saving a last checkpoint.

If session.run() raises exception other than OutOfRangeError or StopIteration then end() is not called. Note the difference between end() and after_run() behavior when session.run() raises OutOfRangeError or StopIteration. In that case end() is called but after_run() is not called.

Parameters:: session – A TensorFlow Session that will be soon closed.

class easy_rec.python.utils.estimator_utils.NumpyCheckpointRestoreHook(ckpt_path, name2var_map)[source]¶

Bases: SessionRunHook

Restore variable from numpy checkpoint.

__init__(ckpt_path, name2var_map)[source]¶

Initializes a NumpyCheckpointRestoreHook.

Parameters:

ckpt_path – numpy checkpoint path to restore from
name2var_map – var name in numpy ckpt to variable map

begin()[source]¶

Called once before using the session.

When called, the default graph is the one that will be launched in the session. The hook can modify the graph by adding new operations to it. After the begin() call the graph will be finalized and the other callbacks can not modify the graph anymore. Second call of begin() on the same graph, should not change the graph.

after_create_session(session, coord)[source]¶

Called when new TensorFlow session is created.

This is called to signal the hooks that a new session has been created. This has two essential differences with the situation in which begin is called:

When this is called, the graph is finalized and ops can no longer be added
to the graph.
This method will also be called as a result of recovering a wrapped
session, not only at the beginning of the overall session.

Parameters:

session – A TensorFlow Session that has been created.
coord – A Coordinator object which keeps track of all threads.

class easy_rec.python.utils.estimator_utils.IncompatibleShapeRestoreHook(incompatible_shape_var_map)[source]¶

Bases: SessionRunHook

Restore variable with incompatible shapes.

__init__(incompatible_shape_var_map)[source]¶

Initializes a IncompatibleShapeRestoreHook.

Parameters:: incompatible_shape_var_map – a variables mapping with incompatible shapes, map from real variable to temp variable, real variable is the variable used in model, temp variable is the variable restored from checkpoint.

begin()[source]¶

Called once before using the session.

When called, the default graph is the one that will be launched in the session. The hook can modify the graph by adding new operations to it. After the begin() call the graph will be finalized and the other callbacks can not modify the graph anymore. Second call of begin() on the same graph, should not change the graph.

after_create_session(session, coord)[source]¶

Called when new TensorFlow session is created.

This is called to signal the hooks that a new session has been created. This has two essential differences with the situation in which begin is called:

When this is called, the graph is finalized and ops can no longer be added
to the graph.
This method will also be called as a result of recovering a wrapped
session, not only at the beginning of the overall session.

Parameters:

session – A TensorFlow Session that has been created.
coord – A Coordinator object which keeps track of all threads.

class easy_rec.python.utils.estimator_utils.MultipleCheckpointsRestoreHook(ckpt_paths)[source]¶

Bases: SessionRunHook

Restore variable from numpy checkpoint.

SEP = ';'¶

__init__(ckpt_paths)[source]¶

Initializes a MultipleCheckpointsRestoreHook.

Parameters:

ckpt_paths – multiple checkpoint path, seperated by ;
name2var_map – var name in numpy ckpt to variable map

begin()[source]¶

Called once before using the session.

When called, the default graph is the one that will be launched in the session. The hook can modify the graph by adding new operations to it. After the begin() call the graph will be finalized and the other callbacks can not modify the graph anymore. Second call of begin() on the same graph, should not change the graph.

after_create_session(session, coord)[source]¶

Called when new TensorFlow session is created.

This is called to signal the hooks that a new session has been created. This has two essential differences with the situation in which begin is called:

When this is called, the graph is finalized and ops can no longer be added
to the graph.
This method will also be called as a result of recovering a wrapped
session, not only at the beginning of the overall session.

Parameters:

session – A TensorFlow Session that has been created.
coord – A Coordinator object which keeps track of all threads.

class easy_rec.python.utils.estimator_utils.OnlineEvaluationHook(metric_dict, output_dir)[source]¶

Bases: SessionRunHook

__init__(metric_dict, output_dir)[source]¶

end(session)[source]¶

Called at the end of session.

The session argument can be used in case the hook wants to run final ops, such as saving a last checkpoint.

If session.run() raises exception other than OutOfRangeError or StopIteration then end() is not called. Note the difference between end() and after_run() behavior when session.run() raises OutOfRangeError or StopIteration. In that case end() is called but after_run() is not called.

Parameters:: session – A TensorFlow Session that will be soon closed.

easy_rec.python.utils.estimator_utils.parse_tf_config()[source]¶

easy_rec.python.utils.estimator_utils.get_task_index_and_num()[source]¶

easy_rec.python.utils.estimator_utils.get_ckpt_version(ckpt_path)[source]¶

Get checkpoint version from ckpt_path.

Parameters:: ckpt_path – such as xx/model.ckpt-2000 or xx/model.ckpt-2000.meta
Returns:: such as 2000
Return type:: ckpt_version

easy_rec.python.utils.estimator_utils.get_latest_checkpoint_from_checkpoint_path(checkpoint_path, ignore_ckpt_error)[source]¶

easy_rec.python.utils.estimator_utils.latest_checkpoint(model_dir)[source]¶

Find lastest checkpoint under a directory.

Parameters:: model_dir – model directory
Returns:: xx/model.ckpt-2000
Return type:: model_path

easy_rec.python.utils.estimator_utils.get_trained_steps(model_dir)[source]¶

easy_rec.python.utils.estimator_utils.master_to_chief()[source]¶

easy_rec.python.utils.estimator_utils.chief_to_master()[source]¶

easy_rec.python.utils.estimator_utils.is_ps()[source]¶

easy_rec.python.utils.estimator_utils.is_chief()[source]¶

easy_rec.python.utils.estimator_utils.is_master()[source]¶

easy_rec.python.utils.estimator_utils.is_evaluator()[source]¶

easy_rec.python.utils.estimator_utils.has_hvd()[source]¶

easy_rec.python.utils.estimator_utils.has_sok()[source]¶

easy_rec.python.utils.estimator_utils.init_hvd()[source]¶

easy_rec.python.utils.estimator_utils.init_sok()[source]¶

easy_rec.python.utils.estimator_utils.get_available_gpus()[source]¶

easy_rec.python.utils.hpo_util¶

easy_rec.python.utils.hpo_util.get_all_eval_result(event_file_pattern)[source]¶

Get the best eval result from event files.

Parameters:: event_files – Absolute pattern of event files.
Returns:: The best eval result.

easy_rec.python.utils.hpo_util.save_eval_metrics(model_dir, metric_save_path, has_evaluator=True)[source]¶

Save evaluation metrics.

Parameters:

model_dir – train model directory
metric_save_path – metric saving path
has_evaluator – evaluation is done on a separate evaluator, not on master.

easy_rec.python.utils.hpo_util.kill_old_proc(tmp_dir, platform='pai')[source]¶

easy_rec.python.utils.io_util¶

IO utils.

isort:skip_file

easy_rec.python.utils.io_util.http_read(url, timeout=600, max_retry=5)[source]¶

Read data from url with maximum retry.

Parameters:

url – http url to be read
timeout – specifies a timeout in seconds for blocking operations.
max_retry – http max retry times.

easy_rec.python.utils.io_util.download(oss_or_url, dst_dir='')[source]¶

Download file.

Parameters:

oss_or_url – http or oss path
dst_dir – destination directory

Returns:

local path for the downloaded file

Return type:

dst_file

easy_rec.python.utils.io_util.create_module_dir(dst_dir)[source]¶

easy_rec.python.utils.io_util.download_resource(resource_path, dst_dir='easy_rec_user_resources')[source]¶

Download user resource.

Parameters:

resource_path – http or oss path
dst_dir – destination directory

easy_rec.python.utils.io_util.download_and_uncompress_resource(resource_path, dst_dir='easy_rec_user_resources')[source]¶

Download user resource and uncompress it if necessary.

Parameters:

resource_path – http or oss path
dst_dir – download destination directory

easy_rec.python.utils.io_util.oss_has_t_mode(target_file)[source]¶: Test if current enviroment support t-mode written to oss.

easy_rec.python.utils.io_util.fix_oss_dir(path)[source]¶: Make sure that oss dir endswith /.

easy_rec.python.utils.io_util.save_data_to_json_path(json_path, data)[source]¶

easy_rec.python.utils.io_util.read_data_from_json_path(json_path)[source]¶

easy_rec.python.utils.io_util.convert_tf_flags_to_argparse(flags)[source]¶

Convert tf.app.flags.FLAGS to argparse.ArgumentParser.

Parameters:: flags – tf.app.flags.FLAGS
Returns:: configurate ArgumentParser object
Return type:: argparse.ArgumentParser

easy_rec.python.utils.io_util.filter_unknown_args(flags, args)[source]¶: Filter unknown args.

easy_rec.python.utils.load_class¶

Load_class.py tools for loading classes.

easy_rec.python.utils.load_class.python_file_to_module(python_file)[source]¶

easy_rec.python.utils.load_class.load_by_path(path)[source]¶

Load functions or modules or classes.

Parameters:: path – path to modules or functions or classes, such as: tf.nn.relu
Returns:: modules or functions or classes

easy_rec.python.utils.load_class.check_class(cls, impl_cls, function_names=None)[source]¶

Check implemented class is valid according to template class.

if function signature is not the same, exception will be raised.

Parameters:

cls – class which declares functions that need users to implement
impl_cls – user implemented class
function_names – if not None, will only check these funtions and their signature

easy_rec.python.utils.load_class.import_pkg(pkg_info, prefix_to_remove=None)[source]¶

Import package.

Parameters:

pkg_info – pkgutil.ModuleInfo object
prefix_to_remove – the package prefix to be removed

easy_rec.python.utils.load_class.auto_import(user_path=None)[source]¶

Auto import python files so that register_xxx decorator will take effect.

By default, we will import files in pre-defined directory and import all files recursively in user_dir

Parameters:: user_path – directory or file that store user-defined python code, by default we wiil only search file in current directory

easy_rec.python.utils.load_class.register_class(class_map, class_name, cls)[source]¶

easy_rec.python.utils.load_class.get_register_class_meta(class_map, have_abstract_class=True)[source]¶

easy_rec.python.utils.load_class.load_keras_layer(name)[source]¶

Load keras layer class.

Parameters:: name – keras layer name
Returns:: (layer_class, is_customize)

easy_rec.python.utils.odps_util¶

Common functions used for odps input.

easy_rec.python.utils.odps_util.is_type_compatiable(odps_type, input_type)[source]¶: Check that odps_type are compatiable with input_type.

easy_rec.python.utils.odps_util.odps_type_to_input_type(odps_type)[source]¶: Check that odps_type are compatiable with input_type.

easy_rec.python.utils.odps_util.check_input_field_and_types(data_config)[source]¶

Check compatibility of input in data_config.

check that data_config.input_fields are compatible with data_config.selected_cols and data_config.selected_types.

Parameters:: data_config – instance of DatasetConfig

easy_rec.python.utils.odps_util.odps_type_2_tf_type(odps_type)[source]¶

easy_rec.python.utils.pai_util¶

easy_rec.python.utils.pai_util.is_on_pai()[source]¶

easy_rec.python.utils.pai_util.set_on_pai()[source]¶

easy_rec.python.utils.pai_util.download(url)[source]¶

easy_rec.python.utils.pai_util.process_config(configs, task_index=0, worker_num=1)[source]¶

Download config and select config for the worker.

Parameters:

configs – config paths, separated by ‘,’
task_index – worker index
worker_num – total number of workers

easy_rec.python.utils.pai_util.test()[source]¶

easy_rec.python.utils.restore_filter¶

Define filters for restore.

class easy_rec.python.utils.restore_filter.Logical(value)[source]¶

Bases: Enum

An enumeration.

AND = 1¶

OR = 2¶

class easy_rec.python.utils.restore_filter.Filter[source]¶

Bases: object

__init__()[source]¶

abstract keep(var_name)[source]¶

Keep the var or not.

Parameters:: var_name – input name of the var
Returns:: True if the var will be kept, else False

class easy_rec.python.utils.restore_filter.KeywordFilter(pattern, exclusive=False)[source]¶

Bases: Filter

__init__(pattern, exclusive=False)[source]¶

Init KeywordFilter.

Parameters:

pattern – keyword to be matched
exclusive – if True, var_name should include the pattern else, var_name should not include the pattern

keep(var_name)[source]¶

Keep the var or not.

Parameters:: var_name – input name of the var
Returns:: True if the var will be kept, else False

class easy_rec.python.utils.restore_filter.CombineFilter(filters, logical=Logical.AND)[source]¶

Bases: Filter

__init__(filters, logical=Logical.AND)[source]¶

Init CombineFilter.

Parameters:

filters – a set of filters to be combined
logical – logical and/or combination of the filters

keep(var_name)[source]¶

Keep the var or not.

Parameters:: var_name – input name of the var
Returns:: True if the var will be kept, else False

class easy_rec.python.utils.restore_filter.ScopeDrop(scope_name)[source]¶

Bases: object

For drop out scope prefix when restore variables from checkpoint.

__init__(scope_name)[source]¶

update(var_name)[source]¶

easy_rec.python.utils.shape_utils¶

Utils used to manipulate tensor shapes.

easy_rec.python.utils.shape_utils.merge_shape(t, shape_list)[source]¶

Merge static shape info into tensor.

Parameters:

t – the input tensor, assuming the rank is at least 1.
shape_list – a list of shape, the same length of t.get_shape()

Returns:

the tensor t with shape updated

easy_rec.python.utils.shape_utils.pad_tensor(t, length)[source]¶

Pads the input tensor with 0s along the first dimension up to the length.

Parameters:

t – the input tensor, assuming the rank is at least 1.
length – a tensor of shape [1] or an integer, indicating the first dimension of the input tensor t after padding, assuming length <= t.shape[0].

Returns:

the padded tensor, whose first dimension is length. If the length: is an integer, the first dimension of padded_t is set to length statically.

Return type:

padded_t

easy_rec.python.utils.shape_utils.clip_tensor(t, length)[source]¶

Clips the input tensor along the first dimension up to the length.

Parameters:

t – the input tensor, assuming the rank is at least 1.
length – a tensor of shape [1] or an integer, indicating the first dimension of the input tensor t after clipping, assuming length <= t.shape[0].

Returns:

the clipped tensor, whose first dimension is length. If the: length is an integer, the first dimension of clipped_t is set to length statically.

Return type:

clipped_t

easy_rec.python.utils.shape_utils.pad_or_clip_tensor(t, length)[source]¶

Pad or clip the input tensor along the first dimension.

Parameters:

t – the input tensor, assuming the rank is at least 1.
length – a tensor of shape [1] or an integer, indicating the first dimension of the input tensor t after processing.

Returns:

the processed tensor, whose first dimension is length. If the: length is an integer, the first dimension of the processed tensor is set to length statically.

Return type:

processed_t

easy_rec.python.utils.shape_utils.pad_nd(tensor, output_shape)[source]¶

Pad given tensor to the output shape.

Parameters:

tensor – Input tensor to pad or clip.
output_shape – A list of integers / scalar tensors (or None for dynamic dim) representing the size to pad or clip each dimension of the input tensor.

Returns:

Input tensor padded and clipped to the output shape.

easy_rec.python.utils.shape_utils.pad_or_clip_nd(tensor, output_shape)[source]¶

Pad or Clip given tensor to the output shape.

Parameters:

tensor – Input tensor to pad or clip.
output_shape – A list of integers / scalar tensors (or None for dynamic dim) representing the size to pad or clip each dimension of the input tensor.

Returns:

Input tensor padded and clipped to the output shape.

easy_rec.python.utils.shape_utils.combined_static_and_dynamic_shape(tensor)[source]¶

Returns a list containing static and dynamic values for the dimensions.

Returns a list of static and dynamic values for shape dimensions. This is useful to preserve static shapes when available in reshape operation.

Parameters:: tensor – A tensor of any type.
Returns:: A list of size tensor.shape.ndims containing integers or a scalar tensor.

easy_rec.python.utils.shape_utils.check_min_image_dim(min_dim, image_tensor)[source]¶

Checks that the image width/height are greater than some number.

This function is used to check that the width and height of an image are above a certain value. If the image shape is static, this function will perform the check at graph construction time. Otherwise, if the image shape varies, an Assertion control dependency will be added to the graph.

Parameters:

min_dim – The minimum number of pixels along the width and height of the image.
image_tensor – The image tensor to check size for.

Returns:

If image_tensor has dynamic size, return image_tensor with a Assert control dependency. Otherwise returns image_tensor.

Raises:

ValueError – if image_tensor’s’ width or height is smaller than min_dim.

easy_rec.python.utils.shape_utils.assert_shape_equal(shape_a, shape_b)[source]¶

Asserts that shape_a and shape_b are equal.

If the shapes are static, raises a ValueError when the shapes mismatch.

If the shapes are dynamic, raises a tf InvalidArgumentError when the shapes mismatch.

Parameters:

shape_a – a list containing shape of the first tensor.
shape_b – a list containing shape of the second tensor.

Returns:

Either a tf.no_op() when shapes are all static and a tf.assert_equal() op when the shapes are dynamic.

Raises:

ValueError – When shapes are both static and unequal.

easy_rec.python.utils.shape_utils.assert_shape_equal_along_first_dimension(shape_a, shape_b)[source]¶

Asserts that shape_a and shape_b are the same along the 0th-dimension.

If the shapes are static, raises a ValueError when the shapes mismatch.

If the shapes are dynamic, raises a tf InvalidArgumentError when the shapes mismatch.

Parameters:

shape_a – a list containing shape of the first tensor.
shape_b – a list containing shape of the second tensor.

Returns:

Either a tf.no_op() when shapes are all static and a tf.assert_equal() op when the shapes are dynamic.

Raises:

ValueError – When shapes are both static and unequal.

easy_rec.python.utils.shape_utils.assert_box_normalized(boxes, maximum_normalized_coordinate=1.1)[source]¶

Asserts the input box tensor is normalized.

Parameters:

boxes – a tensor of shape [N, 4] where N is the number of boxes.
maximum_normalized_coordinate – Maximum coordinate value to be considered as normalized, default to 1.1.

Returns:

a tf.Assert op which fails when the input box tensor is not normalized.

Raises:

ValueError – When the input box tensor is not normalized.

easy_rec.python.utils.shape_utils.get_shape_list(tensor, expected_rank=None, name=None)[source]¶

Returns a list of the shape of tensor, preferring static dimensions.

Parameters:

tensor – A tf.Tensor object to find the shape of.
expected_rank – (optional) int. The expected rank of tensor. If this is specified and the tensor has a different rank, and exception will be thrown.
name – Optional name of the tensor for the error message.

Returns:

A list of dimensions of the shape of tensor. All static dimensions will be returned as python integers, and dynamic dimensions will be returned as tf.Tensor scalars.

easy_rec.python.utils.shape_utils.assert_rank(tensor, expected_rank, name=None)[source]¶

Raises an exception if the tensor rank is not of the expected rank.

Parameters:

tensor – A tf.Tensor to check the rank of.
expected_rank – Python integer or list of integers, expected rank.
name – Optional name of the tensor for the error message.

Raises:

ValueError – If the expected shape doesn’t match the actual shape.

easy_rec.python.utils.shape_utils.truncate_sequence(seq_emb, seq_len, limited_len)[source]¶

easy_rec.python.utils.shape_utils.pad_or_truncate_sequence(seq_emb, seq_len, fixed_len)[source]¶

easy_rec.python.utils.static_shape¶

Helper functions to access TensorShape values.

The rank 4 tensor_shape must be of the form [batch_size, height, width, depth].

easy_rec.python.utils.static_shape.get_batch_size(tensor_shape)[source]¶

Returns batch size from the tensor shape.

Parameters:: tensor_shape – A rank 4 TensorShape.
Returns:: An integer representing the batch size of the tensor.

easy_rec.python.utils.static_shape.get_height(tensor_shape)[source]¶

Returns height from the tensor shape.

Parameters:: tensor_shape – A rank 4 TensorShape.
Returns:: An integer representing the height of the tensor.

easy_rec.python.utils.static_shape.get_width(tensor_shape)[source]¶

Returns width from the tensor shape.

Parameters:: tensor_shape – A rank 4 TensorShape.
Returns:: An integer representing the width of the tensor.

easy_rec.python.utils.static_shape.get_depth(tensor_shape)[source]¶

Returns depth from the tensor shape.

Parameters:: tensor_shape – A rank 4 TensorShape.
Returns:: An integer representing the depth of the tensor.

easy_rec.python.utils.test_utils¶

Contains functions which are convenient for unit testing.

isort:skip_file

easy_rec.python.utils.test_utils.get_hdfs_tmp_dir(test_dir)[source]¶: Create a randomly of directory in HDFS.

easy_rec.python.utils.test_utils.proc_wait(proc, timeout=1200)[source]¶

easy_rec.python.utils.test_utils.get_tmp_dir()[source]¶

easy_rec.python.utils.test_utils.clear_all_tmp_dirs()[source]¶

easy_rec.python.utils.test_utils.set_gpu_id(gpu_id_str)[source]¶

easy_rec.python.utils.test_utils.get_available_gpus()[source]¶

easy_rec.python.utils.test_utils.run_cmd(cmd_str, log_file, env=None)[source]¶: Run a shell cmd.

easy_rec.python.utils.test_utils.RunAsSubprocess(f)[source]¶

Function dectorator to run function in subprocess.

if a function will start a tf session. Because tensorflow gpu memory will not be cleared until the process exit.

easy_rec.python.utils.test_utils.clean_up(test_dir)[source]¶

easy_rec.python.utils.test_utils.clean_up_hdfs(test_dir)[source]¶

easy_rec.python.utils.test_utils.test_datahub_train_eval(pipeline_config_path, odps_oss_config, test_dir, process_pipeline_func=None, total_steps=50, post_check_func=None)[source]¶

easy_rec.python.utils.test_utils.test_single_train_eval(pipeline_config_path, test_dir, process_pipeline_func=None, hyperparam_str='', total_steps=50, post_check_func=None, check_mode=False, fine_tune_checkpoint=None, extra_cmd_args=None, timeout=-1)[source]¶

easy_rec.python.utils.test_utils.test_single_pre_check(pipeline_config_path, test_dir)[source]¶

easy_rec.python.utils.test_utils.test_single_predict(test_dir, input_path, output_path, saved_model_dir)[source]¶

easy_rec.python.utils.test_utils.test_feature_selection(pipeline_config)[source]¶

easy_rec.python.utils.test_utils.yaml_replace(train_yaml_path, pipline_config_path, test_pipeline_config_path, test_export_dir=None)[source]¶

easy_rec.python.utils.test_utils.test_hdfs_train_eval(pipeline_config_path, train_yaml_path, test_dir, process_pipeline_func=None, hyperparam_str='', total_steps=2000)[source]¶

easy_rec.python.utils.test_utils.test_hdfs_eval(pipeline_config_path, eval_yaml_path, test_dir, process_pipeline_func=None, hyperparam_str='')[source]¶

easy_rec.python.utils.test_utils.test_hdfs_export(pipeline_config_path, export_yaml_path, test_dir, process_pipeline_func=None, hyperparam_str='')[source]¶

easy_rec.python.utils.test_utils.get_ports_base(num_worker)[source]¶

easy_rec.python.utils.test_utils.test_distributed_train_eval(pipeline_config_path, test_dir, total_steps=50, num_evaluator=0, edit_config_json=None, use_hvd=False, fit_on_eval=False, num_epoch=0)[source]¶

easy_rec.python.utils.test_utils.test_distribute_eval_test(cur_eval_path, test_dir)[source]¶

easy_rec.python.utils.test_utils.test_distributed_eval(pipeline_config_path, checkpoint_path, test_dir, total_steps=50, num_evaluator=0)[source]¶