easy_rec.python.utils¶
easy_rec.python.utils.compat¶
easy_rec.python.utils.config_util¶
Functions for reading and updating configuration files.
Such as Hyper parameter tuning or automatic feature expanding.
- easy_rec.python.utils.config_util.get_configs_from_pipeline_file(pipeline_config_path, auto_expand=True)[source]¶
Reads config from a file containing pipeline_pb2.EasyRecConfig.
- Parameters
pipeline_config_path – Path to pipeline_pb2.EasyRecConfig text proto.
- Returns
- Dictionary of configuration objects. Keys are model, train_config,
train_input_config, eval_config, eval_input_config. Value are the corresponding config objects.
- easy_rec.python.utils.config_util.auto_expand_names(input_name)[source]¶
Auto expand field[1-3] to field1, field2, field3.
- Parameters
input_name – a string pattern like field[1-3]
- Returns
a string list of the expanded names
Todo
could be extended to support more complicated patterns
- easy_rec.python.utils.config_util.create_pipeline_proto_from_configs(configs)[source]¶
Creates a pipeline_pb2.EasyRecConfig from configs dictionary.
This function performs the inverse operation of create_configs_from_pipeline_proto().
- Parameters
configs – Dictionary of configs. See get_configs_from_pipeline_file().
- Returns
A fully populated pipeline_pb2.EasyRecConfig.
- easy_rec.python.utils.config_util.save_pipeline_config(pipeline_config, directory, filename='pipeline.config')[source]¶
Saves a pipeline config text file to disk.
- Parameters
pipeline_config – A pipeline_pb2.TrainEvalPipelineConfig.
directory – The model directory into which the pipeline config file will be saved.
filename – pipelineconfig filename
- easy_rec.python.utils.config_util.edit_config(pipeline_config, edit_config_json)[source]¶
Update params specified by automl.
- Parameters
pipeline_config – EasyRecConfig
edit_config_json – edit config json
easy_rec.python.utils.estimator_utils¶
- class easy_rec.python.utils.estimator_utils.ExitBarrierHook(num_worker, is_chief, model_dir)[source]¶
Bases:
tensorflow.python.training.session_run_hook.SessionRunHook
ExitBarrier to make sure master and workers exit at the same time.
After training finish, master has to do evaluation and model export, so master exits a little late than workers.
- class easy_rec.python.utils.estimator_utils.EvaluateExitBarrierHook(num_worker, is_chief, model_dir, metric_ops=None)[source]¶
Bases:
tensorflow.python.training.session_run_hook.SessionRunHook
ExitBarrier to make sure master and workers exit at the same time.
After training finish, master has to do evaluation and model export, so master exits a little late than workers.
- class easy_rec.python.utils.estimator_utils.ProgressHook(num_steps, filename, is_chief)[source]¶
Bases:
tensorflow.python.training.session_run_hook.SessionRunHook
- __init__(num_steps, filename, is_chief)[source]¶
Initializes a ProgressHook.
- Parameters
num_steps – total train steps
filename – progress file name
is_chief – is chief worker or not
- before_run(run_context)[source]¶
Called before each call to run().
You can return from this call a SessionRunArgs object indicating ops or tensors to add to the upcoming run() call. These ops/tensors will be run together with the ops/tensors originally passed to the original run() call. The run args you return can also contain feeds to be added to the run() call.
The run_context argument is a SessionRunContext that provides information about the upcoming run() call: the originally requested op/tensors, the TensorFlow Session.
At this point graph is finalized and you can not add ops.
- Parameters
run_context – A SessionRunContext object.
- Returns
None or a SessionRunArgs object.
- after_run(run_context, run_values)[source]¶
Called after each call to run().
The run_values argument contains results of requested ops/tensors by before_run().
The run_context argument is the same one send to before_run call. run_context.request_stop() can be called to stop the iteration.
If session.run() raises any exceptions then after_run() is not called.
- Parameters
run_context – A SessionRunContext object.
run_values – A SessionRunValues object.
- end(session)[source]¶
Called at the end of session.
The session argument can be used in case the hook wants to run final ops, such as saving a last checkpoint.
If session.run() raises exception other than OutOfRangeError or StopIteration then end() is not called. Note the difference between end() and after_run() behavior when session.run() raises OutOfRangeError or StopIteration. In that case end() is called but after_run() is not called.
- Parameters
session – A TensorFlow Session that will be soon closed.
- class easy_rec.python.utils.estimator_utils.CheckpointSaverHook(checkpoint_dir, save_secs=None, save_steps=None, saver=None, checkpoint_basename='model.ckpt', scaffold=None, listeners=None, write_graph=True)[source]¶
Bases:
tensorflow.python.training.basic_session_run_hooks.CheckpointSaverHook
Saves checkpoints every N steps or seconds.
- __init__(checkpoint_dir, save_secs=None, save_steps=None, saver=None, checkpoint_basename='model.ckpt', scaffold=None, listeners=None, write_graph=True)[source]¶
Initializes a CheckpointSaverHook.
- Parameters
checkpoint_dir – str, base directory for the checkpoint files.
save_secs – int, save every N secs.
save_steps – int, save every N steps.
saver – Saver object, used for saving.
checkpoint_basename – str, base name for the checkpoint files.
scaffold – Scaffold, use to get saver object.
listeners – List of CheckpointSaverListener subclass instances. Used for callbacks that run immediately before or after this hook saves the checkpoint.
write_graph – whether to save graph.pbtxt.
- Raises
ValueError – One of save_steps or save_secs should be set.
ValueError – At most one of saver or scaffold should be set.
- after_create_session(session, coord)[source]¶
Called when new TensorFlow session is created.
This is called to signal the hooks that a new session has been created. This has two essential differences with the situation in which begin is called:
- When this is called, the graph is finalized and ops can no longer be added
to the graph.
- This method will also be called as a result of recovering a wrapped
session, not only at the beginning of the overall session.
- Parameters
session – A TensorFlow Session that has been created.
coord – A Coordinator object which keeps track of all threads.
- before_run(run_context)[source]¶
Called before each call to run().
You can return from this call a SessionRunArgs object indicating ops or tensors to add to the upcoming run() call. These ops/tensors will be run together with the ops/tensors originally passed to the original run() call. The run args you return can also contain feeds to be added to the run() call.
The run_context argument is a SessionRunContext that provides information about the upcoming run() call: the originally requested op/tensors, the TensorFlow Session.
At this point graph is finalized and you can not add ops.
- Parameters
run_context – A SessionRunContext object.
- Returns
None or a SessionRunArgs object.
- class easy_rec.python.utils.estimator_utils.NumpyCheckpointRestoreHook(ckpt_path, name2var_map)[source]¶
Bases:
tensorflow.python.training.session_run_hook.SessionRunHook
Restore variable from numpy checkpoint.
- __init__(ckpt_path, name2var_map)[source]¶
Initializes a NumpyCheckpointRestoreHook.
- Parameters
ckpt_path – numpy checkpoint path to restore from
name2var_map – var name in numpy ckpt to variable map
- begin()[source]¶
Called once before using the session.
When called, the default graph is the one that will be launched in the session. The hook can modify the graph by adding new operations to it. After the begin() call the graph will be finalized and the other callbacks can not modify the graph anymore. Second call of begin() on the same graph, should not change the graph.
- after_create_session(session, coord)[source]¶
Called when new TensorFlow session is created.
This is called to signal the hooks that a new session has been created. This has two essential differences with the situation in which begin is called:
- When this is called, the graph is finalized and ops can no longer be added
to the graph.
- This method will also be called as a result of recovering a wrapped
session, not only at the beginning of the overall session.
- Parameters
session – A TensorFlow Session that has been created.
coord – A Coordinator object which keeps track of all threads.
- class easy_rec.python.utils.estimator_utils.IncompatibleShapeRestoreHook(incompatible_shape_var_map)[source]¶
Bases:
tensorflow.python.training.session_run_hook.SessionRunHook
Restore variable with incompatible shapes.
- __init__(incompatible_shape_var_map)[source]¶
Initializes a IncompatibleShapeRestoreHook.
- Parameters
incompatible_shape_var_map – a variables mapping with incompatible shapes, map from real variable to temp variable, real variable is the variable used in model, temp variable is the variable restored from checkpoint.
- begin()[source]¶
Called once before using the session.
When called, the default graph is the one that will be launched in the session. The hook can modify the graph by adding new operations to it. After the begin() call the graph will be finalized and the other callbacks can not modify the graph anymore. Second call of begin() on the same graph, should not change the graph.
- after_create_session(session, coord)[source]¶
Called when new TensorFlow session is created.
This is called to signal the hooks that a new session has been created. This has two essential differences with the situation in which begin is called:
- When this is called, the graph is finalized and ops can no longer be added
to the graph.
- This method will also be called as a result of recovering a wrapped
session, not only at the beginning of the overall session.
- Parameters
session – A TensorFlow Session that has been created.
coord – A Coordinator object which keeps track of all threads.
- class easy_rec.python.utils.estimator_utils.MultipleCheckpointsRestoreHook(ckpt_paths)[source]¶
Bases:
tensorflow.python.training.session_run_hook.SessionRunHook
Restore variable from numpy checkpoint.
- SEP = ';'¶
- __init__(ckpt_paths)[source]¶
Initializes a MultipleCheckpointsRestoreHook.
- Parameters
ckpt_paths – multiple checkpoint path, seperated by ;
name2var_map – var name in numpy ckpt to variable map
- begin()[source]¶
Called once before using the session.
When called, the default graph is the one that will be launched in the session. The hook can modify the graph by adding new operations to it. After the begin() call the graph will be finalized and the other callbacks can not modify the graph anymore. Second call of begin() on the same graph, should not change the graph.
- after_create_session(session, coord)[source]¶
Called when new TensorFlow session is created.
This is called to signal the hooks that a new session has been created. This has two essential differences with the situation in which begin is called:
- When this is called, the graph is finalized and ops can no longer be added
to the graph.
- This method will also be called as a result of recovering a wrapped
session, not only at the beginning of the overall session.
- Parameters
session – A TensorFlow Session that has been created.
coord – A Coordinator object which keeps track of all threads.
- class easy_rec.python.utils.estimator_utils.OnlineEvaluationHook(metric_dict, output_dir)[source]¶
Bases:
tensorflow.python.training.session_run_hook.SessionRunHook
- end(session)[source]¶
Called at the end of session.
The session argument can be used in case the hook wants to run final ops, such as saving a last checkpoint.
If session.run() raises exception other than OutOfRangeError or StopIteration then end() is not called. Note the difference between end() and after_run() behavior when session.run() raises OutOfRangeError or StopIteration. In that case end() is called but after_run() is not called.
- Parameters
session – A TensorFlow Session that will be soon closed.
- easy_rec.python.utils.estimator_utils.get_ckpt_version(ckpt_path)[source]¶
Get checkpoint version from ckpt_path.
- Parameters
ckpt_path – such as xx/model.ckpt-2000 or xx/model.ckpt-2000.meta
- Returns
such as 2000
- Return type
ckpt_version
easy_rec.python.utils.hpo_util¶
- easy_rec.python.utils.hpo_util.get_all_eval_result(event_file_pattern)[source]¶
Get the best eval result from event files.
- Parameters
event_files – Absolute pattern of event files.
- Returns
The best eval result.
easy_rec.python.utils.io_util¶
IO utils.
isort:skip_file
- easy_rec.python.utils.io_util.http_read(url, timeout=600, max_retry=5)[source]¶
Read data from url with maximum retry.
- Parameters
url – http url to be read
timeout – specifies a timeout in seconds for blocking operations.
max_retry – http max retry times.
- easy_rec.python.utils.io_util.download(oss_or_url, dst_dir='')[source]¶
Download file.
- Parameters
oss_or_url – http or oss path
dst_dir – destination directory
- Returns
local path for the downloaded file
- Return type
dst_file
- easy_rec.python.utils.io_util.download_resource(resource_path, dst_dir='easy_rec_user_resources')[source]¶
Download user resource.
- Parameters
resource_path – http or oss path
dst_dir – destination directory
- easy_rec.python.utils.io_util.download_and_uncompress_resource(resource_path, dst_dir='easy_rec_user_resources')[source]¶
Download user resource and uncompress it if necessary.
- Parameters
resource_path – http or oss path
dst_dir – download destination directory
easy_rec.python.utils.load_class¶
Load_class.py tools for loading classes.
- easy_rec.python.utils.load_class.load_by_path(path)[source]¶
Load functions or modules or classes.
- Parameters
path – path to modules or functions or classes, such as: tf.nn.relu
- Returns
modules or functions or classes
- easy_rec.python.utils.load_class.check_class(cls, impl_cls, function_names=None)[source]¶
Check implemented class is valid according to template class.
if function signature is not the same, exception will be raised.
- Parameters
cls – class which declares functions that need users to implement
impl_cls – user implemented class
function_names – if not None, will only check these funtions and their signature
- easy_rec.python.utils.load_class.import_pkg(pkg_info, prefix_to_remove=None)[source]¶
Import package.
- Parameters
pkg_info – pkgutil.ModuleInfo object
prefix_to_remove – the package prefix to be removed
- easy_rec.python.utils.load_class.auto_import(user_path=None)[source]¶
Auto import python files so that register_xxx decorator will take effect.
By default, we will import files in pre-defined directory and import all files recursively in user_dir
- Parameters
user_path – directory or file that store user-defined python code, by default we wiil only search file in current directory
easy_rec.python.utils.odps_util¶
Common functions used for odps input.
easy_rec.python.utils.pai_util¶
easy_rec.python.utils.restore_filter¶
Define filters for restore.
- class easy_rec.python.utils.restore_filter.Logical(value)[source]¶
Bases:
enum.Enum
An enumeration.
- AND = 1¶
- OR = 2¶
- class easy_rec.python.utils.restore_filter.KeywordFilter(pattern, exclusive=False)[source]¶
- class easy_rec.python.utils.restore_filter.CombineFilter(filters, logical=Logical.AND)[source]¶
easy_rec.python.utils.shape_utils¶
Utils used to manipulate tensor shapes.
- easy_rec.python.utils.shape_utils.merge_shape(t, shape_list)[source]¶
Merge static shape info into tensor.
- Parameters
t – the input tensor, assuming the rank is at least 1.
shape_list – a list of shape, the same length of t.get_shape()
- Returns
the tensor t with shape updated
- easy_rec.python.utils.shape_utils.pad_tensor(t, length)[source]¶
Pads the input tensor with 0s along the first dimension up to the length.
- Parameters
t – the input tensor, assuming the rank is at least 1.
length – a tensor of shape [1] or an integer, indicating the first dimension of the input tensor t after padding, assuming length <= t.shape[0].
- Returns
- the padded tensor, whose first dimension is length. If the length
is an integer, the first dimension of padded_t is set to length statically.
- Return type
padded_t
- easy_rec.python.utils.shape_utils.clip_tensor(t, length)[source]¶
Clips the input tensor along the first dimension up to the length.
- Parameters
t – the input tensor, assuming the rank is at least 1.
length – a tensor of shape [1] or an integer, indicating the first dimension of the input tensor t after clipping, assuming length <= t.shape[0].
- Returns
- the clipped tensor, whose first dimension is length. If the
length is an integer, the first dimension of clipped_t is set to length statically.
- Return type
clipped_t
- easy_rec.python.utils.shape_utils.pad_or_clip_tensor(t, length)[source]¶
Pad or clip the input tensor along the first dimension.
- Parameters
t – the input tensor, assuming the rank is at least 1.
length – a tensor of shape [1] or an integer, indicating the first dimension of the input tensor t after processing.
- Returns
- the processed tensor, whose first dimension is length. If the
length is an integer, the first dimension of the processed tensor is set to length statically.
- Return type
processed_t
- easy_rec.python.utils.shape_utils.pad_nd(tensor, output_shape)[source]¶
Pad given tensor to the output shape.
- Parameters
tensor – Input tensor to pad or clip.
output_shape – A list of integers / scalar tensors (or None for dynamic dim) representing the size to pad or clip each dimension of the input tensor.
- Returns
Input tensor padded and clipped to the output shape.
- easy_rec.python.utils.shape_utils.pad_or_clip_nd(tensor, output_shape)[source]¶
Pad or Clip given tensor to the output shape.
- Parameters
tensor – Input tensor to pad or clip.
output_shape – A list of integers / scalar tensors (or None for dynamic dim) representing the size to pad or clip each dimension of the input tensor.
- Returns
Input tensor padded and clipped to the output shape.
- easy_rec.python.utils.shape_utils.combined_static_and_dynamic_shape(tensor)[source]¶
Returns a list containing static and dynamic values for the dimensions.
Returns a list of static and dynamic values for shape dimensions. This is useful to preserve static shapes when available in reshape operation.
- Parameters
tensor – A tensor of any type.
- Returns
A list of size tensor.shape.ndims containing integers or a scalar tensor.
- easy_rec.python.utils.shape_utils.check_min_image_dim(min_dim, image_tensor)[source]¶
Checks that the image width/height are greater than some number.
This function is used to check that the width and height of an image are above a certain value. If the image shape is static, this function will perform the check at graph construction time. Otherwise, if the image shape varies, an Assertion control dependency will be added to the graph.
- Parameters
min_dim – The minimum number of pixels along the width and height of the image.
image_tensor – The image tensor to check size for.
- Returns
If image_tensor has dynamic size, return image_tensor with a Assert control dependency. Otherwise returns image_tensor.
- Raises
ValueError – if image_tensor’s’ width or height is smaller than min_dim.
- easy_rec.python.utils.shape_utils.assert_shape_equal(shape_a, shape_b)[source]¶
Asserts that shape_a and shape_b are equal.
If the shapes are static, raises a ValueError when the shapes mismatch.
If the shapes are dynamic, raises a tf InvalidArgumentError when the shapes mismatch.
- Parameters
shape_a – a list containing shape of the first tensor.
shape_b – a list containing shape of the second tensor.
- Returns
Either a tf.no_op() when shapes are all static and a tf.assert_equal() op when the shapes are dynamic.
- Raises
ValueError – When shapes are both static and unequal.
- easy_rec.python.utils.shape_utils.assert_shape_equal_along_first_dimension(shape_a, shape_b)[source]¶
Asserts that shape_a and shape_b are the same along the 0th-dimension.
If the shapes are static, raises a ValueError when the shapes mismatch.
If the shapes are dynamic, raises a tf InvalidArgumentError when the shapes mismatch.
- Parameters
shape_a – a list containing shape of the first tensor.
shape_b – a list containing shape of the second tensor.
- Returns
Either a tf.no_op() when shapes are all static and a tf.assert_equal() op when the shapes are dynamic.
- Raises
ValueError – When shapes are both static and unequal.
- easy_rec.python.utils.shape_utils.assert_box_normalized(boxes, maximum_normalized_coordinate=1.1)[source]¶
Asserts the input box tensor is normalized.
- Parameters
boxes – a tensor of shape [N, 4] where N is the number of boxes.
maximum_normalized_coordinate – Maximum coordinate value to be considered as normalized, default to 1.1.
- Returns
a tf.Assert op which fails when the input box tensor is not normalized.
- Raises
ValueError – When the input box tensor is not normalized.
- easy_rec.python.utils.shape_utils.get_shape_list(tensor, expected_rank=None, name=None)[source]¶
Returns a list of the shape of tensor, preferring static dimensions.
- Parameters
tensor – A tf.Tensor object to find the shape of.
expected_rank – (optional) int. The expected rank of tensor. If this is specified and the tensor has a different rank, and exception will be thrown.
name – Optional name of the tensor for the error message.
- Returns
A list of dimensions of the shape of tensor. All static dimensions will be returned as python integers, and dynamic dimensions will be returned as tf.Tensor scalars.
- easy_rec.python.utils.shape_utils.assert_rank(tensor, expected_rank, name=None)[source]¶
Raises an exception if the tensor rank is not of the expected rank.
- Parameters
tensor – A tf.Tensor to check the rank of.
expected_rank – Python integer or list of integers, expected rank.
name – Optional name of the tensor for the error message.
- Raises
ValueError – If the expected shape doesn’t match the actual shape.
easy_rec.python.utils.static_shape¶
Helper functions to access TensorShape values.
The rank 4 tensor_shape must be of the form [batch_size, height, width, depth].
- easy_rec.python.utils.static_shape.get_batch_size(tensor_shape)[source]¶
Returns batch size from the tensor shape.
- Parameters
tensor_shape – A rank 4 TensorShape.
- Returns
An integer representing the batch size of the tensor.
- easy_rec.python.utils.static_shape.get_height(tensor_shape)[source]¶
Returns height from the tensor shape.
- Parameters
tensor_shape – A rank 4 TensorShape.
- Returns
An integer representing the height of the tensor.
easy_rec.python.utils.test_utils¶
Contains functions which are convenient for unit testing.
isort:skip_file
- easy_rec.python.utils.test_utils.get_hdfs_tmp_dir(test_dir)[source]¶
Create a randomly of directory in HDFS.
- easy_rec.python.utils.test_utils.RunAsSubprocess(f)[source]¶
Function dectorator to run function in subprocess.
if a function will start a tf session. Because tensorflow gpu memory will not be cleared until the process exit.
- easy_rec.python.utils.test_utils.test_datahub_train_eval(pipeline_config_path, test_dir, process_pipeline_func=None, hyperparam_str='', total_steps=50, post_check_func=None)[source]¶
- easy_rec.python.utils.test_utils.test_single_train_eval(pipeline_config_path, test_dir, process_pipeline_func=None, hyperparam_str='', total_steps=50, post_check_func=None)[source]¶
- easy_rec.python.utils.test_utils.yaml_replace(train_yaml_path, pipline_config_path, test_pipeline_config_path, test_export_dir=None)[source]¶
- easy_rec.python.utils.test_utils.test_hdfs_train_eval(pipeline_config_path, train_yaml_path, test_dir, process_pipeline_func=None, hyperparam_str='', total_steps=2000)[source]¶
- easy_rec.python.utils.test_utils.test_hdfs_eval(pipeline_config_path, eval_yaml_path, test_dir, process_pipeline_func=None, hyperparam_str='')[source]¶