使用 Keras 数据生成器进行流式训练时出现 Tensor 尺寸不匹配错误-Python教程-PHP中文网

使用 Keras 数据生成器进行流式训练时出现 Tensor 尺寸不匹配错误

霞舞

发布： 2025-07-12 17:22:25

原创

491人浏览过

使用 keras 数据生成器进行流式训练时出现 tensor 尺寸不匹配错误

本文旨在解决在使用 Keras 数据生成器进行流式训练时，由于图像尺寸不当导致 Tensor 尺寸不匹配的问题。通过分析错误信息和模型结构，找出图像尺寸与模型层数之间的关系，并提供修改图像尺寸的解决方案，确保模型训练的顺利进行。

在使用 Keras 进行深度学习模型训练时，特别是处理大规模数据集时，通常会采用数据生成器来流式加载数据，以避免内存溢出。然而，在使用生成器进行训练的过程中，有时会遇到 Tensor 尺寸不匹配的错误，例如 InvalidArgumentError: All dimensions except 3 must match. Input 1 has shape [5 25 25 32] and doesn't match input 0 with shape [5 24 24 64]。这种错误通常不是生成器本身的问题，而是由于图像尺寸与模型结构之间的不匹配导致的。

问题分析

该错误信息表明，在模型的某个连接层（例如 concatenate 层）中，需要连接的两个 Tensor 的尺寸不一致。具体来说，除了第三个维度（通常是通道数）之外，其他维度必须完全匹配。

通常，这种问题出现在使用了下采样（例如 MaxPooling2D）和上采样（例如 Conv2DTranspose）的网络结构中，例如 U-Net。如果输入图像的尺寸不是 2 的若干次幂，或者不是模型中下采样倍数的整数倍，那么在经过多次下采样和上采样后，Tensor 的尺寸可能会出现非整数的缩放，从而导致连接时尺寸不匹配。

解决方案

解决此类问题的关键在于确保输入图像的尺寸与模型的下采样倍数相匹配。以下是一些可能的解决方案：

调整图像尺寸： 这是最直接的解决方案。将输入图像的尺寸调整为模型下采样倍数的整数倍。例如，如果模型中最大的下采样倍数为 16，那么可以将图像尺寸调整为 16 的倍数，例如 256x256、512x512 等。

import tensorflow as tf

# 调整图像尺寸的函数
def resize_image(image, target_size):
  """
  将图像调整到指定尺寸。

  Args:
    image: 输入图像 Tensor。
    target_size: 目标尺寸，例如 (256, 256)。

  Returns:
    调整后的图像 Tensor。
  """
  resized_image = tf.image.resize(image, target_size)
  return resized_image

# 在数据生成器中使用调整图像尺寸的函数
def __data_generation(self, subset_pair_id_list):
    'subdivides each image into an array of multiple images'
    # Initialization
    normalized_input_frames, normalized_gt_frames = get_normalized_input_and_gt_dataframes(
        channel = self.channel, 
        pairs_for_training = self.pairs,
        pair_ids=subset_pair_id_list,
        input_normalizing_function_name = self.input_normalizing_function_name,
        prediction_size=self.prediction_size
    )

    # 调整图像尺寸
    target_size = (256, 256)  # 示例尺寸
    normalized_input_frames = resize_image(normalized_input_frames, target_size)
    normalized_gt_frames = resize_image(normalized_gt_frames, target_size)

    print("\t\t\t~~~In data generation: input shape: {}, gt shape: {}".format(normalized_input_frames.shape, normalized_gt_frames.shape))

    return normalized_input_frames, gt_frames

登录后复制

修改模型结构： 如果无法修改图像尺寸，可以考虑修改模型结构，例如调整下采样和上采样的层数，或者使用 padding 方式来处理尺寸不匹配的问题。但这通常需要对模型结构有深入的理解，并且可能会影响模型的性能。

使用 tf.image.pad_to_bounding_box 进行填充： 如果调整尺寸会损失重要信息，可以考虑使用填充的方式。tf.image.pad_to_bounding_box 可以将图像填充到指定大小，并在之后裁剪到原始大小。

import tensorflow as tf

def pad_and_crop(image, target_height, target_width):
    """Pad the image to ensure dimensions are multiples of 16, then crop back."""
    height = tf.shape(image)[0]
    width = tf.shape(image)[1]

    # Calculate padding needed
    height_padding = tf.maximum(0, (target_height - height))
    width_padding = tf.maximum(0, (target_width - width))

    # Calculate padding before and after
    height_pad_before = height_padding // 2
    height_pad_after = height_padding - height_pad_before
    width_pad_before = width_padding // 2
    width_pad_after = width_padding - width_pad_before

    # Pad the image
    padded_image = tf.pad(image, [[height_pad_before, height_pad_after], [width_pad_before, width_pad_after], [0, 0]], mode='REFLECT')

    # Crop back to original size
    cropped_image = padded_image[height_pad_before:height_pad_before + height, width_pad_before:width_pad_before + width]

    return cropped_image

# 在数据生成器中使用填充和裁剪函数
def __data_generation(self, subset_pair_id_list):
    'subdivides each image into an array of multiple images'
    # Initialization
    normalized_input_frames, normalized_gt_frames = get_normalized_input_and_gt_dataframes(
        channel = self.channel, 
        pairs_for_training = self.pairs,
        pair_ids=subset_pair_id_list,
        input_normalizing_function_name = self.input_normalizing_function_name,
        prediction_size=self.prediction_size
    )

    # 填充和裁剪图像
    target_height = 256  # 示例高度，需是16的倍数
    target_width = 256   # 示例宽度，需是16的倍数
    normalized_input_frames = pad_and_crop(normalized_input_frames, target_height, target_width)
    normalized_gt_frames = pad_and_crop(normalized_gt_frames, target_height, target_width)

    print("\t\t\t~~~In data generation: input shape: {}, gt shape: {}".format(normalized_input_frames.shape, normalized_gt_frames.shape))

    return normalized_input_frames, gt_frames

登录后复制

注意事项