INADE个人讲解和理解-人工智能-PHP中文网

INADE基于SPADE改进，是一种条件归一化方法，旨在提升语义图像生成多样性。它结合语义分割（提供普遍性）与实例分割（提供特殊性），通过统一噪声采样协调各归一化层，避免不一致。编码器生成相关噪声辅助训练，解决了现有方法中同语义实例风格趋同的问题，实现语义级和实例级多样性。

☞☞☞AI 智能聊天, 问答助手, AI 智能搜索, 免费无限量使用 DeepSeek R1 模型☜☜☜

inade个人讲解和理解 - php中文网

INADE讲解（主要偏向于个人理解及其看法）

1. 论文题目：Diverse Semantic Image Synthesis via Probability Distribution Modeling

2. 原项目地址：https://github.com/tzt101/INADE

首先这篇论文也是主要基于一个SPADE的论文，这里还是强推FutureSI项目的spade讲解，但是呢这个SPADE被它改成了INADE，同样是一个conditional normal，之所以这么改，原作者目的是提高一个图像生成的多样性。后面会具体讲解

这里描述一下它对于语义生成任务的看法：

条件归一化，无论是空间自适应[30]还是类自适应[37]，都已被证明有助于语义图像的合成。语义条件调制能够在很大程度上防止重复规范化引起的语义信息的“冲刷”效应。

这里就是说SPADE很好用

然而，考虑到标准化仅以语义图为条件，并且仅使用全局随机性（global randomness）来多样化图像样式[30]，因此要实现具有语义级甚至实例级多样性的有希望的生成结果仍然存在挑战。

这里认为对于模型只输入一个语义分割信息，明显不够，靠只输入这玩意要求模型有太好的效果多样性，几乎不可能

语义级的多样性是由[51]通过Group conv 来实现的，但是使用这种卷积通过实例映射切断了将其扩展到实例级多样性的可能性。

这里说明GroupNet是不行的，这里这个改进方案GroupDNet我在我的项目论文解读一篇关于语义生成论文（要求控制单独语义生成）提到，这是另一篇论文啊。

最近在实例感知合成[41,6]方面的研究主要集中在更好的对象边界上，而不是每个单独实例的多样性和真实性。由于缺乏适当的实例条件作用，现有的方法倾向于将具有相同语义标签的实例收敛成相似的风格，这严重损害了生成的多样性。

与第二点有点呼应的味道

这个时候它讲了一下这个INADE添加实例分割的想法来源：

实例级多样性的关键是确定地决定特定语义标签的一般特征的统一语义级分布和引入语义分布模型所允许覆盖的多样性的实例级随机性的适当组合。

这里我认为就是普遍性与特殊性的合理结合，语义分割提供普遍性，实例分割提供特殊性。就比如两只比翼双飞的鸟，都是鸟（普遍性），但是2个不同个体（特殊性）。如果没有实例分割输入，那就退化输入语义分割。

考虑到生成网络包含多个条件归一化层，一个统一的采样解决方案仍然是协调所有这些层的关键。一种直接的方法，即对每个归一化层进行独立的随机抽样，可能会引入不一致性，并导致多样性被严重中和。因此，在本文中，我们提出了一种实例自适应调制采样方法，该方法可以在channel不相等的多个归一化层上实现一致的实例采样。

说白了就是在INADE输入一个贯穿deocder的noise，强调了一下这个noise这个融合的idea很牛逼，需要吹一下

Cursor
一个新的IDE，使用AI来帮助您重构、理解、调试和编写代码。

847

查看详情

生成器架构

INADE个人讲解和理解 - php中文网

首先这个INADE它为了提高这个多样性主要想法是什么呢，它比起SPADE多了一个实例分割的输入，见这张图。 INADE个人讲解和理解 - php中文网

m就是语义分割输入，p是实例分割输入，这应该是一张照片上的两个鸟，语义分割把它们用同一标识，但是实例分割进行了区分。

INADE数学公式表示

INADE个人讲解和理解 - php中文网

看不懂没关系，直接看代码就行,毕竟论文会讲故事，代码不会讲故事

这是INADE，实际pytorch代码：

class ILADE(nn.Module):
    def __init__(self, config_text, norm_nc, label_nc, noise_nc):
        super().__init__()
        self.norm_nc = norm_nc        assert config_text.startswith('spade')
        parsed = re.search('spade(\D+)(\d)x\d', config_text)
        param_free_norm_type = str(parsed.group(1))        if param_free_norm_type == 'instance':
            self.param_free_norm = nn.InstanceNorm2d(norm_nc, affine=False)        elif param_free_norm_type == 'syncbatch':
            self.param_free_norm = SynchronizedBatchNorm2d(norm_nc, affine=False)        elif param_free_norm_type == 'batch':
            self.param_free_norm = nn.BatchNorm2d(norm_nc, affine=False)        else:            raise ValueError('%s is not a recognized param-free norm type in SPADE'
                             % param_free_norm_type)        # wights and bias for each class
        self.weight = nn.Parameter(torch.Tensor(label_nc, norm_nc,2))
        self.bias = nn.Parameter(torch.Tensor(label_nc, norm_nc,2))
        self.reset_parameters()
        self.fc_noise = nn.Linear(noise_nc, norm_nc)    def reset_parameters(self):
        nn.init.uniform_(self.weight)
        nn.init.zeros_(self.bias)    def forward(self, x, segmap, input_instances=None, noise=None):
        # Part 1. generate parameter-free normalized activations
        # noise is [B, inst_nc, 2, noise_nc], 2 is for scale and bias
        normalized = self.param_free_norm(x)        # Part 2. scale the segmentation mask and instance mask
        segmap = F.interpolate(segmap, size=x.size()[2:], mode='nearest')
        input_instances = F.interpolate(input_instances, size=x.size()[2:], mode='nearest')        # the segmap is concate with instance map
        inst_map = torch.unsqueeze(segmap[:,-1,:,:],1)
        segmap = segmap[:,:-1,:,:]        # Part 3. class affine with noise
        noise_size = noise.size() # [B,inst_nc,2,noise_nc]
        noise_reshape = noise.view(-1, noise_size[-1]) # reshape to [B*inst_nc*2,noise_nc]
        noise_fc = self.fc_noise(noise_reshape) # [B*inst_nc*2, norm_nc]
        noise_fc = noise_fc.view(noise_size[0],noise_size[1],noise_size[2],-1)        # create weigthed instance noise for scale
        class_weight = torch.einsum('ic,nihw->nchw', self.weight[...,0], segmap)
        class_bias = torch.einsum('ic,nihw->nchw', self.bias[...,0], segmap)        # init_noise = torch.randn([x.size()[0], input_instances.size()[1], self.norm_nc], device=x.get_device())
        instance_noise = torch.einsum('nic,nihw->nchw', noise_fc[:,:,0,:], input_instances)
        scale_instance_noise = class_weight*instance_noise+class_bias        # create weighted instance noise for bias
        class_weight = torch.einsum('ic,nihw->nchw', self.weight[..., 1], segmap)
        class_bias = torch.einsum('ic,nihw->nchw', self.bias[..., 1], segmap)        # init_noise = torch.randn([x.size()[0], input_instances.size()[1], self.norm_nc], device=x.get_device())
        instance_noise = torch.einsum('nic,nihw->nchw', noise_fc[:,:,1,:], input_instances)
        bias_instance_noise = class_weight * instance_noise + class_bias

        out = scale_instance_noise * normalized + bias_instance_noise        return out

登录后复制

下面是我写的paddle版本

In [1]

import paddleimport paddle.nn as nnimport paddle.nn.functional as F'''
在这里有一个einsum相信大家也不用一般，至少我不用啊，哈哈
在这里的用法，我举例一下
instance_noise = paddle.einsum('nic,nihw->nchw', noise_fc[:,:,0,:], input_instances)#[B,instance_nc,norm_nc] [B,instance_nc,h,w] ->[B,norm_nc,h,w]
 noise_fc[:,:,0,:]，input_instances这两个tensor.shape分别为[B,instance_nc,norm_nc] [B,instance_nc,h,w] 
 经过了上述的这里einsum操作，就得到shape为[B,norm_nc,h,w]的tensor，那这里很明显就是相当于在nn.linear放在第1维进行的那种感觉无bias，矩阵乘法
'''class INADE(nn.Layer):
    def __init__(self, norm_nc = 64, label_nc = 46, noise_nc = 108):
        super().__init__()

        self.param_free_norm = nn.InstanceNorm2D(norm_nc,weight_attr=False, bias_attr=False)        # wights and bias for each class
        weight = self.create_parameter([label_nc,norm_nc,2], default_initializer = paddle.nn.initializer.Uniform())#随机均匀分布初始化函数
        self.add_parameter("weight", weight)

        bias = self.create_parameter([label_nc,norm_nc,2],default_initializer = paddle.nn.initializer.Constant())
        self.add_parameter("bias", bias)
        self.fc_noise = nn.Linear(noise_nc, norm_nc)    def forward(self, x, segmap, input_instances=None, noise=None):
        # Part 1. generate parameter-free normalized activations
        # noise is [B, inst_nc, 2, noise_nc], 2 is for scale and bias
        normalized = self.param_free_norm(x)        # Part 2. scale the segmentation mask and instance mask
        segmap = F.interpolate(segmap, size=x.shape[2:], mode='nearest')
        input_instances = F.interpolate(input_instances, size=x.shape[2:], mode='nearest')        # the segmap is concate with instance map
        inst_map = paddle.unsqueeze(segmap[:,-1,:,:],1)# 后面就不用了
        segmap = segmap[:,:-1,:,:]        # Part 3. class affine with noise
        noise_size = noise.shape # [B,inst_nc,2,noise_nc]
        noise_reshape = noise.reshape([-1, noise_size[-1]]) # reshape to [B*inst_nc*2,noise_nc]
        noise_fc = self.fc_noise(noise_reshape) # [B*inst_nc*2, norm_nc]
        noise_fc = noise_fc.reshape([noise_size[0],noise_size[1],noise_size[2],-1])#[B,instance_nc,2,norm_nc]
        print("noise_fc",noise_fc.shape)        # create weigthed instance noise for scale
        class_weight = paddle.einsum('ic,nihw->nchw', self.weight[...,0], segmap)#[label_nc, norm_nc] [b,label_nc,h,w] ->#[B,norm_nc,h,w]
        print("class_weight",class_weight.shape)
        class_bias = paddle.einsum('ic,nihw->nchw', self.bias[...,0], segmap)#[label_nc, norm_nc] [b,label_nc,h,w] ->#[B,norm_nc,h,w]
        # init_noise = torch.randn([x.size()[0], input_instances.size()[1], self.norm_nc], device=x.get_device())
        instance_noise = paddle.einsum('nic,nihw->nchw', noise_fc[:,:,0,:], input_instances)#[B,instance_nc,norm_nc] [B,instance_nc,h,w] ->[B,norm_nc,h,w]
        scale_instance_noise = class_weight*instance_noise+class_bias        # create weighted instance noise for bias
        class_weight = paddle.einsum('ic,nihw->nchw', self.weight[..., 1], segmap)
        class_bias = paddle.einsum('ic,nihw->nchw', self.bias[..., 1], segmap)        # init_noise = torch.randn([x.size()[0], input_instances.size()[1], self.norm_nc], device=x.get_device())
        instance_noise = paddle.einsum('nic,nihw->nchw', noise_fc[:,:,1,:], input_instances)#[B,instance_nc,norm_nc]  #[B,instance_nc,h,w]
        bias_instance_noise = class_weight * instance_noise + class_bias

        out = scale_instance_noise * normalized + bias_instance_noise        return out
x = paddle.randn([3,64,50,50])
segmap = paddle.randn([3,47,66,66])
inst = paddle.randn([3,72,50,50])
noise = paddle.randn([3,72,2,108])
INADE()(x,segmap,inst,noise).shape

登录后复制

W0222 11:03:38.883304   183 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0222 11:03:38.890153   183 device_context.cc:465] device: 0, cuDNN Version: 7.6.

登录后复制

noise_fc [3, 72, 2, 64]
class_weight [3, 64, 50, 50]

登录后复制

[3, 64, 50, 50]

登录后复制

另外这个noise上面也说了，整个generator都是使用同一个noise.

这里还有一个细节就是，这个noise要和这个输入的图片信息有关系，相当于这个z是包含信息的。这样方便训练，嗯。这里代码实现很复杂，实在有兴趣自己看原项目，因为这里我不能用到我的项目上我就没有太关心。加油.

开个玩笑啊，其实这里这个noise的设计具体落实到代码里面其实是很重要的，因为其中第一点是可以通过noise来进行模型生成多样性的增加和控制，然后另外就是这个noise开始训练的时候必须要和原图信息有关联，这样好训练，不然如果这个Noise直接初始化就很难训练。

接下来我就直接从代码的角度去分析实际pytorch代码中的主体实践。

一些用到的API介绍

nn.Unfold

INADE个人讲解和理解 - php中文网

import paddleimport paddle.nn as nn

x = paddle.randn((100,3,224,224)) 
unfold = nn.Unfold(kernel_sizes=[3, 3])
result = unfold(x) #result.shape = [100,3*3*3,(224-3+1)*(224-3+1)]print(result.shape)

登录后复制

paddle.clip

INADE个人讲解和理解 - php中文网

In [1]

'''
该代码块是重新构造一个encoder里面用到的卷积封装，这个encoder就是为了训练时候构造这个noise用的
'''import paddleimport paddle.nn.functional as Fimport paddle.nn as nnclass InstanceAwareConv2d(nn.Layer):
    def __init__(self, fin = 64, fout = 128, kw = 3, stride=1, padding=1):
        super().__init__()
        self.kw = kw
        self.stride = stride
        self.padding = padding
        self.fin = fin
        self.fout = fout
        self.unfold = nn.Unfold(kw, strides = stride, paddings = padding)

        weight = self.create_parameter([fout, fin, kw, kw], default_initializer = paddle.nn.initializer.Uniform())#随机均匀分布初始化函数
        self.add_parameter("weight", weight)

        bias = self.create_parameter([fout],default_initializer = paddle.nn.initializer.Constant())
        self.add_parameter("bias", bias)    def forward(self, x, instances, check=False):
        N,C,H,W = x.shape        # cal the binary mask from instance map
        instances = F.interpolate(instances, x.shape[2:], mode='nearest') # [n,1,h,w]
        inst_unf = self.unfold(instances)        # print("inst_unf",inst_unf.shape)
        # substract the center pixel
        center = paddle.unsqueeze(inst_unf[:, self.kw * self.kw // 2, :], axis=1)#因为instance的channel为1,所以这个channel的center 为 self.kw * self.kw // 2
        # print("center",center.shape)
        mask_unf = inst_unf - center        # clip the absolute value to 0~1
        mask_unf = paddle.abs(mask_unf)
        mask_unf = paddle.clip(mask_unf, 0, 1)
        mask_unf = 1.0 - mask_unf # [n,k*k,L]
        # print("mask_unf",mask_unf.shape)#mask_unf [4, 9, 65536]
        # # multiply mask_unf and x
        x_unf = self.unfold(x)  # [n,c*k*k,L]
        # print("x_unf",x_unf.shape) #x_unf [4, 64*9, 65536]
        x_unf = x_unf.reshape([N, C, -1, x_unf.shape[-1]]) # [n,c,k*k,L]
        # print("x_unf",x_unf.shape) #[4, 64, 9, 65536]
        mask = paddle.unsqueeze(mask_unf,1) # [n,1,k*k,L]
        mask_x = mask * x_unf # [n,c,k*k,L]
        mask_x = mask_x.reshape([N,-1,mask_x.shape[-1]]) # [n,c*k*k,L]
        # # conv operation
        weight = self.weight.reshape([self.fout,-1]) # [fout, c*k*k]
        out = paddle.einsum('cm,nml->ncl', weight, mask_x)        # print("out",out.shape)#[4, 128, 65536]
        # # x_unf = torch.unsqueeze(x_unf, 1)  # [n,1,c*k*k,L]
        # # out = torch.mul(masked_weight, x_unf).sum(dim=2, keepdim=False) # [n,fout,L]
        bias = paddle.unsqueeze(paddle.unsqueeze(self.bias,0),-1) # [1,fout,1]
        out = out + bias        # print("out",out.shape)#[4, 128, 65536]
        out = out.reshape([N,self.fout,H//self.stride,W//self.stride])        # # print('weight:',self.weight[0,0,...])
        # # print('bias:',self.bias)

        if check:
            out2 = nn.functional.conv2d(x, self.weight, self.bias, stride=self.stride, padding=self.padding)            print((out-out2).abs().max())        return out

x = paddle.randn([4,64,256,256])
y = paddle.randn([4,1,256,256])
InstanceAwareConv2d()(x,y).shape

登录后复制

W0222 16:13:26.634137   145 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0222 16:13:26.638576   145 device_context.cc:465] device: 0, cuDNN Version: 7.6.

登录后复制

[4, 128, 256, 256]

登录后复制

In [ ]

# Encoder构造import paddle 
import paddle.nn as nnimport numpy as npclass Encoder_OPT:
    def __init__(self):
        super().__init__()
        self.ngf = 64
        self.semantic_nc = 46
        self.no_instance = True
        self.noise_nc = 108opt = Encoder_OPT()class instanceAdaptiveEncoder(nn.Layer):
    def __init__(self, opt):
        super().__init__()
        self.opt = opt
        kw = 3
        pw = int(np.ceil((kw - 1.0) / 2))
        ndf = opt.ngf
        conv_layer = InstanceAwareConv2d

        self.layer1 = conv_layer(3, ndf, kw, stride=2, padding=pw)
        self.norm1 = nn.InstanceNorm2D(ndf)
        self.layer2 = conv_layer(ndf * 1, ndf * 2, kw, stride=2, padding=pw)
        self.norm2 = nn.InstanceNorm2D(ndf * 2)
        self.layer3 = conv_layer(ndf * 2, ndf * 4, kw, stride=2, padding=pw)
        self.norm3 = nn.InstanceNorm2D(ndf * 4)
        self.layer4 = conv_layer(ndf * 4, ndf * 8, kw, stride=2, padding=pw)
        self.norm4 = nn.InstanceNorm2D(ndf * 8)
        
        self.middle = conv_layer(ndf * 8, ndf * 4, kw, stride=1, padding=pw)
        self.norm_middle = nn.InstanceNorm2D(ndf * 4)
        self.up1 = conv_layer(ndf * 8, ndf * 2, kw, stride=1, padding=pw)
        self.norm_up1 = nn.InstanceNorm2D(ndf * 2)
        self.up2 = conv_layer(ndf * 4, ndf * 1, kw, stride=1, padding=pw)
        self.norm_up2 = nn.InstanceNorm2D(ndf)
        self.up3 = conv_layer(ndf * 2, ndf, kw, stride=1, padding=pw)
        self.norm_up3 = nn.InstanceNorm2D(ndf)

        self.up = nn.Upsample(scale_factor=2, mode='bilinear')
        self.class_nc = opt.semantic_nc if opt.no_instance else opt.semantic_nc-1

        self.scale_conv_mu = conv_layer(ndf, opt.noise_nc, kw, stride=1, padding=pw)
        self.scale_conv_var = conv_layer(ndf, opt.noise_nc, kw, stride=1, padding=pw)
        self.bias_conv_mu = conv_layer(ndf, opt.noise_nc, kw, stride=1, padding=pw)
        self.bias_conv_var = conv_layer(ndf, opt.noise_nc, kw, stride=1, padding=pw)

        self.actvn = nn.LeakyReLU(0.2, False)
        self.opt = opt    def instAvgPooling(self, x, instances):
        inst_num = instances.shape[1]        for i in range(inst_num):
            inst_mask = paddle.unsqueeze(instances[:,i,:,:], 1) # [n,1,h,w]
            pixel_num = paddle.sum(paddle.sum(inst_mask, axis=2, keepdim = True), axis=3, keepdim=True)
            pixel_num[pixel_num == 0] = 1 #防止某一个instance的label 某一行或某一列没有，防止后续步骤中作为除数报错
            feat = x * inst_mask#只需要该label的x信息，inst_mask为0或1.
            feat = paddle.sum(paddle.sum(feat, axis =2, keepdim=True), axis=3, keepdim=True) / pixel_num            if i == 0:
                out = paddle.unsqueeze(feat[:,:,0,0],1) # [n,1,c]
            else:
                out = paddle.concat([out,paddle.unsqueeze(feat[:,:,0,0],1)],1)            # inst_pool_feats.append(feat[:,:,0,0]) # [n, 64]
        return out #shape = [n,inst_num,c]

    def forward(self, x, input_instances):
        # instances [n,1,h,w], input_instances [n,inst_nc,h,w] 注意一下这个shape
        instances = paddle.argmax(input_instances, 1, keepdim=True).astype("float32")        print("instance",instances.shape)

        x1 = self.actvn(self.norm1(self.layer1(x,instances)))
        x2 = self.actvn(self.norm2(self.layer2(x1,instances)))
        x3 = self.actvn(self.norm3(self.layer3(x2,instances)))
        x4 = self.actvn(self.norm4(self.layer4(x3,instances)))        print("x1",x1.shape,"x2",x2.shape,"x3",x3.shape,"x4",x4.shape) #x1 [4, 64, 128, 128] x2 [4, 128, 64, 64] x3 [4, 256, 32, 32] x4 [4, 512, 16, 16]
        y = self.up(self.actvn(self.norm_middle(self.middle(x4,instances))))
        y1 = self.up(self.actvn(self.norm_up1(self.up1(paddle.concat([y,x3],1),instances))))
        y2 = self.up(self.actvn(self.norm_up2(self.up2(paddle.concat([y1, x2], 1),instances))))
        y3 = self.up(self.actvn(self.norm_up3(self.up3(paddle.concat([y2, x1], 1),instances))))        print("y",y.shape,"y1",y1.shape,"y2",y2.shape,"y3",y3.shape)# y [4, 256, 32, 32] y1 [4, 128, 64, 64] y2 [4, 64, 128, 128] y3 [4, 64, 256, 256]
        scale_mu = self.scale_conv_mu(y3,instances)
        scale_var = self.scale_conv_var(y3,instances)
        bias_mu = self.bias_conv_mu(y3,instances)
        bias_var = self.bias_conv_var(y3,instances)

        scale_mus = self.instAvgPooling(scale_mu,input_instances)
        scale_vars = self.instAvgPooling(scale_var,input_instances)
        bias_mus = self.instAvgPooling(bias_mu,input_instances)
        bias_vars = self.instAvgPooling(bias_var,input_instances)        return scale_mus, scale_vars, bias_mus, bias_vars #shape都为[batch_size,instance_nc,noise_num]encoder = instanceAdaptiveEncoder(opt)
x = paddle.randn([4,3,256,256])
input_instances = paddle.randn([4,72,256,256])
encoder(x,input_instances)

登录后复制

In [5]

class Encoder_OPT:
    def __init__(self):
        super().__init__()
        self.ngf = 64
        self.semantic_nc = 2
        self.no_instance = True
        self.noise_nc = 108opt = Encoder_OPT()def instance_encode_z(real_image, input_instances):
    s_mus, s_logvars, b_mus, b_logvars = instanceAdaptiveEncoder(opt)(real_image,input_instances)
    z = [s_mus,paddle.exp(0.5 * s_logvars),b_mus,paddle.exp(0.5 * b_logvars)]    return z, s_mus, s_logvars, b_mus, b_logvars

instance_nc = 2real_image = paddle.randn([4,3,256,256])
input_instances = paddle.randn([4,instance_nc,256,256])
z, s_mus, s_logvars, b_mus, b_logvars = instance_encode_z(real_image,input_instances)#s_mus, s_logvars, b_mus, b_logvars这四个return是为了计算KLDLOSS

登录后复制

instance [4, 1, 256, 256]
x1 [4, 64, 128, 128] x2 [4, 128, 64, 64] x3 [4, 256, 32, 32] x4 [4, 512, 16, 16]
y [4, 256, 32, 32] y1 [4, 128, 64, 64] y2 [4, 64, 128, 128] y3 [4, 64, 256, 256]

登录后复制

In [7]

# KLD_loss = (KLDLoss(s_mus, s_logvars)+KLDLoss(b_mus, b_logvars)) * .opt.lambda_kld / 2instance_nc = 2noise_nc = 108noise = paddle.randn([x.shape[0], instance_nc, 2,noise_nc])def pre_process_noise( noise, z):
    '''
    noise: [n,inst_nc,2,noise_nc], z_i [n,inst_nc,noise_nc]
    z: [s_mus,torch.exp(0.5 * s_logvars),b_mus,torch.exp(0.5 * b_logvars)]
    '''
    s_noise = paddle.unsqueeze(noise[:,:,0,:].multiply(z[1])+z[0],2)
    b_noise = paddle.unsqueeze(noise[:,:,1,:].multiply(z[3])+z[2],2)    return paddle.concat([s_noise,b_noise],2)

noise = pre_process_noise(noise,z)#这个时候得到的noise才是贯穿decoder，其中INADE的一个输入的noiseprint(noise.shape) #[4,instance_nc,2,noise]

登录后复制

[4, 2, 2, 108]

登录后复制

那么train的时候这个noise可以得到了，但是decoder是基于一个很小的特征图逐渐上采样的，那么这个特征图作者采取得到的方法是直接randn加linear再reshape，初始化完全是没有任何信息的。这样的方法我看来是不错的，因为实际测试的时候decoder输入的就是标准正太分布，这样就防止了训练和测试的时候输入不一致的问题，防止模型依靠这个初始特征图的结构信息.（这个特征图的处理我正在实验，好像不太好训练）

batch_size = 4z = paddle.randn(batch_size, z_dim,
                    dtype=torch.float32, device=input.get_device())
x = nn.Linear(opt.z_dim, 16 * 64 * sw * sh)(z)
x = x.reshape(-1, 16 * 64, self.sh, self.sw)

登录后复制

以上就是INADE个人讲解和理解的详细内容，更多请关注php中文网其它相关文章！