C/C++教程

Pix2Pix GAN(CVPR. 2017)

本文主要是介绍Pix2Pix GAN(CVPR. 2017),对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
image-20210627163327806

1. Motivation

Image-to-Image translation的定义

  • We define automatic image-to-image translation as the task of translating one possible representation of a scene into another.
  • Our goal in this paper is to develop a common framework for all these problems.

需要认真设计loss函数,因为如果只用欧氏距离的方法,容易造成生成的图片blurry results.

  • This is because Euclidean distance is minimized by averaging all plausible outputs, which causes blurring.
  • Earlier papers have focused on specific applications, and it has remained unclear how effective image-conditional GANs can be as a general-purpose solution for image-to- image translation.

2. Contribution

  • 本文的第一个贡献在于CGAN在多任务上可以统一,有不错的效果。
  • Our primary contribution is to demonstrate that on a wide variety of problems, conditional GANs produce reasonable results.
  • 本文的第二个贡献在于提出了一个简单的框架。
  • Our second contribution is to present a simple framework sufficient to achieve good results, and to analyze the effects of several important architectural choices.

与之前的工作不同的是,Pix2Pix GAN在G中使用U-Net,并且在D中使用PatchGAN Classifier。

  • Unlike past work, for our generator we use a “U-Net”-based architecture.
  • And for our discriminator we use a convo- lutional “PatchGAN” classifier, which only penalizes struc- ture at the scale of image patches.

3. Method

image-20210628193929793

3.1 Objective

CGAN可以表示为公式1,其中x为condition:

image-20210628194244007

最后的目标函数表示为:

image-20210628194949001

对于noise z的设定,作者采取了dropout的方式:

  • Instead, for our final models, we provide noise only in the form of dropout, applied on several layers of our generator at both training and test time.
image-20210628194928724

3.2 Network architecture

对于输入和输出图片来说,可以额理解为在surface appearance不同,但是具有相同的underlying structure渲染。

  • In addition, for the problems we consider, the input and output differ in surface appearance, but both are renderings of the same underlying structure.
  • Therefore, structure in the input is roughly aligned with structure in the output.

对于generation的制作来说,作者参考U-Net 使用了skip connection结构:

  • To give the generator a means to circumvent the bottleneck for information like this, we add skip connections, fol- lowing the general shape of a “U-Net”.

3.3 Markovian discriminator (PatchGAN)

作者指出虽然L1 L2loss会使得生产的图片具有blurry模糊性质,无法捕获高频特征,但是可以精确的捕获低频的特征。这样就只需要GAN Discriminator建模高频的结构,使用L1 来建模低频。

  • Although these losses fail to encourage high-frequency crispness, in many cases they nonetheless accu- rately capture the low frequencies.

那么对于制定一个建模高频的结构,在局部image patch中限制attention是有效的。因此作者制定了一个PatcchGAN,将图片分为NxN个patch,对于patches进行penalize,判断每一个patch是real还是fake。

4. Experiment

4.1 Dataset

image-20210628212414111

4.2 Evaluation metrics

  • We employ two tactics. First, we run “real vs. fake” perceptual studies on Amazon Mechanical Turk (AMT).

  • Second, we measure whether or not our synthesized cityscapes are realistic enough that off-the-shelf recognition system can recognize the objects in them.

  • AMT perceptual studies

  • “FCN-score”

4.3. Analysis of the generator architecture A

image-20210628213539180 image-20210628213533548

4.4 Analysis of the objective function

image-20210628212654986

4.5 FromPixelGANs to PatchGANs to ImageGANs

image-20210628214924897 image-20210628223522272

4.6. Semantic segmentation

image-20210628220151258 image-20210628223803959
这篇关于Pix2Pix GAN(CVPR. 2017)的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!