Adversarially-Sandwiched VAEs for Inverse Problems
Harshit Gupta, EPFL STI LIB
Harshit Gupta, EPFL STI LIB
Meeting • 02 October 2018
AbstractOne of the main challenges of inverse problems is modelling (or learning) the data prior. Recently, neural-network-based generative modelling have shown impressive ability to model (or estimate) this data distribution. These methods use latent-variable-based parametrisation of the estimated distribution which is useful for real-world signals. In this talk we will first briefly discuss the two pillars of generative modelling: Generative Adversarial Netwoks (GANs) and Variational Autoencoder (VAE). In GANs, a generator is used to generate samples from latent variables and a discriminator is trained to differentiate these generated fake samples from the real samples. Meanwhile, the generator is trained to produce real looking samples so as to fool the discriminator. This method is equivalent to minimising the Jensen-Shanon Divergence (JSD) between the actual and the estimated distribution. However, GANs have many problems: they are hard to train, they lack encoding architecture to produce latent representation of the data, and more importantly they do not explicitly give the estimated likelihood of the data. VAEs are encoder-decoder networks which are much easier to train and which explicitly estimate a lower bound on the likelihood. They are trained by maximising the lower bound of the estimated log-likelihood of the data. This is equivalent to minimising the Kullback-Leibler Divergence (KLD) between the actual and the estimated distribution. However, KLD, unlike JSD, is an unsymmetric type of divergence and may result in inferior results. Finally, I will propose a new scheme to train the VAEs, in which an upper and a lower bound of the log-likelihood are used to sandwich it. Then for a given sample from the decoder, a discriminator (or adversary) is used to decide if the estimated likelihood of the sample is higher or lower than the actual likelihood. In case of former, an upper bound of the likelihood is minimised and in case of latter a lower bound is maximized. We show that this scheme, like GANs, is equivalent to minimizing an upper bound on the JSD between the actual and the estimated distribution and reaches the global minima iff both are equal.