Graduate Students Presentations | SIPL Annual Event 2021

What's in the Image? Explorable Decoding of Compressed Images (Oral @ CVPR 2021)

Yuval Bahat, Tomer Michaeli

The ever-growing amounts of visual contents captured on a daily basis necessitate the use of lossy compression methods in order to save storage space and transmission bandwidth. While extensive research efforts are devoted to improving compression techniques, every method inevitably discards information. Especially at low bit rates, this information often corresponds to semantically meaningful visual cues, so that decompression involves significant ambiguity. In spite of this fact, existing decompression algorithms typically produce only a single output, and do not allow the viewer to explore the set of images that map to the given compressed code. In this work we propose the first image decompression method to facilitate user-exploration of the diverse set of natural images that could have given rise to the compressed input code, thus granting users the ability to determine what could and what could not have been there in the original scene. Specifically, we develop a novel deep-network based decoder architecture for the ubiquitous JPEG standard, which allows traversing the set of decompressed images that are consistent with the compressed JPEG file. To allow for simple user interaction, we develop a graphical user interface comprising several intuitive exploration tools, including an automatic tool for examining specific solutions of interest. We exemplify our framework on graphical, medical and forensic use cases, demonstrating its wide range of potential applications.

Spatially-Adaptive Pixelwise Networks for Fast Image Translation (CVPR 2021)

Tamar Rott Shaham, Michaël Gharbi, Richard Zhang, Eli Shechtman, Tomer Michaeli

We introduce a new generator architecture, aimed at fast and efficient high-resolution image-to-image translation. We design the generator to be an extremely lightweight function of the full-resolution image. In fact, we use pixel-wise networks; that is, each pixel is processed independently of others, through a composition of simple affine transformations and nonlinearities. We take three important steps to equip such a seemingly simple function with adequate expressivity. First, the parameters of the pixel-wise networks are spatially varying, so they can represent a broader function class than simple 1x1 convolutions. Second, these parameters are predicted by a fast convolutional network that processes an aggressively low-resolution representation of the input. Third, we augment the input image by concatenating a sinusoidal encoding of spatial coordinates, which provides an effective inductive bias for generating realistic novel high-frequency image content. As a result, our model is up to 18x faster than state-of-the-art baselines. We achieve this speedup while generating comparable visual quality across different image resolutions and translation domains.

Watch the video

Sparsity Aware Normalization for GANs (AAAI 2021)

Idan Kligvasser, Tomer Michaeli

Watch the video

Generative adversarial networks (GANs) are known to benefit from regularization or normalization of their critic (discriminator) network during training. In this work, we analyze the popular spectral normalization scheme, find a significant drawback and introduce sparsity aware normalization (SAN), a new alternative approach for stabilizing GAN training. As opposed to other normalization methods, our approach explicitly accounts for the sparse nature of the feature maps in convolutional networks with ReLU activations. We illustrate the effectiveness of our method through extensive experiments with a variety of network architectures. As we show, sparsity is particularly dominant in critics used for image-to-image translation settings. In these cases our approach improves upon existing methods, in less training epochs and with smaller capacity networks, while requiring practically no computational overhead.

Contrastive Divergence Learning is a Time Reversal Adversarial Game (Spotlight @ ICLR 2021)

Omer Yair, Tomer Michaeli

Recently there has been an increasing interest in energy-based mode as a tool for modeling the underlying probabilities of given data sets. Although these models hold great potential, the task of training such models is very challenging and many different methods have been proposed for the task. One of the classical methods for training such models is the contrastive divergence (CD) algorithm. This algorithm has been successfully used over the years in a wide range of domains, however, despite its empirical success there still remain open questions regarding the validity of this algorithm. The primary source of difficulty is an unjustified approximation used in the derivation of the algorithm. In this talk, we will present an alternative derivation that does not require any approximation and view CD as an adversarial learning procedure.

Watch the video

Learning Optimal Wavefront Shaping for Multi-channel Imaging (ICCP 2021)

Elias Nehme, Boris Ferdman, Lucien E. Weiss, Tal Naor, Daniel Freedman, Tomer Michaeli, Yoav Shechtman

Watch the video

Fast acquisition of depth information is crucial for accurate 3D tracking of moving objects. Snapshot depth sensing can be achieved by wavefront coding, in which the point-spread function (PSF) is engineered to vary distinctively with scene depth by altering the detection optics. In low-light applications, such as 3D localization microscopy, the prevailing approach is to condense signal photons into a single imaging channel with phase-only wavefront modulation to achieve a high pixel-wise signal to noise ratio. Here we show that this paradigm is generally suboptimal and can be significantly improved upon by employing multi-channel wavefront coding, even in low-light applications. We demonstrate our multi-channel optimization scheme on 3D localization microscopy in densely labelled live cells where detectability is limited by overlap of modulated PSFs. At extreme densities, we show that a split-signal system, with end-to-end learned phase masks, doubles the detection rate and reaches improved precision compared to the current state-of-the-art, single-channel design. We implement our method using a bifurcated optical system, experimentally validating our approach by snapshot volumetric imaging and 3D tracking of fluorescently labelled subcellular elements in dense environments.

Symmetric Positive Semi-definite Riemannian Geometry & Applications

Almog Lahav, Or Yair, Ronen Talmon

Symmetric positive semi-definite (SPSD) matrices are common data features in contemporary data analysis. Notable examples for such features are (low rank) covariance matrices, various kernel matrices, and graph Laplacians.
We present new results on the Riemannian geometry of SPSD matrices, leading to a convenient framework for developing data analysis methods that rely on these useful data features. In addition, we propose an algorithm for Domain Adaptation and demonstrate its performance in applications to real data.

Watch the video

Unsupervised Acoustic Condition Monitoring with Riemannian Geometry (MLSP 2020)

Pavel Lifshits, Ronen Talmon

Watch the video

Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets

Amir Ivry , Baruch Berdugo, Israel Cohen

We address voice activity detection in acoustic environments of transients and stationary noises, which often occur in real-life scenarios. We exploit unique spatial patterns of speech and non-speech audio frames by independently learning their underlying geometric structure. This process is done through a deep encoder–decoder-based neural network architecture. This structure involves an encoder that maps spectral features with temporal information to their low-dimensional representations, which are generated by applying the diffusion maps method. The encoder feeds a decoder that maps the embedded data back into the high-dimensional space. A deep neural network, which is trained to separate speech from non-speech frames, is obtained by concatenating the decoder to the encoder, resembling the known diffusion nets architecture. Experimental results show enhanced performance compared to competing voice activity detection methods. The improvement is achieved in both accuracy, robustness, and generalization ability. Our model performs in a real-time manner and can be integrated into audio-based communication systems. We also present a batch algorithm that obtains an even higher accuracy for offline applications.

Watch the video

Source Localization With Feedback Beamforming

Itay Yehezkel Karo, Tsvi G. Dvorkind, Israel Cohen

Watch the video

Target localization using array processing is a highly active research field, utilized by a wide variety of applications. Several approaches to localization are to be found. The oldest and most basic approach is beamforming in which the arena is constantly scanned in search for objects of interest where the spatial response of the array is called the beampattern. Focusing on the beamforming approach, specifically on the beampattern itself, the analogy between the spatial response (beampattern) of uniform linear array (ULA) and the frequency response of finite impulse response (FIR) filters is well known. In this work, motivated by the known efficiency-related advantages of infinite impulse response (IIR) over their FIR counterparts, we search for the spatial version of the IIR architecture. It turns out that by merely integrating a transmitter to an array (not necessarily ULA), a controllable spatial loop is generated, mathematically analogous to an IIR filter. This architecture is called the “feedback beamformer”. Performance analysis confirms that an infinite (under ideal scenarios) improvement is achievable which may be interpreted as a virtual increase of the array’s aperture. We find that the feedback beamformer is sensitive to the target’s range, where the high sensitivity is closely related to the high carrier frequencies in typical applications. We then present a more sophisticated “dual frequency feedback beamformer”, using the same resources, which extracts the spatial information from the frequency gap. This architecture features high directivity beamforming, with high performance also in low signal to noise (SNR) and low range sensitivity.
*Student of Israel Cohen