Author: Luisa Verdoliva
University Federico II of Naples, Italy
The tutorial will focus on deepfake detection. First, it will present the most reliable supervised approaches based both on deep learning and on handcrafted features (e.g., corneal specular highlights, heart variations, landmark locations) together with the main datasets used in this field. The most interesting directions for gaining generalization and robustness will be described, such as one-class learning, few shot learning and incremental learning. In this context, the concepts of camera fingerprints and artificial fingerprints will be introduced. Then, identity-based methods, aimed at detecting both face swapping and facial reenactment, will be presented. Multimodal approaches that detect audio-visual inconsistencies will be also considered. Results on challenging datasets and realistic scenarios, such as the spreading of manipulated images and videos over social networks, will be presented. In addition, the robustness of such methods to adversarial attacks will be analyzed. The tutorial will consider mostly deepfake videos, but will also include examples of fully generated images using generative adversarial networks (GANs).
T3: Stochastic Bayesian methods for imaging inverse problems: from Monte Carlo to score-matching and deep learning
Authors: Valentin De Bortoli (1), Julie Delon (2), Marcelo Pereyra (3)
(1) CNRS and ENS Paris, France
(2) Université de Paris, France
(3) Heriot-Watt University and the Maxwell Institute for Mathematical Sciences, Edinburgh, UK
The tutorial is structured in three parts of two hours of duration, organised as follows:
• The first part of this tutorial will introduce the Bayesian statistical framework and key concepts of Bayesian analysis and computation in the context of imaging. We first introduce the Bayesian modelling paradigm and then quickly progress to fundamental concepts of Bayesian decision theory that are relevant to imaging sciences, such as point estimation and uncertainty quantification analyses, hierarchical and empirical approaches to calibrate unknown model parameters, and model selection. This is then followed by an introduction to efficient Bayesian computation approaches. We pay special attention to methods based on the overdamped Langevin stochastic differential equation, to proximal Markov chain Monte Carlo algorithms, and to stochastic approximation methods that intimately combine ideas from stochastic optimisation and Langevin sampling. These computation techniques are illustrated with a series of imaging experiments where they are used to perform some of the advanced Bayesian analyses previously introduced.
• The second part of this tutorial is devoted to Plug-and-Play methods and Tweedie based approaches. In the Bayesian framework introduced in the first part, image models are used as priors or regularisers and combined to explicit likelihood functions to define posterior distributions. These posterior distributions can be used to derive Maximum A Posteriori (MAP) estimators, leading to optimization problems that may be convex or not, but are well studied and understood. Sampling schemes can also be used to explore these posterior distributions, to derive Minimum Mean Square Error (MMSE) estimators, quantify uncertainty or perform other advanced inferences. While research on inverse problems has focused for many years on explicit image models (either directly in the image space, or in a transformed space), an important trend nowadays is to use implicit image models encoded by denoising neural networks. These denoising networks can be seen in particular as approximating the gradient or the proximal operator of the log-prior on natural images, and can therefore be used in many classical optimization or sampling schemes. These methods, commonly known as Plug & Play (PnP), open the way to restoration algorithms that exploit more powerful and accurate prior models for natural images but raises novel challenges and questions on the corresponding posterior distributions and their resulting estimators. The goal of this part is to in- troduce these Plug & Play approaches, and to provide some perspectives and present recent developments on these questions.
• The third part of this tutorial is devoted to score-based generative modelling for inverse problems. These models to sample from given posterior distribution are adapted from methods used for generative modelling, i.e. the task of generating new samples from a data distribution. Score-based generative modelling is a recently developed approach to solve this problem and exhibits state-of-the-art performance on several image synthesis problems. These methods can be roughly described as follows. First, noise is incrementally added to the data to obtain an easy-to-sample distribution. Then, we learn the time-reversed denoising dynamics using a neural network. When initialized at the easy-to-sample distribution we obtain a generative model. These dynamics can be analyzed through the lens of stochastic analysis. In particular, it is useful to describe these processes as Stochastic Differential Equations (SDEs). The time-reversed SDE is a diffusion whose drift depends on the logarithmic gradients of the perturbed data distributions, i.e. the Stein scores. These scores are computed leveraging score-matching methods and in particular the Tweedie identity as well as neural network approximations. These generative models can be conditioned on observed data and give rise to efficient solvers for in- verse problems. We will draw connections between these machine-learning models and the PnP methods introduced in image processing and present applications to some classical inverse problems in image processing.
Author: Ivan V. Bajić
Simon Fraser University, Canada
Visual content is increasingly being used for more than human viewing. For example, traffic video is automatically analyzed to count vehicles, detect traffic violations, estimate traffic intensity, and recognize license plates; images uploaded to social media are automatically analyzed to detect and recognize people, organize images into thematic collections, and so on; visual sensors on autonomous vehicles analyze captured signals to help the vehicle navigate, avoid obstacles, collisions, and optimize their movement. The sheer amount of visual content used for purposes other than human viewing demands rethinking the traditional approaches for image and video compression.
This tutorial is about techniques for compressing images and video for multiple purposes, besides human viewing. We will start the first part of the tutorial by reviewing early attempts at tackling multi-task usage
of compressed visual content. We will discuss several representative problems in “compressed-domain” image and video analysis, such as interest-point detection, face and person detection, saliency detection, and object tracking. We will briefly mention several MPEG standards for encoding features related to image and video analysis, such as Content Description for Visual Search (CDVS) and Content Description for Visual Analysis (CDVA).
The second part of the tutorial is devoted to the recent learning-based image and video compression methods, which offer much more flexibility for multi-task compression. We will review some basic concepts from information theory that will help appreciate subsequent material. We will then present several recent Deep Neural Network (DNN) models for image and video compression and how they might be used in multi-task compression. We will also discuss task-scalability and privacy in the context of multi- task compression. Finally, recent standardization activities related to multi-task compression, such as JPEG AI and MPEG Video Coding for Machines (VCM) will be reviewed.
Authors: Zhu Li (1), Zhan Ma (2), Shan Liu (3), Xiaozhong Xu (3), Xiang Zhang (3)
(1) University of Missouri, Kansas City, USA
(2) Nanjing University, Jiangsu, China
(3) Tencent Media Lab, USA
Abstract: Point cloud data arises from 3D sensing and capturing for autonomous driving/navigation/smart city, as well as the VR/AR playback and immersive visual communication applications. Recent advances in sensor technologies and algorithms, especially LiDAR and 77Ghz mmWave radar systems, and ultra-high resolution RGB camera arrays, have made point cloud acquisition and processing closer to the wide adoption in real world applications. Given that point cloud data often present an excessive amount of random, unstructured points in a 3D space, efficient compression of point cloud is highly desired for its successful enabling, especially for networked services. In this tutorial we will review the latest advances in point cloud processing and compression, for both standard based and learning based frameworks, including advanced 3d motion model, deep learning based deblocking, end to end learning based compression of point cloud as well as QoE metrics. The tutorial is based on a series of recent publications listed in the reference.
The tutorial is organized into the following 3 sections:
Multiscale Sparse Tensor based Point Cloud Geometry Compression: this section talks about a unified Point Cloud Geometry (PCG) compression method through the processing of multiscale sparse tensor based voxelized PCG, e.g., as the SparsePCGC. The proposed SparsePCGC adopts the multiscale representation to compress scale-wise MP-POVs (Most-Probable Positively-Occupied Voxels) by extensively and flexibly exploiting cross-scale and same-scale correlations. Thus, the compression efficiency highly depends on the accuracy of estimated occupancy probability for each MP-POV, e.g., p(MP-POV). Thus, we first design the Sparse Convolution based Neural Networks (SparseCNN) that stack sparse convolutions and voxel sampling to best characterize and embed spatial correlations. We then develop SparseCNN based Occupancy Probability Approximation (SOPA) model to estimate the p(MP-POV)s either in a single-stage manner only using the cross-scale correlation or in a multi-stage means by stage-wisely exploiting correlation among same-scale neighbors. Besides, we also suggest the SparseCNN based Local Neighborhood Embedding (SLNE) to aggregate local variations as spatial prior in feature attributes to improve the SOPA. Our unified approach not only shows state-of-art performance in both lossless and lossy compression modes across a variety of datasets including the dense object PCGs (8iVFB, Owlii, MUVB) and sparse LiDAR PCGs (KITTI, Ford) when compared with standardized MPEG G-PCC and other prevalent learning-based compression schemes, but also presents lightweight complexity consumption which is attractive to practical applications.
Point Cloud Upscaling and Dynamic Point Cloud Compresison: in this section a sparse convolutional network backbone based point cloud upscaling and dynamic point cloud motion compensation based coding scheme are introduced. Due to the increased popularity of augmented and virtual reality experiences, the interest in capturing highresolution real-world point clouds has never been higher. To recover the loss of details and irregularities in point cloud geometry is a very challenging problem that can occur during the capturing, processing, and compression pipeline. Current upsampling methods suffer from several weaknesses in handling point cloud upsampling, especially in dense real-world photo-realistic point clouds. In this work, we present a novel geometry upsampling technique, PU-Dense, which can process a diverse set of point clouds including synthetic mesh-based point clouds, real-world high-resolution point clouds, real-world indoor LiDAR scanned objects, as well as outdoor dynamically acquired LiDAR-based point clouds. PU-Dense utilizes an UNet like architecture with a highly efficient sparse conv network backbone that leans the point cloud geometry occupancy at multiscales. The architecture is memory efficient and driven by a binary voxel occupancy classification loss that allows it to process high-resolution dense point clouds with millions of points during inference time. Qualitative and quantitative experimental results show that PU-Dense significantly outperforms the current SOTA by a large margin while having much lower inference time complexity. For dynamic point cloud geometry compression, modeling dynamic geometry with non-uniformly sampled points is a key challenge. We developed a solution that utilizes sparse convolutions with hierarchical multiscale 3D feature learning to encode the current frame using the previous frame. We employ convolution on target coordinates to map the latent representation of the previous frame to the downsampled coordinates of the current frame to predict the current frame’s feature embedding. Then the residual of the predicted features are coded with a learned entropy model and arithmetic coder. The solution is adopted in the recent MPEG AI based point cloud compression experiments and it outperforms both G-PCC and V-PCC, achieving more than 91% BD-Rate (Bjøntegaard Delta Rate) reduction against G-PCC, more than 62% BD-Rate reduction against V-PCC intra-frame encoding mode, and more than 52% BD-Rate savings against V-PCC P-frame-based inter-frame encoding mode using HEVC.
Video-based point cloud/mesh compression and optimizations: This section talks about video-based point cloud and mesh compression algorithms in state-of-the-art standards as well as some encoder optimization techniques that enables real-time implementations in a wide range of applications. Video-based representation methodologies map 3D objects onto 2D planes by either orthogonal projection along surface normal (for 3D point clouds mostly) or 2D bijective topology-kept mapping (a.k.a., UV parameterization for 3D meshes). These 2D planes may include occupancy map, geometry map and attribute maps that can be coded by existing video codecs, and the 3D objects can be easily reconstructed from the decoded video frames on the decoder side. The overall coding performance will be largely impacted by the 3D-2D mapping method because it may break the continuity in the original 3D space. A good mapping algorithm would try to maximize the spatial and temporal consistency of the 2D planes while minimizing the distortions while being mapped from 3D to 2D. This talk will brief these key technologies in the MPEG V-PCC and the emerging V-Mesh standards and introduce some examples of how to optimize them to the best performance in terms of speed and coding efficiency in real products.
Zhu Li is a Professor of Electrical & Computer Engineering with the Dept of CSEE, University of Missouri, Kansas City, US. He directs the NSF I/UCRC Center for Big Learning at UMKC. He received his PhD from Electrical & Computer Engineering from Northwestern University in 2004. He was AFRL summer faculty fellow with the US Air Force Academy, 2016-18, 2020 and 2022, Sr. Staff Researcher/Sr. Manager with Samsung Research America‘s Multimedia Core Standards Research Lab in Dallas, from 2012-2015, Sr. Staff Researcher at FutureWei, from 2010-12, Assistant Professor with the Dept of Computing, The HongKong Polytechnic University from 2008 to 2010, and a Principal Staff Research Engineer with the Multimedia Research Lab (MRL), Motorola Labs, Schaumburg, Illinois, from 2000 to 2008. His research interests include image/video analysis, compression, and communication and associated optimization and machine learning problems. He has 50+ issued or pending patents, 180+ publications in book chapters, journals, conference proceedings and standards contributions in these areas. He is the Associate Editor-in-Chief for IEEE Trans on Circuits & System for Video Tech, 2020~, and served and serving as Associated Editor for IEEE Trans on Image Processing (2019~), IEEE Trans on Multimedia (2015-18), and IEEE Trans on Circuits & System for Video Tech (2016~19). He received a Best Paper Award from IEEE Int’l Conf on Multimedia & Expo (ICME) at Toronto, 2006, and a Best Paper Award from IEEE Int’l Conf on Image Processing (ICIP) at San Antonio, 2007.
Zhan Ma is a Full Professor in the School of Electronic Science and Engineering at Nanjing University, Jiangsu, 210093, China. He received the B.S. and M.S. from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2004 and 2006 respectively, and the Ph.D. degree from the New York University, New York, in 2011. From 2011 to 2014, he was with Samsung Research America, Dallas TX, and Futurewei Technologies, Inc., Santa Clara, CA, respectively. His current research focuses on the learnt visual data compression and networking, and computational imaging. He has 20 issued patents and 60+ publications in IEEE Transactions and conferences. He is the Associated Editor for IEEE Circuits and Systems Magazine, 2020~, and serves as the area chair, special session chair, etc for a number of multimedia processing related conferences. He is a co-recipient of 2018 ACM SIGCOMM Student Research Competition Finalist, 2018 PCM Best Paper Finalist, 2019 IEEE Broadcast Technology Society Best Paper Award, 2020 IEEE MMSP/JPEG Imaging Coding Grand Challenge Best Performing Solution, and 2020 SPIE ICMV Camera Illumination Estimation Contest First Prize.
Shan Liu received the B.Eng. degree in electronic engineering from Tsinghua University, the M.S. and Ph.D. degrees in electrical engineering from the University of Southern California, respectively. She is a Distinguished Scientist and General Manager at Tencent, where she leads teams of Tencent Media Lab, Tencent Video and Tencent Games to build technologies and products serving over a billion users. She was formerly Director of Media Technology Division at MediaTek USA. She was also formerly with MERL and Sony, etc. She has been a long-time contributor to international standards and has had numerous technical proposals adopted into various standards, such as VVC, HEVC, OMAF, DASH, MMT and PCC. She served co-Editor of H.265/HEVC SCC and H.266/VVC standards. She also served on the Editorial Board of IEEE Transactions on Circuits and Systems for Video Technology (2018-2021) and received the Best AE Award in 2019 and 2020. She is a vice chair of IEEE Data Compression Standards Committee. She is an APSIPA Distinguished Industry Leader. She is a Fellow of the IEEE. She holds more than 500 granted US patents and has published more than 100 peer-reviewed papers and one book. Her research interests include audio-visual, volumetric, immersive and emerging media compression, intelligence, transport and systems.
Xiaozhong Xu has been a Principal Researcher and Senior Manager of Multimedia Technologies at Tencent Media Lab, Palo Alto, CA, USA, since 2017. He was with MediaTek USA Inc., San Jose, CA, USA as a Senior Staff Engineer and Department Manager of Multimedia Technology Development, from 2013 to 2017. Prior to that, he worked for Zenverge (acquired by NXP in 2014), a semiconductor company focusing on multi-channel video transcoding ASIC design, from 2011 to 2013. He also held technical positions at Thomson Corporate Research (now Technicolor) and Mitsubishi Electric Research Laboratories. His research interest lies in the general area of multimedia, including visual data coding, processing, quality assessment and transmission. He has been an active participant in multimedia data coding standardization activities for over fifteen years. He has successfully contributed to various standards including H.264/AVC and its extensions, AVS1 and AVS3 (China), HEVC and its extensions, MPEG-5 EVC and the most recent H.266/VVC standard. He served as core experiment (CE) coordinators, ad-hoc group (AHG) chairs and specification editors in various international and national video coding standards. He is also an active participant in volumetric visual data coding standards such as MPEG-VMesh, AoMedia-VVM and AVS-PCC. Xiaozhong Xu received the B.S. and Ph.D. degrees from Tsinghua University, Beijing China in electronics engineering, and the MS degree from Polytechnic school of engineering, New York University, NY, USA, in electrical and computer engineering.
Xiang Zhang received the B.S. degree in computer science from the Harbin Institute of Technology, Harbin, China, in 2013, and the Ph.D. degree in computer application technology from the Peking University, Beijing, China, in 2019. He was a visiting student in the Information Processing Laboratory, University of Washington, Seattle, WA, USA, in 2017. He is currently with Tencent Media Lab, Palo Alto, CA, USA, as a senior researcher. He has authored and co-authored over 40 technical articles in refereed journals and proceedings in the areas of image/video coding, quality assessment and analysis. He has also made contributions to MPEG, JPEG and AVS standard committees. His current research interests include volumetric data compression, image/video compression and 3D vision.
Author: C.-C. Jay Kuo
University of Southern California, USA
There has been a rapid development of artificial intelligence and machine learning technologies in the last decade. The core lies in a large amount of annotated training data and deep learning networks. Representative deep learning networks include the convolutional neural network, the recurrent neural network, the long short-term memory network, the transformer, etc. Although deep learning networks have made great impacts in various application domains such as computer vision, natural language processing, autonomous driving, robotics navigation, etc., they have several inherent shortcomings. They are mathematically intractable, vulnerable to adversarial attacks and demanding a huge amount of annotated training data. Furthermore, their training is computationally intensive because of the use of backpropagation for end-to-end network optimization.
There is an emerging concern that deep learning technologies are not friendly to the environment since their carbon footprint is a threat to global warming and climate change. As sustainability has become critical to human civilization, one priority in science and engineering is to preserve our environment for future generations. In the field of artificial intelligence, it is urgent to investigate new learning paradigms that are competitive with deep learning in performance yet with significantly lower carbon footprint. Professor C.-C. Jay Kuo has worked towards this goal since 2014. He has published a sequence of influential papers along this direction (see the recent publication list) and coined this emerging field with a term – “green learning”. By definition, green learning demands low power consumption in both training and inference. Besides, it has several attractive characteristics: small model sizes, fewer training samples, mathematical transparency, ease for incremental learning, etc. It is particularly attractive for mobile/edge computing.
I organized two tutorials on this topic at ICIP 2020 and ICIP 2021, respectively, to promote the importance of this emerging area. It has received more attention recently. I focused on the evolution of convolution layers to the unsupervised feature learning module in green learning at ICIP 2020. I presented the unsupervised feature learning module from the filter bank theory viewpoint and some application examples such as face biometrics and point cloud classification, segmentation and registration. For ICIP 2022, I will add two new learning modules and introduce new applications.
Authors: Jiayi Ma (1) and Xiao-Ping Zhang (2)
(1) Wuhan University, China
(2) Ryerson University, Canada
As the most extensive information carrier, image drives the current artificial intelligence to better understand the world. However, a single type of image barely can completely describe the imaging scene, which is not conducive to deep learning technology in high-level semantic inference. In this context, many engineering, medical, remote sensing, environmental, national defense, and civilian applications need to combine information from various types of images to make more precise decisions. As a result, image fusion technology came into being. According to the differences among source images, typical image fusion scenarios can be divided into multi-modality image fusion, digital photography image fusion, and remote sensing image fusion. The diversity of source images and the complexity of the fusion scenario both pose new challenges to the development of algorithms. This tutorial will provide a basic understanding of image fusion as well as a comprehensive analysis of state-of-the-art solutions.
In the first part of the tutorial, a comprehensive overview of the problem will be given for three major categories: multi-modal image fusion, digital photography image fusion, and remote sensing image fusion. We will discuss different aspects of the above categories by considering the imaging principles, application areas, basic technology pipeline, datasets, and evaluation criteria. In the second, third, and fourth parts of the tutorial, we will focus on details of representative state-of-the-art solutions in each category for a deeper understating of designing successful image fusion systems. Moreover, we will also present comparative analyses of state-of-the-art solutions based on various pipelines to demonstrate intuitively the superiority of different pipelines. In the last part of the tutorial, current challenges and future work in image fusion will be considered, such as the non-registered image fusion, task-oriented image fusion, cross-resolution image fusion, real-time image fusion, and fusion quality assessment.
Author: Antonin Chambolle
Université Paris-Dauphine, CNRS, France
The goal of this tutorial is to review saddle points methods for convex problems in optimization, which have been developed over almost 15 years. Most of the material which will be presented is not very new (except maybe some results on the computation of optimal transportation and some recent applications of the non-linear setting).
The tutorial will start with describing a few examples of (basic) optimization tasks for image reconstruction (based on elementary Bayesian models, such as segmentation, deblurring, medical imaging, Wasserstein distances or barycenters). These problems will be modeled as non-smooth convex minimization problems, such as involving l1 norms, the Total Variation, etc.
Then, we will introduce, first in an Euclidean setting, the proximal map of a convex function, and introduce standard elementary splitting methods for solving composite minimization problems. This will lead to the introduction of the “PDHG” or stabilized Arrow-Hurwicz method as in . It will be also related to the proximal-point algorithm, following a remark of He and Yuan (2012). Before this, a very quick introduction to convex conjugacy will be necessary.
In a second part, we will describe some extensions. First, we will try to explain how a O(1/N^2) acceleration can be obtained using varying steps and relaxation. This is the most tricky part, as it is a bit too technical for a 3-hours tutorial in this context and one will need to find a simple way to introduce the main tricks which make the acceleration work without loosing the audience. One will also focus on explaining the meaning of the rates which are obtained in terms of primal-dual gap or energies (a common error being to substitute in such estimates the test point (x,y) with a saddle-point (x∗,y∗), which in non-smooth problems gives an absolutely irrelevant criterion of optimality).
The other improvements and extensions we will recall are the explicit schemes of Condat (2013) and Vũ(2013), the step adaption of Golstein et al (2013), the linesearch variant of Malistky and Pock (2016), the generalization to smooth/non- smooth convex-concave coupling (Boţ et al, 2021). We will very quiclky mention some stochastic extensions such as “APPROX” of Fercoq and Richtárik (2013), and maybe also , yet without details.
In the third section, we will introduce the non-linear setting for optimization in Banach spaces (or simply, finite dimensional optimisation with non-Euclidean norms). The idea is to review the definition of Bregman distances and the proximal Bregman algorithms. We will then show (without too many details, as it is identical to the Euclidean setting) that the theoretical results on the algorithm transfer to this setting without almost any difference, as shown in  (including acceleration in case a function is relatively strongly convex, yet this seems not widely useful). This will be illustrated in the end of the lecture with two applications:
• a comparison between the rate of convergence for solving (approximately) optimal transportation (assignment) problems, using the Euclidean and the Entropy settings ;
• the extension of primal-dual algorithms to problems min_u F (K u) + G(u) with F smooth or G strongly convex, using instead of the “prox” the gradient of F (or G∗), as suggested by Lan and Zhou (2017). In that case, the notion of relative strong convexity is essential and one recovers in this way variants of the Nesterov/Tseng accelerated methods. A possibility is also, if time permits, to address the interesting issue of derivating a loss with respect to the parameters of the algorithm and in particular the coupling operator K between the primal and dual variable. A Piggyback method has been analysed in [6, 1], based on results on inexact algorithms , it is quite practical for problems where these parameters need to be learned.
We hope to have time again to discuss some numerical experiments towards the end, and at least to consider one example in particular and explain in details how it is solved.
 Lea Bogensperger, Antonin Chambolle, and Thomas Pock. Convergence of a Piggyback-style method for the differentiation of solutions of standard saddle-point problems. working paper or preprint, January 2022.
 Antonin Chambolle and Juan Pablo Contreras. Computational optimal transport using accelerated bregman primal-dual algorithms. (preprint, 2022).
 Antonin Chambolle, Matthias J. Ehrhardt, Peter Richt ́arik, and Carola- Bibiane Sch ̈onlieb. Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM J. Optim., 28(4):2783– 2808, 2018.
 Antonin Chambolle and Thomas Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40:120–145, 2011.
 Antonin Chambolle and Thomas Pock. On the ergodic convergence rates of a first-order primal–dual algorithm. Math. Program., 159(1-2, Ser. A):253– 287, 2016.
 Antonin Chambolle and Thomas Pock. Learning consistent discretizations of the total variation. SIAM J. Imaging Sci., 14(2):778–813, 2021.
 Julian Rasch and Antonin Chambolle. Inexact first-order primal-dual algo- rithms. Comput. Optim. Appl., 76(2):381–430, 2020.
Authors: Stanley Chan and Nicholas Chimitt
Purdue University, USA
Imaging through the atmospheric turbulence is one of the fastest growing topics in computational photography, image processing, and computer vision. The challenge of doing research in this field is the steep learning curve of optics that beginners often find difficult to manage. As the community grows, a tutorial of the subject presented in the context of image processing is not only timely, but also serves the pressing demand due to the lack of an alternative. The proposed tutorial will be taught by researchers in computational photography with a strong track record in image processing and optics journals. The objective of the tutorial is to bridge the knowledge gap for participants in a number of upcoming major research programs such as IARPA’s BRIAR (launched) and CVPR 2022’s UG2+ challenge on turbulence.
The proposed tutorial aims at providing a working knowledge of the simulation and principles of imaging through turbulence, with the only requirements being familiarity with basic Electrical Engineering principles. The tutorial uses an appropriate balance of theory and programming to suit the ICIP audience, using live Python demos for the purpose of providing the audience some familiarity with the concepts. Python code will be available for download and contain multiple tunable parameters, with suggested inputs, so that those in attendance may change parameters and become accustomed with these concepts through experience while following along.
The course is designed for three hours. Each hour will cover one sub-topic: Fourier optics, atmospheric turbulence simulation, and reconstruction.