Industry Demos



Video Quality Metrics and Their Industry Applications


Video quality metrics and automated video quality assessment have gained momentum in the video industry, thanks to continuous accuracy improvement and complexity reduction of available tools. Given the diverse spectrum of video applications and their ecosystems, it would be interesting to share experiences and learnings on how companies apply these tools in building and operating products and services. This panel brings together experts from industry to discuss the following topics:

  • How video streaming and conferencing companies build quality metrics into their workflow and how they meet their business needs

  • How codec vendors use automated quality assessment to optimize their encoder performance

  • The challenges and opportunities of the traditional psychovisual-based approach vs. the neural net-based approach

  • Next unsolved impactful problems in perceptual quality assessment

Prospective invitees to the panel include but not limited to Meta, Google, Zoom, Amazon, Snap, Disney, Twitter, Harmonic, Beamr, WaveOne, SSIMWAVE and Visionular.

Organizers: Zhi Li, Lukas Krasula, Anne Aaron, Netflix


Reliability and Interpretability of Vision Models


In Progress


End-to-End Delivery of VVC Multicast Services over 5G Mobile Network


This industrial demo showcases an end-to-end live video delivery chain leveraging Versatile Video Coding (VVC) and multicast-ROUTE over a 5G radio access network. The VVC encoder is provided by Ateme, achieving live encoding of a complete OTT ladder, from SD to 4K, packaged into CMAF using low-latency chunks, published on a local origin server. The multicast server, deployed prior to the base-station, is provided by GPAC and is performing ROUTE encapsulation of the CMAF services pushed by the encoder on the origin server. The multicast bitstreams are ingested by an Amarisoft Callbox providing 4G-Lte and 5G-NR core network and Radio Access Network (RAN) enabling to deliver LTE-Broadcast or unicast services to the smartphones. The playback of the services is achieved on a 5G smartphone running both a multicast client and a VVC-compatible player, in an interactive manner (dynamic quality selection). The multicast client is provided by GPAC and the VVC decoding library is Fraunhofer HHI VVdeC, optimized for ARM. The demonstration is highlighting how these emerging technologies can be deployed together to enable next-generation video services over 5G mobile network.

Link to website:

 

Short bio:

  • Thibaud Biatek received the Ph.D. degree in signal and image processing from the Institut National des Sciences Appliquées, Rennes, France, in 2016. From 2013 to 2017, he was a Doctoral and Post-Doctoral Fellow with TDF, Cesson-Sévigné, France. From 2017 to 2019, he was a Video Coding Expert with TDF, where he was involved in MPEG and DVB standardization activities. In 2019, he was a Senior Engineer with Qualcomm working on VVC standardization. Since 2020, he has been Director of Technology and Standards with ATEME, working on partnership projects and multimedia standards, contributing to MPEG, DVB, 5GMAG, AOM and 3GPP groups. His research interests include compression, processing, and delivery of audiovisual signals over broadcast and broadband networks.
  • Christophe Burdinat is the director, technologies and standards, at Ateme, Paris, France. With more than 15 years of experience in the mobile broadcast industry, from DVB-H/ATSC-MH/ISDBT-MM to LTE-broadcast and 5G-broadcast, in his different roles, he drives the product development, research and development collaborative projects, and standardization management. He is a regular delegate and contributor at 3GPP, Digital Video Broadcast (DVB), and Streaming Video Alliance (SVA). His expertise covers content delivery protocols, mission-critical/public safety services over mobile networks, multicast adaptive bitrate (ABR), DVB-I, 5G broadcast, and over-the-top (OTT) service distribution.
  • Mickaël Raulet is CTO at ATEME, where he drives research and innovation with various collaborative R&D projects. He represents ATEME in several standardization bodies: ATSC, DVB, 3GPP, ISO/IEC, ITU, MPEG, DASH-IF, CMAF-IF, SVA and UHDForum. He is the author of numerous patents and more than 100 conference and journal scientific papers. In 2006 he received his Ph.D. from INSA in electronic and signal processing, in collaboration with Mitsubishi Electric ITE (Rennes, France).
  • Adam Wieckowski received the M.Sc. degree in computer engineering from the Technical University of Berlin, Berlin, Germany, in 2014.,In 2016, he joined the Fraunhofer Institute for Telecommunications–Heinrich Hertz Institute, Berlin, as a Research Assistant. He worked on the development of the software, which later became the test model for VVC Development. He contributed several technical contributions during the standardization of VVC. Since 2019, he has been a Project Manager coordinating the technical development of decoder and encoder solutions for the VVC standard.
  • Benjamin Bross received the Dipl.-Ing. degree in electrical engineering from RWTH Aachen University, Aachen, Germany, in 2008. In 2009, he joined the Fraunhofer Institute for Telecom-munications – Heinrich Hertz Institute, Berlin, Germany, where he is currently heading the Video Coding Systems group at the Video Coding & Applications Department and in 2011, he became a part-time lecturer at the HTW University of Applied Sciences Berlin. Since 2010, Benjamin is very actively involved in the ITU-T VCEG | ISO/IEC MPEG video coding standardization processes as a technical contributor, coordinator of core experiments and chief editor of the High Efficiency Video Coding (HEVC) standard [ITU-T H.265 | ISO/IEC 23008-2] and the new Versatile Video Coding (VVC) standard [ITU-T H.266 | ISO/IEC 23090-3]. In addition to his involvement in standardization, his group is developing standard-compliant software implementations. This includes the development of an HEVC live software encoder that is currently deployed in broadcast for HD and UHD TV channels and most recently, the open and optimized VVC software implementations VVenC and VVdeC. Benjamin Bross is an author or co-author of several fundamental HEVC and VVC-related publications, and an author of two book chapters on HEVC and Inter-Picture Prediction Techniques in HEVC. He received the IEEE Best Paper Award at the 2013 IEEE International Conference on Consumer Electronics – Berlin in 2013, the SMPTE Journal Certificate of Merit in 2014 and an Emmy Award at the 69th Engineering Emmy Awards in 2017 as part of the Joint Collaborative Team on Video Coding for its development of HEVC.
  • Jean Le Feuvre received his Ingénieur (M.Sc.) degree in Telecommunications in 1999, from TELECOM Bretagne. He has been involved in MPEG standardization since 2000 for his NYC-based startup Avipix,llc and joined TELECOM Paris in 2005 as Research Engineer within the Image, Data and Signal Department. His main research topics cover multimedia authoring, delivery and rendering systems in broadcast, broadband and home networking environments. He is the project leader and maintainer of GPAC, a multimedia framework based on standard technologies (MPEG, W3C, IETF). He is the author of many scientific contributions (peer-reviewed journal articles, conference papers, book chapters, patents) in the field and is editor of several ISO standards.”

Live is Life: Efficient Two-pass Per-title Encoding for Adaptive Live Streaming


According to the Bitmovin Video Developer Report 2021, live streaming at scale has the highest scope for innovation in video streaming services. Currently, there are no open-source implementations available which can predict video complexity for live streaming applications. To this light, we plan to demo the functions of VCA software, and show accuracy of the complexities analyzed by VCA (https://vca.itec.aau.at) using the heatmaps, and show-case the speed of video complexity analysis. VCA can achieve an analysis speed of about 370fps compared to the 5fps speed of the reference SITI implementation. Hence, we show that it can be used for live streaming applications.

In the demo, we also showcase an application of VCA in detail: optimized CRF prediction for adaptive streaming, which is being presented in ICIP’22 (Paper ID: 2030). This scheme improves the compression efficiency of the conventional ABR encoding for live streaming.

Link to website:

 

Short bio:

  • Hadi Amirpour is a postdoc research fellow at Christian Doppler (CD) Laboratory ATHENA based at the University of Klagenfurt. He got his Ph.D. in computer science from the University of Klagenfurt in 2022. He received two B.Sc. degrees in Electrical and Biomedical Engineering and he received his M.Sc. degree in Electrical Engineering from the K. N. Toosi University of Technology. He was involved in the project EmergIMG, a Portuguese consortium on emerging imaging technologies, funded by the Portuguese funding agency and H2020. His research interests are on video streaming, image and video compression, quality of experience, emerging 3D imaging technology and medical image analysis. Further information at https://hadiamirpour.github.io.

 

Contributors:

  • Hadi Amirpour, University of Klagenfurt, Austria (hadi.amirpour@aau.at)
  • Vignesh V Menon, University of Klagenfurt, Austria (vignesh.menon@aau.at)
  • Christian Feldmann, Bitmovin, Austria (christian.feldmann@bitmovin.com)
  • Christian Timmerer, Bitmovin, Austria (christian.timmerer@bitmovin.com)

Object Detection at GPU speeds on a Consumer Laptop: CPU-Accelerated Sparse Networks


In this demo, we will demonstrate the power of compound sparsity for model compression and inference speedup within CV and NLP applications. The DeepSparse engine, which is optimized for executing sparse graphs on CPU hardware, will be utilized to run live object detection (YOLOv5) at 80 FPS on a consumer laptop. We will show participants how custom models designed for most tasks can be sparsified by using the open source library SparseML to apply compound sparsity while maintaining 99% of the baseline accuracy of the dense model. Techniques utilized include a combination of structured + unstructured pruning (to 90%+ sparsity), quantization, and knowledge distillation. In addition, we will demonstrate an array of off the shelf sparse models running in real time, including object segmentation (YOLACT) and sentiment analysis (HuggingFace BERT).

Link to the website:

 

Short bio:

  • Konstantin Gulin is a Machine Learning Engineer at Neural Magic working on bringing sparse computation to the forefront of industry. With prior experience in applying machine learning to remote sensing (NASA) and space mission simulation (The Aerospace Corporation), he’s turned his focus to enabling effective model deployment in even the most constrained environments. He’s passionate about technology and ethical engineering and strives for the thoughtful advancement of AI.
  • Damian is an engineer, roboticist, software developer, and problem solver. He has previous experience in autonomous driving (Argo AI), AI in industrial robotics (Arrival), and building machines that build machines (Tesla). Currently working in Neural Magic, focusing on the sparse future of AI computation. He works towards unlocking creative and economic potential with intelligent robotics while avoiding the uprising of sentient machines.

POKAIOK: A Standalone Machine Learning Cognitive Assistant for Visual Inspection in Production Lines


The POKAIOK system automatically detect assemblies on the production line and raise alerts when anomalies are detected. The algorithms that analyses the images are trained on edge, with few images, by using a user friendly graphic interface. No technical knowledge is required.

Link to the website:

 

Short bio:

  • Pierre Besset is a french engineer, who obtained his MSc “Mecatronic Systems” from the Lancaster University in 2013 and his PhD at Arts et Métiers (industrial robotics) in 2017. Since 2018, Pierre works on Computer Vision / anomaly detection for production and assembly lines. Since 2020, Pierre is head of R&D at BUAWEI.

A demonstration of neural-coded video decoding on mobile devices


Demo setup : We show an inter-frame neural video decoder running on a commercial mobile phone, decompressing high-definition videos in real-time while maintaining high visual quality. The demo interface displays technical information, including frames/s, MB/s, etc., during playback. A companion video with narration is available at https://www.youtube.com/watch?v=WUnlSHenr08.

Research and technical information : A presentation about the demo describes the research work (including ICIP 2022 paper #1480 “Optimized learned entropy coding parameters for practical neural-based image and video compression”), and the techniques and tools that were used for converting research results into a software that can exploit the neural acceleration hardware already available in commercial mobile devices, and support their requirements for low-precision weights and arithmetic. This is also covered in paper “MobileCodec: Neural Inter-frame Video Compression on Mobile Devices”,Proc. 2022 ACM Multimedia Systems Conf. (online links https://dl.acm.org/doi/10.1145/3524273.3532906, https://arxiv.org/abs/2207.08338).

Link to the website:

 

Short bio:

  • Amir Said is a principal engineer at Qualcomm AI Research, in San Diego, CA, USA. His research interests include signal processing and machine learning for multimedia applications. While working at Qualcomm Technologies, he recently participated in the development of conventional video codecs, like H.266/VVC, and now studies the use of machine learning and neural networks for video compression, and their efficient implementation in mobile devices. Dr. Said is a Fellow of the IEEE, recipient of two IEEE Best Journal Paper awards, and was elected member of IEEE Signal Processing Society Technical Committees. He has been regularly attending and publishing at IEEE ICIP since 1999 (https://scholar.google.com/citations?user=iRTzPLoAAAAJ&hl=en), and was glad to have some opportunities to help in its organization and support.

Image Processing Online


This demo is not proposed by an industry, but is partly the result of industrial interaction, and it is directed toward dual industry and academic use. IPOL is a research journal of image processing and image analysis which emphasizes the role of mathematics as a source for algorithm design and the reproducibility of the research. Each article contains a text on an algorithm and its source code, with an online demonstration facility and an archive of experiments. Text and source code are peer-reviewed and the demonstration is controlled. IPOL is an Open Science and Reproducible Research journal.

In this demo, we plan to interact with:

  • Industrials, particularly startups may be interested to explore the > 250 papers and online demos at IPOL, which cover most of the classic image and video processing problems. The new IPOL installation now works by key words and allows users to very quickly find the relevant papers. This speeds up online experimentation.
  • Academics: we will be ready to answer any request by visitors to test directly on line the relevant papers on their own images and to illustrate how this can speed up industrial development. We shall also on demand illustrate for interested users the progress reached on certain classic image processing problems by deep learning algorithms, and as well novel applications of deep learning that we have installed online (about 40 deep learning online demos will be available).

 

Link to the website:

 

Short bio:

  • Jean-Michel Morel is professor of mathematics at Ecole Normale Supérieure Paris-Saclay. He works on theories and algorithms for the restoration and automatic analysis of digital images and video. In 2011 his team founded Image Processing on Line (www.ipol.im) the first journal publishing reproducible algorithms in online executable articles. IPOL has collaborators in 15 universities and its public archives contain 300000 online experiments.
  • Yanhao Li is a laureate of the Shanghai Jiao Tong University –Mines ParisTech – Double degree in Bachelor and master on Electronics and Communication Engineering. After that he completed a research master degree at Jiao Tong and is currently preparing a PhD on deepfake detection at Ecole Normale Paris-Saclay under the supervision of Rafael Grompone, Miguel Colom and Jean-Michel More.

Codex: A Computer-Vision-Based Mobile App For Coding Education


In this demo, the educational Codex app is presented. Through the computer-vision-based app, users can organize physical coding blocks to complete a variety of coding challenges. The Codex app connects the physical and digital worlds of learning, and is currently focused towards K-8 students.

Link to the website: TBA

Short bio:

  • My name’s Nathan Elias and I’m a high school student at the Liberal Arts and Science Academy (LASA). I’m really interested in deep learning because I feel it’s a field that can have meaningful impacts across several fields. I’ve been exploring the world of machine learning through high school, and I hope to continue pursuing this amazing field!

InvasiveAI: A Mobile App For Deep-Learning-Based Invasive Species Detection and Growth Prediction


In this demo, the invasive species detecting InvasiveAI app is presented. Through a smartphone camera, users can automatically detect hundreds of invasive species in the field. Furthermore, the InvasiveAI app also enables users to view future projections of invasive species. The app is being utilized by agricultural workers and citizen scientists worldwide.

Link to the website:

 

Short bio:

  • My name’s Nathan Elias and I’m a high school student at the Liberal Arts and Science Academy (LASA). I’m really interested in deep learning because I feel it’s a field that can have meaningful impacts across several fields. I’ve been exploring the world of machine learning through high school, and I hope to continue pursuing this amazing field!