Integrating Image Processing with Large-Scale Vision/Language Models for Advanced Visual Understanding Workshop
(ICIP 2024)
Sunday 27 October
Introduction
This workshop aims to bridge the gap between conventional image processing techniques and the latest advancements in large-scale models (LLM and LVLM). In recent years, the integration of large-scale models into image processing tasks has shown significant promise in improving visual object understanding and image classification.
This workshop will provide a platform for researchers and practitioners to explore the synergies between conventional image processing methods and cutting-edge large language model and large vision language models, fostering innovation and collaboration in the field.
Our objectives are as follows:
Explore the foundations of image processing techniques with large-scale models.
Investigate the current landscape of large-scale language/vision models and their capabilities.
Discuss challenges and opportunities in integrating large-scale models with image processing to enhance visual understanding.
Showcase practical examples and case studies where the combined approach has yielded superior results.
This workshop is designed for researchers, academics, and industry professionals working in the fields of image processing, computer vision, multimedia processing and natural language processing. Participants should have a basic understanding of image processing concepts and an interest in exploring innovative approaches for visual understanding.
The workshop will consist of paper presentations by leading experts in image processing and large-scale language/vision models. Participants will have the opportunity to engage in discussions, exchange insights, and collaborate on potential research projects.
Call for Papers
Call for papers: Prospective participants are invited to submit full papers (in ICIP format) related to the workshop themes. Topics of interest include but are not limited to:
Cross-Modal Fusion: Exploring methodologies for seamlessly integrating image processing with large-scale language/vision models to enhance visual understanding.
Object Detection and Recognition with Large-scale models: Investigating novel approaches for object detection and recognition by leveraging the capabilities of large-scale language/vision models.
Image Classification and Annotation: Discussing advancements in image classification and annotation using large-scale models, focusing on improving accuracy and efficiency.
Multimodal Sensor Fusion: Examining strategies for incorporating multisensory data into image understanding tasks with the aid of large-scale models.
Semantic Segmentation with Large-scale Models: Exploring how large-scale model can contribute to semantic segmentation tasks, particularly in understanding complex visual scenes.
Cross-Domain Visual Understanding: Addressing challenges and opportunities in applying large-scale models for visual understanding across different domains or modalities.
Visual Question Answering (VQA) Systems: Enhancing VQA systems through the integration of image processing techniques and LLMs to enable more accurate and context-aware responses.
Text-Image Linking and Alignment: Investigating methods for establishing meaningful connections between text and image data using LLMs, facilitating richer visual understanding.
Submission
We encourage submissions of up to 4 pages, excluding references and acknowledgements. The submission should be in the ICIP format. Accepted papers will be published in ICIP proceedings and IEEE Explore.
Submission URL: https://cmsworkshops.com/ICIP2024/papers/submission.asp?Type=WS&ID=8
Important Dates
Paper Submission Deadline: May 9, 2024
Paper Acceptance Notification: June 6, 2024
Final Submission Deadline: June 19, 2024
Author Registration Deadline: July 11, 2024
Organizers
Yong Man Ro (KAIST, South Korea), e-mail address: ymro@kaist.ac.kr
Prof. Yong Man Ro earned his Ph.D. degree from the Department of Electrical Engineering at KAIST. He conducted research at various institutions including Columbia University, the University of California, Irvine, and, Berkeley. Additionally, he served as a visiting professor at the University of Toronto. Currently, he holds the position of full professor at the School of Electrical Engineering and ICT endowed chair professor at KAIST. Furthermore, he is the director of the Center for Applied Research in Artificial Intelligence, the Image Video System Lab, and the Integrated Vision and Language Lab at KAIST. Prof. Ro has received notable recognition, including the Young Investigator Finalist Award from ISMRM in 1992 and the Scientist of the Year Award (Korea) in 2003. He has contributed to the academic community, has served as an associate editor for IEEE Signal Processing Letters and currently serving IEEE Transactions on Circuits and Systems for Video Technology. He is also the IVMSP committee member in the IEEE Signal Processing Society. Moreover, he has played key roles in organizing numerous international conferences, including serving as the organizing chair/program chair of MMM 2020/PCM 2015 and IWDW 2004. He has also curated several special sessions, such as "Explainable Deep Neural Networks for Image/Video Processing" at ICIP 2021 and 2022, "Digital Photo Album Technology" at AIRS 2005, "Social Media" at DSP 2009, and "Human 3D Perception and 3D Video Assessments" at DSP 2011. Prof. Ro's recent research interests span various AI areas, including deep learning in computer vision and image processing, multimodal learning, integrating vision, speech, and language for AI, explainable and robust AI. His scholarly output includes over 520 peer-reviewed papers published in international journals and conferences.
2. Hak Gu Kim (Chung-Ang University, South Korea), e-mail address: hakgukim@cau.ac.kr
Prof. Hak Gu Kim received the Ph.D. degree from the Department of Electrical Engineering at Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2019. From 2019 to 2021, he was a postdoctoral researcher at École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland. He is currently an assistant professor at the Graduate School of Advanced Imaging Science, Multimedia & Films (GSAIM) in Chung-Ang University, South Korea. He served as the tutorial chair for the 2023 IEEE International Conference on Electronics, Information, and Communication (ICEIC). His research interests include deep learning and machine learning in 2D/3D/VR image and video processing and computer vision, human visual perception, multi-modal learning, and vulnerability of deep neural network for convergence of AI and reality.
3. Nikolaos (Nikos) Boulgouris (Brunel University London, United Kingdom), e-mail address: nikolaos.boulgouris@brunel.ac.uk
Prof. Nikolaos (Nikos) Boulgouris is an academic with the Department of Electronic and Computer Engineering of Brunel University London. From 2004 to 2010, he was an academic member of staff with King's College London, and prior to that he was a researcher with the Department of Electrical and Computer Engineering of the University of Toronto, Canada. He has published more than 100 papers in international journals and conferences and has participated in numerous national and international research consortia. Dr. Boulgouris was on the organizing committee of six major IEEE conferences, and served as Technical Program Chair for the 2018 IEEE International Conference on Image Processing (ICIP). He served as Senior Area Editor for the IEEE Transactions on Image Processing and as Associate Editor for the IEEE Transactions on Circuits and Systems for Video Technology, from which he received the 2017 Best Associate Editor Award. He also served as Associate Editor for the IEEE Transactions on Image Processing, and the IEEE Signal Processing Letters. He was co-editor of the book Biometrics: Theory, Methods, and Applications, which was published by Wiley - IEEE Press Series on Computational Intelligence, and guest co-editor for two journal special issues. From 2020 to 2022, he served as an elected member of the IEEE Multimedia Signal Processing Technical Committee (MMSP - TC). From 2014 to 2019, he served as an elected member of the IEEE Image, Video, and Multidimensional Signal Processing Technical Committee (IVMSP - TC). Dr. Boulgouris is a Senior Member of the IEEE and a Fellow of the Higher Education Academy.
Supported by
This workshop is supported by Center for Applied Research in Artificial Intelligence (CARAI).