Seminar Computer Vision WS'24/25
Seminar
Prof. Dr.-Ing. Martin Eisemann
Hörerkreis: Bachelor & Master
Kontakt: seminarcv@cg.cs.tu-bs.de
Modul: INF-STD-66, INF-STD-68
Vst.Nr.: 4216031, 4216032
Topic: Recent research in Visual Computing
Latest News
Schedule for the final talks on the 30.01.2025:
09:00 Examining the Use of VR as a Study Aid for University Students with ADHD
09:30 Detecting distracted students in educational VR environments using machine learning on eye gaze data
10:00 The effect of a virtual reality based intervention on processing speed and working memory in individuals with ADHD—A pilot-study
10:45 CAT3D: Create Anything in 3D with Multi-View Diffusion Models
11:15 Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes
13:00 KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
13:30 Neuralangelo: High-Fidelity Neural Surface Reconstruction
14:15 NeRF as a Non-Distant Environment Emitter in Physics-based Inverse Rendering
14:45 Neural Gaussian Scale-Space Fields
(old) Schedule for the fundamentals talks on the 05.12.2024:
09:00 NN and Architectures
09:40 VR Brain
10:10 NeRFs
10:40 Data Analysis Techniques
Content
In this seminar we discuss current research results in computer vision, visual computing and image/video processing. The task of the participants is to understand and explain a certain research topic to the other participants. In a block seminar in the middle of the semester the background knowledge required for the final talks will be presented in oral presentations and at the end of the semester, the respective research topic is presented in an oral presentation. This must also be rehearsed beforehand in front of another student and his/her suggestions for improvement must be integrated.
Participants
The course is aimed at bachelor's and master's students from the fields of computer science (Informatik), IST, business informatics (Wirtschaftsinformatik), and data science.
Registration takes place centrally via StudIP. The number of participants is initially limited to 8 students, but can be extended in the kickoff if necessary.
Important Dates
All dates listed here must be adhered to. Attendance at all events is mandatory.
- 04.07.2024: Registration via Stud.IP
- 22.10.2024, 10:30-12:00: Kickoff Meeting (G30, ICG)
- 28.10.2024: End of the deregistration period
- 05.11.2024, 10:30-12:00, G30 (ICG): Gather topics for fundamentals talk
- 04.12.2024: Submission of presentation slides for fundamentals talk (please use the following naming scheme: Lastname_FundamentalsPresentation_SeminarCV.pdf)
- 05.12.2024, 09:00 - 12:00, G30 (ICG): Fundamentals presentations, Block
- Till 22.01.2025: Trial presentation for final presentation (between tandem partners from fundamentals talk)
- 29.01.2025: Submission of presentation slides for final talk (ALL participants!) (please use the following naming scheme: Lastname_FinalPresentation_SeminarCV.pdf)
- 30.01.2025, 09:00 - 15:00, G30 (ICG): Presentations - Block Event Part 1
Registered students have the possibility to deregister until 2 weeks after the start of the lectures at the latest. For a successful deregistration it is necessary to deregister with the seminar supervisor.
The respective drop-offs are done by email to seminarcv@cg.cs.tu-bs.de , and your advisor, and if necessary by email to the tandem partner. Unless otherwise communicated, submissions must be made by 11:59pm on the submission day.
If you have any questions about the event, please contact seminarcv@cg.cs.tu-bs.de.
Format
- The topics for the final talks will be distributed amongst the participants during the Kickoff event.
- The topics for the fundamentals talks will be distributed amongst the participants during the second meeting.
- The topics will be presented in approximately 20 minute presentations followed by a discussion, see important dates.
- For the on-site lectures, a laptop of the institute or an own laptop can be used. If the institute laptop is to be used, it is necessary to contact seminarcv@cg.tu-bs.de in time, at least two weeks before the presentations. In this case, the presentation slides must be made available at least one week before the lecture.
- The presentations will be given on site. If, for some reason, the presentations take place online, Big Blue Button will be used as a platform. In this case, students need their own PC with microphone. In addition, a video transmission during the own lecture would be desirable. If these requirements cannot be met, it is necessary to contact seminarcv@cg.cs.tu-bs.de in time.
- The language for the presentations can be either German or English.
- The presentations are mandatory requirements to pass the course successfully.
Files and Templates
- Kickoff-Slides
- Slide-Template (optional usage)
Topics
- Examining the Use of VR as a Study Aid for University Students with ADHD
(Cuber et al.) CHI '24
Attention-deficit/hyperactivity disorder (ADHD) is a neurodevelopmental condition characterized by patterns of inattention and impulsivity, which lead to difficulties maintaining concentration and motivation while completing academic tasks. University settings, characterized by a high student-to-staff ratio, make treatments relying on human monitoring challenging. One potential replacement is Virtual Reality (VR) technology, which has shown potential to enhance learning outcomes and promote flow experience. In this study, we investigate the usage of VR with 27 university students with ADHD in an effort to improve their performance in completing homework, including an exploration of automated feedback via a technology probe.
https://dl.acm.org/doi/pdf/10.1145/3613904.3643021
Advisor: Anika Jewst - Detecting distracted students in educational VR environments using machine learning on eye gaze data
(Asish et al.) Computers & Graphics
Virtual Reality has been found useful to improve engagement and retention level of students, for some topics, compared to traditional learning tools such as books, and videos. However, a student could still get distracted and disengaged due to a variety of factors including stress, mind-wandering, unwanted noise, and external alerts. Student eye gaze data could be useful for detecting these distracted students. Gaze data-based visualizations have been proposed in the past to help a teacher monitor distracted students. However, it is not practical for a teacher to monitor a large number of student indicators while teaching. To help filter students based on distraction level, we propose an automated system based on machine learning to classify students based on their distraction level.
https://www.sciencedirect.com/science/article/pii/S0097849322001856
Advisor: Anika Jewst - The effect of a virtual reality based intervention on processing speed and working memory in individuals with ADHD—A pilot-study
(Cunha et al.) Front. Virtual Real.
Attention deficit hyperactivity disorder (ADHD) is a neurodevelopmental disorder that manifests in children and adults and is characterized by high levels of inattention, hyperactivity, and impulsivity that often lead to multiple behavioral problems. For a considerable period, attention problems were considered to be the main neurological deficits underlying ADHD. However, current evidence states that executive function deficits are the central components of ADHD. This study aimed to evaluate the effectiveness of a virtual reality based intervention in processing speed and working memory in 25 students with ADHD symptomatology.
https://www.frontiersin.org/articles/10.3389/frvir.2023.1108060/full
Advisor: Anika Jewst - CAT3D: Create Anything in 3D with Multi-View Diffusion Models
(Gao et al.) Preprint May 2024
CAT3D introduces a novel approach to 3D scene creation by simulating real-world capture with a multi-view latent diffusion model, significantly reducing the number of images required for high-quality 3D (re)construction. This model generates highly consistent novel views from any input configuration, which are then used in a robust NeRF-based 3D reconstruction pipeline, enabling 3D scene creation from even single image input.
https://cat3d.github.io/
Advisor: Jannis Möller - Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes
(Takikawa et al.) CVPR 2021
NGLOD introduces a novel neural representation for real-time rendering of high-fidelity 3D shapes using signed distance functions (SDFs). This method significantly enhances rendering speed by 2-3 orders of magnitude compared to previous techniques, while maintaining high-quality geometry reconstruction. It achieves this through an octree-based feature volume and efficient algorithms for querying necessary levels of detail, making it an important advancement for real-time neural feature-volume based graphics applications.
https://research.nvidia.com/labs/toronto-ai/nglod/
Advisor: Jannis Möller - KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs
(Reiser et al.) ICCV 2021
KiloNeRF advances the field of Neural Radiance Fields (NeRF) by drastically improving rendering speeds. It achieves this by employing thousands of small Multi-Layer Perceptrons (MLPs) instead of a single large one, thus enabling real-time rendering without compromising visual quality. This approach accelerates rendering by three orders of magnitude compared to traditional NeRF models, making high-quality, real-time 3D scene synthesis feasible on modern hardware.
https://creiser.github.io/kilonerf/
Advisor: Jannis Möller - Neuralangelo: High-Fidelity Neural Surface Reconstruction
(Li et al.) CVPR 2023
Neural surface reconstruction has shown to be powerful for recovering dense 3D surfaces via image-based neural rendering. However, current methods struggle to recover detailed structures of real-world scenes. To address the issue, we present Neuralangelo, which combines the representation power of multi-resolution 3D hash grids with neural surface rendering. Our approach is enabled by two key ingredients: (1) numerical gradients for computing higher-order derivatives as a smoothing operation and (2) coarseto-fine optimization on the hash grids controlling different levels of details. Even without auxiliary depth, Neuralangelo can effectively recover dense 3D surface structures from multi-view images with a fidelity that significantly surpasses previous methods, enabling detailed large-scale scene reconstruction from RGB video captures.
https://research.nvidia.com/labs/dir/neuralangelo/paper.pdf
Advisor: Fabian Friederichs - NeRF as a Non-Distant Environment Emitter in Physics-based Inverse Rendering
(Ling et al.) SIGGRAPH 2024
Physics-based inverse rendering enables joint optimization of shape, material, and lighting based on captured 2D images. To ensure accurate reconstruction, using a light model that closely resembles the captured environment is essential. Although the widely adopted distant environmental lighting model is adequate in many cases, we demonstrate that its inability to capture spatially varying illumination can lead to inaccurate reconstructions in many real-world inverse rendering scenarios. To address this limitation, we incorporate NeRF as a non-distant environment emitter into the inverse rendering pipeline. Additionally, we introduce an emitter importance sampling technique for NeRF to reduce the rendering variance. Through comparisons on both real and synthetic datasets, our results demonstrate that our NeRF-based emitter offers a more precise representation of scene lighting, thereby improving the accuracy of inverse rendering.
https://dl.acm.org/doi/pdf/10.1145/3641519.3657404
Advisor: Fabian Friederichs - Neural Gaussian Scale-Space Fields
(Mujkanovic et al.) SIGGRAPH 2024
Gaussian scale spaces are a cornerstone of signal representation and processing, with applications in filtering, multiscale analysis, anti-aliasing, and many more. However, obtaining such a scale space is costly and cumbersome, in particular for continuous representations such as neural fields. We present an efficient and lightweight method to learn the fully continuous, anisotropic Gaussian scale space of an arbitrary signal. Based on Fourier feature modulation and Lipschitz bounding, our approach is trained self-supervised, i.e., training does not require any manual filtering. Our neural Gaussian scale-space fields faithfully capture multiscale representations across a broad range of modalities, and support a diverse set of applications. These include images, geometry, light-stage data, texture anti-aliasing, and multiscale optimization.
https://neural-gaussian-scale-space-fields.mpi-inf.mpg.de/paper.pdf
Advisor: Fabian Friederichs
Useful Resources
Example of a good presentation (video on the website under the Presentation section, note how little text is needed, and how much has been visualized to create an intuitive understanding).
General writing tips for scientific papers (mainly intended for writing scientific articles, but also good to use for summaries).