- π Hi, Iβm Teo (Timothy) Wu, Final Year PhD Candidate in Nanyang Technological University πΈπ¬, Resume, Homepage
-
π± Iβm currently working on project Q-Future: Visual Evaluation with Foundation ModelsπΉ
- See my top Repos:
-
- π§βπ«π§βπ«Co-Instruct: Extending multi-modality foundation models on multi-image comparison for low-level vision and visual quality assessment. See its πpaper and π₯οΈrepository~
-
- βοΈQ-Align: A text-guided syllabus to teach multi-modality foundational models for most accurate visual rating ever! See its πpaper and π₯οΈrepository~
-
- π§βπ«CVPR 2024, Q-Instruct: A dataset and a model zoo for multi-modality foundational models with improved abilities on low-level vision and visual quality assessment! See its πpaper and π₯οΈrepository~
-
- πICLR 2024 Spotlight, Q-Bench: A benchmark for multi-modality LLMs on low-level vision and visual quality assessment! See its πpaper and π₯οΈrepository~
-
- π₯ ICCV 2023, DOVER TL,DR: the SOTA NR-VQA method, can predict disentangled aesthetic and technical quality. Colab demo available.
-
- π§° TPAMI 2023, ECCV 2022, End-to-End VQA Toolbox (FAST-VQA) TL, DR: An end-to-end Video Quality Assessment toolbox allowing you to develop your methods; official repo for FAST-VQA and FasterVQA!
- π« Reach me by e-mail: realtimothyhwu@gmail.com/haoning001@e.ntu.edu.sg, Twitter: Twitter
- Google Scholar
π₯ News
- 2024.02.27: Β ππ Q-Instruct get accepted by CVPR2024! In my last year as a PhD student to finally have one paper at this conf.
- 2024.01.16: Β ππ Q-Bench get accepted by ICLR2024 as a Spotlight paper!
- 2023.09.10: Β ππ Extension of FAST-VQA (FasterVQA) get accepted by TPAMI (IF: 23.600)!
- 2023.07.27: Β ππ Passed my PhD Qualification Examination, now I am a PhD candidate!
- 2023.07.26: Β ππ MaxVQA get accepted by ACMMM2023 as an Oral paper (CCF-A)!
- 2023.07.14: Β ππ DOVER get accepted by ICCV2023 (CCF-A)!
- 2023.07.11: Β ππ BUONA-VISTA get oral presentation in ICME2023 (CCF-B)!
- 2023.03.12: Β ππ BUONA-VISTA get accepted by ICME2023 (CCF-B)!
- 2023.02.28: Β ππ DisCoVQA get accepted by TCSVT2023 (IF: 8.400)!
- 2022.07.07: Β ππ FAST-VQA get accepted by ECCV2022 (CCF-B)!
π First-authored Projects
Preprints
Towards Open-ended Visual Quality Comparison
- We propose the first open-source LMM that has the ability to answer open-range questions for visual quality comparison, and provide detailed reasonings. It learns from a merged variant of Q-Instruct dataset and GPT-4V outputs on unlabeled image groups, and its results on simple tasks are even better than GPT-4V (one of its teacher)!
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels
- We prove that discrete text-defined rating levels (excellent/good/fair/poor/bad) are more effective than scores themselves in teaching multi-modality foundation models to score.
- Without any additional design, it reaches state-of-the-art performance on IQA, IAA and VQA with remarkable gain! And it unifies these three tasks (mixing 5 training datasets) in one model (OneAlign)!
Publications
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
- We propose the Q-Instruct, a 200K instruction-tuning dataset for low-level vision, derived from a self-collected database on 58K real human pathway feedbacks on image quality.
- The Q-Instruct significantly improves the low-level perceptual accuracy of MLLMs, and we have released the model zoo. Take a try!
Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-Level Vision
- We construct the Q-Bench, a benchmark to examine the progresses of MLLMs on lowlevel visual abilities. Anticipating these large foundation models to be general-purpose intelligence that can ultimately relieve human efforts, we propose that MLLMs should achieve three important and distinct abilities: perception on low-level visual attributes, language description on low-level visual information, as well as IQA.
- Submit your model at our project page to compete with existing ones!
Neighbourhood Representative Sampling for Efficient End-to-End Video Quality Assessment
- Consisting of fragments and FANet, the proposed FrAgment Sample Transformer for VQA (FAST-VQA) enables efficient end-to-end deep VQA and learns effective video-quality-related representations. It improves state-of-the-art accuracy by around 10% while reducing 99.5% FLOPs on 1080P high-resolution videos.
- In our extension, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS) and the most efficient NR-VQA method at present, FasterVQA. FasterVQA achieve significantly better performance than existing approaches on all VQA benchmarks while requiring only 1/1612 FLOPs compared to the current state-of-the-art.
Towards Explainable Video Quality Assessment: A Database and a Language-Prompted Approach
- We collect over two million human opinions on 13 dimensions of quality-related factors to establish the multi-dimensional Maxwell database. Furthermore, we propose the MaxVQA, a language-prompted VQA approach that modifies CLIP to better capture important quality issues as observed in our analyses.
- The proposed Disentangled Objective Video Quality Evaluator (DOVER) reach state-of-the-art performance (0.91 SRCC for KoNViD-1k, 0.89 SRCC for LSVQ, 0.89 SRCC for YouTube-UGC) in the UGC-VQA problem. More importantly, our subjective studies construct the first aesthetic and technical VQA database, the DIVIDE-3k, proving that UGC-VQA is jointly affected by the two perspectives.
Exploring Opinion-unaware Video Quality Assessment with Semantic Affinity Criterion
GitHub , ArXiv (Short Version), Extension (Under Review for TIP)
- Robustly predict quality without training from any MOS scores.
- Localized quality prediction via CLIP.
- Given a small set of MOS-labelled videos, can robustly+efficiently fine-tune on them.
DisCoVQA: Temporal Distortion-Content Transformers for Video Quality Assessment
- Based on prominent time-series modeling ability of transformers, we propose a novel and effective transformer-based VQA method to tackle temporal distortions as well as the content-related temporal attention mechanism. Without extra data for pre-training, the DisCoVQA reaches state-of-the-art performance on several VQA benchmarks and up to 10% better generalization ability than existing methods.
π Educations
- 2021.07.27 - 2024.07.31 (expected), PhD Candidate, S-Lab, Nanyang Technological University, supervised by Prof. Weisi Lin, Research Topic: Consistent and Robust Video Quality Assessments.
- 2021.08.10 - 2023.07.27, PhD Student, S-Lab, Nanyang Technological University, supervised by Prof. Weisi Lin.
- 2017.09 - 2021.07, B.S. in Computer Science, EECS, Peking University.
π Teachings
- 2023.02 - 2023.05, Teaching Assistant for SC1015: Introduction to Data Science and Artificial Intelligence.
- 2023.08 - 2024.05, Teaching Assistant for SC3000: Artificial Intelligence.
π» Internships
- 2023.05 - 2023.08, TikTok Pte Ltd, Singapore, Singapore.
- 2022.01 - 2022.08, Sensetime Research, Beijing, China.
- 2021.05 - 2021.08, AiFi Inc, CA, USA.
- 2020.06 - 2021.03, Megvii Research, Beijing, China.