πŸ”₯ News

  • 2024.02.27: Β πŸŽ‰πŸŽ‰ Q-Instruct get accepted by CVPR2024! In my last year as a PhD student to finally have one paper at this conf.
  • 2024.01.16: Β πŸŽ‰πŸŽ‰ Q-Bench get accepted by ICLR2024 as a Spotlight paper!
  • 2023.09.10: Β πŸŽ‰πŸŽ‰ Extension of FAST-VQA (FasterVQA) get accepted by TPAMI (IF: 23.600)!
  • 2023.07.27: Β πŸŽ‰πŸŽ‰ Passed my PhD Qualification Examination, now I am a PhD candidate!
  • 2023.07.26: Β πŸŽ‰πŸŽ‰ MaxVQA get accepted by ACMMM2023 as an Oral paper (CCF-A)!
  • 2023.07.14: Β πŸŽ‰πŸŽ‰ DOVER get accepted by ICCV2023 (CCF-A)!
  • 2023.07.11: Β πŸŽ‰πŸŽ‰ BUONA-VISTA get oral presentation in ICME2023 (CCF-B)!
  • 2023.03.12: Β πŸŽ‰πŸŽ‰ BUONA-VISTA get accepted by ICME2023 (CCF-B)!
  • 2023.02.28: Β πŸŽ‰πŸŽ‰ DisCoVQA get accepted by TCSVT2023 (IF: 8.400)!
  • 2022.07.07: Β πŸŽ‰πŸŽ‰ FAST-VQA get accepted by ECCV2022 (CCF-B)!

πŸ“ First-authored Projects

Preprints

Preprint
sym

Towards Open-ended Visual Quality Comparison

  • We propose the first open-source LMM that has the ability to answer open-range questions for visual quality comparison, and provide detailed reasonings. It learns from a merged variant of Q-Instruct dataset and GPT-4V outputs on unlabeled image groups, and its results on simple tasks are even better than GPT-4V (one of its teacher)!
  • Open in Huggingface Spaces
Preprint
sym

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

  • We prove that discrete text-defined rating levels (excellent/good/fair/poor/bad) are more effective than scores themselves in teaching multi-modality foundation models to score.
  • Without any additional design, it reaches state-of-the-art performance on IQA, IAA and VQA with remarkable gain! And it unifies these three tasks (mixing 5 training datasets) in one model (OneAlign)!
  • Open in Huggingface Spaces

Publications

CVPR 2024
sym

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

  • We propose the Q-Instruct, a 200K instruction-tuning dataset for low-level vision, derived from a self-collected database on 58K real human pathway feedbacks on image quality.
  • The Q-Instruct significantly improves the low-level perceptual accuracy of MLLMs, and we have released the model zoo. Take a try!
  • Open in Huggingface Spaces
ICLR 2024 (Spotlight)
sym

Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-Level Vision

  • We construct the Q-Bench, a benchmark to examine the progresses of MLLMs on lowlevel visual abilities. Anticipating these large foundation models to be general-purpose intelligence that can ultimately relieve human efforts, we propose that MLLMs should achieve three important and distinct abilities: perception on low-level visual attributes, language description on low-level visual information, as well as IQA.
  • Submit your model at our project page to compete with existing ones!
TPAMI 2023 (IF: 23.60), ECCV 2022
sym

Neighbourhood Representative Sampling for Efficient End-to-End Video Quality Assessment

GitHub , ArXiv

  • Consisting of fragments and FANet, the proposed FrAgment Sample Transformer for VQA (FAST-VQA) enables efficient end-to-end deep VQA and learns effective video-quality-related representations. It improves state-of-the-art accuracy by around 10% while reducing 99.5% FLOPs on 1080P high-resolution videos.
  • In our extension, we propose a unified scheme, spatial-temporal grid mini-cube sampling (St-GMS) and the most efficient NR-VQA method at present, FasterVQA. FasterVQA achieve significantly better performance than existing approaches on all VQA benchmarks while requiring only 1/1612 FLOPs compared to the current state-of-the-art.
ACMMM 2023 (Oral)
sym

Towards Explainable Video Quality Assessment: A Database and a Language-Prompted Approach

GitHub , ArXiv

  • We collect over two million human opinions on 13 dimensions of quality-related factors to establish the multi-dimensional Maxwell database. Furthermore, we propose the MaxVQA, a language-prompted VQA approach that modifies CLIP to better capture important quality issues as observed in our analyses.
ICCV 2023
sym

Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives

GitHub , ArXiv

  • The proposed Disentangled Objective Video Quality Evaluator (DOVER) reach state-of-the-art performance (0.91 SRCC for KoNViD-1k, 0.89 SRCC for LSVQ, 0.89 SRCC for YouTube-UGC) in the UGC-VQA problem. More importantly, our subjective studies construct the first aesthetic and technical VQA database, the DIVIDE-3k, proving that UGC-VQA is jointly affected by the two perspectives.
ICME 2023 (Oral)
sym

Exploring Opinion-unaware Video Quality Assessment with Semantic Affinity Criterion

GitHub , ArXiv (Short Version), Extension (Under Review for TIP)

  • Robustly predict quality without training from any MOS scores.
  • Localized quality prediction via CLIP.
  • Given a small set of MOS-labelled videos, can robustly+efficiently fine-tune on them.
TCSVT 2023 (IF: 8.400)
sym

DisCoVQA: Temporal Distortion-Content Transformers for Video Quality Assessment

GitHub, ArXiv

  • Based on prominent time-series modeling ability of transformers, we propose a novel and effective transformer-based VQA method to tackle temporal distortions as well as the content-related temporal attention mechanism. Without extra data for pre-training, the DisCoVQA reaches state-of-the-art performance on several VQA benchmarks and up to 10% better generalization ability than existing methods.

πŸ“– Educations

  • 2021.07.27 - 2024.07.31 (expected), PhD Candidate, S-Lab, Nanyang Technological University, supervised by Prof. Weisi Lin, Research Topic: Consistent and Robust Video Quality Assessments.
  • 2021.08.10 - 2023.07.27, PhD Student, S-Lab, Nanyang Technological University, supervised by Prof. Weisi Lin.
  • 2017.09 - 2021.07, B.S. in Computer Science, EECS, Peking University.

πŸ“– Teachings

  • 2023.02 - 2023.05, Teaching Assistant for SC1015: Introduction to Data Science and Artificial Intelligence.
  • 2023.08 - 2024.05, Teaching Assistant for SC3000: Artificial Intelligence.

πŸ’» Internships

  • 2023.05 - 2023.08, TikTok Pte Ltd, Singapore, Singapore.
  • 2022.01 - 2022.08, Sensetime Research, Beijing, China.
  • 2021.05 - 2021.08, AiFi Inc, CA, USA.
  • 2020.06 - 2021.03, Megvii Research, Beijing, China.