• πŸ‘‹ Hi, I’m Teo Wu (officially Haoning Wu), Final Year PhD Candidate in Nanyang Technological University πŸ‡ΈπŸ‡¬, supervised by Prof. Weisi Lin. Prior to that, I obtained by B.S. degree of computer science in Peking University, PR China.

  • I am currently focusing on multi-modal LLM (MLLM) pre-training and evaluation (longer context & better instruction-following). See our LongVideoBench, the first video benchmark for MLLMs proven improvable given more input frames (>=256).

  • 🌱 I have also been the lead of project Q-Future: Visual Evaluation with MLLMsπŸ“Ή, on which 6 first-authored papers accepted in top conferences and journels including ICML, ICLR, TPAMI, CVPR, ECCV and ACMMM. The flagship scorer, OneAlign has been downloaded more than 238K times (until Jul 25, 2024) on HuggingFace.

  • Prior to MLLMs, my PhD topic was on video quality assessment, a traditional area trying to gauge the quality scores (and more) on videos. Among 6 papers published in that area (in ECCV, ICCV, TPAMI, etc), the two representative works are FAST-VQA and DOVER , earning 1st and 2nd-ranked stars among all projects in this topic.

  • πŸ“« Reach me by e-mail: realtimothyhwu@gmail.com/haoning001@e.ntu.edu.sg, Twitter: Twitter
  • Google Scholar

πŸ”₯ News

  • 2024.08.08: Β πŸŽ‰πŸŽ‰ Extension of Q-Bench (Q-Bench+, on image pairs) get accepted by TPAMI!
  • 2024.07.16: Β πŸŽ‰πŸŽ‰ 4 papers in Q-Future get accepted by ACMMM (1 first-authored, 3 Oral)!
  • 2024.07.02: Β πŸŽ‰πŸŽ‰ Co-Instruct get accepted by ECCV2024 as an Oral paper!
  • 2024.05.02: Β πŸŽ‰πŸŽ‰ Q-Align get accepted by ICML2024 (score: 7765)!
  • 2024.02.27: Β πŸŽ‰πŸŽ‰ Q-Instruct get accepted by CVPR2024!
  • 2024.01.16: Β πŸŽ‰πŸŽ‰ Q-Bench get accepted by ICLR2024 as a Spotlight paper!
  • 2023.09.10: Β πŸŽ‰πŸŽ‰ Extension of FAST-VQA (FasterVQA) get accepted by TPAMI!
  • 2023.07.26: Β πŸŽ‰πŸŽ‰ MaxVQA get accepted by ACMMM2023 as an Oral paper!
  • 2023.07.14: Β πŸŽ‰πŸŽ‰ DOVER get accepted by ICCV2023!
  • 2023.02.28: Β πŸŽ‰πŸŽ‰ DisCoVQA get accepted by TCSVT. First paper written in my PhD career.
  • 2022.07.07: Β πŸŽ‰πŸŽ‰ FAST-VQA get accepted by ECCV2022!

πŸ“ Selected Publications

Preprints

Arxiv 2024
sym

LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding

  • Authors: me, Dongxu, Bei Chen, Junnan
  • TL, DR: A benchmark for long videos that truly requires a long input (i.e. hundreds of frames). First-ever non-synthetic benchmark (6.7K human-crafted MCQs) on long-context multi-modal understanding.

Selected Publications

ECCV 2024 (Oral)
sym

Towards Open-ended Visual Quality Comparison

  • Authors: me, Hanwei, Zicheng, et al.
  • TL, DR: The first work for MLLMs on visual quality comparison, via bootstrapping human and LLM, plus distilling GPT-4v. It is a very early attempt on multi-image MLLM. Proposed dataset has been integrated by LLaVA-Interleave and Mantis.
ICML 2024
sym

Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

  • Authors: me, Zicheng, et al.
  • TL, DR: The best visual quality and aesthetic scorer so far (disclaimer: until Jul 25, 2024); Yet a better way to train LMMs on scoring. It fine-tunes an LMM with discrete text levels (good, poor, fair) and extract the weighted average of token probabilities for continuous scores on inference. SOTA on 12 datasets.
CVPR 2024
sym

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

  • Authors: me, Zicheng, Erli, et al.
  • TL, DR: The first low-level visual instruction tuning dataset. Includes GPT to generate QA pairs based on human-annotated quality descriptions.
ICLR 2024 (Spotlight) & TPAMI 2024
sym

Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-Level Vision

  • Authors: me, Zicheng, Erli, et al.
  • TL, DR: The first-things-first of Q-Future, defining three important tasks: low-level description (similar to captioning), low-level question-answering, and scoring. It has been extended to multi-image versions (Q-Bench+) as well as on AI-generated artifacts (A-Bench).