Data, benchmarks, and demos for the Workshop on Human鈥揝cene Interaction Workshop.
Given speech, a 3D target coordinate, and a scene, generate SMPL-X pointing gestures that are temporally aligned, spatially accurate, and referentially grounded. Submissions are scored on three axes: temporal alignment, spatial accuracy, and referent recall.
馃搫 Challenge Paper 路 馃摝 Data & Baseline 路 馃幆 Interactive Demo
| Resource | Description |
|---|---|
| MM-Conv | ~2K pointing clips from naturalistic VR dialogue with 3D scene graphs |
| SGS-HSI | 1,138 synthetic single-target pointing clips |
| OmniControl-PT baseline | Reference baseline (code & weights coming soon) |
| Milestone | Date |
|---|---|
| Challenge opens | May 5, 2026 |
| Submission deadline | July 7, 2026 |
| Results announced | July 31, 2026 |
| Workshop | October 2026 |
Jonas Beskow (KTH) 路 Rishabh Dabral (MPI) 路 Anna Deichler (KTH) 路 Fethiye Irmak Do臒an (Cambridge) 路 Anindita Ghosh (MPI) 路