Dídac Surís

Dídac Surís Coll-Vinent

Research Scientist at Meta Superintelligence Labs

About Me

I am a Research Scientist at Meta Superintelligence Labs. I obtained my PhD from Columbia University, where I was a Microsoft PhD fellow and worked under the supervision of Professor Carl Vondrick.

My interests are multimodal machine learning, video representations and self-supervised learning, and in general all the areas of artificial intelligence that involve using all the available information in an intelligent way.

Before moving to New York, I was a researcher at the Vector Institute, working in Professor Sanja Fidler's lab. I was previously a visiting student at CSAIL-MIT, working in Professor Antonio Torralba's lab, and I studied both the undergrad and a Master's in telecommunications in the UPC, in Barcelona. I have also been lucky to work at Telefonica with Joan Serrà, at Adobe with Justin Salamon and Bryan Russell, and at Meta (FAIR) with Yale Song and Lorenzo Torresani.

In my free time, I like to play the classical guitar. Maybe (probably not) some day I'll upload some recording of me playing. Also, I got interviewed by the Columbia CS department about my experience as a PhD student.

Publications

My publications are also listed in my Google Scholar profile.

Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Dídac Surís, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane Momeni, Rishi Hazra, Shuangrui Ding, Sagar Vaze, Francois Porcher, Feng Li, Siyuan Li, Aishwarya Kamath, Ho Kei Cheng, Piotr Dollár, Nikhila Ravi, Kate Saenko, Pengchuan Zhang, Christoph Feichtenhofer
SAM 3: Segment Anything with Concepts NEW! arXiv preprint, 2025.
[BibTeX] [PDF] [Website] [Demo] [Blog] [Code]

@article{carion2025sam3,
    title={SAM 3: Segment Anything with Concepts},
    author={Nicolas Carion and Laura Gustafson and Yuan-Ting Hu and Shoubhik Debnath and Ronghang Hu and D\'idac Sur\'is and Chaitanya Ryali and Kalyan Vasudev Alwala and Haitham Khedr and Andrew Huang and Jie Lei and Tengyu Ma and Baishan Guo and Arpit Kalla and Markus Marks and Joseph Greer and Meng Wang and Peize Sun and Roman R{\"a}dle and Triantafyllos Afouras and Effrosyni Mavroudi and Katherine Xu and Tsung-Han Wu and Yu Zhou and Liliane Momeni and Rishi Hazra and Shuangrui Ding and Sagar Vaze and Francois Porcher and Feng Li and Siyuan Li and Aishwarya Kamath and Ho Kei Cheng and Piotr Doll{\'a}r and Nikhila Ravi and Kate Saenko and Pengchuan Zhang and Christoph Feichtenhofer},
    journal={arXiv preprint arXiv:2511.16719},
    year={2025}
}

Dante Francisco Wasmuht, Otto Brookes, Maximillian Schall, Pablo Palencia, Chris Beirne, Tilo Burghardt, Majid Mirmehdi, Hjalmar Kühl, Mimi Arandjelovic, Sam Pottie, Peter Bermant, Brandon Asheim, Yi Jin Toh, Adam Elzinga, Jason Holmberg, Andrew Whitworth, Eleanor Flatt, Laura Gustafson, Chaitanya Ryali, Yuan-Ting Hu, Baishan Guo, Andrew Westbury, Kate Saenko, Dídac Surís
The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification NEW! arXiv preprint, 2025.
[BibTeX] [PDF] [Website]

@article{wasmuht2025safari,
    title={The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification},
    author={Dante Francisco Wasmuht and Otto Brookes and Maximillian Schall and Pablo Palencia and Chris Beirne and Tilo Burghardt and Majid Mirmehdi and Hjalmar K{\"u}hl and Mimi Arandjelovic and Sam Pottie and Peter Bermant and Brandon Asheim and Yi Jin Toh and Adam Elzinga and Jason Holmberg and Andrew Whitworth and Eleanor Flatt and Laura Gustafson and Chaitanya Ryali and Yuan-Ting Hu and Baishan Guo and Andrew Westbury and Kate Saenko and D\'idac Sur\'is},
    journal={arXiv preprint arXiv:2511.15622},
    year={2025}
}

Ege Ozguroglu, Ruoshi Liu, Dídac Surís, Dian Chen, Achal Dave, Pavel Tokmakov, Carl Vondrick
pix2gestalt: Amodal Segmentation by Synthesizing Wholes Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
[BibTeX] [PDF] [Website]

@article{ozguroglu2024pix2gestalt,
    title={pix2gestalt: Amodal Segmentation by Synthesizing Wholes},
    author={Ege Ozguroglu and Ruoshi Liu and D\'idac Sur\'is and Dian Chen and Achal Dave and Pavel Tokmakov and Carl Vondrick},
    journal={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2024}
}

Dídac Surís*, Sachit Menon*, Carl Vondrick
ViperGPT: Visual Inference via Python Execution for Reasoning International Conference on Computer Vision (ICCV) - ORAL PRESENTATION, 2023.
[BibTeX] [PDF] [Website]

@article{surismenon2023vipergpt,
    title={ViperGPT: Visual Inference via Python Execution for Reasoning},
    author={D\'idac Sur\'is and Sachit Menon and Carl Vondrick},
    journal={Proceedings of IEEE International Conference on Computer Vision (ICCV)},
    year={2023}
}

Purva Tendulkar, Dídac Surís, Carl Vondrick
FLEX: Full-Body Grasping Without Full-Body Grasps Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
[BibTeX] [PDF] [Website]

@inproceedings{tendulkar2022flex,
    title={FLEX: Full-Body Grasping Without Full-Body Grasps},
    author={Tendulkar, Purva and Sur\'is, D\'idac and Vondrick, Carl},
    journal={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2023}
}

Dídac Surís, Carl Vondrick
Representing Spatial Trajectories as Distributions Conference on Neural Information Processing Systems (NeurIPS), 2022.
[BibTeX] [PDF] [Website] [5min Video Presentation]

@article{suris2022trajectories,
    title={Representing Spatial Trajectories as Distributions},
    author={Sur\'is, D\'idac and Vondrick, Carl},
    journal={Advances in Neural Information Processing Systems 35 (NeurIPS)},
    year={2022}
}

Dídac Surís, Carl Vondrick, Bryan Russell and Justin Salamon
It's Time for Artistic Correspondence in Music and Video Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
[BibTeX] [PDF] [Website] [5min Video Presentation]

@article{suris2022musicforvideo,
    title={It's Time for Artistic Correspondence in Music and Video},
    author={Sur\'is, D\'idac and Vondrick, Carl and Russell, Bryan and Salamon, Justin},
    journal={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2022}
}

Dídac Surís, Dave Epstein and Carl Vondrick
Globetrotter: Unsupervised Multilingual Translation from Visual Alignment Conference on Computer Vision and Pattern Recognition (CVPR) - ORAL PRESENTATION, 2022.
[BibTeX] [PDF] [Code and model] [Website] [5min Video Presentation]

@article{suris2022globetrotter,
    title={Globetrotter: Connecting Languages by Connecting Images},
    author={Sur\'is, D\'idac and Epstein, Dave and Vondrick, Carl},
    journal={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2022}
}

Basile Van Hoorick, Purva Tendulkar, Dídac Surís, Dennis Park, Simon Stent and Carl Vondrick
Revealing Occlusions with 4D Neural Fields Conference on Computer Vision and Pattern Recognition (CVPR) - ORAL PRESENTATION, 2022.
[BibTeX] [PDF] [Code and models] [Website] [5min Video Presentation]

@article{vanhoorick2022revealing,
    title={Revealing Occlusions with 4D Neural Fields},
    author={Van Hoorick, Basile and Tendulkar, Purva and Sur\'is, D\'idac and Park, Dennis and Stent, Simon and Vondrick, Carl},
    journal={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2022}
}

Dídac Surís*, Ruoshi Liu* and Carl Vondrick
Learning the Predictability of the Future Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
[BibTeX] [PDF] [Code and model] [Website] [Press release] [Video Presentations (1h) (15min) (5min)]

@InProceedings{suris2021hyperfuture,
    title={Learning the Predictability of the Future},
    author={Sur\'is, D\'idac and Liu, Ruoshi and Vondrick, Carl},
    journal={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2021}
}

Dídac Surís*, Dave Epstein, Heng Ji, Shih-Fu Chang and Carl Vondrick
Learning to Learn Words from Visual Scenes European Conference on Computer Vision (ECCV), 2020.
[BibTeX] [PDF] [Code and model] [Video Presentation] [Website]

@Article{Suris2020learning,
    author = {Dídac Surís and D. Epstein and H. Ji and S. Chang and C. Vondrick},
    title = {Learning to Learn Words from Visual Scenes},
    journal = {European Conference on Computer Vision (ECCV)},
    year = {2020}
}

Dídac Surís*, Adrià Recasens*, David Bau, David Harwath, James Glass and Antonio Torralba
Learning Words by Drawing Images Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[BibTeX] [PDF] [Code] [Website]

@Article{Suris2019,
    author = {D. Sur\'is and A. Recasens and D. Bau and D. Harwath and J. Glass and A. Torralba},
    title = {Learning Words by Drawing Images},
    journal = {Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2019}
}

David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba and James Glass
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input European Conference on Computer Vision (ECCV), 2018 (ORAL PRESENTATION).
[BibTeX] [PDF] [Code and data] [Video Presentation] [MIT News]

@Article{Harwath2018,
    author = {D. Harwath and A. Recasens and D. Sur\'is and G. Chuang and A. Torralba and J. Glass},
    title = {Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input},
    journal = {European Conference on Computer Vision (ECCV)},
    year = {2018}
}

Joan Serrà, Dídac Surís, Marius Miron and Alexandros Karatzoglou
Overcoming catastrophic forgetting with hard attention to the task International Conference on Machine Learning (ICML), 2018 (LONG TALK).
[BibTeX] [PDF] [Code] [Video Presentation(21:20)] [Tech World News]

@Article{Serra2018,
    author = {J. Serr\`a and D. Sur\'is and M. Miron and A. Karatzoglou},
    title = {Overcoming catastrophic forgetting with hard attention to the task},
    journal = {International Conference on Machine Learning (ICML)},
    year = {2018}
}

Dídac Surís, Amanda Duarte, Amaia Salvador, Jordi Torres and Xavier Giró-i-Nieto
Cross-modal Embeddings for Video and Audio Retrieval European Conference on Computer Vision Workshops (ECCV Workshops), 2018.
[BibTeX] [PDF]

@Article{Suris2018,
    author = {D. Sur\'is and A. Duarte and A. Salvador and J. Torres and X. Gir\'o-i-Nieto},
    title = {Cross-modal Embeddings for Video and Audio Retrieval},
    journal = {European Conference on Computer Vision Workshops (ECCV Workshops)},
    year = {2018}
}

Dídac Surís, Adrian Agustin and Josep Vidal
Delay minimization in dynamic and scalable multi-operator wireless backhauling IEEE International Conference on Communications Workshops (ICC Workshops), 2017.
[BibTeX] [PDF]

@Article{Suris2017,
    author = {D. Sur\'is and A. Agustin and J. Vidal},
    title = {Delay minimization in dynamic and scalable multi-operator wireless backhauling},
    journal = {IEEE International Conference on Communications Workshops (ICC Workshops)},
    year = {2017}
}

Resume

Research Experience

  • Research Scientist at Meta Superintelligence Labs
  • PhD Student at Columbia University, in Carl Vondrick's group - Microsoft PhD Fellow
  • Internship at Meta (FAIR) - Research on large language models for video understanding
  • Internship at Adobe Research - Research in the intersection of music and video
  • Research Assistant at Vector Institute, in Sanja Fidler's group
  • Visiting Student at CSAIL-MIT, at Antonio Torralba's lab
  • Internship at Telefonica Research - Developed a method to avoid catastrophic forgetting in neural networks

Awards & Fellowships

  • Microsoft PhD Fellow
  • "la Caixa" fellowship to carry out graduate studies in North America
  • Best academic record awards in both BSc and MSc
  • Excellence Master's degree Grants by the Catalunya-La Pedrera foundation

Education

  • PhD in Computer Science - Columbia University
  • MSc in Telecommunications - UPC BarcelonaTech
  • BSc in Telecommunications - UPC BarcelonaTech