Vladimir Iashin
/vla-DEE-meer/ /YA-sheen/

I'm a deep learning researcher working as a postdoc at the Visual Geometry Group (VGG) in the University of Oxford.

My research is focused on figuring out how to make computers understand videos with a mix of visuals, sound, and text. I build neural nets that bring all these pieces together.

I completed my PhD (with distinction) in EECS at Tampere University, supervised by Esa Rahtu. I hold an MSc (with distinction) in Applied Mathematics and Computer Science, along with a BSc in Economics from HSE University.

In my free time, I enjoy hitting the gym, and searching for terroir in coffee.

[Google Scholar] • [GitHub] • [X/Twitter] • [LinkedIn]

Selected Publications

Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder
Vladimir Iashin, Horace Lee, Dan Schofield, and Andrew Zisserman
Computer Vision for Ecological and Biodiversity Monitoring Workshop, ICIP, 2025
[Project Page] • [Code] • [Paper]

SAGANet: Video Object Segmentation-aware Audio Generation
Ilpo Viertola, Vladimir Iashin, Esa Rahtu
GCPR, 2025 (Oral)
[Project Page] • [Code] • [Paper]

Temporally Aligned Audio for Video with Autoregression
Ilpo Viertola, Vladimir Iashin, Esa Rahtu
ICASSP, 2025 (Oral)
[Project Page] • [Code] • [Paper]

Synchformer: Efficient Synchronization from Sparse Cues
Vladimir Iashin, Weidi Xie, Esa Rahtu, and Andrew Zisserman
ICASSP, 2024
[Project Page] • [Code] • [Paper]

Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors
Vladimir Iashin, Weidi Xie, Esa Rahtu, and Andrew Zisserman
BMVC, 2022 (Spotlight)
[Project Page] • [Code] • [Paper] • [Presentation]

Taming Visually Guided Sound Generation
Vladimir Iashin and Esa Rahtu
BMVC, 2021 (Oral)
[Project Page] • [Code] • [Paper] • [Presentation]

A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer
Vladimir Iashin and Esa Rahtu
BMVC, 2020
[Project Page] • [Code] • [Paper] • [Presentation]

Multi-modal Dense Video Captioning
Vladimir Iashin and Esa Rahtu
Multimodal Learning Workshop, CVPR, 2020
[Project Page] • [Code] • [Paper] • [Presentation]

Community Service

Reviewer
AAAI 2023, CVPR 2022, ICCV 2021, TPAMI 2020, and conference workshops in 2024 & 2021

Misc

Video Features
Enables seamless feature extraction and optical flow frame extraction from raw videos with multi-GPU acceleration through a user-friendly and flexible API.

Object Detector
Discover the contents of your uploaded image effortlessly.
The detector is based on YOLOv3 and implemented in PyTorch. The computation is done on ~~a cloud server which runs a Flask application (see the Note)~~ HuggingFace Spaces with Gradio UI.
[Code] • [~~Note~~]

ITC Wiki
During my PhD studies, I organized a crowd-sourced wiki which helps with common internal questions such as how to set up remote access to office GPU machines.

IDE Customization
A note about VSCode customization.

Vladimir Iashin /vla-DEE-meer/ /YA-sheen/

Selected Publications

Community Service

Misc

Vladimir Iashin
/vla-DEE-meer/ /YA-sheen/