I'm a deep learning engineer at Revolut, where I research and develop foundation models for fintech applications.
Earlier, I was a postdoctoral researcher at the Visual Geometry Group (VGG) at the University of Oxford, where I worked with Andrew Zisserman on multimodal computer vision, with a focus on combining video, audio, and text.
Prior to that, I completed my PhD (with distinction) in EECS at Tampere University, supervised by Esa Rahtu. Even earlier, I did an MSc (with distinction) in Applied Mathematics and Computer Science, along with a BSc in Economics from HSE University.
In my free time, I enjoy hitting the gym, and searching for terroir in coffee.
[Google Scholar] • [GitHub] • [X/Twitter] • [LinkedIn]
PRAGMA: Revolut Foundation Model
Maxim Ostroukhov, Ruslan Mikhailov, Vladimir Iashin, and others
Technical Report, 2026
[Paper]
Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder
Vladimir Iashin, Horace Lee, Dan Schofield, and Andrew Zisserman
Computer Vision for Ecological and Biodiversity Monitoring Workshop, ICIP, 2025
[Project Page] •
[Code] •
[Paper]
SAGANet: Video Object Segmentation-aware Audio Generation
Ilpo Viertola, Vladimir Iashin, Esa Rahtu
GCPR, 2025 (Oral)
[Project Page] •
[Code] •
[Paper]
Temporally Aligned Audio for Video with Autoregression
Ilpo Viertola, Vladimir Iashin, Esa Rahtu
ICASSP, 2025 (Oral)
[Project Page] •
[Code] •
[Paper]
Synchformer: Efficient Synchronization from Sparse Cues
Vladimir Iashin, Weidi Xie, Esa Rahtu, and Andrew Zisserman
ICASSP, 2024
[Project Page] •
[Code] •
[Paper]
Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors
Vladimir Iashin, Weidi Xie, Esa Rahtu, and Andrew Zisserman
BMVC, 2022 (Spotlight)
[Project Page] •
[Code] •
[Paper] •
[Presentation]
Taming Visually Guided Sound Generation
Vladimir Iashin and Esa Rahtu
BMVC, 2021 (Oral)
[Project Page] •
[Code] •
[Paper] •
[Presentation]
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer
Vladimir Iashin and Esa Rahtu
BMVC, 2020
[Project Page] •
[Code] •
[Paper] •
[Presentation]
Multi-modal Dense Video Captioning
Vladimir Iashin and Esa Rahtu
Multimodal Learning Workshop, CVPR, 2020
[Project Page] •
[Code] •
[Paper] •
[Presentation]
Reviewer
AAAI 2023, CVPR 2022, ICCV 2021, TPAMI 2020, and conference workshops in 2024 & 2021
Video Features
Enables seamless feature extraction and optical flow frame extraction from raw videos
with multi-GPU acceleration through a user-friendly and flexible API.
Object Detector
Discover the contents of your uploaded image effortlessly.
The detector is based on YOLOv3 and implemented in PyTorch.
The computation is done on a cloud server which runs a Flask application (see the Note)
HuggingFace Spaces with Gradio UI.
[Code] •
[Note]
ITC Wiki
During my PhD studies, I organized a crowd-sourced wiki which helps with common internal questions
such as how to set up remote access to office GPU machines.
IDE Customization
A note about VSCode customization.