Biomedical Imaging Group
Logo EPFL
    • Splines Tutorials
    • Splines Art Gallery
    • Wavelets Tutorials
    • Image denoising
    • ERC project: FUN-SP
    • Sparse Processes - Book Preview
    • ERC project: GlobalBioIm
    • The colored revolution of bioimaging
    • Deconvolution
    • SMLM
    • One-World Seminars: Representer theorems
    • A Unifying Representer Theorem
Follow us on Twitter.
Join our Github.
Masquer le formulaire de recherche
Menu
BIOMEDICAL IMAGING GROUP (BIG)
Laboratoire d'imagerie biomédicale (LIB)
  1. School of Engineering STI
  2. Institute IEM
  3.  LIB
  4.  Molecular Diffusion
  • Laboratory
    • Laboratory
    • Laboratory
    • People
    • Jobs and Trainees
    • News
    • Events
    • Seminars
    • Resources (intranet)
    • Twitter
  • Research
    • Research
    • Researchs
    • Research Topics
    • Talks, Tutorials, and Reviews
  • Publications
    • Publications
    • Publications
    • Database of Publications
    • Talks, Tutorials, and Reviews
    • EPFL Infoscience
  • Code
    • Code
    • Code
    • Demos
    • Download Algorithms
    • Github
  • Teaching
    • Teaching
    • Teaching
    • Courses
    • Student projects
  • Splines
    • Teaching
    • Teaching
    • Splines Tutorials
    • Splines Art Gallery
    • Wavelets Tutorials
    • Image denoising
  • Sparsity
    • Teaching
    • Teaching
    • ERC project: FUN-SP
    • Sparse Processes - Book Preview
  • Imaging
    • Teaching
    • Teaching
    • ERC project: GlobalBioIm
    • The colored revolution of bioimaging
    • Deconvolution
    • SMLM
  • Machine Learning
    • Teaching
    • Teaching
    • One-World Seminars: Representer theorems
    • A Unifying Representer Theorem

Self-Supervised Learning of Molecular Diffusion Using Motion-Informed Vision Transformer—MiViT

E. Silly, J. Requejo-Isidro, D. Sage

Proceedings of the Single-Molecule Localization Microscopy Symposium (SMLMS'25), Bonn, Federal Republic of Germany, August 27-29, 2025, pp. 97


Estimating diffusion of molecule from image-based single particle tracking (SPT) is essential for probing subcellular states. The diffusion coefficient (D) is typically derived from the mean square displacement (MSD) of sub-pixel localizations; however, motion during exposure produces blurry, blob-like shapes that degrade localization precision and diffusion accuracy. Indeed, previous work [Park, 2023] has shown that convolutional neural networks (CNN) can infer D directly from small image patches centered on the localization; however, the lack of temporal context limits the performance. We propose a Motion-Informed Vision Transformer ( MiViT ), to directly regress the diffusion coefficient (D) from time-series image patches, capturing spatial and temporal features. Trajectory features [Kæstel-Hansen, 2024] are computed to form a temporal token, which is concatenated with CNN-encoded shape tokens. The resulting spatiotemporal tokens are then processed through self-attention layers within a transformer architecture. To train without labeled data, we use self-supervised learning on simulated sequences of Brownian diffusing particles generated under imaging conditions, aligned with the ANDI challenge [Muñoz-Gil, 2021]. MiViT reduces the error on D estimation, (mean squared error: 1.41 for MSD, 0.76 for CNN, and 0.57 for our method) on 10,000 synthetic samples. The transformer architecture captures long-range dependencies and temporal structure more effectively, especially under noise. Our approach could generalize across various experimental conditions, demonstrating the benefit of spatiotemporal self-attention in MiViT models. Although currently validated only on synthetic data, further work is needed to evaluate robustness under real acquisition variability. Our pilot study work suggests that high frame rates are not strictly necessary; improved image quality at lower frame rates may yield more informative diffusion estimates.

@INPROCEEDINGS(http://bigwww.epfl.ch/publications/silly2501.html,
AUTHOR="Silly, E. and Requejo-Isidro, J. and Sage, D.",
TITLE="Self-Supervised Learning of Molecular Diffusion Using
	Motion-Informed Vision Transformer---{MiViT}",
BOOKTITLE="Proceedings of the Single-Molecule Localization Microscopy
	Symposium ({SMLMS'25})",
YEAR="2025",
editor="",
volume="",
series="",
pages="97",
address="Bonn, Federal Republic of Germany",
month="August 27-29",
organization="",
publisher="",
note="")
© 2025 Universitätsgesellschaft Bonn. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from Universitätsgesellschaft Bonn. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
  • Laboratory
  • Research
  • Publications
    • Database of Publications
    • Talks, Tutorials, and Reviews
    • EPFL Infoscience
  • Code
  • Teaching
Logo EPFL, Ecole polytechnique fédérale de Lausanne
Emergencies: +41 21 693 3000 Services and resources Contact Map Webmaster email

Follow EPFL on social media

Follow us on Facebook. Follow us on Twitter. Follow us on Instagram. Follow us on Youtube. Follow us on LinkedIn.
Accessibility Disclaimer Privacy policy

© 2025 EPFL, all rights reserved