Yanis Labrak

Postdoctoral Researcher

Idiap Research Institute

Switzerland 🇨🇭

💼 I am a Postdoctoral Researcher at the Idiap Research Institute (Switzerland), specializing in Speech and Natural Language Processing (NLP). I previously completed my PhD at the Avignon University CS Research Lab (LIA) in collaboration with Zenidoc.

🔬 My research focuses on advanced Machine Learning, specifically Large Language Models (LLMs) and Speech Recognition tailored for complex, domain-specific applications like Healthcare.

🔭 Currently, I am investigating the multimodal integration of speech into textual LLMs. My goal is to leverage massive textual knowledge to enable seamless End-to-End performance, replacing traditional cascaded systems with more robust, unified architectures.

Spaces on HuggingFace

Models on HuggingFace

Connect with me 😃

Interests

Speech processing
Multi-modality (speech-text)
Autoregressive modeling
Synthetic data generation
Healthcare domains

Education

PhD in Computer Science, 2025

Avignon University
MSc in Computer Science, 2022

Avignon University
BSc in Computer Science, 2020

Avignon University

Research Demo

Interactive demonstration of End-to-End synthetic clinical data generation using SDialog.

Why Synthetic Data?

Real clinical conversations are protected by strict privacy laws (GDPR/HIPAA) and are expensive to annotate. By simulating persona-driven, multi-agent dialogues, we can generate limitless, privacy-safe training data to build robust healthcare AI models while actively controlling for fairness and bias.

sdialog-orchestrator ~ bash

> Waiting for pipeline initialization...

1. Persona & Environment Definition

Agent A (Doctor)
Role: Cardiologist
Tone: Empathetic

Agent B (Patient)
Profile: 55yo Male
State: Post-medication

Environment: Cardiology Consultation Room, Follow-up visit.

2. Multi-Agent Turn-by-Turn Generation

Dr. Smith: Good morning. How have you been feeling on the new Lisinopril dosage?

Patient: The dizziness stopped, but my blood pressure is still 135/85.

Dr. Smith: I see. Let's increase it to 20mg daily. Have you noticed any dry cough?

Patient: No cough, but I do feel a bit fatigued in the afternoons.

3. Evaluation & Audio Synthesis

Metrics:
[Clinical] Coherence: 0.98 | Med-Accuracy: 0.95
[Safety] Toxicity: 0.00 | Bias: 0.02
[Linguistics] Fluency: 0.97 | Lexical-Diversity: 0.88

4. Downstream Tasks (NER, SLU, Sum)

NER:
BP is 135/85. Increase Lisinopril to 20mg. No cough, mild fatigue.

SLU:
[Intent: Update_Prescription] [Slot: Med=Lisinopril, Dose=20mg]
[Intent: Report_Symptom] [Slot: Symptom=Fatigue]

Summary:
BP remains elevated (135/85). Lisinopril increased to 20mg. Patient denies cough but reports afternoon fatigue.

Experience

Postdoctoral Researcher

Idiap Research Institute

Jan 2026 – Present Martigny, Switzerland

Member of the Speech & Audio Processing group under the supervision of Petr Motlicek.
Researching the transition "From Bias to Fairness" to build the next generation of Trustworthy AI systems.
Developing robust speech and language models with a focus on ethical AI, bias mitigation, and transparency in large-scale multimodal systems.
Focus Areas: Trustworthy AI, Speech Processing, Multimodal LLMs, Fairness & Ethics.

Visiting Researcher

Brno University of Technology (JSALT 2025)

June 2025 – August 2025 Brno, Czechia

Member of the JSALT 2025 "Play your Part" team.
Collaborated on Synthetic Dialog Generation and Analysis using LLMs.
Contributed to the open-source development of sdialog.

Research Intern

Deezer Research

March 2024 – Sept 2024 Paris, France

Subject of research : AI Lyrics Detection
Applied Research on AI generated lyrics detection for filtering daily ingested tracks and identifying fraudulent content.
Explore the capabilities of lyrics based text embeddings to guide content recommendation in production.
Tech Stack: Docker, Python, Pytorch, Google Cloud Platform, BigQuery, Kafka, CUDA, Flask, SQL.

Senior Research Scientist

Zenidoc

Sep 2020 – Sep 2025 Marseille, France

Collect industry partners needs and study the technical feasibility of projects
Manage the technical deployment of solutions developed in academia
Promote state-of-the-art technologies and find out new applications
Transfer of technical knowledge to technical teams
Supervise annotations phases for named-entity-recognition, part-of-speech tagging and documents classification
Manage in-house technological benchmarking comparison

PhD Student

Avignon University - LIA / Nantes University - LS2N

Aug 2022 – Sept 2025 Avignon, France

Thesis title: Language Models at the Crossroads of Text and Speech for Healthcare Applications
Organizer of the 2023 edition of the French NLP shared task DEFT (DÉfi Fouille de Textes) in Paris. The shared task was articulated around our medical multiple-choice question answering dataset FrenchMedMCQA and brought together 6 academic and industrial teams for a total of 34 participants.
Major success of the release of our open-source models (up to more than >110k downloads/month, multiple industry applications).
Presented my research in person to the French National Research Agency (my research is core to the MALADES project.
My work was featured in multiple news articles and a wide range of podcasts, I gave 3 interviews (CNRS, l’usine digitale and ActuIA).
Teachings: Parallel Programming for MSc 1st year students (c, c++, cuda), End-to-end Software Development for MSc 1st year students (python, flask, dart, flutter, JavaScript, nodejs, tesseract-ocr) and Web & Database Architecture for BSc 2nd year students (php, javascript, jQuery, css).

Research Assistant

Avignon University - LIA

Aug 2020 – Sept 2022 Avignon, France

De-identification of electronic health record using deep neural network
Automatic Named Entity Recognition (NER) for medical records structuration and posologic entities extraction
Recommendation system for ICD-10 and CCAM codification based on the patient’s medical record (operative report, anesthesia report)
Introducing Part-of-speech tagging into an E2E ASR via a dual decoding to reduce semantic and grammatical errors in medical transcriptions.

Recent Publications

Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization.
Yanis Labrak, David Grünert, Severin Baroudi, Jiyun Chun, Pawel Cyrta, Sergio Burdisso, Ahmed Hassoon, David Liu, Adam Rothschild, Reed Van Deusen, Petr Motlicek, Andrew Perrault, Ricard Marxer, Thomas Schaaf (2026).

Under Review @ Interspeech 2026 Paper (Coming Soon)

Doctor or Patient? Synergizing Diarization and ASR for Code-Switched Hinglish Medical Conditions Extraction.
Séverin Baroudi, Yanis Labrak, Shashi Kumar, Joonas Kalda, Sergio Burdisso, Pawel Cyrta, Juan Ignacio Alvarez-Trejos, Petr Motlicek, Hervé Bredin, Ricard Marxer (2026).

Under Review @ Interspeech 2026 Paper

SDialog: A Python Toolkit for End-to-End Agent Building, User Simulation, Dialog Generation, and Evaluation.
Sergio Burdisso, Séverin Baroudi, Yanis Labrak, David Grunert, Pawel Cyrta, Yiyang Chen, Srikanth Madikeri, Esaú Villatoro-Tello, Thomas Schaaf, Ricard Marxer, Petr Motlicek (2026).

EACL 2026 Paper

Language Models at the Crossroads of Text and Speech for Healthcare Applications.
Yanis Labrak (2025).

Ph.D. Thesis, Université d'Avignon

Late Fusion and Multi-Level Fission Amplify Cross-Modal Transfer in Text-Speech LMs.
Santiago Cuervo, Adel Moumen, Yanis Labrak, Sameer Khurana, Antoine Laurent, et al. (2025).

ArXiv 2025 Paper

An Empirical Analysis of Discrete Unit Representations in Speech Language Modeling Pre-training.
Yanis Labrak, Richard Dufour and Mickaël Rouvier (2025).

TSD 2025 Paper

Detecting Synthetic Lyrics with Few-Shot Inference.
Yanis Labrak, Elena V. Epure and Gabriel Meseguer-Brocal (2024).

TrustNLP @ NAACL 2025 Paper

Zero-Shot End-To-End Spoken Question Answering In Medical Domain.
Yanis Labrak, Adel Moumen, Mickael Rouvier and Richard Dufour (2024).

InterSpeech 2024 Paper Code HuggingFace

BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains
Yanis Labrak, Adrien Bazoge, Richard Dufour, Mickael Rouvier, Emmanuel Morin, et al. (2024).

ACL 2024 Paper Code HuggingFace

DrBenchmark: A Large Language Understanding Evaluation Benchmark for French Biomedical Domain.
Yanis Labrak, Adrien Bazoge, Oumaima El Khettari, Mickael Rouvier, pacome constant dit beaufils, Natalia Grabar, Béatrice Daille, Solen Quiniou, Emmanuel Morin, Pierre-Antoine Gourraud and Richard Dufour. (2024)

LREC-COLING 2024 Paper Code HuggingFace

A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks.
Yanis Labrak, Mickael Rouvier and Richard Dufour. (2024)

LREC-COLING 2024 Paper

How Important Is Tokenization in French Medical Masked Language Models?
Yanis Labrak, Adrien Bazoge, Béatrice Daille, Mickael Rouvier and Richard Dufour. (2024)

LREC-COLING 2024 Paper Code HuggingFace

DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains.
Yanis Labrak, Adrien Bazoge, Richard Dufour, Mickael Rouvier, Emmanuel Morin, et al. (2023).

ACL 2023 Honorable Mention Paper Code HuggingFace

BigBIO: A Framework for Data-Centric Biomedical Natural Language Processing
Jason Alan Fries, Leon Weber, Natasha Seelam, Gabriel Altay, Debajyoti Datta, ..., Yanis Labrak, et al. (2022)

NeurIPS 2022 - Datasets & Benchmarks Paper Code HuggingFace

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, ..., Yanis Labrak, et al. (2022)

ArXiv 2022 Paper HuggingFace

FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain.
Yanis Labrak, Adrien Bazoge, Richard Dufour, Béatrice Daille, Pierre-Antoine Gourraud, et al.. (2022)

LOUHI @ EMNLP 2022 Paper Code

Projects

Apr 1, 2026

SDialog

SDialog is an MIT-licensed open-source toolkit for building, simulating, and evaluating LLM-based conversational agents end-to-end. It aims to bridge agent construction → user simulation → dialog generation → evaluation in a single reproducible workflow, so you can generate reliable, controllable dialog systems or data at scale. It standardizes a Dialog schema and offers persona‑driven multi‑agent simulation with LLMs, composable orchestration, built‑in metrics, and mechanistic interpretability.

Dec 25, 2021

HugsVision

Dec 31, 2020

DeskMap

From scratch implementation of a desktop version of a map browser such as Google Maps, coded in Java using canvas rendering. On top of it, I implemented a A-Star variant to optimize user trip planning by considering multi-modality (bike, foot, car, real-time bus of Nantes metropole) and respectfull of the regulation (one-way, ...).

Yanis Labrak

Postdoctoral Researcher

Idiap Research Institute

Spaces on HuggingFace

Models on HuggingFace

Connect with me 😃

Research Demo

Why Synthetic Data?

Experience

Recent Publications

Recent Posts

Projects