What were you doing before Jungle Program?

I was epidemiologist, co-head of two work packages, ’Data innovation’ and ‘methods and scientific advice’ at the French Constances cohort. The goal of the Constances project is to implement a sizeable epidemiological group aimed at contributing to the development of epidemiological research. This tool aims to constitute an infrastructure open to the research community and provide public health information. In addition, Constances was intended as a public health and surveillance tool, thanks to the particularly exhaustive nature of the system for collecting and monitoring a great variety of data from a large representative sample of the adult population covered by the General Social Security scheme.

Constances | Améliorer la santé de demain
Constances est une grande cohorte épidémiologique constituée d’un échantillon représentatif de 200 000 adultes consultants des Centres d’examens de santé (CES) de la Sécurité sociale. Constances sera ouverte à la communauté des chercheurs et aux professionnels de santé publique

Could you tell us more about your research in epidemiology?

Before joining Constances Cohort, I worked during my Ph.D. on the impact of agricultural activities on neurodegenerative diseases using the massive data of the French insurance health.

I am currently working on hypertension and cardiovascular diseases. This research aims to understand the leading risk factor and identify the best care pathways for optimal medical care using Constances' massive database linked to the French Insurance Health Databases (SNDS).

How do you imagine epidemiology in 5 years?

I think it’s not possible to give a global answer to this question. Epidemiology is a multidisciplinary science with several fields (social epidemiology, genetic epidemiology, clinical epidemiology…). Each area has its expertise, needs, and challenges.

For my specific field, clinical epidemiology applied to hypertension and cardiovascular risk factors, massive data and machine learning will significantly contribute in 5 years.

Do you have a favorite anecdote or a key discovery moment from your epidemiology research that you’d like to share?

During my thesis, my first scientific work highlighted that pesticides use in agriculture is probably also associated with a higher proportion of Parkinson's disease among the general population in some areas and not only among agricultural workers. The scientific paper was accepted the day before the Christmas holidays after more than 6 months of review.

What aspects of being an epidemiologist and a data scientist do you find most fulfilling?

Sometimes, the possibility to provide a simple answer to a complex question using massive big data.

What do you think are key challenges epidemiology as a whole is facing today?

Massive databases open up new possibilities for epidemiology. But they also pose new challenges to the discipline, such as mastering the latest analysis methods while maintaining high data quality for reliable and reproducible results. Last but not least are the ethical considerations regarding anonymization and personal data protection.

What do you think are the biggest challenges for the training of future epidemiologists?

Epidemiology is multidisciplinary, requiring technical skills in data analyses and knowledge in the field of interest (medical, biological, sociological, etc.). Moreover, awareness, rigor, and data quality seem to me to be of primary importance.

What made you learn more about Data Science?

I needed to work on massive data from the French health insurance and Constances cohort.

Should every epidemiologist learn to code?

As an example, every microbiologist should be able to use a microscope. Likewise, I believe that all epidemiologists should know how to code.

How is Jungle Program changing the way you think about data science?

The importance of gaining skills in data visualisation with Python and report automation using Jupyter notebooks.

What is your impression of the Jungle Program?

The data science course is one of the best programs I have attended. It covers both theoretical and practical aspects. The teachers are very competent and available. In addition, the small number of students in each group allows for a direct and effective exchange.

Jungle Program – Data Science Live Course
Join a 6-week training to practice and gain strong foundations in data science and machine learning.

What’s your favorite part of the Live course?

Work on practical examples and the possibility to interact with the trainers directly.

What advice do you have for epidemiologists looking to start their journey in Data Science?

Objectively, the Jungle program perfectly fits any epidemiologist who wishes to quickly acquire a solid basis in Data Science (with Python).

What question keeps you awake at night?

The Covid pandemic has several other physical and mental lasting health effects. In the Constances cohort, we are part of the SAPRIS study, which aims to identify better and understand the various impacts of the pandemic.

COVID-19 and Confinement: A Large-Scale French Survey of Social Challenges and Health
[:fr]Afin de mieux comprendre les enjeux épidémiologiques et sociaux des mesures de prévention exceptionnelles mises en place pour lutter contre la pandémie de Covid-19, et notamment du confinemen

Could you share with us your 3 favorite projects or research combining data science & epidemiology?

The first one is the current thesis of Rayan Charrier. This project uses dynamic models and trajectory classification to define clusters from multivariate longitudinal care trajectories with continuous values. The first application data concern the deliveries of antihypertensive medicines in France, and come from the French cohort Constances and the SNDS (Système National des Données de Santé - the French administrative health care database).

theses.fr – Rayan Charrier, Modèles dynamiques et classification de trajectoires. Application à des données épidémiologiques
L’objectif de cette thèse est de développer des modélisations statistiques innovantes de séries temporelles, adaptées à des trajectoires de soin longitudinales multivariées à valeurs continues, et ce, dans l’optique de pouvoir classer ces trajectoires de la manière la plus pertinente possible. Le…


The second and third projects aim to use machine learning R to provide the first algorithm for typing diabetes (type 1 and 2) and another to predict obesity using data from the Constances cohort linked to the French administrative health care database.

Do you fancy using data science in your career, but have always brushed off the thought of learning it and didn't know where to start?

🚀 Start learning Data Science