NOTE: JuliaHealth contributor, @kosuri-indu, wrote a great series on Patient Level Prediction Julia workflows! She can’t post links to Discourse so I am posting her work below!
Introduction
Hello everyone! I’m Kosuri L Indu, a student and open-source contributor with a strong interest in health data, machine learning, and the Julia programming language. Over the past few months, I have been working on building a patient-level prediction (PLP) pipeline using clinical data in the OMOP Common Data Model (CDM) format, and I have documented my journey in a three-part blog series.
Patient-level prediction (PLP) refers to using historical clinical data to predict individual patient outcomes - like whether a patient with hypertension might develop diabetes. It’s a powerful tool for personalized medicine, and building these pipelines in Julia showcases how performant, flexible, and open Julia can be for real-world health data science.
Blog Posts
Through this series, I have tried to share the process in a simple, approachable way from asking the right questions to building models and reflecting on the results.
Below are short summaries of each post, along with links to the full versions.
This post walks through how to translate a clinical question like predicting diabetes onset in hypertensive patients into a structured cohort definition using the OMOP CDM. It explains how I used Julia tools to define and extract cohorts, while discussing the key concepts.
Here, I dive into how the raw clinical data was processed into a machine learning-ready format. It covers feature extraction, handling missing values, normalization, encoding, data splitting, and training ML models using the MLJ.jl ecosystem.
In the final post, I reflect on the challenges I faced like low model performance and data limitations and outline what I learned. I also share ideas for how the pipeline can be improved and extended, including visualization and cohort quality tools.
Conclusion
A big shoutout to Jacob S Zelko (@TheCedarPrince) for being an incredible mentor and guide throughout this journey - your support, feedback, and encouragement truly made all the difference.
And to everyone reading, if you are working with healthcare data or exploring patient-level prediction in Julia, I hope this series offers something helpful or sparks fresh ideas. I’m always happy to connect, so feel free to share your thoughts, feedback, or questions!
~ Kosuri L Indu