Development of a predictive model for the side effects of liraglutide

Article information

Cardiovasc Prev Pharmacother. 2022;4(2):87-93
Publication date (electronic) : 2022 April 27
doi :
1Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea
2Division of Endocrinology and Metabolism, Department of Internal Medicine, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
Corresponding to Hun-Sung Kim, MD, PhD Department of Medical Informatics, College of Medicine, The Catholic University of Korea, 222, Banpo-daero, Seocho-gu, Seoul 06591, Korea. Email:
Received 2022 February 4; Revised 2022 April 6; Accepted 2022 April 19.



Liraglutide, a drug used for the management of obesity, has many known side effects. In this study, we developed a predictive model for the occurrence of liraglutide-related side effects using data from electronic medical records (EMRs).


This study included 237 patients from Seoul St. Mary's Hospital and Eunpyeong St. Mary's Hospital who were prescribed liraglutide. An endocrinologist obtained medical data through an EMR chart review. Model performance was evaluated using the mean of the area under the receiver operating characteristic curve (AUROC) with a 95% confidence interval (CI).


A predictive model was developed for patients who were prescribed liraglutide. However, 37.1% to 75.5% of many variables were missing, and the AUROC of the developed predictive model was 0.630 (95% CI, 0.551–0.708). Patients who had previously taken antiobesity medication had significantly fewer side effects than those without previous antiobesity medication use (20.7% vs. 41.4%, P<0.003). The risk of side effect occurrence was significantly higher in patients with diabetes than in patients without diabetes by 2.389 times (odds ratio, 2.389; 95% CI, 1.115–5.174).


This study did not successfully develop a predictive model for liraglutide-related side effects, primarily due to issues related to missing data. When prescribing antiobesity drugs, detailed records and basic blood tests are expected to be essential. Further large-scale studies on liraglutide-related side effects are needed after obtaining high-quality data.


Obesity is a crucial problem worldwide [1]. In Korea, the prevalence of obesity increased from 29.7% to 32.4% from 2009 to 2015, accompanied by a concomitant increase in the prevalence of abdominal obesity from 18.4% to 20.8% [2]. The major complications of obesity include chronic diseases, such as diabetes, hypertension, fatty liver disease, cardiovascular disease, and depression [3,4]. Obesity is a chronic disease that requires strict management because the obesity-related social and economic burdens are increasing with the growth of medical expenses associated with obesity [5].

Liraglutide, a glucose-like peptide-1 receptor agonist (GLP-1 RA), has been approved for obesity treatment [6]. Liraglutide was originally approved as a treatment for type 2 diabetes and later emerged as a treatment option for obesity. Although both contain liraglutide, Victoza (Novo Nordisk, Bagsvaerd, Denmark), often used as a treatment for diabetes, is available through insurance benefits, whereas Saxenda (Novo Nordisk) is not covered by insurance. However, the price of Saxenda is different, and data on its usage in Korea are unavailable since it is imported. The usage of Saxenda is not well understood even in reviews of claims data from the Health Insurance Review and Assessment Service, the results of which are sent to the National Health Insurance Service [7]. Analyses of electronic medical records (EMRs) of university hospitals are advantageous for determining the side effects of a drug, since EMRs contain properly recorded data [8,9]. In this study, the researchers aimed to develop a predictive model for the occurrence of side effects of Saxenda injections using EMR data from a university hospital.


Ethical statements

This study was approved by the Institutional Review Board of the Catholic University of Korea (No. KC21RNSI0831). The requirement for informed consent was waived due to the retrospective nature of the study. All data were stored on an encrypted computer of the principal investigator in an encrypted file that was only accessible to the principal investigator. The predictive model was converted to an anonymized file.

Study population

Patients who were prescribed liraglutide (Saxenda) and whose baseline weights were recorded at Seoul St. Mary’s Hospital and Eunpyeong St. Mary’s Hospital between 2014 and 2019 were included in this study. Patients’ demographic information, baseline body information, medical history, involvement in previous drug trials for obesity, and baseline laboratory test results at the time of the first liraglutide prescription were used as candidate predictors in the model. Demographic information included age, sex, height, and weight. Baseline body information included body mass index (BMI), skeletal muscle mass, body fat mass, percent body fat, waist-hip ratio, systolic blood pressure, and diastolic blood pressure. Medical history included previous history of hypertension, diabetes mellitus, fatty liver disease, thyroid disease, gastrointestinal disease, psychiatric disease, or skin allergy. Information on involvement in previous drug trials for obesity included previous use of the following: lorcaserin, a combination of bupropion and naltrexone, orlistat, or another GLP-1 RA, such as exenatide, dulaglutide, or lixisenatide. Baseline laboratory testing included serum glucose, glycated hemoglobin, blood urea nitrogen, creatinine, glomerular filtration rate, total bilirubin, aspartate transaminase, alanine transaminase, alkaline phosphatase, γ-glutamyl transpeptidase, creatine phosphokinase, total cholesterol, triglyceride, high-density lipoprotein cholesterol, and low-density lipoprotein cholesterol levels. All data were extracted through a direct EMR chart review by an endocrinologist with over 10 years of experience.

Predictive model output

The model predicted the occurrence of side effects (including digestive, nervous system, and pruritic side effects) within 7 months of Saxenda administration using patient information at the time of prescription.

Missing data

Variables with a large missing rate (>45%) were excluded. As a multiple imputation method for the remaining data, the multivariate imputation by chained equations algorithm with random forests was used [10,11].

Feature selection

A stepwise backward feature elimination technique with a stratified 10-fold cross-validation technique was used for feature selection [12,13]. The least important features measured from support vector machine algorithms were excluded step by step until one feature remained. Finally, the subset of features that optimized the average area under the receiver operating characteristic curve (AUROC) from 10 folds was used to develop the predictive model.

Development and evaluation of the predictive model

The eXtreme Gradient Boosting (XGBoost) technique was used to develop a Saxenda side effect prediction model [14]. Stratified 10-fold cross-validation, which is a helpful procedure to estimate the performance of a small dataset [15], was used to train and evaluate the predictive model. The dataset was randomly divided into 10 subparts of equal size. Nine subparts were used for training the model, and the remaining subpart was used for evaluation. This process was repeated 10 times. The Shapley value was used to measure feature contributions to the model prediction [16,17].

Statistical analysis

Continuous variables were described as means with standard deviations, and categorical variables were described as frequencies with percentages. The t-test was used for continuous variables, and the chi-square test was used for categorical variables for comparisons between groups with and without side effects. The model performance was evaluated using the mean of the AUROC with a 95% confidence interval (CI), along with the means of sensitivity, specificity, negative predictive value (NPV), positive predictive value (PPV), and accuracy in 10 folds from the stratified 10-fold cross-validation technique. We conducted multivariate logistic regression to investigate the associations between predictors and side effect outcomes. A P-value <0.05 was considered to indicate statistical significance for all tests. For statistical analysis and modeling, R ver. 4.0.3 (The R Foundation, Vienna, Austria; and Python ver. 3.8.5 (Python Software Foundation, Wilmington, DE, USA; were used.


In total, 237 patients were included in the study, excluding those whose body weights were not recorded in EMRs. Side effects occurred in 75.5% (179 of 237 patients), and no side effects occurred in 24.5% (58 of 237 patients).

The missing rate for BMI was 1.7% (4 of 237 patients), but for other baseline information, such as skeletal muscle mass, body fat mass, percent body fat, and waist-hip ratio, the missing rate was 75.5% (179 of 237 patients). The missing rate for blood tests varied from 37.1% to 60.8% (Table S1). For further analysis, 20 variables were selected, excluding those with a missing rate of ≥45%.

Table 1 shows the differences across 20 variables according to the presence or absence of side effects after Saxenda injections. The mean age was 43±13 years, and 64.5% of patients (176 of 273) were women. Their average BMI was 30.7±5.1 kg/m2, and there was no significant difference in BMI between patients with and without a history of side effects. Significantly fewer side effects were observed in patients with a history of previous antiobesity medication use than in those without prior history of antiobesity medication use (20.7% vs. 41.4%, P<0.003). Other laboratory tests showed no significant associations with whether patients experienced side effects.

Baseline characteristics of the study population

After excluding variables with over 45% of missing data, backward feature elimination was performed on the remaining 20 variables. The AUROC was computed according to the number of selected variables; as such, all variables were selected (Fig. S1). The average AUROC obtained using the side effects prediction model was 0.630 (95% CI, 0.551–0.708), the average sensitivity was 0.423, the average specificity was 0.760, the average PPV was 0.368, the average NPV was 0.805, and the average accuracy was 0.679 (Fig. 1).

Fig. 1.

Area under the receiver operating characteristic curve (AUROC) of the side effect prediction model. CI, confidence interval.

Fig. 2 shows the six variables that had the greatest influence in the side effects prediction model, using 10 model predictions. A previous history of antiobesity medication intake had the strongest influence in the predictive model. Imputation was conducted for laboratory tests due to the high missing data rate, and creatine levels were found to have a strong influence on the occurrence of side effects.

Fig. 2.

Contribution of features to the model predictions.

Logistic regression analysis showed that patients with diabetes had a significantly higher risk of developing side effects than patients without diabetes (odds ratio [OR], 2.389; 95% CI, 1.115–5.174) (Table 2). The incidence of side effects was significantly higher in women (OR, 2.143; 95% CI, 0.920–5.432) and in patients with gastrointestinal disease (OR, 10.822; 95% CI, 1.003–254.964). Hypothyroidism was excluded from the OR analysis, since no prior history was elicited in patients without side effects.

Logistic regression analysis


The monitoring of drug side effects plays an important role in evaluating the safety of drugs on the market, which is a public health concern [18]. Therefore, increasingly many clinical trials using EMR data are being performed [19,20]. An advantage of EMR studies is that they can easily extract a large amount of data from a long period of time at a relatively low cost; therefore, EMR-based clinical research has been conducted with various study designs [8,9]. Since most side effects of liraglutide are subjective, a cohort study with a large population is advantageous for measuring the incidence of these adverse effects [21]. EMR data, in which subjective symptoms of patients are well documented, are particularly suitable for this purpose. However, a consequence of the reliance upon chart reviews in this study is that the side effects experienced by the patients may not have all been due to liraglutide. Furthermore, it is difficult to confirm or completely rule out minor side effects from liraglutide use.

In this study, the AUROC of the side effect prediction model after Saxenda prescriptions was low (0.630). The excessive amount of missing data may have been one of the main reasons for the model’s poor performance. In particular, 75.5% of patients had missing data on skeletal muscle, body fat mass, body fat percentage, and abdominal fat percentage before obesity medication use, which would be helpful for follow-up. The data recorded in the EMRs were not as well documented as expected, and in many cases, baseline tests were not performed prior to prescribing liraglutide. Since EMR data are not generated for research purposes, it was expected that the missing rate would be high [22], and the inability to include other variables is a major cause of the low prediction rate.

In our study, XGBoost was used to develop a predictive model, considering its good support for explainability even when missing values are expected; furthermore, it has shown favorable results with longitudinal healthcare data [23]. The researchers conducted 10 model predictions to increase the AUROC score; however, the final results did not meet the expectations. In addition, there were no significant differences in patient characteristics in relation to the occurrence of side effects after Saxenda administration. Since the researchers performed imputation due to the high missing data rate, it is most likely that these values influenced the Shapley value analysis, contributing to low reliability.

Nevertheless, the most influential factor in the predictive model was the prior use of antiobesity medications. Patients with no prior antiobesity medication use were more likely to experience side effects from Saxenda. Additionally, patients with a prior history of antiobesity medication use who experienced side effects may not have been included in the study because they did continue their use of antiobesity medications. Patients who have taken antiobesity medications in the past and have experienced side effects may not have reported relatively minor side effects. Conversely, patients taking Saxenda for the first time may have reported any and all minor side effects. These suggest that patient compliance may have affected the results of the study, which is also an important characteristic of real-world data [8]. Therefore, careful interpretation of these results is necessary.

It is also worth noting that people with diabetes were 2.4 times more likely to experience side effects. Liraglutide was developed for the treatment of diabetes as a GLP-1 RA [24]. Therefore, it is most suitable for patients with obesity and diabetes; however, more side effects occurred in patients with diabetes. A possible explanation for this might be that patients without diabetes who are trying to lose weight may be strongly motivated to endure minor side effects. Since Saxenda is administered via injection, patients with diabetes taking Saxenda tend to have poor compliance [25], which may also be relevant for the high observed incidence of side effects in this group.

The researchers attempted to create a predictive model of Saxenda side effects; however, the accuracy of the model was low, and no successful model was ultimately developed. In the future, the successful development of such models will require the analysis of a large amount of EMR data with a low missing rate [22]. When prescribing Saxenda, measurements of various laboratory tests and baseline body information are needed. Many areas need to be seriously considered for the development of predictive models in retrospective cohort studies. It is important to gather diverse and well-organized data with a minimal amount of missing real-world data. Ultimately, this will also help in patient management. Future attempts to develop a successful predictive model will require high-quality data.


Table S1.

Missing rate of variables used in the development of the algorithm.

Fig. S1.

Area under the receiver operating characteristic curve (AUROC) based on selected variables.


Supplementary materials are available at


Ethical statements

This study was approved by the Institutional Review Board of the Catholic University of Korea (No. KC21RNSI0831). The requirement for informed consent was waived due to the retrospective nature of the study.

Conflicts of interest

The authors have no conflicts of interest to declare.


This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (Ministry of Science and ICT) (No. NRF-2021R1G1A1091471).

Author contributions

Conceptualization: JM, JS, HSK; Data curation: HSK; Formal Analysis: HSK; Funding acquisition: HSK; Investigation: HSK; Methodology: HSK; Project administration: HSK; Resources: HSK; Software: HSK; Supervision: HSK; Validation: HSK; Visualization; JM, JS, HSK; Writing–original draft: JM, JS, HSK; Writing–review&editing: JM, JS, HSK.

All authors read and approved the final manuscript.


1. Kyle TK, Dhurandhar EJ, Allison DB. Regarding obesity as a disease: evolving policies and their implications. Endocrinol Metab Clin North Am 2016;45:511–20.
2. Seo MH, Kim YH, Han K, Jung JH, Park YG, Lee SS, et al. Prevalence of obesity and incidence of obesity-related comorbidities in Koreans based on National Health Insurance Service health checkup data 2006-2015. J Obes Metab Syndr 2018;27:46–52.
3. Bray GA, Fruhbeck G, Ryan DH, Wilding JP. Management of obesity. Lancet 2016;387:1947–56.
4. Global BMI Mortality Collaboration, Di Angelantonio E, Bhupathiraju ShN, Wormser D, Gao P, Kaptoge S, et al. Body-mass index and all-cause mortality: individual-participant-data meta-analysis of 239 prospective studies in four continents. Lancet 2016;388:776–86.
5. Kang JH, Jeong BG, Cho YG, Song HR, Kim KA. Socioeconomic costs of overweight and obesity in Korean adults. J Korean Med Sci 2011;26:1533–40.
6. Mehta A, Marso SP, Neeland IJ. Liraglutide for weight management: a critical review of the evidence. Obes Sci Pract 2017;3:3–14.
7. Kyoung DS, Kim HS. Understanding and utilizing claim data from the Korean National Health Insurance Service (NHIS) and Health Insurance Review & Assessment (HIRA) database for research. J Lipid Atheroscler 2021;11e1.
8. Kim HS, Kim JH. Proceed with caution when using real world data and real world evidence. J Korean Med Sci 2019;34e28.
9. Kim HS, Lee S, Kim JH. Real-world evidence versus randomized controlled trial: clinical research based on electronic medical records. J Korean Med Sci 2018;33e213.
10. Van Buuren S, Karin O. Flexible multivariate imputation by MICE [Internet] Leiden: The Netherlands Organization for Applied Scientific Research (TNO); 1999. [cited 2022 Jan 02]. Available from:
11. Van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. J Stat Softw 2011;45:1–67.
12. Saeys Y, Inza I, Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics 2007;23:2507–17.
13. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn 2002;46:389–422.
14. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In : In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016 Aug 13-17; San Francisco, CA. New York. Association for Computing Machinery. 2016. p. 785–94.
15. Raschka S. Model evaluation, model selection, and algorithm selection in machine learning [Preprint]. Posted 2018 Nov 13. arXiv 1811.12808.
16. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: von Luxburg U, Guyon I, Bengio S, Wallach H, Fergus R, editors. In : Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017 Dec 4-9; Long Beach, CA. Red Hook. Curran Associates Inc. 2017. p. 4768–77.
17. Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2020;2:56–67.
18. Dart RC. Monitoring risk: post marketing surveillance and signal detection. Drug Alcohol Depend 2009;105 Suppl 1:S26–32.
19. Ko S, Kim H, Shinn J, Byeon SJ, Choi JH, Kim HS. Estimation of sodium-glucose cotransporter 2 inhibitor-related genital and urinary tract infections via electronic medical record-based common data model. J Clin Pharm Ther 2021;46:975–83.
20. Kim H, Lee SH, Lee H, Yim HW, Cho JH, Yoon KH, et al. Blood glucose levels and bodyweight change after dapagliflozin administration. J Diabetes Investig 2021;12:1594–602.
21. Gladman DD, Farewell VT. Longitudinal cohort studies. J Rheumatol Suppl 2005;72:30–2.
22. Kim HS, Kim DJ, Yoon KH. Medical big data is not yet available: why we need realism rather than exaggeration. Endocrinol Metab (Seoul) 2019;34:349–54.
23. Yelin I, Snitser O, Novich G, Katz R, Tal O, Parizade M, et al. Personal clinical history predicts antibiotic resistance of urinary tract infections. Nat Med 2019;25:1143–52.
24. Nuffer WA, Trujillo JM. Liraglutide: a new option for the treatment of obesity. Pharmacotherapy 2015;35:926–34.
25. Edelman SV, Polonsky WH. Type 2 diabetes in the real world: the elusive nature of glycemic control. Diabetes Care 2017;40:1425–32.

Article information Continued

Fig. 1.

Area under the receiver operating characteristic curve (AUROC) of the side effect prediction model. CI, confidence interval.

Fig. 2.

Contribution of features to the model predictions.

Table 1.

Baseline characteristics of the study population

Characteristic Side effect (+) (n=179) Side effect (–) (n=58) P-value
Age (yr) 43.8±12.8 41.8±12.8 0.312
Sex 0.401
 Male 49 (27.4) 12 (20.7)
 Female 130 (72.6) 46 (79.3)
Body mass index (kg/m2) 30.6±5.2 30.9±4.9 0.736
Personal medical history
 Hypertension 54 (30.2) 18 (31.0) >0.999
 Diabetes mellitus 55 (30.7) 23 (39.7) 0.273
 Fatty liver 15 (8.4) 6 (10.3) 0.848
 Thyroid disease 3 (1.7) 0 0.752
 Gastrointestinal disease 2 (1.1) 3 (5.2) 0.180
 Psychiatric disease 14 (7.8) 4 (6.9) >0.999
 Skin allergy 1 (0.6) 1 (1.7) 0.986
Past medication history  37 (20.7) 24 (41.4) 0.003
 Lorcaserin 9 (5.0) 7 (12.1) 0.12
 Combination of bupropion and naltrexone 8 (4.5) 9 (15.5) 0.011
 Orlistat 4 (2.2) 1 (1.7) >0.999
Another GLP-1 RA 2 (1.1) 3 (5.2) 0.180
Glucose (mg/dL) 116±40 110±26 0.289
Creatinine (mg/dL) 2.4±7.8 1.5±5.4 0.426
Glomerular filtration rate (mL/min/1.73 m2) 88.2±34.5 87.6±40.6 0.927
Aspartate transaminase (IU/L) 35.9±36.1 31±29 0.421
Alanine transaminase (IU/L) 44±43 42±48 0.799

Values are presented as number (%) for categorical variables and mean±standard deviation for continuous variables.

GLP-1 RA, glucose-like peptide-1 receptor agonist.

Table 2.

Logistic regression analysis

Variable Odds ratio 95% CI P-value
Female sex 2.143 0.920–5.432 0.090
Age 0.978 0.948–1.007 0.134
Body mass index 0.993 0.924–1.063 0.843
Personal medical history
 Hypertension 1.028 0.429–2.422 0.951
 Diabetes mellitus 2.389 1.115–5.174 0.025
 Fatty liver 0.917 0.251–2.950 0.888
 Gastrointestinal disease 10.822 1.003–254.964 0.065
 Psychiatric disease 0.887 0.227–2.865 0.850
 Skin allergy 5.149 0.194–136.438 0.258
Past medication history (total) 1.588 0.576–4.148 0.354
 Lorcaserin 1.952 0.520–7.370 0.318
 Combination of bupropion and naltrexone 2.862 0.781–10.782 0.113
 Orlistat 0.668 0.030–5.967 0.743
 Another GLP-1 RA 5.228 0.702–47.974 0.109

CI, confidence interval; GLP-1 RA, glucose-like peptide-1 receptor agonist.