Cardiovascular risk assessment using ASCVD risk score in fibromyalgia: a single-centre, retrospective study using “traditional” case control methodology and “novel” machine learning

In autoimmune inflammatory rheumatological diseases, routine cardiovascular risk assessment is becoming more important. As an increased cardiovascular disease (CVD) risk is recognized in patients with fibromyalgia (FM), a combination of traditional CVD risk assessment tool with Machine Learning (ML) predictive model could help to identify non-traditional CVD risk factors. This study was a retrospective case–control study conducted at a quaternary care center in India. Female patients diagnosed with FM as per 2016 modified American College of Rheumatology 2010/2011 diagnostic criteria were enrolled; healthy age and gender-matched controls were obtained from Non-communicable disease Initiatives and Research at AMrita (NIRAM) study database. Firstly, FM cases and healthy controls were age-stratified into three categories of 18–39 years, 40–59 years, and ≥ 60 years. A 10 year and lifetime CVD risk was calculated in both cases and controls using the ASCVD calculator. Pearson chi-square test and Fisher's exact were used to compare the ASCVD risk scores of FM patients and controls across the age categories. Secondly, ML predictive models of CVD risk in FM patients were developed. A random forest algorithm was used to develop the predictive models with ASCVD 10 years and lifetime risk as target measures. Model predictive accuracy of the ML models was assessed by accuracy, f1-score, and Area Under 'receiver operating Curve' (AUC). From the final predictive models, we assessed risk factors that had the highest weightage for CVD risk in FM. A total of 139 FM cases and 1820 controls were enrolled in the study. FM patients in the age group 40–59 years had increased lifetime CVD risk compared to the control group (OR = 1.56, p = 0.043). However, CVD risk was not associated with FM disease severity and disease duration as per the conventional statistical analysis. ML model for 10-year ASCVD risk had an accuracy of 95% with an f1-score of 0.67 and AUC of 0.825. ML model for the lifetime ASCVD risk had an accuracy of 72% with an f1-score of 0.79 and AUC of 0.713. In addition to the traditional risk factors for CVD, FM disease severity parameters were important contributors in the ML predictive models. FM patients of the 40–59 years age group had increased lifetime CVD risk in our study. Although FM disease severity was not associated with high CVD risk as per the conventional statistical analysis of the data, it was among the highest contributor to ML predictive model for CVD risk in FM patients. This also highlights that ML can potentially help to bridge the gap of non-linear risk factor identification.


Introduction
Fibromyalgia (FM) is a debilitating condition that presents with generalized widespread pain and is often associated with sleep disturbances, fatigue, and cognitive dysfunction [1]. The prevalence rates of FM range from 1 to 3% in the general population [2].
While there is no singular pathological process that explains the development of FM; three key mechanisms include increased central nervous system response to peripheral pain stimulation via amplification of signalling, abnormal ascending pain pathways, and small fiber neuropathy [3,4]. It is well known that other diseases with altered neuronal pathways like depression are more likely to develop cardiovascular disease (CVD). Major depression is more prevalent in people who have suffered an acute myocardial infarction, with up to 18-20% of patients meeting the criteria for depression patients using structured interviews [5]. However, this is not just a cause-effect relationship where the occurrence of a major cardiac event leads to the secondary development of depression. Patients with depression are more likely to have a cardiac event across the world. A cohort study in the South-Asian cohort population had reported that patients with depression were at an increased risk for developing CVD by 41% for men and 48% for women [6]. Western populations also showed similar results where a combination of depressive symptoms and stress in lowincome persons was associated with an increased risk of incident CVD [7].
While depression and CVD are well studied, the interrelationship between FM and CVD is less described in the literature. A causal association is lent weight to by findings that FM is associated with dysfunction of the autonomic nervous system and the inflammatory system; both of which play a part in the pathophysiology of CVD [8].
A meta-analysis of the key studies on cytokines in FM has shown elevated interleukin-6 (IL-6) levels in FM patients, which is similar to a state of chronic mild inflammation [9]. IL-6 has a downstream knockdown effect on lipid metabolism and anti-inflammatory therapy with canakinumab (human anti-IL-1β monoclonal antibody) has been studied in a randomized controlled trial for CVD prevention [10,11]. Thus, it is also possible that the neuro-immunological changes linked with FM may itself be a risk factor for CVD. CVD risk assessment in a Taiwanese cohort has shown findings that are suggestive of the same [12]. To the best of our knowledge, regional data regarding this is lacking from India.
Non-invasive CVD risk assessment tools are well described and validated in many cohorts. The most recent of these scales put forward by the American Heart Association (AHA) and American College of Cardiology is the Atherosclerotic Cardiovascular (ASCVD) risk assessment tool. Unlike the more traditionally used Framingham Risk Score, the ASCVD risk score can be applied to other races also and predicts both the 10-year and lifetime ASCVD risk of an individual [13]. This tool-ASCVD risk calculator, helps to predict future risk of CVD and is in fact, currently recommended to be used as a guide in the initiation of statin therapy [13]. As there is a paucity of studies assessing CVD risk in FM patients, this study has also adopted this tool to quantify the CVD risk.
As FM is a disease that is more common in females, an increased CVD risk in the affected females would be a relevant observation [14]. The identification of any elevated CVD risk as compared to the general population would help in stratifying such patients for interventions and primary prevention strategies.
Machine learning (ML) is the process by which the large computing capacity of modern computers is applied to analyse the varied statistical relations within a data set for an outcome prediction [15]. In the recent years, many studies have shown ability of ML to predict accurate predictions in medicine [16].
There are two main techniques of machine learning. The first is "supervised" learning when the goal is predicting a known output or target using classified/sorted data. The other type is called unsupervised learning where there are no outputs to predict; rather the aim is to find naturally occurring patterns or groupings within the unsorted data using massive statistical computing power. Both types have found applications in medicine with supervised machine learning methods being the more commonly used currently. This is largely because it is easier to create datasets that allow communication between researching physicians and data analysts. One of the best-known ML projects launched by the Microsoft Corporation (USA) is Project Hanover which uses supervised ML and aims to help in drug research in oncology [17].
Machine learning (ML) tools are increasingly used to understand the complex interplay of the various factors associated with diseases. Recently ML tools were used to develop an algorithm to aid diagnosis of systemic lupus erythematosus by Adamichou et al. [18].
The primary objective of our study was to assess the CVD risk in female FM patients using the ASCVD risk score calculator and compare it with that of the general population.
A secondary objective of our study was to develop a supervised machine learning (ML) algorithm for studying the interactions between characteristics of the FM patients and ASCVD risk categories.

Study design (Fig. 1)
This was a retrospective case-control study done in a setting of Rheumatology outpatient department (OPD) of a quaternary care center in the southern Indian state of Kerala. Electronic medical records of FM patients were accessed after the study was approved by the institutional ethics committee.
We included the female FM patients diagnosed as per 2016 modified 2010/2011 ACR criteria, who had documented parameters needed for risk score calculation with ASCVD risk calculator (done within 6 months of diagnosis), baseline average pain score on a visual analog scale more than 5, and visited rheumatology OPD from April 2017 to April 2020 [1]. We excluded those FM patients who had any associated autoimmune disease, primary psychiatric diagnosis, breastfeeding, and patients taking any concomitant medications which could affect the fasting lipid profile.
Healthy control participants data was obtained after age and gender matching from the datasets of a population-based cross-sectional study conducted at the same quaternary center by Menon et al.-Non-communicable disease Initiatives and Research at AMrita (NIRAM) study [19].
The ASCVD risk score was calculated in both FM patients and controls, The details collected for ASCVD risk score calculation included age (in years), height (in meters), weight (in kilograms), and co-morbid conditions like hypertension, diabetes mellitus, and dyslipidemia. The FM disease severity was assessed using Fibromyalgia Impact Questionnaire-Revised (FIQR), the average pain score on the Brief Pain Inventory (BPI) form. Patients with a FIQR score of more than 60 at baseline were considered to have severe disease. The fasting lipid profile was measured using AU2700 plus Beckman Coulter (USA).
Based on the age, the study participants were categorized into three age group strata of 18-39 years, 40-59 years, and 60 years or older. The 10-year and lifetime ASCVD cardiovascular risk score was calculated for each study participant in the age group of 41-60 years. For participants less than 40 years of age, only lifetime risk could be calculated whereas for the participants of more than 60 years of age, only a 10-year risk could be calculated. The ASCVD lifetime and 10-year risk scores were further categorized into low risk and high risk based on the standard cut-off. The 10-year CVD risk of less than 7.5% was considered as low while 7.5% or more was considered high. Lifetime CVD risk less than 39% was considered low, 39% or more was considered high. The FM patients were also stratified based on disease severity and disease duration to study the association of these parameters with ASCVD risk scores.
To address the secondary objective of the study, a supervised ML algorithm was developed to study the interactions between FM disease characteristics and ASCVD risk scores. A random forest classifier ML algorithm was used for the development and training of the model to help predict the ASCVD risk category of FM patients using baseline clinical features and biochemical parameters. Baseline characteristics of higher importance for high CVD risk in FM patients were identified using this ML algorithm.

Statistical methods
We used SPSS Version 20.0 (IBM Corporation, USA) for statistical analysis. All continuous variables were summarized using mean (SD) or median (IQR). Kolmogorov-Smirnov test was used to ascertain whether the data were normally distributed. Categorical variables were expressed in counts (%). We used the Pearson Chi-Square test or Fisher's exact test for categorical variables and independent sample t-test or Mann-Whitney test for continuous variables. Anaconda-Python 3.0 (USA) software was used for ML analysis.

Machine learning (ML) model preparation and evaluation of the model
The 10 years and Lifetime ASCVD risk status were the target variables (high risk versus low risk) that were to be predicted in their respective models. The inputs were all other variables mentioned above and cumulatively, these included 21 parameters for each patient-age, height, weight, body mass index (BMI), presence of co-morbidities (hypothyroidism, diabetes mellitus, hypertension, and dyslipidemia), systolic-diastolic blood pressures, smoking status, BPI (total and average) score, FIQR  score, ESR, CRP levels and lipid profile (total cholesterol, LDL, HDL levels and triglyceride). To apply the ML risk algorithms, the data was imported using Python 3.0 using JupyterLab coding environment. For the ML modeling, the Scikit-learn python library, which has been specifically developed for data science and ML, was used. The FM patients were split into the data set into training and validation subgroups. The training subgroup was derived from a random sampling of 80% of the FM patients, and the 'validation' cohort comprised the remaining 20% which was imported into a random forest classifier algorithm to help derive an ML model for the 10-year CVD risk and lifetime CVD risk, respectively. Then validation subgroup data was used by these trained algorithms to test the performance of the model. Each model's hyperparameters were determined manually with the n-estimators and using fivefold cross-validation on the training subgroup to determine the values which led to the best performance. The model with the best performance was then represented using accuracy score, f-score, area under the curve (A.U.C) for both ML models. In both the 10-year risk random classifier model and lifetime risk random classifier model in addition to model training, the feature importance and their weightage were calculated.

Results
A total of 139 FM cases and 1820 age and gendermatched healthy controls were enrolled in the study. Table 1 shows the baseline characteristics of study participants. Table 2 shows the age-stratified distribution of ASCVD risk scores between the FM cases and healthy controls. The lifetime risk of CVD is significantly higher in the FM cases of age group 40-59 years than the age and gendermatched healthy controls (OR = 1.56, 95% CI = 1.01-2.42, p = 0.043). Table 3 shows ASCVD risk scores of FM patients stratified for FM disease severity and disease duration. There is no difference noted in CVD risk with FM disease severity and duration (p > 0.05).

Machine learning (ML) analysis and features importance
Out of a total of 139 FM patients enrolled, 110 patients were of more than 40 years of age whose 10-year ASCVD risk was calculated. An ML model was derived using the SciKit Random Classifier algorithm. The best model was selected after hyperparameter tuning (manually with the n-estimators, and using fivefold cross-validation) and the precision of 96%, recall of 95% with an F1 score of 0.95 was achieved. The AUC of this 10-year ASCVD ML model was 0.825 and is shown in Fig. 2A. A lifetime ASCVD risk calculated for 126 FM patients of less than 60 years age. Using the SciKit Random Classifier algorithm again, an ML model was derived. The best model developed has a precision of 72% recall of 72% with an F1 score of 0.79 was achieved. The AUC of the lifetime ASCVD ML model was 0.713 and is shown in Fig. 2B. The feature importance in the individual models are shown in Table 4.

Discussion
In our study, the conventional case-control design showed that female FM patients of age group 40-59 years had increased lifetime ASCVD risk. Nevertheless, ASCVD risk was not increased in female FM patients of the age groups of 18-39 years and more than 60 years. Also, this conventional study design failed to show any association of ASCVD risk with FM disease severity and disease duration.
These results are in concordance with the study done in Taiwan which also shows that FM is a risk factor for CVD [12]. This finding of increased CVD risk in patients with FM is homologous to CVD risk in another related condition-major depressive disorder [20]. Depression has been demonstrated to be an independent risk factor for the onset of a wide range of cardiovascular disorders in a meta-analysis [21].
As FM pathogenesis is multifaceted and we are continuing to learn with every research, evidence to support the role of inflammation in FM etiopathogenesis is increasing. Pro-inflammatory cytokines, reactive oxygen species and inflammatory factors are considered to play a distinct role in inflammatory response in FM patients [22]. Serum CRP levels in FM patients are reported to be higher than control population which is known to improve by exercise and dietary changes [23]. This low-grade inflammation could be the common link between FM and CVD.
The CVD risk is multi-factorial but amongst the parameters used in the ASCVD risk calculator, dyslipidemia is strikingly high in fibromyalgia patients in comparison to the control population in this study (Table 1A). This is finding is consistent with various studies across the globe [24,25]. A low-grade subclinical inflammatory process with an elevation of inflammatory cytokine (interleukin-6) is well described in fibromyalgia patients [26]. The role of IL-6 in lipid metabolism is well described and this could be the common mechanism that links the two diseases explaining our observation [10]. FM patients have lower physical fitness that could be another reason for increased CVD risk as suggested in the al-Ándalus Project [27].
Since the conventional statistical analysis showed a significantly high ASCVD risk in FM patients of age 40-59 years but failed to show any association of ASCVD risk with FM disease severity and disease duration; ML algorithms were developed and trained to understand the non-linear relationships of CVD risk in FM patients. The application of ML models showed some interesting relationships with all the possible risk factors assessed, which were overlooked by the traditional study design. Notably, the ML models, displayed the importance of FM disease severity with ASCVD risk status, in addition to other traditional CVD risk features like dyslipidemia. The high importance feature of both the 10-year ASCVD risk score and lifetime ASCVD risk ML models were age, weight, total cholesterol, FIQR score, and triglyceride level. Other features included BMI and BPI (average pain) score for the 10-year ASCVD ML model while the lifetime ASCVD risk ML model had revealed height and lowdensity lipoprotein levels, in addition to the aforementioned five parameters. Both models showed excellent validity after training. This demonstrates the ability of ML in identifying the non-linear relationships between the disease datasets (FM characteristics) and the target variable (ASCVD risk score) and improving prediction models.
The application of ML algorithms for the prediction of cardiovascular diseases is not novel. Data from a large cohort of 378,000 patients from a family practice database in England were analyzed using machine learning and was shown to significantly improve the accuracy of cardiovascular risk prediction when compared to 10-year ASCVD risk scores [28].
While the use of ML for data analysis from the general population for CVD risk prediction is increasingly studied; our study explores the novel application of ML in FM for studying the role of non-linear predictors of ASCVD risk (Table 4). However, in addition to the high feature importance of traditional risk factors (age, BMI, and fasting lipid profile) our ML model was able to identify that FM disease severity also had high feature importance for increasing CVD risk in female FM patients which is an interesting finding.
Given the widespread prevalence of heart disease, the importance of characterizing the "at-risk" population for CVD cannot be understated. In addition to the ASCVD risk score, carotid intimal medial thickness could be used as a surrogate marker for predicting CVD risk in future studies, as recommended by a meta-analysis report by Willeit et al. [29].
The external validity of our study could be limited as it was an observational study from a single center. Further prospective studies are required to establish causation, but as this study shows excellent internal validity, it can be used to guide future research. This study highlights a probable novel non-linear FM disease severity association with the ASCVD risk score that could help better stratify an individual FM patient for CVD risk.

Conclusion
Traditional risk factors like obesity, diabetes mellitus, hypertension, and dyslipidemia are the main pillars of CVD risk. In our study, the conventional case-control design showed that premenopausal female patients with FM could represent a new CVD risk group. The novel supervised ML study design identified FM disease severity as a high contributor feature importance for CVD risk, which was missed by the conventional study design. More definitive evidence can be gathered by future prospective studies employing non-invasive surrogate markers of CVD risk like carotid intimal medial thickness and coronary artery calcium scoring.