Machine learning of microvolt-level 12-lead electrocardiogram can help distinguish takotsubo syndrome and acute anterior myocardial infarction

Background Qualitative differences in 12-lead electrocardiograms (ECG) at onset have been reported in patients with takotsubo syndrome (TTS) and acute anterior myocardial infarction (Ant-AMI). We aimed to distinguish these diseases by machine learning (ML) approach of microvolt-level quantitative measurements. Methods We enrolled 56 consecutive patients with sinus rhythm TTS (median age, 77 years; 16 men), and 1-to-1 random matching was performed based on age and sex of the patients. The ECG in the emergency room was evaluated using an automated system (ECAPs12c; Nihon-Koden). Statistical and ML predictive models for TTS were constructed using clinical features and ECG parameters. Results Statistically significant differences were observed in 25 parameters; the V1 ST level at the J point (V1 STJ) showed the lowest P value (P < .001). V1 STJ ≤+18 μV showed the highest accuracy for TTS (0.773). The highest area under the receiver operating characteristic curve (AUROC) was shown in the aVR ST level at 1/16th of the preceding R-R interval after the J point (aVR STmid: 0.727). Conversely, the light gradient boosting machine (model_LGBM) and extra tree classifier (model_ET) indicated higher accuracy (model_LGBM: 0.842, model_ET: 0.831) and AUROC (model_LGBM: 0.868, model_ET 0.896) than other statistical models. V1 STJ had high feature importance and Shapley additive explanation values in the 2 ML models. Conclusion ML applied to automated microvolt-level ECG measurements showed the possibility of distinguishing between TTS and Ant-AMI, which may be a clinically useful ECG-based discriminator.


Introduction
Takotsubo syndrome (TTS) and acute anterior myocardial infarction (Ant-AMI) at its onset show seemingly similar clinical features, and distinguishing between the 2 diseases without emergent cardiac catheterization is difficult. 1 Although TTS, a diagnosis of exclusion, can be managed noninvasively with appropriate medical therapy, Ant-AMI is often managed invasively, with ST-elevation myocardial infarction (STEMI) cases requiring emergent revascularization. In cases of non-ST-elevation AMI (NSTEMI) without key clinical symptoms and signs of ST-elevation AMI (STEMI), noninvasive methods are desirable, especially in ambulances, clinics, and hospitals that cannot perform emergent cardiac catheterization. STEMI cases should be transferred to hospitals with cardiac catheterization laboratories immediately, but even in such hospitals, emergent catheteri-zation is sometimes difficult to perform owing to various reasons: advanced age, dementia, frailty, poor physical status, and/or social problems. Twelve-lead electrocardiogram (ECG) is a fundamental examination that can be performed on arrival and has a time advantage compared to other examinations such as high-sensitivity troponin. Initial ECG on arrival can be useful to triage patients, but ECG between the 2 diseases at onset shows very similar patterns. The difference in ECG has been studied by several investigators. 2,3 Difference in ST-T change has also been well studied, and several leads (eg, V 1 , aVR, inferior leads) were reported to have important roles in distinguishing between the diseases. 2,3 Although these reports demonstrated good accuracy (0.95, Kosuge and Kimura, 2 and 0.66-0.86, Jim and colleagues), 3 external validity was not confirmed.
Machine learning (ML) applied to ECG has been developed in several cardiac diseases, and some investigators have reported methods for diagnosing myocardial infarction using a convolutional neural network on vector data of ECG. 4,5 However, the accuracy (0.81, Makimoto and colleagues 4 ) or area under the receiver operating characteristic curve (AUROC; 0.85-0.88, Cho and colleagues 5 ) was not higher than that of conventional ST-level examination. Moreover, there are no studies on the diagnosis of TTS using ML applied to ECG.
We aimed to build a predictive model for TTS by ML, not with ECG vector data but with table data of conventional 12lead ECG parameters, and to elucidate the parameters of ECG with high feature importance in the ML models.

Study patients and ECG
We enrolled 56 consecutive patients at Yokohama Minami Kyosai Hospital with sinus rhythm with apical ballooningtype TTS from 2013 to 2021. In all cases, cardiac catheterization was performed, and no coronary stenosis and left ventricular apical ballooning was confirmed in TTS cases. Stenosis/occlusion of the left anterior descending coronary artery was confirmed in the Ant-AMI cases. The diagnosis of TTS was based on Mayo's criteria, 1 and other diseases that mimicked AMI (acute myocarditis/pericarditis and vasospastic angina) were excluded by absence of inflammation or acetylcholine provocation test (several days after admission). The diagnosis of AMI (STEMI and NSTEMI) was based on the fourth universal definition of AMI. 6 Among our AMI database, patients with same age and sex in each TTS case were extracted, and 1-to-1 random matching was performed. Finally, 112 patients (median age, 77 years [interquartile range, 67-84 years]; 32 men) were enrolled.
ECG on arrival (in the emergency room) in both groups was measured at the mV level using an automated system (ECAPs12c; Nihon-Koden, Tokyo, Japan). 7 ECG variables, ST levels, T-wave amplitude, and other fundamental parameters were preselected, as explained in Figure 1, and those parameters were measured from 10-second waveforms. The ST levels of each lead were measured automatically at 3 points: (1) ST level at the J point (STJ), which was recorded at the end of the QRS complex, measured with respect to the baseline; (2) the middle of the ST level (STmid), which is the ST level at the point 1/16 th of the preceding R-R interval after the Figure 1 Explanation of measurement on 1 beat of the electrocardiogram (ECG). All the parameters were measured automatically. Left figure shows schema of the measurement, and right figure a real ECG wave and real results. The ST level was measured at 3 points: (1) ST level at the J point (STJ), which was recorded at the end of the QRS complex as measured in mV with respect to the baseline; (2) the middle of the ST level (STmid), which is the ST level at the point of 1/16 th of the preceding R-R interval after the J point; and (3) the end of the ST level (STend), which is the ST level at the point of 2/16 th of the preceding R-R interval after the J point. The T-wave amplitude was defined as the absolute distance from the apex of the T wave to the baseline. TTS 5 takotsubo syndrome.

KEY FINDINGS
Takotsubo syndrome (TTS) and acute anterior myocardial infarction (Ant-AMI) show similar clinical features, especially in electrocardiogram (ECG) at onset.
Automated ECG measurement provides microvolt-level ST change; at J point (STJ), at 1/16th of the preceding R-R interval after the J point (STmid), and at 2/16th (STend) in each lead. STJ 118 mV showed the highest accuracy, and aVR STmid indicated the highest area under receiver operating characteristics curve (AUROC) for TTS.
Diagnostic performance of machine learning (light gradient boosting machine, extra tree classifier) on table data of ECG parameters demonstrated higher accuracy and AUROC for TTS than those of statistical models, and V 1 STJ played an essential role in building both models. The ethics committee of Yokohama Minami Kyosai Hospital approved the study protocol and written informed consent was obtained from all participants prior to the study.

Statistical analysis for characteristics of patients
Numeric variables are displayed as the median value (interquartile range: 25%-75% value), and the Mann-Whitney test was used to compare the TTS and Ant-AMI groups. Fisher exact test was used to evaluate differences in categorical variables, and Holm's multiple comparison was used as a post hoc test.
Statistical significance was set at P , .05. All statistical analyses were performed using EZR (Saitama Medical Center, Jichi Medical University, Saitama, Japan), 8 a graphical user interface for R (The R Foundation for Statistical Computing, Vienna, Austria). 9

Predictive model construction by statistical method
Fifty-six pairs of cases were randomly split into 80% and 20% (45 pairs vs 11 pairs), in which 45 pairs (90 cases) were set as the prediction data and 11 pairs (22 cases) as the test data. Statistical predictive models were constructed based on these 90 cases, as described below. Univariate logis-tic regression analysis for TTS was performed using the prediction data, and significant predictors were extracted. A multivariate logistic regression analysis was not performed because of the multicollinearity of many pairs of parameters. A receiver operating characteristic (ROC) curve analysis was performed, and the cutoff value was calculated based on the Youden index. The statistical predictive model consisted of an assessment of whether a parameter in each case was higher/lower than the cutoff value (named the cutoff value model). A confusion matrix was created, and the diagnostic performance (accuracy/sensitivity : recall/positive predictive value : precision) was evaluated. From the analysis of the predictive model, the propensity score (PS) of each predictor was calculated, and the PS formula for each predictor was constructed (the ROC curve model): where a 5 coefficient of predictor and b 5 intercept, calculated by logistic regression analysis. The model was applied to the test data and the AUROC was measured using ROC curve analysis.

Predictive model construction and validation by ML
Among the ML methods for ECG data, we did not use conventional deep learning procedures using 1-dimensional data (vector data) 10 because it was difficult to explain the feature importance in the model. To secure explainability, we adopted a novel method that used an ensemble learning procedure with conventional ECG parameters (eg, ST level, T-wave amplitude) as table data.
Eleven ML models were built using PyCaret, an opensource wrapper over several ML libraries in Python in a low-code environment. 11 After screening of the 11 models, ST elevation/depression was defined as at least 0.1 mV ST deviation at J point, and T-wave inversion as at least -0.1 mV amplitude judged on automated measurement. Diagnosis of ST-elevated acute anterior myocardial infarction was based on fourth universal definition of acute myocardial infarction-briefly, ST elevation at the J-point in 2 contiguous leads with the cut point 1 mm in V 1 -V 4 leads, or other than leads V 2 -V 3 where the following cut-points apply: 2 mm in men 40 years; 2.5 mm in men ,40 years, or 1.5 mm in women regardless of age. Fisher exact test was performed, and as post hoc analysis, data underwent Holm's multiple comparison. A P , .05 was considered as significant. ,.001* we adopted 2 excellent ensemble learning methods: an extra tree classifier (model_ET) and a light gradient boosting machine (model_LGBM), 12,13 which uses many random decision trees and built a majority vote-like system. A brief explanation of the ensemble learning is provided in Supplemental File 1. All 112 cases were randomly split 10 times into 80% of the data for ML training (90 cases) and 20% of the data for validation (22 cases), and 10-fold random cross-validation was performed. Both models were tuned to obtain the highest accuracy with optimization of the hyperparameters. The best number of features was searched by recursive feature elimination by cross-validation on PyCaret.
The cross-validated model was finalized on PyCaret (the results of hyperparameter tuning are displayed in Supplemental File 6). The feature importance of the models was ranked to estimate the contribution of the predictors to the ML models. In addition to feature importance, the SHAP (SHapley Additive exPlanations) method was introduced on data for ML model training. 14,15 The theory of SHAP is based on "the game optimal Shapley values," 14 and the summary plot of SHAP combines feature importance with feature effects. The red and blue points indicate TTS and Ant-AMI, respectively. On the x-axis, the Shapley value of each feature is displayed (defined as the SHAP value), in which a large (right side) value corresponds to a positive contribution to the model. A brief explanation of the SHAP method is provided in Supplemental File 2.

Results
The characteristics of all 112 patients and the comparison of prediction data (n 5 90) and test data (n 5 22) are displayed in Supplemental File 2. There were no significant differences between both the groups. Table 1 shows the qualitative ST elevation/depression in V 1 -V 4 leads and the number of cases diagnosed as anterior STEMI. Although there were significant differences in V 1 ST elevation (P , .001) and V 4 ST depression (P 5 .029) at point J, the number of cases diagnosed with anterior STEMI did not differ between the 2 diseases (TTS, 28; Ant-AMI, 31; P 5 .566). Table 2 shows a comparison of the TTS and Ant-AMI in the prediction data (n 5 90). Hyperlipidemia and diabetes cases with TTS were significantly lower than those with Ant-AMI. Among ECG parameters, heart rate and several ST levels (lead I/II/aVR/aVF/V 1 /V 2 /V 5 /V 6 ) demonstrated significant differences. Univariate logistic regression analysis identified 25 significant predictors (asterisk [*] in Table 2). Multivariate logistic regression analysis was not performed because of the many significant correlations/confounding/multicollinearity among the variables.
The diagnostic performances of the statistical predictive models are presented in Table 3, and the ML models in Table 4. Among the statistical predictors, V 1 STJ 18 mV showed the highest accuracy of 0.773 (in test data), and aVR STmid had the highest AUROC (0.727, in test data). The results of recursive feature elimination by crossvalidation are demonstrated in Supplemental File 4. In mod-el_LGBM, the best number of features was 16, 24, and 25; and in model_ET, 25. As a result, we adopted all 25 features to construct the ML models. Compared with the statistical predictive models, model_LGBM and model_ET had higher accuracy (0.842 and 0.831, respectively) and AUROC (0.868 and 0.896, respectively). Figure 2 shows a comparison of ST levels. The STJ of TTS in lead I/II/aVF/V 5 /V 6 was higher than that in Ant-AMI, and the STJ of TTS in lead aVR/V 1 /V 2 was lower than that in Ant-AMI. The STmid and STend showed similar results to STJ, but V 5 STmid, V 5 STend, and I STend were not significant predictors. The representative ECG waveforms are shown in Figure 3. From Table 3, the representative ECG characteristics of TTS compared with Ant-AMI were as follows: no ST elevation of V 1 STJ (118 mV) and ST depression in aVRmid (-10 mV). The representative and visible ECG features of TTS were summarized as no ST elevation in V 1 and ST depression in aVR.
The important features of the 2 ML models are shown in Supplemental File 5. In model_LGBM, the V 1 STJ showed the highest feature importance. Conversely in model_ET, not only V 1 STJ but diabetes and hyperlipidemia showed high feature importance. The SHAP values of the models are shown in Figure 4, which showed similar pattern to Supplemental File 5. V 1 STJ showed the highest feature Numeric variables are displayed as median [interquartile range: 25%, 75%], and categorical variables are displayed as n (%). STJ, STmid, STend, and T wave are expressed as mV, and are explained in Figure 1. CK and CKMB were maximum value during acute phase, and they were not analyzed by logistic regression. Statistical comparison methods, abbreviations are explained in Table 1 footnote. In logistic regression, BNP were analyzed per 100 pg/mL, and the result of OR and 95% CI were displayed as per 100 values. WBC per 100 counts/mm 3 , ST levels per 10 mV, and T-wave amplitude per 100 mV. P , .05 was considered significant; significant values are denoted by an asterisk (*).

Discussion
We built predictive models for TTS and Ant-AMI using an automated ECG system with mV-level measurements. The ST levels in several leads were significant predictors, and we were able to provide clinically useful cutoff values for them, as shown in Table 3. Among them, V1 STJ 118 mV showed the highest accuracy, and aVR STmid showed the highest AUROC in test data. Conversely, ML predictive models demonstrated higher accuracy and AUROC than statistical models. In the ML models, V 1 STJ played an important role in model building.

ST level as a predictor of TTS
In addition to ST levels in V 1 , ST levels in I/II/aVR/aVF/V 2 / V 6 were significant predictors of TTS.
The significance of lead V 2 can be explained by the importance of the nearest lead, V 1 , which is well known. Kosuge  Initially, a receiver operating characteristic (ROC) curve analysis was performed, area under ROC (AUROC) was measured, and cutoff value was calculated by Youden index. AUROC of hyperlipidemia (HL) and diabetes (DM) were not evaluated because they were bivariate categorical variables. The statistical predictive model consisted of 2 methods, an assessment of whether a parameter of each case had higher/lower value than the cutoff (named as cutoff value model), and propensity score (PS) of each predictor was calculated on the prediction data, and the PS formula for each predictor was constructed (named as ROC curve model), which was applied to the test data and AUROC was measured by ROC curve analysis. Confusion matrix was prepared from the model, and diagnostic performance  Table 2 footnote. The diagnostic performance was explained by accuracy (Acc) / sensitivity (Sens); named as recall / positive predictive value (PPV); named as precision (Prec.) / and F1 score (harmonic mean of recall and Prec.). Ten times random cross-validation was performed, and the average of results was displayed.  and colleagues 2 explained that V 1 is located in both the right ventricular anterior region and the right paraseptal region; however, abnormalities of wall contraction in TTS rarely extend to the right ventricle region, compared with AMI. Therefore, the significance of other leads can be explained as follows: Lead aVR Several investigators found more ST depression in TTS than in Ant-AMI. 2,16,17 Ant-AMI causes greater injury than TTS. In the present study, the CK-MB level in Ant-AMI was higher than that in TTS. Lead aVR is known as a "cavity lead" and allows for visualization of the left ventricle (LV); therefore, the aVR lead can help determine the total damage of the LV. 18 Lead II/aVF ST elevation was found in inferior leads in 33%-50% of TTS cases in large series. 19,20 Jim and colleagues 3 emphasized the importance of ST elevation in the inferior leads as a new criterion for TTS diagnosis. They described that the inferior myocardium is universally affected in typical TTS, theoretically expressed as inferior ST-segment elevation. Compared with TTS, Ant-AMI tends to show large LV damage, in which the vector of injuries of opposing walls cancel each other out, and simultaneous ischemia in both the lateral and inferior walls reduces the ST-segment changes in their respective leads.
Lead V 6 /I Inoue and colleagues 21 reported more prevalent V 6 ST elevation in TTS than in Ant-AMI. Ogura and colleagues 20 reported that the ratio SST elevation of V 4 -V 6 / SST elevation of V 1 -V 3 was a significant predictor of TTS. Regarding lead I of TTS, there were no reports. However, these differences can easily be understood from the perspective of RV involvement. Chia and colleagues 22 reported ST depression of the lateral lead as a sign of right ventricular ischemia. Both lead V 1 ST elevation and lead I/V 6 ST depression can show RV involvement of Ant-AMI.

Machine learning of ECG to distinguish between TTS and Ant-AMI
Although the accuracy was not very high for each of the 25 significant predictors, their aggregation created excellent predictive ML models with the algorithms Model_ET and Model_LGBM, which are ensemble learning models and use decision trees. 12,13 The method of model_ET to divide trees is based not on the best fit method but on a random choice of the Gini coefficient or entropy; consequently, model_ET can show high performance, especially in the presence of noisy features. 12,23 Therefore, model_ET has an advantage if the importance of each variable is not very high. In the present study, the accuracy of 22 significant predictors was limited, and model_ET was suitable for building a predictive model. Each red and blue point shows a case with takotsubo syndrome and acute anterior myocardial infarction, respectively. On the y-axis, features are sorted based on their importance; color shows the feature value from low (blue) to high (red). The SHAP value is displayed on the x-axis, wherein left side (minus value) shows negative impact and right side (plus value) shows positive impact. Abbreviations as in Figure 1 and Table 2 Model_LGBM is frequently used and is known to exhibit higher diagnostic performance, especially on table data, compared to other ensemble ML methods. 12,24 Following are the advantages of model_LGBM: (1) Model_LGBM uses the boosting method, which is a series data composition, instead of bagging (bootstrap aggregating; used in random forest method). Consequently, the learning speed is faster than that of the parallel data composition of bagging. (2) Decision trees of model_LGBM are a leaf-wise tree growth method, which is much faster than the level-wise tree growth method (used, eg, in XG boosting). (3) Fine-tuning of hyperparameters can be performed more easily in model_LGBM than in other ML models, which improves the accuracy of the model. Therefore, model_LGBM can perform excellently under the condition of many low-importance parameters, such as in the present study.
The SHAP method is novel and can show both feature importance and correlation (positive or negative). In statistical comparison by multivariate analysis, it is difficult to compare the importance of all parameters when there are pairs with significant correlation/confounding/multicollinearity. However, in ML (especially ensemble learning models using decision trees), most models are adjusted by internal regularization, and their predictive value is not affected. The explainability of features is somewhat affected (weakened) when there is significant correlation, confounding, or multicollinearity; as opposed to statistical models, parameters are recognized not to exclude because of internal regularization. 25

Study limitations
This study was performed with a small sample size of patients; therefore, several limitations were inevitable: no external validation (using separate external test data), no ECG variations, and no validation in other similar clinical populations (acute pericarditis, inferior AMI, atypical TTS, no sinus rhythm, etc). The precision of the study became relatively low because of the small sample size of test data. We did not perform deep learning as ML method because of the size. Because STEMI and NSTEMI cases are treated differently, it was ideal to separate the 2 groups. Combining multiple leads seems to produce good results, but in our preliminary data, it induced overfitting to prediction data and we could not show the usefulness of the combination. Although the V 1 STJ was essential in both ML models, other important features of the models were not the same; hence, the diagnosis by the 2 models might be different in several patients. An automated system of ECG (ECAPs12c) with mV-level measurement demonstrated a higher diagnostic performance; however, this system is not commonly used worldwide.

Conclusion
ML on the parameters of the automated ECG system with mV measurement showed superior diagnostic performance compared to conventional single ECG parameters to distin-guish between TTS and Ant-AMI. Although the results of the present study were limited by the small sample size, it may be a clinically useful ECG-based discriminator.