Article Text

Cluster analysis of long COVID in Japan and association of its trajectory of symptoms and quality of life
  1. Fumimaro Ito1,
  2. Hideki Terai1,2,
  3. Masahiro Kondo3,4,
  4. Ryo Takemura3,
  5. Ho Namkoong5,
  6. Takanori Asakura6,7,
  7. Shotaro Chubachi1,
  8. Keita Masuzawa7,
  9. Sohei Nakayama7,
  10. Yusuke Suzuki7,
  11. Mizuha Hashiguchi1,
  12. Junko Kagyo8,
  13. Tetsuya Shiomi8,
  14. Naoto Minematsu9,
  15. Tadashi Manabe1,10,
  16. Takahiro Fukui10,
  17. Yohei Funatsu10,
  18. Hidefumi Koh10,
  19. Katsunori Masaki1,
  20. Keiko Ohgino1,
  21. Jun Miyata1,
  22. Ichiro Kawada1,
  23. Makoto Ishii11,
  24. Yasunori Sato12 and
  25. Koichi Fukunaga1
  1. 1Division of Pulmonary Medicine, Department of Medicine, Keio University School of Medicine, Tokyo, Japan
  2. 2Keio Cancer Center, Keio University School of Medicine Graduate School of Medicine, Shinjuku-ku, Japan
  3. 3Biostatistics Unit, Clinical and Translational Research Center, Keio University Hospital, Tokyo, Japan
  4. 4Graduate School of Health Management, Keio University, Kanagawa, Japan
  5. 5Department of Infectious Diseases, Keio University School of Medicine, Tokyo, Japan
  6. 6Department of Clinical Medicine (Laboratory of Bioregulatory Medicine), Kitasato University School of Pharmacy, Tokyo, Japan
  7. 7Department of Respiratory Medicine, Kitasato University Kitasato Institute Hospital, Tokyo, Japan
  8. 8Department of Internal Medicine, Keiyu Hospital, Kanagawa, Japan
  9. 9Department of Internal Medicine, Hino Municipal Hospital, Tokyo, Japan
  10. 10Division of Pulmonary Medicine, Department of Internal Medicine, Tachikawa Hospital, Tokyo, Japan
  11. 11Department of Respiratory Medicine, Nagoya University Graduate School of Medicine, Nagoya, Japan
  12. 12Department of Preventive Medicine and Public Health, Keio University School of Medicine, Tokyo, Japan
  1. Correspondence to Dr Hideki Terai; hidekit926{at}gmail.com; Dr Ryo Takemura; rtakemura{at}keio.jp

Abstract

Background Multiple prolonged symptoms observed in patients who recovered from COVID-19 are defined as long COVID. Although diverse phenotypic combinations are possible, they remain unclear. This study aimed to perform a cluster analysis of long COVID in Japan and clarify the association between its characteristics and background factors and quality of life (QOL).

Methods This multicentre prospective cohort study collected various symptoms and QOL after COVID-19 from January 2020 to February 2021. This study included 935 patients aged ≥18 years with COVID-19 at 26 participating medical facilities. Hierarchical cluster analysis was performed using 24 long COVID symptom at 3 months after diagnosis.

Results Participants were divided into the following five clusters: numerous symptoms across multiple organs (cluster 1, n=54); no or minor symptoms (cluster 2, n=546); taste and olfactory disorders (cluster 3, n=76); fatigue, psychoneurotic symptoms and dyspnoea (low prevalence of cough and sputum) (cluster 4, n=207) and fatigue and dyspnoea (high prevalence of cough and sputum) (cluster 5, n=52). Cluster 1 included elderly patients with severe symptoms, while cluster 3 included young female with mild symptoms. No significant differences were observed in the comorbidities. Cluster 1 showed the most impaired QOL, followed by clusters 4 and 5; these changes as well as the composition of symptoms were observed over 1 year.

Conclusions We identified patients with long COVID with diverse characteristics into five clusters. Future analysis of these different pathologies could result in individualised treatment of long COVID.

Trial registration number The study protocol is registered at UMIN clinical trials registry (UMIN000042299).

  • COVID-19
  • infection control

Data availability statement

Data are available on reasonable request.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • Prior studies have shown that long COVID presents with various symptoms and is a disease with a diverse mix of phenotypes rather than a single phenotype. Several clusters have been reported, but the relationship between symptoms over time and quality of life is not sufficiently clear.

WHAT THIS STUDY ADDS

  • Long COVID with diverse symptoms was classified into five characterised clusters. The characteristic symptoms of the clusters formed by symptoms at 3 months after diagnosis were maintained over 1 year. Quality of life improved over time in all clusters, but the degree of its decline was maintained over time. Clusters with different characteristics may be caused by different mechanisms and could lead to personalised treatment.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • Clustering patients by long COVID symptoms allowed complex phenotypes to be classified into several features. The symptoms comprising the clusters were maintained over time. Future analysis of the pathologies of these phenotypes may lead to individualised treatment for long COVID.

Introduction

COVID-19, which is an infectious disease caused by SARS-CoV-2, has spread worldwide since the outbreak in December 2019.1 2 Although COVID-19 is a critical global health concern, the mortality rate attributed to COVID-19 has decreased with the development of diagnostic techniques and therapeutic agents.3 However, with the increase in the number of patients with COVID-19, post-COVID-19 sequelae have emerged as another concern. These residual symptoms observed in patients recovering from COVID-19 are commonly called ‘long COVID’4 5 or ‘postacute sequelae of COVID-19’.6

The WHO has developed a clinical case definition of long COVID; fatigue, shortness of breath and cognitive dysfunction are notable common symptoms.7 In addition to these symptoms, the disease causes a variety of other symptoms, including respiratory, ear, nose, and throat, digestive, musculoskeletal, skin, systemic, and mood disorders; the symptoms reported by each patient are diverse.4 5 The disease concept of long COVID has been established; however, the symptoms vary and suggest a combination of different phenotypes, which is not well defined.

Several cluster analyses of long COVID have been reported.8–13 In those previous studies, several long COVID clusters with different phenotypes were reported, including a fatigue cluster, respiratory cluster, psychiatric symptom cluster, and taste and smell disorder cluster. Other studies have reported associations with background factors and quality of life (QOL), but they mainly analysed a single time point and did not fully evaluate how the cluster’s symptoms and QOL change over time. In addition, the definition of long COVID and the time from onset to investigate the long COVID symptoms vary by study.

Therefore, in this study, we performed a cluster analysis of long COVID symptoms in Japanese patients to evaluate the relationship between symptoms and background factors, acute course, and QOL. We also evaluated how the prevalence of symptoms and QOL outcomes changed over time for each cluster.

Methods

Study procedures and participants

This study is part of the exploratory analyses derived from the long COVID research project, designed as a multicentre, interactive cohort study, and the details of the study procedure have been described elsewhere.14 15 A prospective nationwide observational study in Japan was conducted on patients aged ≥18 years. Patients were admitted and discharged with a confirmed diagnosis of COVID-19 by SARS-CoV-2 PCR or antigen testing from January 2020 to February 2021 at 26 participating medical institutions in Japan. Study descriptions and consent forms were mailed to potential research candidates from Keio University Hospital, the principal institution. Subsequently, study invitations were mailed from 26 participating institutions that had obtained ethical review committee approval and permission to conduct the study.

The severity of COVID-19 during hospitalisation was classified according to the Ministry of Health, Labour and Welfare in Japan guidelines (Clinical management of patients with COVID-19: a guide for frontline healthcare workers. Version 2.1)16 as follows: mild, peripheral oxygen saturation (SpO2) above 96%, with no respiratory symptoms or coughing only, no shortness of breath; moderate I, SpO2 between 93% and 95%, with shortness of breath and findings of pneumonia; moderate II, SpO2 below 93% and need for oxygen supplementation and severe, requiring admission to the intensive care unit or mechanical ventilation. Patients were stratified by age according to COVID-19 diagnosis into young (18–40 years), middle aged (41–64 years) and old (>65 years).

An invitation to participate in this research was mailed to all patients at each institution, and those who provided consent were requested to complete a questionnaire on paper or a smartphone application at 3, 6 and 12 months after diagnosis. In our study, we considered individuals who reported experiencing symptoms for at least 3 months after the initial onset as having long COVID symptoms.

Questionnaire

In this study, the following symptoms were identified after being diagnosed with COVID-19: fever, cough, sputum, breathlessness (dyspnoea), hypersensitivity to sound, light and smell (sensory hypersensitivity), weakness and fatigue (fatigue), hair loss (alopecia), joint pain (arthralgia), muscle pain (myalgia), difficulty with muscle strength (muscle weakness), head pain (headache), sore throat, ringing in the ears (tinnitus), loss of consciousness (unconsciousness), abdominal pain, diarrhoea, bumps or redness on the skin (rash), numbness, eye-related symptoms (eye pain, itching, foreign body sensation, redness, watery eyes and eye discharge), memory loss and speechlessness (memory impairment), reduced ability to think and concentrate (poor concentration), and sleeping, taste, and olfactory disorders. Other symptoms that were not present in the 24 representative symptoms described above could be noted in an optional comment section in the questionnaire.

Patients were inquired on the presence of each symptom at the time of hospitalisation and at 3, 6 and 12 months after diagnosis. Medical information covering 168 clinical survey items during medical treatment at each facility was obtained using an electronic data capture system.

Study population

The flow diagram of this study is shown in figure 1. A total of 1066 patients answered a longitudinal questionnaire on long COVID-19 symptoms at least one time (3, and/or 6, and/or 12 months). We analysed the data of 935 patients who answered the longitudinal questionnaire at 3 months after diagnosis.

Figure 1

Flow diagram of the study. Patients for whom clinical data were known and who answered a longitudinal questionnaire at 3 months after diagnosis were analysed.

QOL scores

Patients were scored with questionnaires according to the internationally authorised scoring system for health-related QOL (EuroQOL 5 Dimensions 5 Level (EQ-5D-5L)17 and Short Form-8 (SF-8)18). For EQ-5D-5L, one of the components, the Visual Analogue Scale and for SF-8, the Physical Component Summary and Mental Component Summary (PCS and MCS, respectively), which are components of SF-8, were also analysed.

Mood symptoms of patients were measured using the Hospital Anxiety and Depression Scale (HADS), a 14-item self-report questionnaire that contains two subscales measuring anxiety (HADS-A) and depression.19 20 Scores for each subscale range from 0 (no distress) to 21 (maximum distress).

Statistical analyses

Data are presented as frequencies and proportions for categorical variables. Hierarchical cluster analysis using the 24 variable symptoms at 3 months mentioned above was performed using the Ward’s minimum-variance method.21 22 The results are graphically depicted using a dendrogram. Data were compared among groups using analysis of variance and χ2 tests. A generalised estimation equation (GEE) with logit link and binomial distribution was performed to estimate the association between the clusters classified according to long COVID symptoms and QOL scores at 3, 6 or 12 months. GEE models were fitted using the unstructured working correlation matrix. We included months and clusters (the reference was cluster 2, which had no or few long COVID symptoms) and their interaction terms in the model. A p<0.05 was considered statistically significant. All the statistical analyses were performed using the JMP V.16 software (SAS Institute) and SAS V.9.4 (SAS Institute).

Patient and public involvement

Patients or the public were not involved in the design, or conduct, or reporting, or dissemination plans of our research.

Results

Baseline characteristics of the patients

Baseline clinical characteristics of the patients are shown in table 1. A total of 935 patients (men, n=595, 63.6%) were enrolled in this study. Middle aged (41–64 years) was the most common age group (n=433, 46.4%), followed by old (over 65 years, n=327, 35.0%) and young (18–40 years, n=174, 18.6%) patients groups. COVID-19 acute severity was classified as severe (n=95, 10.5%), moderate II (n=216, 23.7%), moderate I (n=389, 42.7%) and mild (n=189, 20.7%). A total of 292 patients (32.0%) required oxygen supplementation during hospitalisation. A total of 328 patients (36.7%) had a history of smoking. The most common comorbidities were hypertension (n=316, 34.1%), diabetes mellitus (n=157, 17.0%), prior cardiovascular disease (n=59, 6.4%) and cancer (n=59, 6.4%).

Table 1

Baseline characteristics of study population and each cluster

We investigated the presence of 24 symptoms of long COVID; data are shown in table 2. The most common symptoms were fatigue (n=192, 20.5%), dyspnoea (n=128, 13.7%), muscle weakness (n=111, 11.9%), alopecia (n=104, 11.1%) and poor concentration (n=102, 10.9%).

Table 2

Symptoms of the study population and each cluster at 3 months after diagnosis

Comparison of the clinical symptoms and baseline characteristics among the clusters

We performed Ward’s hierarchical cluster analysis based on 24 symptoms at 3 months reported as long COVID symptoms.4–6 Five clusters with different distributions in the variables were identified, as visualised in figure 2. Table 2 shows the long COVID symptoms of each cluster at 3 months after diagnosis. Cluster 1 (n=54) included the population having numerous symptoms across multiple organs. Cluster 2 (n=546) included the population with no or minor symptoms. Cluster 3 (n=76) included the population with taste and olfactory disorders. Cluster 4 (n=207) included the population with fatigue, psychoneurotic symptoms (such as alopecia, headache, poor concentration and sleeping disorders) and dyspnoea (low prevalence of cough and sputum). Cluster 5 included the population with fatigue and dyspnoea (high prevalence of cough and sputum).

Figure 2

Dendrogram illustrating the results of the cluster analysis of 935 patients with COVID-19 using Ward’s hierarchical clustering method.

Baseline characteristics of each cluster are shown in table 1. Cluster 1 was characterised by a high proportion of old patients with severe symptoms, cluster 3 by a high proportion of young women with mild symptoms and cluster 4 by a high proportion of middle-aged patients. Cluster 2 was not characterised by age, sex or acute severity, and cluster 5 tended to have a proportion of patients with severe symptoms. Although no significant differences were observed in the smoking history or comorbidities among the clusters, a trend towards a higher prevalence of diabetes and respiratory diseases, such as chronic obstructive pulmonary disease and asthma, was observed in cluster 1, whereas a trend towards a lower prevalence of hypertension and diabetes was observed in cluster 3.

Comparison of laboratory findings in the acute phase and characteristics among the clusters

We evaluated whether the laboratory findings and characteristics in the acute phase (COVID-19 onset) were associated with the clusters. online supplemental table 1 shows a comparison of the laboratory findings at hospitalisation among the five clusters, and online supplemental table 2 shows the progress characteristics during hospitalisation for COVID-19. Compared with other clusters, cluster 1 had cases with higher lactate dehydrogenase (LDH), C reactive protein (CRP) and haemoglobin A1c (HbA1c) levels than those in other clusters. Cluster 3 demonstrated cases with symptoms of mild acute severity with low white cell count (WCC), LDH and CRP values, and high albumin levels. Cluster 5 demonstrated cases with acute severe symptoms with high WCC and CRP values similar to those of cluster 1. Clusters 2 and 4 did not show characteristic laboratory findings.

There were no significant differences between the clusters in terms of imaging findings or complications regarding the course of hospitalisation. In terms of treatment, compared with other clusters, cluster 1 had a higher proportion of case with oxygen therapy, intubation and medications, such as lopinavir–ritonavir, remdesivir, nafamostat mesylate, anticoagulants and steroids, reflecting the severity of the acute phase. Conversely, in cluster 3, which had many patients with mild illnesses, oxygen therapy and intubation were less common, and the proportion of patients with therapeutic agents was also lower.

Trajectory of symptoms for each cluster

We performed a cluster analysis based on the presence of 24 symptoms at 3 months after diagnosis. We evaluated how these symptoms were progressing 6 and 12 months after diagnosis (figure 3). A total of 935 patients were enrolled in this study; 857 patients responded to the questionnaire at 6 months, and 719 patients answered it at 12 months after diagnosis. The incidence of symptoms decreased over time in each cluster. The characteristics of symptoms constituting each cluster were maintained at 6 and 12 months (such as cluster 1, numerous symptoms across multiple organs; cluster 2, no or minor symptoms; cluster 3, taste and olfactory disorders; cluster 4, fatigue, psychoneurotic symptoms, and dyspnoea without cough and sputum and cluster 5, fatigue and dyspnoea with cough and sputum) following diagnosis.

Figure 3

Heatmap of the symptom prevalence differences comparing five clusters. Heatmap shows the prevalence of the symptoms of each cluster at 3, 6, and 12 months following diagnosis.

Comparison of the QOL outcomes among the clusters

We assessed the QOL scores for each cluster. Online supplemental table 3 shows the values of each QOL score 3 months after diagnosis. Regardless of the type of QOL score, cluster 1 consistently had poorer scores. Cluster 2, which had no or few long COVID symptoms, had a higher QOL score, whereas cluster 3, mainly comprising taste and smell disorders, had a similar QOL score to that of cluster 2. Clusters 4 and 5 were comparable and did not differ between the two groups, although, mental QOL scores tended to be lower in cluster 4.

We evaluated how these QOL scores fluctuate over time (figure 4). The number of respondents decreased at 6 and 12 months (6 months, n=857; 12 months, n=719), necessitating the use of GEE to evaluate the results. Statistical significance was determined by comparing each cluster to cluster 2, which had no or few long COVID symptoms. Across all clusters, QOL scores tended to remain similar or improve over time. Clusters 1 and 4 consistently scored significantly worse than cluster 2 at all time points (3, 6 and 12 months). Cluster 3 had no QOL scores that were significantly worse than those of cluster 2 at all time points. In contrast, cluster 5 exhibited certain QOL scores (EQ-5D-5L, SF-8 PCS and HADS-A) that were significantly worse than those of cluster 2 at all time points. The clusters identified based on long COVID symptoms at 3 months remained consistent at 6 and 12 months concerning the degree of QOL as well as the transition of symptoms.

Figure 4

Comparison of the quality of life (QOL) outcomes among the five clusters. (A) EQ-5D-5L. (B) EQ-5D-5L VAS. (C) SF-8 PCS. (D) SF-8 MCS. (E) HADS-A. (F) HADS-D. To analyse the association between the clusters and QOL scores at 3, 6 or 12 months, we performed a generalised estimation equation with an unstructured working correlation matrix. Data are presented as mean±SD. Statistical analysis evaluated differences in cluster 2. Clusters that differed significantly at all time points (3, 6 and 12 months) are shown in the figure with *p<0.05 and **p<0.005.

Discussion

We performed a cluster analysis of long COVID in Japan and evaluated each cluster’s characteristics, including acute symptoms and association with QOL. To our knowledge, there are few reports of studies assessing changes in symptoms and QOL over time for each cluster.

We identified the following five clusters of long COVID symptoms: cluster 1, having numerous symptoms across multiple organs; cluster 2, no or minor symptoms; cluster 3, taste and olfactory disorders; cluster 4, fatigue, psychoneurotic symptoms (such as alopecia, headache, poor concentration and sleeping disorders) and dyspnoea (low prevalence of cough and sputum) and cluster 5, fatigue and dyspnoea (high prevalence of cough and sputum).

Cluster 1 was a cluster with a variety of symptoms across multiple organs and had more long COVID symptoms than the other clusters. In previous reports, patients who required oxygen or intubation/ventilation were more likely to have long COVID symptoms than those who did not,23 24 and the patients who were older or had more acute symptoms were reported to be at a higher risk for residual long COVID symptoms.25–28 This cluster had a higher proportion of elderly patients with severe symptoms in the acute phase, consistent with findings of previous reports of a cluster with more long COVID symptoms. Cluster 1 demonstrated cases with higher HbA1c, LDH and CRP on performing blood tests on admission. However, whether these inflammatory markers are associated with long COVID is debatable.29–34 Regardless of whether HbA1c, LDH and CRP were possible biomarkers for long COVID, they were related to the acute phase severity factors,35 36 which could indirectly indicate that cluster 1 had more severe cases. There were no significant differences between the clusters in terms of comorbidities, but there was a trend towards more hypertension and diabetes in cluster 1. In this study population, younger patients were significantly less likely to have hypertension (p<0.0001) and diabetes (p<0.0001), which may have influenced the results because cluster 1 has more elderly patients. Cluster 1 also had significantly lower QOL scores than the other clusters. Acute phase severity and the presence of any long COVID symptom are reportedly associated with a lower health-related QOL.37–39 Thus, acute phase severity and high prevalence of long COVID symptoms in cluster 1 could be associated with a lower QOL.

Cluster 2 included the population with no or minor long COVID symptoms. This cluster accounted for approximately 30% of moderate II to severe COVID-19 onset; thus, it was not a cluster with a concentration of groups with mild acute disease. Therefore, the patients in this cluster could have certain factors pointing to a good prognosis of long COVID symptoms. However, this cluster possessed no characteristic trends concerning age, sex, acute severity, smoking history or comorbidity, and no characteristic background was identified in this cluster. Data from the UK’s Office for National Statistics reported that about 3% of COVID-19 patients without coronavirus infection also have symptoms consistent with long COVID.40 In cluster 2, there are a few cases with slight residual symptoms, but the frequency of symptoms is similar to that of the healthy cohort and is considered to be similar to the population recovering from COVID-19.

Cluster 3 included the population with taste and olfactory disorders. A higher proportion of young females with mild acute severity of COVID-19 onset was observed concerning the patient background than in the other clusters. The percentage of patients with COVID-19 of mild acute severity patients was higher in cluster 3 than in cluster 2. Taste disorders are more common in younger and milder cases,41 and olfactory disorders are more common in women42; the results of this study were consistent with those of previous reports. The QOL scores were similar to those of cluster 2. Some reports have suggested that taste and smell disorders are associated with lower QOL43 44; however, they could be milder than neuropsychiatric and respiratory symptoms.

Cluster 4 included the population of patients with fatigue, psychoneurotic symptoms (such as alopecia, headache, poor concentration and sleeping disorders), and dyspnoea (low prevalence of cough and sputum). Cluster 4 had a large proportion of middle-aged patients; however, no significant characteristics were observed in the other background factors. Nevertheless, based on the frequency of the number of patient in this study and previous reports,4–13 the typical long COVID symptoms may be classified as cluster 4.

Cluster 5 included the population of patients with fatigue and dyspnoea (high prevalence of cough and sputum). Unlike cluster 4, it was characterised by a high frequency of symptoms of cough and sputum, indicating a pure respiratory symptom cluster. In the acute phase, inflammatory reactions demonstrated by high WCC and CRP levels, and the high frequency of ground glass opacities and consolidation on CT were observed, which suggest that patients with high inflammation and lower respiratory tract symptoms in the acute phase are more likely to have respiratory long COVID symptoms.

This study is unique in that it evaluated long COVID symptoms at multiple time points. Although the clusters were created with variable symptoms at a single time point (at 3 months after diagnosis), it was possible to monitor the clusters subsequently according to the prevalence of symptoms. As noted in previous reports,45–47 the prevalence of long COVID symptoms decreases over time, similarly in the present study. Interestingly, the symptom component groups that characterised the clusters were maintained at 6 and 12 months. The composition of symptoms per cluster does not change much over time, and symptom combinations at 3 months could be predictive of long-term long COVID symptoms. In addition, although several cluster analysis studies have been reported, the evaluated time period and long COVID definitions vary from study to study (online supplemental table 4). The information that the symptoms comprising the clusters did not change significantly over time was considered useful in comparing and integrating these studies.

QOL scores showed a trend towards improvement over time in each cluster (figure 4). Cluster 2 is as symptomatic as the healthy cohort as previously reported,40 and we tested whether there was a significant difference between the groups compared with cluster 2. Clusters 1 and 4, which had significantly poorer QOL scores at 3 months compared with other clusters, improved over time but were still significantly poorer than cluster 2 at 6 and 12 months. The presence of long COVID symptoms has been reported to be associated with lower QOL,43 44 and the improvement in QOL over time may be due to a decrease in symptoms after the illness, but clusters 1 and 4, where symptoms remained even at 6 and 12 months, did not recover to the extent of cluster 2. In cluster 5, EQ-5D-5L Visual Analogue Scale and SF-8 MCS were worse at 12 months than at 6 months, a different trend from that of the other clusters. Cluster 5 exhibited an increased prevalence of fatigue and dyspnoea at 12 months compared with 6 months, as depicted in figure 3. Additionally, this cluster displayed a marginally higher prevalence of COPD in patients’ background, as shown in table 1. These elevated symptoms, combined with the presence of underlying disease, may have contributed to the observed variations in QOL scores.

The concept of long COVID was proposed; however, the symptoms were diverse, and different clusters could be formed by combining symptoms. Previous reports of cluster analysis have identified multiorgan multisymptomatic phenotypes, taste and olfactory disorders clusters, a fatigue and neuropsychiatric symptoms clusters, and respiratory clusters (online supplemental table 4), which are similar to clusters 1, 3, 4 and 5 of this study, respectively. Potential mechanisms have been proposed as causes of long COVID, such as viral persistence, autoimmunity, reactivation of latent virus and organ damage caused by infection.48 49In an experimental animal model using golden hamsters, SARS-CoV-2 was more likely to affect the olfactory bulb and the olfactory epithelium than influenza A.50Considering these factors, it is suggested that cluster 1 may cause organ damage due to systemic inflammation, while cluster 3 may cause symptoms by different mechanisms, such as nerve damage.

A previous study has demonstrated a higher prevalence of long COVID in historical variants compared with alpha, delta and omicron variants.51 Thus, long COVID could vary not only based on the host factors but also the viral variant. This study covers the periods of the first, second and third epidemic waves in Japan. All the periods included historical variants with minor differences (first wave: B1.1.114, second wave: B.1.1.284 and third wave: B1.1.214). No significant association was observed between the epidemic waves and clusters in this study. Vaccination has been reported to be associated with acute incidence and severity.52 COVID-19 vaccination had not started in Japan during the patient recruitment period of this study, and none of the patients had been vaccinated at the time of COVID-19 onset. Another study analysing some of the cases in this study,53 vaccination did not appear to have a clinically important effect on patients with long COVID symptoms.

In the future, more detailed background information concerning the pathophysiology of each cluster is warranted.

This study has several limitations. First, no information concerning the presence or absence of symptoms prior to infection with COVID-19 is present. Second, although we analysed 24 symptoms with high frequency as long COVID, we were not able to include all the symptoms of each individual in the analysis. Third, the severity of each symptom of long COVID was not collected, and the relationship between each cluster and QOL was not evaluated in terms of severity. Fourth, a healthy control population of COVID-19 non-infected individuals was not included in this study.

Conclusion

We identified patients of long COVID-19 exhibiting diverse characteristics and classified them into five clusters based on their symptom profiles. The composition of symptoms that characterised each cluster remained consistent over time. Additionally, the QOL outcomes varied according to the specific symptoms present in each cluster. Analysing these distinct pathologies could potentially lead to individualised treatment of long COVID.

Data availability statement

Data are available on reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants and this study was approved by the Ethics Committee of Keio University School of Medicine (approval number: #20200243). Additionally, approval and permission to conduct the study were obtained from each participating institutional ethical review committee. The study protocol is registered at UMIN clinical trials registry (UMIN000042299). Participants gave informed consent to participate in the study before taking part.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • FI and MK are joint first authors.

  • Contributors All authors contributed to the patient recruitment. FI, HT, MK and RT were involved in the initial conception. All authors were involved in the planning of the data analyses and interpretation of the data analyses. FI, HT, MK and RT analysed the data. FI drafted the manuscript. All authors contributed to the interpretation, reviewing and editing the first draft, and subsequently agreed on the final manuscript. All authors meet authorship the ICMJE criteria. HT and RT are responsible for the overall content.

  • Funding This research was funded by the Health Labor Science Special Research Project (20CA2054) and supported by AMED (JP20nk0101612, JP20fk0108415, JP20fk0108452, JP21fk0108553, JP21fk0108431, JP22fk0108510, JP21fk0108563, JP21fk0108573, JP22fk0108573, JP22fk0108513, and JP22wm0325031) and JST PRESTO (JPMJPR21R7).

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.