Patients
Patients were recruited from an outpatient clinic based in the Rheumatology Department at Fundación Santa Fe de Bogotá University Hospital, Colombia. Stratified random sampling was conducted, selecting patients with different disease activity states (according to the most recent clinical record) from a previously established ongoing cohort of 820 patients. The sample size was calculated for the desired correlation coefficient of 0.6, a population correlation coefficient of 0.8, a power of 0.8, and a confidence interval of 0.95. All those who were invited to participate fulfilled the 2010 ACR/European League Against Rheumatism (EULAR) classification criteria [10] and were at least 18 years old. Those who had a history of trauma, septic arthritis, joint replacement or synovectomy, joint deformity, and or crystal arthropathy were excluded.
The following data were registered at baseline: age, gender, treatment, erythrocyte sedimentation rate (ESR), C-reactive protein (CRP), rheumatoid factor (RF), global health assessed by the patient (GH), Clinical Disease Activity Index (CDAI), and Simplified Disease Activity Index (SDAI).
Training session dedicated to standardization
Our planned training session was structured to focus on standardization of the tenderness and swelling assessment of 28 joints (bilateral shoulders, elbows, wrists, knees, metacarpophalangeal (MCP), and interphalangeal (PIP)).
Three rheumatologists, with 10, 13, and 15 years of experience in clinical examination of RA patients, attended three sessions (separated by 1–2 weeks). Each session was divided into three steps: (1) individual joint assessment was performed by each rheumatologist (blinded to both clinical and other rheumatologist's data), (2) 20-min discussion on practice observations, difficulties, limitations, and facilitating factors during the physical examination, in order to reach agreement on uniform criteria and technique, and (3) joint individual reassessment based on the unified criteria. On the third session, disease activity indexes (DAS28-ESR, SDAI, and CDAI) were calculated individually by each rheumatologist.
Ultrasound assessment
Twenty minutes after the last training session's third step, each patient was instructed to proceed to the ultrasound (US) assessment room. The US examination was performed by a radiologist with 15 years of experience and training in musculoskeletal radiology (blinded to physical examinations' data) on 10 joints (bilateral wrists, and 2nd to 5th MCPs), using a GE (General Electric) LOGICQ E ultrasound machine with a 6–13 Hz multifrequency linear transducer. Findings were systematically graded according to the Outcome Measures in Rheumatology Clinical Trial (OMERACT) guidelines, evaluating synovitis as synovial hypertrophy defined by an abnormal nondisplaceable and poorly compressible hypoechoic intraarticular tissue [11]. PIPs joins were considered as potential confounders due to the eventual overlapping of tenosynovitis, therefore were not assessed.
Synovitis grading was conducted based on a scoring method initially introduced by Szkudlarek et al., widely used for studies of this kind [12,13,14] and currently supported by the EULAR-OMERACT ultrasound taskforce [15, 16] (0 = absence of synovial thickening, 1 (mild) = minimal synovial thickening, filling the angle between the periarticular bones, without bulging over the line linking tops of the corresponding bones, 2 (moderate) = synovial thickening bulging over the line linking tops of the periarticular bones but without extension along the bone diaphysis, 3 (severe) = severe synovial thickening bulging over the line linking tops of the periarticular bones and with extension to at least one bone diaphysis). Normal distances between bone and joint capsule were acknowledged based on average population values proposed by Schmidt et al. [17].
Statistical analysis
Statistical analysis was performed using STATA software, version 15.0. Descriptive analysis was presented for continuous variables with central tendency measures as mean and standard deviation (SD) or as the median and interquartile range (IQR) for normally or nonnormally distributed data, respectively. For dichotomous variables, data were presented with percentages and absolute values.
Interobserver agreement and concordance were calculated through Cohen's kappa (between two observers considering all the possible pairs, i.e., Observer A vs B, observer A vs C, observer B vs C), Fleiss' kappa (between the three observers), and percentage of an overall agreement (percentage of observed exact agreements). The relative strength of agreement was described according to the following ranges of kappa (k) coefficients: < 0.00 = poor, 0.00–0.20 = slight, 0.21–0.40 = fair, 0.41–0.60 = moderate, 0.61–0.80 = substantial, and 0.81–1.00 = almost perfect [18]. Linear correlation coefficients (Pearson correlation coefficient) were also calculated for tender and swollen joint counts.