Testing the PROMIS® Depression measures for monitoring depression in a clinical sample outside the US

doi:10.1016/j.jpsychires.2015.06.009

Journal of Psychiatric Research

Volume 68, September 2015, Pages 140-150

https://doi.org/10.1016/j.jpsychires.2015.06.009 Get rights and content

Highlights

•
PROMIS Depression shows excellent measurement properties in a clinical sample from Spain.
•
Scores detect depression with similar ability than other common depression measures.
•
Scores discriminates between major depression and comorbid anxiety disorders.
•
Results support the usefulness of PROMIS Depression for cross-national comparisons.
•
PROMIS depression can be used for monitoring depression in clinical settings.

Abstract

The Patient Reported Outcomes Measurement Information System (PROMIS) was devised to facilitate assessment of patient self-reported health status, taking advantage of Item Response Theory. We aimed to assess measurement properties of the PROMIS Depression item bank and an 8-item static short form in a Spanish clinical sample. A three-month follow-up study of patients with active mood/anxiety symptoms (n = 218) was carried out. We assessed model unidimensionality (Confirmatory Item Factor Analysis), reliability (internal consistency and Item Information Curves), and validity (convergent-discriminant with correlations; known-groups with comparison of means and effect sizes; and criterion validity with Receiver operating Characteristics (ROC) analysis). We also assessed 3-month responsiveness to change (Cohen's effect sizes (d) in stable and recovered patients). The unidimensional model showed adequate fit (CFI = 0.97, RMSEA = 0.08). Information Curves had reliabilities over 0.90 throughout most of the score continuum. As expected, we observed high correlations with external self-reported depression, and moderate with self-reported anxiety and clinical measures. The item bank showed an increasing severity gradient from no disorder (mean = 48, SE = 0.6) to depression with comorbid anxiety (mean = 55.8, SE = 0.4). PROMIS detected depression disorder with great accuracy according to the area under the curve (AUC = 0.89). Both formats, item bank and short form, were highly responsive to change in recovered patients (d > 0.7) and had small changes in stable patients (d < 0.2). The good metric properties of the Spanish PROMIS Depression measures provide further evidence of their adequacy for monitoring depression levels of patients in clinical settings. This double check of quality (within countries and populations) supports the ability of PROMIS measures for guaranteeing fair comparisons across languages and countries in specific clinical populations.

Introduction

Certain areas of medicine have a sustained interest in the development of Patient Reported Outcome (PRO) instruments (Black and Jenkinson, 2009). This interest has been accompanied by a proliferation of condition-specific instruments, causing a fragmentation of measures that hampers comparability across studies, settings, or pathologies. As a response, the Patient-Reported Outcomes Measurement Information System (PROMIS^®) (Cella et al., 2007) was devised in the US as a publicly available measurement system of self-reported health based on a domain-specific approach without attributions to specific conditions or treatments (Cella et al., 2010). PROMIS focuses on comparability between health states and populations through the application of item response theory (IRT), a psychometric method for item-calibration allowing a common metric for different populations, broader range of scores, and greater precision in individual measures compared to classical test theory methods. IRT properties yield the possibility of alternative administration forms: full item banks, static short forms or dynamic computer adaptive testing (CAT) that selects items in real time targeted to the examinee's specific level of ability or impairment, reducing the number of questions needed and respondent burden without a substantial loss of precision (Hambleton et al., 1991, Van der Linden and Glas, 2000). However, administration burden is increased as CAT requires computerized support in applications.

The international extension of PROMIS is currently underway (Alonso et al., 2013) with PROMIS domains being culturally adapted into several languages (Patient-reported outcomes measurement information system, 2015a). To support the usefulness of PROMIS^® for cross-national comparisons, it is important to demonstrate that PROMIS measures are valid, reliable and responsive to change when used outside the US. The assessment of cross-cultural differential functioning at the item (DIF) and test (DTF) level is also crucial to ensure that items are similarly understood and the measures are unbiased across different subpopulations, most importantly, countries, cultures and conditions.

A case of particular importance is emotional disturbance and depression, constructs negatively influencing the course of health (Anderson et al., 2001, Scott et al., 2009) that have been recommended as main outcomes to assess the impact of treatments for various specific conditions (Turk et al., 2003). Efforts have been made to develop item banks for CAT depression instruments (Fliege et al., 2005, Forkmann et al., 2013, Gardner et al., 2004, Gibbons et al., 2008, Gibbons et al., 2012). Among them, the PROMIS system includes a depression domain as part of the overall health profile; it is also the only IRT-based depression measure available in Spanish. An interesting feature of PROMIS Depression is that it does not include items regarding somatic symptoms (e.g. sleep problems, appetite disturbances), unlike other commonly used depression measures (Beck et al., 1996, Spitzer et al., 1999). Thus PROMIS avoids potential confounding effects when assessing patients with comorbid physical conditions. Another advantage of PROMIS measures is that they are designed to be population-independent and sensitive to prevalence but also to a wide range of severity levels. The dimensional approach also allows averting difficulties related to changes in the consensus criteria of categorical nosologies, a problem which is known to have a great impact in clinical patient status when it comes to modification of disorder compulsory criteria (Pereda and Forero, 2012). Additionally, it can provide valuable information on real or biased cross-national differences in the epidemiology of depression (Forero et al., 2014b) (Weissman et al., 1996).

In order to gain evidence about their usefulness, PROMIS Depression attributes should be tested in clinical environments in different languages. Of greatest concern is the evaluation of construct validity and responsiveness in patient samples relevant to the construct of interest. PROMIS Depression has shown good results in patients with major depression (Pilkonis et al., 2014) and other conditions (Amtmann et al., 2014). However, the psychometric properties of the PROMIS Depression measures in Spanish or other language versions have not been evaluated so far.

This study aimed at testing the measurement properties of the Spanish version of PROMIS^® Depression in patients seeking mental health care at different care levels in Spain. Specifically, our objectives were to: a) confirm the measurement model and unidimensionality of the PROMIS Depression item bank; b) assess reliability, construct-related validity and responsiveness to change of the item bank and the 8-item static short form.

Section snippets

Selection of the sample

This study was conducted as part of the Inventory of Depression and Anxiety Symptoms (INSAyD) project (Olariu et al., 2014), a prospective study designed to provide brief and easy-to-use tools for diagnosing and assessing severity of mood and anxiety disorders, based on DSM-IV-TR symptom criteria, in a sample of primary care and specialized mental health patients seeking help for active symptoms of mood or anxiety. Patients were invited to participate from October 2011 to February 2013. Three

Results

Out of 244 patients invited, 96.7% were interviewed (8 did not meet inclusion criteria and 3 refused to participate). Of them, 15 did not provide information on self-reported scales including PROMIS. Among the 218 participants who completed baseline self-reported measures, 47 (19.8%) were lost to follow up after 3 months and one was excluded. Additionally, 20 (8.3%) did not respond to the PROMIS depression item bank at follow up. The baseline analysis was carried out with these 218 individuals.

Discussion

This study assesses the psychometric properties of the Spanish version of the PROMIS Depression measures in a sample of individuals with common mental disorders. In this first study evaluating the performance of the Spanish PROMIS Depression in a clinical sample it was shown to be reliable, valid and responsive. Both the item bank and the short form were able to discriminate between MDE and frequently comorbid disorders while capturing aggravation due to comorbidity. Our results are comparable

Conclusions

Results indicate good reliability; construct validity and responsiveness of the Spanish PROMIS Depression item bank and the 8-item static short form, thus supporting PROMIS as a good measure of depression state levels. The fact that these results are found in a clinical sample demonstrates its ability for monitoring depression in clinical settings in spite of not having been designed as a clinical diagnostic instrument. Given that it is part of a broader assessment of different health outcomes

Financial disclosure and acknowledgments

We would like to thank the participating patients and health care centers who made this project possible. This study was supported by grant from Instituto de Salud Carlos III FEDER (grant references: FEDER PI10/00530; FEDER PI13/00506). Gemma Vilagut was supported by Fondo De Investigación Sanitaria. ISCIII (ECA07/059).

References (57)

D. Cella et al.
The patient-reported outcomes measurement information system (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005-2008
J. Clin. Epidemiol.
(2010)
M.W. Enns et al.
Confirmatory factor analysis of the beck anxiety and depression inventories in patients with major depression
J. Affect Disord.
(1998)
C.G. Forero et al.
Differential item and test functioning methodology indicated that item response bias was not a substantial cause of country differences in mental well-being
J. Clin. Epidemiol.
(2014)
T. Forkmann et al.
Adaptive screening for depression–recalibration of an item bank for the assessment of depression in persons with mental and somatic diseases and evaluation in a simulated computer-adaptive test environment
J. Psychosom. Res.
(2013)
L.V. Hedges et al.
Estimation of a single effect size: parametric and non-parametric methods
A. Lobo et al.
Validación de las versiones en español de la Montgomery-Asberg depression y la Hamilton anxiety rating scale para la evaluación de la depresión y de la ansiedad
Med. Clin. Barc.
(2002)
J.V. Luciano et al.
Psychometric properties of the twelve item world health organization disability assessment schedule II (WHO-DAS II) in Spanish primary care patients with a first major depressive episode 14112
J. Affect Disord.
(2010)
P.A. Pilkonis et al.
Validation of the depression item bank from the patient-reported outcomes measurement information system (PROMIS) in a three-month observational study
J. Psychiatr. Res.
(2014)
D.C. Turk et al.
Core outcome domains for chronic pain clinical trials: IMMPACT recommendations
Pain
(2003)
J. Alonso et al.
The case for an international patient-reported outcomes measurement information system (PROMIS(R)) initiative
Health Qual. Life Outcomes
(2013)

D. Amtmann et al.

Comparing CESD-10, PHQ-9, and PROMIS depression instruments in individuals with multiple sclerosis

Rehabil. Psychol.

(2014)

R.J. Anderson et al.

The prevalence of comorbid depression in adults with diabetes: a meta-analysis

Diabetes Care

(2001)

J.S. Bajaj et al.

PROMIS computerised adaptive tests are dynamic instruments to measure health-related quality of life in patients with cirrhosis

Aliment. Pharmacol. Ther.

(2011)

A.T. Beck et al.

An inventory for measuring clinical anxiety: psychometric properties

J. Consult Clin. Psychol.

(1988)

A.T. Beck et al.

Comparison of beck depression inventories -IA and -II in psychiatric outpatients

J. Pers. Assess.

(1996)

P.M. Bentler

Alpha, dimension-free, and model-based internal consistency reliability

Psychometrika

(2009)

N. Black et al.

Measuring patients' experiences and outcomes

BMJ

(2009)

M. Browne et al.

Alternative ways of assessing model fit

I.M. Cameron et al.

Psychometric comparison of PHQ-9 and HADS for measuring depression severity in primary care

Br. J. Gen. Pract.

(2008)

D. Cella et al.

The patient-reported outcomes measurement information system (PROMIS): progress of an NIH roadmap cooperative group during its first two years

Med. Care

(2007)

J. Cohen

Statistical Power Analysis for the Behavioral Sciences

(1988)

M. Comeche et al.

Cuestionarios, inventarios y escalas. Ansiedad, depresión y habilidades sociales

(1995)

H. Correia

PROMIS^® Instrument Development and Validation Scientific Standards Version 2.0. Appendix 14

(2013)

C. Diez-Quevedo et al.

Validation and utility of the patient health questionnaire in diagnosing mental disorders in 1003 general hospital Spanish inpatients

Psychosom. Med.

(2001)

H.F. Fischer et al.

Screening for mental disorders in heart failure patients using computer-adaptive tests

Qual. Life Res.

(2014)

H. Fliege et al.

Development of a computer-adaptive test for depression (D-CAT)

Qual. Life Res.

(2005)

C.G. Forero et al.

Towards a biopsychosocial nosology of mental illness: challenges and opportunities for psychiatric epidemiology

J. Epidemiol. Community Health

(2014)

W. Gardner et al.

Computerized adaptive measurement of depression: a simulation study

BMC Psychiatry

(2004)

Cited by (36)

Diagnostic operating characteristics of PROMIS scales in screening for depression
2021, Journal of Psychosomatic Research
Citation Excerpt :
Since both are widely-accessible public domain measures, this supports both as viable screening options. Only four previous studies [8–11] have examined the operating characteristics of the PROMIS depression scales using a criterion standard psychiatric interview; their findings are compared to our study in Table 4. Only two studies prior to ours reported operating characteristics for major depressive disorder (the other two examined any depressive disorder), and three of the previous studies had a relatively small number of patients with major depression (18 to 32 cases).
To determine the diagnostic operating characteristics of The Patient Reported Outcomes Measurement Information Systems (PROMIS) depression scales in screening for major depression.
Interview data from patients enrolled in clinical trials involving patients with chronic pain (2 trials) or post-stroke (1 trial) were analyzed. This included baseline and follow-up interviews in 648 and 586 study patients, respectively. Patients completed PROMIS depression scales of varying lengths (4-item, 6-item, and two 8-item scale versions) as well as the Patient Health Questionnaire 9-item (PHQ-9) depression scale. A Structured Clinical Interview for DSM Disorders (SCID) was administered to establish diagnoses of major depression and any depression. Sensitivity and specificity at various score cutpoints as well as area under the curve (AUC) were calculated.
PROMIS scales of varying lengths had similar diagnostic operating characteristics. The optimal screening cutpoint for PROMIS raw scores on the 4-, 6-, and 8-item scales was 8, 12, and 16, respectively, which corresponds to a PROMIS T-score of 55. The average sensitivity and specificity of the two PROMIS 8-item scales for major depression across the 3 trials using a T-score cutpoint ≥55 was 0.81 and 0.84, respectively. This was almost identical to 0.81 and 0.82 for the PHQ-9 at its standard cutpoint score ≥ 10. The average AUC for major depression was identical (0.91) for the two PROMIS 8-item scales and PHQ-9, and also similar for any depression (0.87 to 0.89).
PROMIS scales ranging from 4 to 8 items have strong operating characteristics comparable to the PHQ-9 in screening for depression.
Trial registration: ClinicalTrials.gov ID: NCT01236521, NCT01583985, NCT01507688
Individual differences, personality, social, family and work variables on mental health during COVID-19 outbreak in Spain
2021, Personality and Individual Differences
Citation Excerpt :
Items were answered according to a 5-point Likert scale containing a range of replies from 0 (never) to 5 (always). Previous studies have found adequate psychometric properties (Vilagut et al., 2015). High score on both scales indicates higher anxiety and depression.
Spain is one of the countries with the highest number of Covid-19 cases per habitant. On March 14, 2020, the Government declared the State of Alarm which included the mandatory confinement of all citizens. On March 30 and April 11, we surveyed 1659 adults to research the relevance of social/work status and personality variables in the prediction of psychological health (anxiety, depression and life satisfaction). Results indicated that women and young reported higher anxiety, depression, conflict between work and family relationship, conscientiousness, and extraversion. Men reported higher emotional stability. The variables considered predicted a substantial percentage of variance on anxiety (36%), depression (38%) and life satisfaction (19%), with a significant relative contribution of personality traits. People with poorer psychological health also showed more conflict between work and family relationships. Working at office was more related to anxiety while working at home was more related to depression. We noted that the influence of impact of job status and conflict between work and family relationship as mental health as performance might depend on individual differences. Depression, anxiety and life satisfaction were predicted by personality and social/work variables, which highlights the importance of consider these variables to address mental health in this situation.
Minimally important differences and severity thresholds are estimated for the PROMIS depression scales from three randomized clinical trials
2020, Journal of Affective Disorders
Citation Excerpt :
The only previous study to suggest a possible MID focused on 194 patients undergoing treatment for depression over 12 weeks and used PROMIS CAT administration and a retrospective global rating of change anchor to provide an MID estimate of 2.5 to 5 points (Pilkonis et al., 2014). The strong correlations (mean = 0.72) between PROMIS scales and the PHQ-9 were similar to correlations previously reported that ranged from 0.63 to 0.84 (Amtmann et al., 2014; Choi et al., 2014; Pilkonis et al., 2014; Tang et al., 2019; Vilagut et al., 2015). Second, the correspondence between PROMIS and PHQ-9 scores (1.25 point T-score change for each 1 point change in the PHQ-9) may be useful in interpreting studies that use only one of these measures.
Patient Reported Outcomes Measurement Information Systems (PROMIS) scales are increasingly being used to measure symptoms in research and practice. The purpose of this study was to determine the minimally important difference (MID) and severity thresholds (cut-points) for the four fixed-length PROMIS depression scales.
The study sample was adult participants in three randomized clinical trials (N=651). MID was estimated using multiple distribution- and anchor-based approaches including assessing correspondence between Patient Health Questionnaire (PHQ-9) and PROMIS depression scores.
The best MID estimate was a PROMIS depression T-score of 3.5 points with most methods producing an MID in the 3 to 4 point range across all three samples. MID estimates were similar for all four PROMIS scales. A PHQ-9 1-point change equated to a PROMIS 1.25-point T-score change. PROMIS T-scores of 55, 60, 65, and 70 appeared to be reasonable thresholds for mild, moderate, moderately severe, and severe depression, respectively.
The study sample was predominantly male veterans with either chronic pain (2 trials) or previous stroke (1 trial). The severity of depression was mild to moderate.
A T-score of 3 to 4 points is a reasonable MID for PROMIS depression scales and can be used to assess treatment effects in both practice and research as well to calculate sample sizes for clinical trials. Severity cut-points can help interpret the meaning of scores and action thresholds for treatment decisions.
Spanish adaptation of the Gender-Related Variables for Health Research (GVHR): Factorial Structure and Relationship with Health Variables
2023, Spanish Journal of Psychology
Validation of the computerized adaptive test for mental health in primary care
2019, Annals of Family Medicine
Citation Excerpt :
By design, CATs minimize measurement uncertainty and have greater precision than traditional self-report assessments. Several CATs for depression and anxiety have been developed,20–37 including the Computerized Adaptive Test for Mental Health (CAT-MH). The CAT-MH comprises a suite of assessments, including ones for MDD screening,38 MDD severity,39,40 and anxiety severity.41
The US Preventive Services Task Force recommends screening for depression in the general adult population. Although screening questionnaires for depression and anxiety exist in primary care settings, electronic health tools such as computerized adaptive tests based on item response theory can advance screening practices. This study evaluated the validity of the Computerized Adaptive Test for Mental Health (CAT-MH) for screening for major depressive disorder (MDD) and assessing MDD and anxiety severity among adult primary care patients.
We approached 402 English-speaking adults for participation from a primary care clinic, of whom 271 adults (71% female, 65% black) participated. Participants completed modules from the CAT-MH (Computerized Adaptive Diagnostic Test for MDD, CAT–Depression Inventory, CAT–Anxiety Inventory); brief paper questionnaires (9-item Patient Health Questionnaire [PHQ-9], 2-item Patient Health Questionnaire [PHQ-2], Generalized Anxiety Disorder 7-item Scale [GAD-7]); and a reference-standard interview, the Structured Clinical Interview for DSM-5 (Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition) Diagnoses.
On the basis of the interview, 31 participants met criteria for MDD and 29 met criteria for GAD. The diagnostic accuracy of the Computerized Adaptive Diagnostic Test for MDD (area under curve [AUC] = 0.85) was similar to that of the PHQ-9 (AUC = 0.84) and higher than that of the PHQ-2 (AUC = 0.76) for MDD screening. Using the interview as the reference standard, the accuracy of the CAT–Anxiety Inventory (AUC = 0.93) was similar to that of the GAD-7 (AUC = 0.97) for assessing anxiety severity. The patient-preferred screening method was assessment via tablet/computer with audio.
Computerized adaptive testing could be a valid and efficient patient-centered screening strategy for depression and anxiety screening in primary care settings.
Cross-cultural adaptation and psychometric properties of the Thai version of the patient-reported outcomes measurement information system short form– depression 8a in individuals with chronic low back pain
2024, Journal of Patient-Reported Outcomes

View all citing articles on Scopus

¹: INSAyD Investigators: Jordi Alonso, Carlos García Forero, Gemma Vilagut, Pilar Álvarez, José-Ignacio Castro-Rodriguez, Luis Miguel Martín-López, Maite Campillo, Lina Abellanas, Carrie Garnier, Maria Rosa Más, Marta Reinoso, Gabriela Barbaglia, Miquel A. Fullana, Alberto Maydeu, Anna Brown.

View full text

Testing the PROMIS® Depression measures for monitoring depression in a clinical sample outside the US

Highlights

Abstract

Introduction

Section snippets

Selection of the sample

Results

Discussion

Conclusions

Financial disclosure and acknowledgments

J. Clin. Epidemiol.

J. Affect Disord.

J. Clin. Epidemiol.

J. Psychosom. Res.

Med. Clin. Barc.

J. Affect Disord.

J. Psychiatr. Res.

Pain

The case for an international patient-reported outcomes measurement information system (PROMIS(R)) initiative

Health Qual. Life Outcomes

Comparing CESD-10, PHQ-9, and PROMIS depression instruments in individuals with multiple sclerosis

Rehabil. Psychol.

The prevalence of comorbid depression in adults with diabetes: a meta-analysis

Diabetes Care

PROMIS computerised adaptive tests are dynamic instruments to measure health-related quality of life in patients with cirrhosis

Aliment. Pharmacol. Ther.

An inventory for measuring clinical anxiety: psychometric properties

J. Consult Clin. Psychol.

Comparison of beck depression inventories -IA and -II in psychiatric outpatients

J. Pers. Assess.

Alpha, dimension-free, and model-based internal consistency reliability

Psychometrika

Measuring patients' experiences and outcomes

BMJ

Alternative ways of assessing model fit

Psychometric comparison of PHQ-9 and HADS for measuring depression severity in primary care

Br. J. Gen. Pract.

The patient-reported outcomes measurement information system (PROMIS): progress of an NIH roadmap cooperative group during its first two years

Med. Care

Statistical Power Analysis for the Behavioral Sciences

Cuestionarios, inventarios y escalas. Ansiedad, depresión y habilidades sociales

PROMIS® Instrument Development and Validation Scientific Standards Version 2.0. Appendix 14

Validation and utility of the patient health questionnaire in diagnosing mental disorders in 1003 general hospital Spanish inpatients

Psychosom. Med.

Screening for mental disorders in heart failure patients using computer-adaptive tests

Qual. Life Res.

Development of a computer-adaptive test for depression (D-CAT)

Qual. Life Res.

Towards a biopsychosocial nosology of mental illness: challenges and opportunities for psychiatric epidemiology

J. Epidemiol. Community Health

Computerized adaptive measurement of depression: a simulation study

BMC Psychiatry

Testing the PROMIS^® Depression measures for monitoring depression in a clinical sample outside the US

PROMIS^® Instrument Development and Validation Scientific Standards Version 2.0. Appendix 14