Maldonado (2009) IQ MR Atkins death penalty decision: A psychometric miscarriage of justice?

I'm no longer shocked by what passes as credible psychological/psychometric evidence or testimony in some Atkins IQ MR death penalty court decisions.  Another such decision (Maldonado, 2009) has come to my attention.  A link to a PDF copy of the decision is now available in the Court Decisions section of this blog. 

Readers can gather all the relevant background information re: the case by reading the entire decision.  I intend to focus primarily on the psychological interpretation and psychometric issues in the decision that are troubling. 

But first, before delving into these issues, I  present a few quotes (from the Maldonado record) that capture the confusion and uncertainty surrounding many Atkins decisions, a situation that is resulting in considerable variability in the quality of psychological assessment data reported/interpreted and the common occurrence of "dueling expert witnesses".

Quotes from the record:
Because the Supreme Court did not establish a bright-line test to identify mental retardation, the Atkins inquiry has become a fact-intensive question that heavily relies on the opinions provided by mental-health experts. This case, like most involving Atkins claims, requires consideration of testimony from competing experts who disagree about the nature of mental retardation, the means by which it may be identified, the manner in which it manifests in a criminal defendant’s life, and the psychological profession’s role in making the legal decision of whether mental capacity precludes execution. (p.32)

A “welter of uncertainty” followed the Atkins decision because “[t]he Supreme Court neither conclusively defined mental retardation nor provided guidance on how its ruling should be applied to prisoners already convicted of capital murder.” Bell v. Cockrell, 310 F.3d 330, 332 (5th Cir. 2002). Accordingly, federal courts have approached the implementation of Atkins with some trepidation. (p.36)
After reading a number of of Atkins rulings (see Court Decisions section of this blog), I could not agree more with these statements.  The courts appear ill-equipped to handle the complex psychological measurement issues presented, issues that are, at times, confounded by the inclusion of data from dubious procedures, interpretations of test scores that are not grounded in any solid empirical research, and the deference to a single intelligence battery (the WAIS series) as the "gold standard" when a more appropriate instrument (or combination of WAIS-III/IV and other measures) might have been administered, but the results of the more appropriate measure are summarily dismissed based on personal opinion (and not sound theory or empirical research).

Below are some of the troubling psychological/psychometric issues I see in the Maldonado (2009) decision

Lack of English language proficiency:
  "A theme developed by both parties is that Maldonado’s lack of English proficiency has madehis exact intellectual capacity difficult to gauge." (p.38).
  • As a result, the prosecution psychologist administered the WAIS-III "through a translator" (p.42).  The translator administered WAIS-III resulted in Verbal, Performance, and Full Scale scores of 74, 74, and 72 respectively.  A defense psychologist captures the essence of my reaction to the use of a translator-administered WAIS-III.  “The accepted practice in the evaluation on Spanish-speakers is to communicate with the client in Spanish without the use of translators. In addition, tests should be scientifically translated and validated and the most appropriate norms available should be applied" (p.49).  I agree.  I am unaware of any professionally established and endorsed procedure for the translated administration of the English-normed WAIS-III.  Such a procedure violates a fundamental backbone of the science of individual intelligence testing--standardized test administration.
Use of non-empirical clinical judgement procedures to upwardly adjust IQ scores.  On page 53 of the record, it is indicated that the prosecutions psychological expert believed that the translated English-normed WAIS-III scores needed to be upwardly adjusted due to Maldonado's educational and cultural background.  Additional support came from the finding of a Verbal IQ score of 83 on the WAIS Espanol (p.56).  Although psychologists are appropriately trained to recognize the potential impact of such environmental variables when interpreting scores, the psychologist  upwardly adjusted the scores to a specific IQ score estimate ("It’s around the 80s, I guess, if you had to pin me down. Around the 80s; somewhere in there"- p.48) and this expert "conceded that only 'clinical judgment,' not any statistical formula or established methodology, informed how much to alter an IQ score because of cultural and educational factors" (p. 53). 

My concern with this procedure mirrors the testimony of the defense experts in the case.  Adjusting obtained IQ scores, either up or down, based on an n=1 professionals clinical judgement, in the absence of any scientifically established procedure for adjusting IQ scores, is troubling and is not consistent with accepted psychological assessment practices or standards.  In fact, this IQ adjustmend procedure sounds similar to a notable empirical effort (in the late 1970s and early 1980s) to produce IQ scores that better reflected a persons social-cultural backgroundJane Mercer's SOMPA (System of Multicultural Pluralistic Assessment) was a valiant effort to adjust Wechsler IQ scores for African-American individuals based on their social-cultural knowledge and history.  The result was a new score called Estimated Learning Potential (ELP).  From the start, SOMPA was controversial and eventually was found to be flawed for many reasons (see Hellfinger, 1987; also Jirsa, 1983).  If a reasonably conceived theoretical and empirical IQ adjustment procedure (i.e, SOMPA), which was intended to account for a person's social-cultural background, was found to be flawed, how can a specific n=1 psychologist be endowed with unique insights that allow for the invoking of an unspecified personal algrorithm to make IQ score adjustments?  This is indeed troubling.  Also, to the best of my knowledge, SOMPA is no longer around and is not, or is seldom, used.  Race-based or adjusted norms have not been recognized as an acceptable professional psychological assessment practice for over twenty years!

Dismissal of the BAT-R intelligence results.  Maldonado had also been administered the Woodcock-Muñoz Bateria-R (“Bateria-R”), the Spanish-language counerpart of the Woodcock-Johnson Test of Cognitive Abilities--Revised [conflict of interest notice:  I am not a co-author of the BAT-R  or WJ-R, but was a paid measurement consultant on the WJ-R project and have since become a coauthor of the subsequent edition, the WJ III].  Maldonado obtained a Broad Cognitive Ability (BCA) score, which is analagous to the full-scale score from other intelligence batteries, of 61.  Of all the cognitive tests administered, this is the only comprehensive intelligence battery that was administered in Maldonado's natural language and where his performance is compared against appropriate US-equated Spanish norms [note-- the WAIS Español administered was also administered in his natural language and makes comparisons against Spanish norms, but only a portion, the Verbal section, was administered].  Also, as previously noted at this blog, the WJ-R/BAT-R and WJ III/BAT III provide for the most comprehensive assessment of intellectual functioning as per the consensus model of human intelligence (CHC theory) among serious intelligence scholars.  This is what I've termed the "Atkins MR death penalty IQ test-theory gap.

Why were the BAT-R scores dismissed? 

The prosecution expert "testified that the AAMR has not cited the Bateria-R as a predicate test to the evaluation of mental retardation and “[i]t’s not well suited for that purpose, although you can use it for that" (p. 70).  He also stated that the "Bateria-R test score was especially suspect because it was inconsistent with Dr. _____'s administration of the WAIS Español in which Maldonado scored well above the range for mental retardation." (p.70). Furthermore, this expert "opined that the usefulness of the test was impaired because it 'is used by school psychologists to diagnose learning disabilities and it measures a lot of things like visual and auditory processing. It really measures very little in terms of general intelligence.' " (p.70).  As a result, "the state habeas court dismissed the Bateria-R score because it “is not one of the tests the AAMR cites for mental retardation evaluation,” but instead “is generally used by school psychologists to diagnose learning disabilities” and, in fact, “is not very relevant for establishing general intellectual functioning, so it is not well-suited for determination of the first prong . . . to determine mental retardation" (p.70-71).

There are many problems with the reasons given for dismissing the most culturally appropriate (for Maldonado) and comprehensive measure of intelligence (BAT-R). 

First, dismissing an instrument because it is used primarily by a particular set of psychologists (school psychologists) is non-sensical, and frankly, condescending.  School psychologists typically give many more intelligence tests than psychologists working in adults settings and use these instruments to diagnose mental retardation. School psychologists, in many respects, have more intimate familiarity with intelligence testing than most other professional psychologists.  

Second, the previously mentioned national expert panel that examined the Dx of MR at the same cut-point as Atkin's cases (for SSA benefits) indicated that most intelligence tests would be moving towards measuring the Cattell-Horn and Carroll Gf-Gc models of intelligence (now collectively referred to as CHC theory; also see McGrew, 2009), and instruments based on this model are very relevant to the Dx of MR.  The English version of the WJ-R WJ III (a CHC-based revision of the WJ-R) was listed as one of the approved instruments by the expert panel...which should also implicitly argue for use of the Spanish-language versions.  The prosecution witness, who appears  stuck in the land where the WAIS is the "gold standard," appeared unaware of the recent advancements in understanding the psychometric nature of human intelligence, which has converged on the CHC model of intelligence.  This is particularly ironic given that both John Horn (of Cattell-Horn) and Jack Carroll served as theoretical consultants on the WJ-R...which was the foundation of the BAT-R.

Third, stating that the BAT-R (and WJ-R by implication) is not a respected measure of general intellectual functioning reflects a complete lack of awareness of the CHC-foundation of the instruments, as well as published research (including the WJ-R and BAT-R technical manuals and bulletins).  If CHC theory is the consensus model of psychometric intelligence, then the only battery administered to Maldonado that measured most of the model, which in most conceptualizations has general intelligence (g) at the apex, should have been given serious weight.  I, and others, heard Dr. Arthur Jensen, the most prominent pscyhometric expert on g, at an ISIR conference in Nashville, TN, state, during a discussion of a presentation in front of the entire audience, that (at the time) he considerd the WJ-R (and, thus, the BAT-R by implicit endorsement) the best available intelligence battery for measuring g.  I will return to this point in future posts as I've been analyzing WAIS-III data together with the WJ batteries (as well as other accepted intelligence batteries, e.g., KAIT, K-ABC; SB-IV; SB-IV) to evalute the g-ness of each battery when jointly analyzed.

Fourth, prosecution psychologist used the WAIS Español Verbal score as evidence that the BAT-R was not accurate.  The problem with this logic is that the WAIS Verbal scale is known to be an excellent measure of crystallized intelligence/comprhension-knowledge (Gc), only one of the major 7-8 domains in the CHC model of intelligence.  Conversely, the BAT-R includes indicators from seven of the major CHC broad ability domains, only one of which is Gc.  No attempt was made to compare the WAIS Gc score with the corresponding Gc measure(s) on the BAT-R.  They might have been similar...or not.  More importantly, you cannot compare a measure of Gc to one that includes Gc and fluid reasoning (Gf), visual-spatial processing (Gv), short-term memory (Gsm), long-term retrieval (Glr), auditory processing (Ga), and processing speed (Gs).

I could go on and comment on other issues, such as the use of only the Verbal scale of the Spanish WAIS-III, as well as the whole area of adaptive behavior, but these are issues for another post, possibly by a guest blogger.

