Thursday, March 31, 2011

More on relevance of Daubert Standard to Atkins MR/ID death penalty cases

Ten days ago I made a post about an interesting Law Review Article dealing with scientific evidence and the Daubert standard.

Interestingly, a non-MR/ID court decision came to my attention almost simultaneously. The case is Millward & Millward v Acuity Speciality Products Group.

I am no lawyer and am not qualified to comment on the relevance of this court decision to Atkins MR/ID cases. However, a couple of trusted and respected colleagues of mine, who have extensive experience in law or psychology and law sent me comments and their permission to share their reactions and comments. They are reproduced below (with minor editing for format and readability) without comment.

Commentator # 1

The federal appeals court in Boston decided a case this week dealing with causation and stated that, “In this mode of reasoning [used by the expert], the use of scientific judgment is necessary.” And that, “No matter what methodology is used, ‘an evaluation of data and scientific evidence to determine whether an inference of causation is appropriate requires judgment and interpretation.’" Pgs 12-13.

There was no discussion whatsoever about “error rate.”

While disease causation and ID/MR assessment are far apart in the world of expert testimony, the court’s discussion is relevant in either situation, IMHO. With the ID/MR cases, since courts must decide whether the person is ID/MR, and can only do so with the use of expert testimony, experts have to be allowed to give their opinions. There are just some situations where the calculation of an accurate error rate is just not practical. But, my argument is there still needs to be some sort of definitive guidance in the areas, both legal-wise and psychology-wise. It shouldn't be like the Wild West

Commentator # 2

That is a great opinion regarding admissibility. I read it and found many places that are great teaching points on admissibility, especially for those of us testifying in an area often referred to as "soft science." I cut and pasted some of the more interesting passages below. The entire opinion is really well written and the court clearly took time to reason through cases like Kumho, Joiner, and Daubert and examine he total picture rather than focusing on the narrow findings that came from Daubert (or at least the way many courts interpret the findings in Daubert).


These factors "do not constitute a 'definitive checklist or test.'"
Kumho Tire Co. v. Carmichael, 526 U.S. 137, 150 (1999) (emphasis
omitted) (quoting Daubert, 509 U.S. at 593). Given that "there are
many different kinds of experts, and many different kinds of
expertise," these factors "may or may not be pertinent in assessing
reliability, depending on the nature of the issue, the expert's
particular expertise, and the subject of his testimony." Id.

Exactly what is involved in "reliability" was not and could not have been filled out by Daubert. Rather, the answers must come from developing case law in adjudicating individualcontroversies. "[T]he question of admissibility 'must be tied to the facts of a particular case.'" Beaudette v. Louisville Ladder,Inc., 462 F.3d 22, 25-26 (1st Cir. 2006) (quoting Kumho Tire, 526U.S. at 150).

Although Daubert stated that trial courts should focus "on principles and methodology, not on the conclusions that they generate," Daubert, 509 U.S. at 595, the Court subsequently clarified that this focus "need not completely pretermit judicial consideration of an expert's conclusions," Ruiz-Troche, 161 F.3d at 81 (citing Joiner, 522 U.S. at 146). In Joiner, the Court explained that "conclusions and methodology are not entirely distinct from one another" and "nothing in either Daubert or the Federal Rules of Evidence requires a district court to admit opinion evidence that is connected to existing data only by the
ipse dixit of the expert." Joiner, 522 U.S. at 146. Expert
testimony may be excluded if there is "too great an analytical gap
between the data and the opinion proffered." Id. "[T]rial judges
may evaluate the data offered to support an expert's bottom-line
opinions to determine if that data provides adequate support to
mark the expert's testimony as reliable." Ruiz-Troche, 161 F.3d at

This does not mean that trial courts are empowered "to determine which of several competing scientific theories has the best provenance." Id. at 85. "Daubert does not require that a party who proffers expert testimony carry the burden of proving to the judge that the expert's assessment of the situation is correct." Id. The proponent of the evidence must show only that "the expert's conclusion has been arrived at in a scientifically sound and methodologically reliable fashion." Id.; see also United States v. Vargas, 471 F.3d 255, 265 (1st Cir. 2006). The object of Daubert is "to make certain that an expert, whether basing testimony on professional studies or personal experience, employs in the courtroom the same level of intellectual rigor that characterizes the practice of an expert in the relevant field."Kumho Tire, 526 U.S. at 152.

So long as an expert's scientific testimony rests upon "'good grounds,' based on what is known," Daubert, 509 U.S. at 590,
it should be tested by the adversarial process, rather than
excluded for fear that jurors will not be able to handle the
scientific complexities, id. at 596. "Vigorous cross-examination,
presentation of contrary evidence, and careful instruction on the
burden of proof are the traditional and appropriate means of
attacking shaky but admissible evidence."

Atkins MR/ID Court Decision: Jones v McNeil (Fl, 2011)

Thanks (again) to Kevin Foley for sending me another Atkins decision.  Jones v McNeil (Fl, 2011) is now available and is being added to the Atkins Court Decision Blogroll

Why IQ composite scores often are higher or lower than the subtest scores: Awesome video explanation

This past week Dr. Joel Schneider and I released a paper called " 'Just say no' to averaging IQ subtest scores." The report generated considerable discussion on a number of professional listservs.

One small portion of the paper explained why composite/cluster scores from IQ tests often are higher (or lower) than the arithmetic mean of the tests that comprise the composite. This observation often baffles test users.

I would urge those who have ponder this question to read that section of the report. And THEN, be prepared to be blown away by an instructional video Joel posted at his blog where he leads you through a visual-graphic explanation of the phenomena. Don't be scared by the geometry or some of the terms. Just sit back and relax and now recognize, even if all the technical stuff is not your cup-of-tea, that there is an explanation for this score phenomena. And when colleagues ask, just refer them to Joel's blog.

It is brilliant and worth a view, even if you are not a quantitatively oriented thinker.

Below is a screen capture of the start [double click on icon to enlarge]

Wednesday, March 30, 2011

FYiPOST: Robinson on Coercive Indoctrination and "Rotten Social Background"

Recently posted to SSRN: "Are We Responsible for Who We Are? The Challenge for Criminal Law Theory in the Defenses of Coercive Indoctrination and 'Rotten Social Background'" U of Penn Law School, Public Law Research Paper No. 11-10 PAUL H....

FYiPOST: Pardo & Patterson on Neuroscience and Retributivism

Michael S. Pardo (pictured) and Dennis Patterson (University of Alabama School of Law and European University Institute) have posted Neuroscientific Challenges to Retributivism (THE FUTURE OF PUNISHMENT, Thomas Nadelhoffer, ed., Oxford University Press, Forthcoming) on SSRN. Here is the abstract:...

Sunday, March 27, 2011

FYiPOST: Top-Ten Recent SSRN Downloads

in criminal law and procedure ejournals are here. The usual disclaimers apply. Rank Downloads Paper Title 1 482 An Equilibrium-Adjustment Theory of the Fourth Amendment Orin S. Kerr, George Washington University - Law School, Date posted to database: January 26,...

IAP Applied Psychometrics 101 Report #10: "Just say no" to averaging IQ subtest scores

Should psychologists engage in the practice of calculating simple arithmetic averages of two or more scaled or standard scores from different subtests (pseudo-composites) within or across different IQ batteries? Dr. Joel Schneider and I, Dr. Kevin McGrew say "no."

Do psychologists who include simple pseudo-composite scores in their reports, or make interpretations and recommendations based on such scores, have a professional responsibility to alert recipients of psychological reports (e.g., lawyers, the courts, parents, special education staff, other mental health practitioners, etc.) of the potential amount of error in their statements when simple pseudo-composite scores are the foundation of some of their statements? We believe "yes."

Simple pseudo-composite scores, in contrast to norm-based scores (i.e., composite scores with norms provided by test publishers/authors--e.g., Wechsler Verbal Comprehension Index), contain significant sources of error. Although they have intuitive appeal, this appeal cloaks hidden sources of error in the scores---with the amount of error being a function of a combination of psychometric variables.

IAP Applied Psychometrics 101 Report #10 addresses the psychometric issues involved in pseudo-composite scores.

In the report we offer recommendations and resources that allow users to calculate psychometrically sound pseudo-composites when they are deemed important and relevant to the interpretation of a person's assessment results.

Finally, understanding the sources of error in simple pseudo-composite scores provides an opportunity for practitioners to understand the paradoxical phenomenon frequently observed in practice where norm-based or psychometrically sound pseudo-composite scores are often higher (or lower) than the subtest scores that comprise the composite. The "total does not equal the average of the parts" phenomenon is explained conceptually, statistically, and via an interesting visual explanation based on trigonometry.


The publishers and authors of intelligence test batteries provide norm-based composite scores based on two or more individual subtests. In practice, clinicians frequently form hypotheses based on combinations of tests for which norm-based composite scores are not available. In addition, with the emergence of Cattell-Horn-Carroll (CHC) theory as the consensus psychometric theory of intelligence, clinicians are now more frequently “crossing batteries” to form composites intended to represent broad or narrow CHC abilities. Beyond simple “eye-balling” of groups of subtests, clinicians at times compute the arithmetic average of subtest scaled or standard scores (pseudo-composites). This practice suffers from serious psychometric flaws and can lead to incorrect diagnoses and decisions. The problems with pseudo-composite scores are explained and recommendations made for the proper calculation of special composite scores.

Friday, March 25, 2011

FYiPOST: "Neuroscientific Challenges to Retributivism"

The title of this post is the title of this notable book chapter by Professors Michael Pardo and Dennis Patterson available via SSRN. Here is the abstract:

We examine two recent challenges to retribution-based justifications for criminal punishment based on neuroscientific evidence.  The first seeks to undermine retributivism because of the brain activity of subjects engaged in punishment decisions for retributive (as opposed to consequentialist) reasons.  This challenge proceeds by linking retributivism with deontological moral theories and the brain activity correlated with deontological moral judgments.  The second challenge seeks to undermine retributivism by exposing, through neuroscientific information, the purportedly implausible foundation on which retributivism depends: one based on free will and folk psychology.

We conclude that neither challenge succeeds.  The first challenge fails, in part, because the brain activity of punishers does not provide the appropriate criteria for whether judgments regarding criminal punishment are justified or correct.  Moreover, retributivism does not necessarily depend on the success or failure of any particular moral theory.  The second challenge fails because neuroscience does not undermine the conceptions of free will or folk psychology on which retributivism depends.  Along the way, we point out a number of faulty inferences and problematic assumptions and presuppositions involved in these challenges to retributivism.

Thursday, March 24, 2011

FYiPOST: The Death Penalty from an International Perspective

A recent book by Sanaz Alasti, "Cruel and Unusual Punishment: Comparative Perspective in International Conventions, the United States and Iran," explores the question of what constitutes cruel and unusual punishment on an international level. The book reviews current practices in both Iran and the United States, focusing on the death penalty and the harshness of such practices as corporal punishment, long terms of imprisonment, and inflexibile laws mandating punishment.   Punishments are particularly examined in light of the universal declaration of human rights. Sanaz Alasti is a Fellow at Harvard law school, and has written numerous books and articles on various aspects of comparative criminal justice and penology.

(S. Alasti, "Cruel and Unusual Punishment: Comparative Perspective in International Conventions, the United States and Iran," Vandeplas Publishing 2009).  See more Books.

FYiPOST: Neuroscience in courtrooms

Monday, March 21, 2011

FYiPOST: The Dana Foundation on the NYC Law and the Brain Conference

Nicky Penttila at the Dana Foundation wrote up a brief description of some of the presentations at the recent Law and the Brain Conference in New York City.

Law Review Article: Haug & Baird (2011)-Essay: Finding the Error in Daubert

Thanks to Kevin Foley for sending this thought provoking Hastings Law Review Article on the Daubert standard by Haug and Baird (2011) re: scientific evidence. As an applied psychometrician, I find it interesting that the three types of error described mirror the three general classes of unreliability/error we measurement folks address in test development. The article will be added to the ICDP Law Review Article blogroll.

The authors focus on the "known rate of error" factor of Daubert, and they suggest a new test.

"If an expert can account for the measurement error, the random error,
and the systematic error in his evidence, then he ought to be permitted
to testify. On the other hand, if he should fail to account for any one or
more of these three types of error, then his testimony ought not be

FYiPOST: Notable new forthcoming book on juve crime and punishment

41vLLrbBgJL._SL500_AA300_ Professors Christopher Slobogin and Mark Fondacaro have a new forthcoming book on juvenile justice which is titled "Juveniles at Risk: A Plea for Preventive Justice."   This is the abstract the authors have now posted on SSRN:


This book is especially timely in the wake of the Supreme Court's work last year in Graham v. Florida.  Though Graham involved constitutional limits on punishment, the ruling should be viewed by legislatures as a call to begin re-thinking the modern approach to juvenile crime and punishment more broadly.

Complete info at link below

Sunday, March 20, 2011

FYiPOST: Top-Ten Recent SSRN Downloads

in criminal law and procedure ejournals are here. The usual disclaimers apply. Rank Downloads Paper Title 1 468 An Equilibrium-Adjustment Theory of the Fourth Amendment Orin S. Kerr, George Washington University - Law School, Date posted to database: January 26,...

Research brief: Another study (German) supporting validity of IQ Flynn effect

Yet another article supporting the scientific fact of the Flynn effect for IQ scores (IQ norm obsolescence) This article will be added to the online Flynn Effect archive next time it is updated.

Double click on image to enlarge.

Saturday, March 19, 2011

FYiPOST: Ethics and the Brain - Columbia, MO

The University of Missouri presents the 7th Annual Life Sciences & Society Symposium, Ethics & the Brain March 19-20, 2011.

Adam Kolber (Brooklyn Law) will present The Experiential Future of the Law (available at on March 19.   mw

Thursday, March 17, 2011

FYiPOST: University of Missouri Neuroethics Symposium

The University of Missouri's Seventh Annual Life Sciences and Society Symposium is this weekend. The title is "Ethics and the Brain." I'm told that more than 500 people have registered for the free event. As usual, blog readers in attendance...

Tuesday, March 15, 2011

Does the WAIS-III measure the same intellectual abilities in MR/ID individuals?

I have had a number of people send me copies of this article (see abstracts and journal info below), especially those who do work related to Dx of MR/ID in Atkins death penalty cases.

The abstract is self-explanatory--the authors conclude that the WAIS-III four-factor structure is not validated in an MR/ID population. I can hear a lawyer now--"so Dr. __________, according to MacLean et al. the WAIS-III doesn't measure the same abilities in individuals with MR/ aren't your results questionable?"

A close read of the article suggests the results should be take with a serious grain of salt. In fact, the discussion is primarily a discussion of the various methodological and statistical reasons why the published 4-factor model may not have fit.

As is often the case when dealing with samples of convenience (the authors own words), especially samples of individuals at the lower end of the ability continuum, the variables often show significant problems with non-normality and skew. This is present in this sample. Given that we are dealing with SEM-based statistics, the problem is actually one of not meeting the assumption of multivariate normality. The variables also showed restricted SD's---restricted range of talent, a condition that dampens correlations in a matrix.

While doing extensive modeling research at the Institute for Community Integration at the University of Minnesota, an institute devoted to individuals with MR/ID/DD, I was constantly faced with data sets with these problems. As a result, I was constantly faced with model fit statistics that were much lower than the standard acceptable rules-of-thumbs for model fit statistics...which reflected the less than statistical and distributional robustness of such sample data. The best way to overcome the resultant low model fits (after trying transformations of the variables to different scales), was to compare the fit of competing models. The best fitting model, when compared to competing models, may still show a relatively poor absolute fit value (when compared to the standard rules of thumb), but by demonstrating that it was the best when compared to alternatives, the case could be made that it was still the best possible model given the constraints of the sample data.

This leads to the MAJOR flaw of this study. Although the authors discuss the sample problems above, they only tested one model...the WAIS-III four-factor model. They then looked at the absolute value of the fit statistics and concluded that the 4-factor model was not a good fit. I see this as a major flaw. Since the standard rules-of-thumb for absolute magnitude of fit stats may no longer hold in samples with statistical and distributional problems, they should have specified competing models (e.g., two-factor; CHC-model, single factor, etc.) and then compared the relative model fit statistics before rendering a conclusion.

Finally, as the authors correctly point out, the current results, even with the flaws above, may simply reflect the well-established finding that the differentiation of cognitive abilities is less for lower functioning individuals, and more for higher functioning. This is Spearman's Law of Diminishing Returns (SLODR) [Click here for an interesting recent discussion of SLODR]

Bottom line for the blogmaster--I judge the authors conclusions to be overstated for the reasons noted above, particularly the failure to compare the 4-factor model to alternative models. It is very possible that the 4-factor model may be the best fitting model given the statistical and distributional constraints of the underlying sample data.


Intellectual assessment is central to the process of diagnosing an intellectual disability and the assessment process needs to be valid and reliable. One fundamental aspect of validity is that of measurement invariance, i.e. that the assessment measures the same thing in different populations. There are reasons to believe that measurement invariance of the Wechsler scales may not hold for people with an intellectual disability. Many of the issues which may influence factorial invariance are common to all versions of the scales. The present study, therefore, explored the factorial validity of the WAIS-III as used with people with an intellectual disability. Confirmatory factor analysis was used to assess goodness of fit of the proposed four factor model using 13 and 11 subtests. None of the indices used suggested a good fit for the model, indicating a lack of factorial validity and suggesting a lack of measurement invariance of the assessment with people with an intellectual disability. Several explanations for this and implications for other intellectual assessments were discussed.

Sunday, March 13, 2011

Atkins MR/ID death penalty decisions: Albarran v AL, Johnson v MO, Ybarra v NV

Time to push out the back log. I had intended to read the three following decisions and make some introductory comments related to each, but time does not permit. The following are three recent Atkins MR/ID death penalty decisions that will be added to the Court Decisions blogroll.

Johnson v Missouri (2011, 2008)

Albarran v Alabama (2011)

Ybarra v Nevada (2011)

- iPost using BlogPress from my Kevin McGrew's iPad

Saturday, March 12, 2011

Friday, March 11, 2011

FYiPOST: Psychology, Public Policy, and Law - Volume 17, Issue 1

A meta-analytic review of competency to stand trial research.
Page 1-53
Pirelli, Gianni; Gottdiener, William H.; Zapf, Patricia A.

Discrimination in the 21st century: Are science and the law aligned?
Page 54-75
King, Eden B.; Dunleavy, Dana G.; Dunleavy, Eric M.; Jaffer, Salman; Morgan, Whitney Botsford; Elder, Katie; Graebner, Raluca

Experts' and novices' abilities to detect children's high-stakes lies of omission.
Page 76-98
Nysse-Carris, Kari L.; Bottoms, Bette L.; Salerno, Jessica M.

Seventy-two tests of the sequential lineup superiority effect: A meta-analysis and policy discussion.
Page 99-139
Steblay, Nancy K.; Dysart, Jennifer E.; Wells, Gary L.

The role of eyewitness identification evidence in felony case dispositions.
Page 140-159
Flowe, Heather D.; Mehta, Amrita; Ebbesen, Ebbe B.

Thursday, March 10, 2011

FYiPOST: Age of onset component of MR/ID and Atkins DP and SSA

Does age of onset in legal contexts mean chronological or developmental age? This is a live issue in the death penalty context because in Atkins v. Virginia, the United States Supreme Court found it cruel and unusual punishment under the Eighth Amendment to execute mentally retarded offenders. While the Atkins Court cited favorably to clinical definitions of mental retardation, it left open to the states to define what mental retardation means for purposes of the death penalty.

Click link below for rest of post and links

FYiPOST: Orthogonal higher order structure and confirmatory factor analysis of the French Wechsler Adult Inte

According to the most widely accepted Cattell–Horn–Carroll (CHC) model of intelligence measurement, each subtest score of the Wechsler Intelligence Scale for Adults (3rd ed.; WAIS–III) should reflect both 1st- and 2nd-order factors (i.e., 4 or 5 broad abilities and 1 general factor). To disentangle the contribution of each factor, we applied a Schmid–Leiman orthogonalization transformation (SLT) to the standardization data published in the French technical manual for the WAIS–III. Results showed that the general factor accounted for 63% of the common variance and that the specific contributions of the 1st-order factors were weak (4.7%–15.9%). We also addressed this issue by using confirmatory factor analysis. Results indicated that the bifactor model (with 1st-order group and general factors) better fit the data than did the traditional higher order structure. Models based on the CHC framework were also tested. Results indicated that a higher order CHC model showed a better fit than did the classical 4-factor model; however, the WAIS bifactor structure was the most adequate. We recommend that users do not discount the Full Scale IQ when interpreting the index scores of the WAIS–III because the general factor accounts for the bulk of the common variance in the French WAIS–III. The 4 index scores cannot be considered to reflect only broad ability because they include a strong contribution of the general factor. (PsycINFO Database Record (c) 2011 APA, all rights reserved)

FYIPOST: Measurement invariance of neuropsychological tests in diverse older persons.

Objective: Comparability of meaning of neuropsychological test results across ethnic, linguistic, and cultural groups is important for clinicians challenged with assessing increasing numbers of older ethnic minorities. We examined the dimensional structure of a neuropsychological test battery in linguistically and demographically diverse older adults. Method: The Spanish and English Neuropsychological Assessment Scales (SENAS), developed to provide psychometrically sound measures of cognition for multiethnic and multilingual applications, was administered to a community dwelling sample of 760 Whites, 443 African Americans, 451 English-speaking Hispanics, and 882 Spanish-speaking Hispanics. Cognitive function spanned a broad range from normal to mildly impaired to demented. Multiple group confirmatory factor analysis was used to examine equivalence of the dimensional structure for the SENAS across the groups defined by language and ethnicity. Results: Covariance among 16 SENAS tests was best explained by five cognitive dimensions corresponding to episodic memory, semantic memory/language, spatial ability, attention/working memory, and verbal fluency. Multiple Group confirmatory factor analysis supported a common dimensional structure in the diverse groups. Measures of episodic memory showed the most compelling evidence of measurement equivalence across groups. Measurement equivalence was observed for most but not all measures of semantic memory/language and spatial ability. Measures of attention/working memory defined a common dimension in the different groups, but results suggest that scores are not strictly comparable across groups. Conclusions: These results support the applicability of the SENAS for use with multiethnic and bilingual older adults, and more broadly, provide evidence of similar dimensions of cognition in the groups represented in the study. (PsycINFO Database Record (c) 2011 APA, all rights reserved)

Wednesday, March 9, 2011

Intelligent IQ testing@sbkaufman, 3/9/11 5:14 PM

Scott Barry Kaufman (@sbkaufman)
3/9/11 5:14 PM
Intelligent Testing via @huffingtonpost

