2. Previous Literature Reviews

Previous Literature Reviews

Qualitative Literature Reviews
Limitations of Qualitative Literature Reviews
Sampling biases
Subjectivity and imprecision

Quantitative Literature Reviews

In America, starting at the end of the 1970s, researchers began in earnest examining the psychological correlates of CSA. Soon, numerous such studies had been published. This in turn occasioned a new kind of research, which consisted of reviewing and synthesizing the available studies--that is, conducting literature reviews. Many literature reviews have appeared over the last 15 years. These reviews have not been unanimous in their conclusions, although a good many of them have favored the assumptions of causality, pervasiveness, intensity, and equivalence of harm, thus supporting popular impressions of CSA. Two basic types of reviews have been done: qualitative and quantitative. We’ll examine each type now.

Qualitative Literature Reviews

The first type of review is qualitative, in which a researcher gathers a set of studies and summarizes in narrative fashion what they seem to be saying. The researcher tells the reader in words and descriptions, rather than mathematically, his or her interpretation of the findings of the studies taken as a whole.

The authors of these qualitative reviews have typically concluded that CSA is associated with a wide range of psychological problems, including anger, depression, anxiety, eating disorders, alcohol and drug abuse, low self-esteem, relationship difficulties, inappropriate sexual behavior, aggression, self-mutilation, suicide, dissociation, and posttraumatic stress disorder, among others. They more often than not have assumed that CSA caused these problems, and have stated or implied that most persons with CSA experiences will be afflicted. Some have taken pains to emphasize that boys are just as badly affected as girls. One group of researchers called it a myth that boys are less affected than girls. Another researcher dismissed as an "exercise in futility" efforts to determine whether boys or girls are more adversely affected by CSA, and concluded that CSA "has pronounced deleterious effects on its victims, regardless of their gender." Not all reviewers, however, have agreed with these conclusions. Some have pointed to the need for caution when inferring causality, noting that CSA is so consistently confounded with family environment problems that it really is not possible to say whether the poorer adjustment found in CSA subjects compared to control subjects is the result of the CSA or poor family background. A number of reviewers have argued that CSA outcomes are variable, rather than pervasively negative.

For example, Constantine, in one of the earliest reviews, found that negative outcomes were often absent in CSA persons in nonclinical samples. He concluded that there is no inevitable outcome or set of reactions, and that responses to CSA are mediated by nonsexual factors, such as the young person’s perceived willingness when participating in the sexual encounter. And finally, a few reviewers have noted that boys tend to react much more positively or neutrally than girls.

Limitations of Qualitative Literature Reviews

What can we conclude from the qualitative literature reviews as a whole regarding popular assumptions about CSA? Not much, actually, for several reasons. First, their conclusions have been inconsistent from one review to the next. Second, and even more importantly, these reviews have generally suffered from sampling biases and third, they have been vulnerable to biases stemming from subjectivity and imprecision.

Sampling biases

All of these qualitative reviews except for one (which we’ll discuss later in this presentation) were based primarily on clinical or legal samples. A fair number of them were based exclusively or nearly exclusively on these samples. Clinical and legal samples of persons with CSA cannot be assumed to be representative of the population of persons with a history of CSA. This is an extremely important principle that is worth elaborating on.

"Proof" that masturbation caused mental disease was once based on observing that institutionalized psychiatric patients masturbate. "Proof" that homosexuality was a mental disorder was once based on psychiatric and prison samples. When nonclinical samples were examined, a much different and much more benign view of masturbation and homosexuality emerged. By analogy, we must also examine CSA in nonclinical populations to be able to infer whether it is generally harmful, and if so, to what degree.

Some reviews of CSA have been based on a large number of clinical samples, emboldening the reviewers to conclude that CSA is highly destructive. But bigger numbers do not necessarily bring us closer to valid knowledge. To see why, consider this famous example. In 1936 in the U.S., the Republican candidate Alf Landon ran against the Democrat candidate Franklin Roosevelt for president. Two weeks before the election, Literary Digest magazine sent out 12,000,000 postcards asking people whom they would vote for. They got 2,500,000 responses, voting 57% for Landon and 43% for Roosevelt. The actual election produced just the opposite results. What went wrong? The magazine got its sample from car registrations and telephone directories. In 1936, during the height of the depression, people with cars and phones were likely to have had money, and such people tend to be Republicans. Thus, their sample was biased. The fact that they got such a huge number of responses (2.5 million) did not compensate for sample bias. A representative sample of 1000, which is typically used today, is far better at reaching valid results. The principle is, sample size will never compensate for sample bias.

The findings of 150 clinical studies are not nearly as informative as the findings of one representative study. The focus on clinical and legal samples represents a major failing of most qualitative reviews.

Drawing conclusions from clinical and legal samples is problematic not only because these samples are not representative of the general population, but also because data coming from these samples are vulnerable to being invalid.

One problem has to do with the beliefs of the therapist. If a therapist is convinced, as many once were, that homosexuality causes maladjustment, then the therapist will be unmotivated to search for other potential causes of a homosexual patient’s maladjustment. In this way, the therapist’s belief of pathology is maintained. The same argument can be applied to CSA. In one famous example of this, psychiatrist Fred Berlin evaluated the president of American University, who had just been arrested for making obscene phone calls. Berlin heard from his patient that he had incest with his mother at age 11, but also that he had been severely beaten at random times repeatedly throughout his entire childhood. Berlin, convinced as he was in the power of CSA to create pathology, fixated on the incest as the cause of his patient’s current problems, and then used this case as just another example of how devastating CSA is. But, given the confound of much more prominent and pervasive physical abuse, his conclusions seem dubious at best.

The point of this example is that the psychiatrist’s beliefs in the harmfulness of CSA were strengthened by selective attention to evidence, which is not scientifically valid. This is not to argue that CSA is never the cause of a patient’s maladjustment, but that a therapist’s expectancies can substantially inflate the perception that CSA causes maladjustment.

Subjectivity and imprecision

Qualitative reviews are entirely narrative and therefore susceptible to the reviewers' own subjective interpretations. Reviewers who are convinced that CSA is a major cause of adult psychopathology may fall prey to confirmation bias--that is, they note and describe study findings indicating harmful effects, but ignore or pay less attention to findings indicating nonnegative or positive outcomes, thus confirming their initial belief. By analogy, people who believe in astrology are very impressed when their horoscope’s prediction comes true, but quickly forget the vast majority of cases when it doesn’t. By means of this confirmation bias, they are convinced in the predictive validity of astrology. An example of confirmation bias in CSA research is that of Mendel, who reviewed a study consisting of two separate college samples of males. In one sample, no associations were found between CSA and adjustment problems. In the second, smaller sample, some associations were found. Mendel ignored the results from the first sample, but used the second to argue that CSA causes maladjustment. This selective attention to confirming results has been a serious problem in many of the qualitative reviews.

Another problem has to do with precision. In the Mendel example just discussed, he used the confirming example to argue that CSA causes depression, anxiety, and so on. What he did not report was that the association in that sample between CSA and symptoms was small. This is very important information, though, because it is not valid to conclude from these results that CSA produces intense effects, as Mendel did. In these qualitative literature reviews, this has been a constant problem: studies show small but statistically significant differences and reviewers inflate the findings by claiming serious effects. What is needed is for reviewers to deal with the statistics precisely; otherwise, they are prone to exaggerate the results if they already believe CSA is highly destructive.

Quantitative Literature Reviews

To avoid the problems of qualitative reviews, by the mid-1990s a few researchers began doing quantitative reviews. These reviews were based on a statistical procedure called meta-analysis. In meta-analysis the researcher collects a number of studies that have compared the adjustment of CSA subjects with control subjects. Then the researcher takes the statistics reported in each study that compared the two groups and converts them into a common statistic. Finally, the researcher averages all these values to see what the studies collectively are saying about the association between CSA and adjustment.

The common value derived from each study in the meta-analyses we’ll be discussing is called an effect size, which tells you how big the difference is between CSA and control subjects in terms of their adjustment. This is different from saying that the two groups showed a statistically significant difference, because such a difference could be very small or quite big. The effect size tells us whether the difference is small or big. If you save one guilder at store A compared to store B on a 1000 guilder item, there’s a difference, but it’s quite small. If you save 200 guilders, then that’s something. As a shopper, you want to know how much you’ll save by going to store A, not simply whether you’ll save. This is the spirit of effect size analysis.

For ease of presentation, given that many of you are not familiar with statistics, we will report effect sizes in the following way. Imagine that we have a group of people, some of whom had CSA and some of whom did not. Now, you can imagine that there is a lot of variation in both groups in terms of how well the different individuals are adjusted. Some will be very well adjusted, others moderately so, others not too well, and a few will be seriously maladjusted. If CSA had a very strong effect on adjustment, then CSA should account for at least 50% of the adjustment variability among all of the subjects. If CSA had a strong effect, it should account for at least 25%. If CSA had a medium effect, it should account for about 10%. And if CSA had only a small effect, it should account for about 1% of the adjustment variability.

One researcher, by the name of Jumper, in 1995 included student, community, and clinical samples in her meta-analysis of the relation between CSA and adjustment. She averaged the effect sizes separately for each sample-type. After correcting for some errors she made, her results were that CSA accounted for 0.8% of the adjustment variation in the student samples, 2.25% in the community samples, and 7.3% in the clinical samples. In other words, CSA was related to adjustment, but the relationship was small in the nonclinical samples and medium in the clinical samples.

In 1996, another group of researchers published a second meta-analysis. They computed average effect sizes separately for nonclinical and clinical samples. The amount of variability accounted for by CSA was 1.4% for the nonclinical samples and 3.6% for the clinical samples.

These two quantitative reviews improved over the qualitative reviews in several ways. First, they avoided subjective interpretations. Second, they included large numbers of nonclinical samples. Third, they analyzed them separately. The overall picture is this. Clinical samples are clearly different from nonclinical samples. This empirically demonstrates that it is not appropriate to generalize from clinical reports of CSA to the general population. Additionally, although CSA is related to poorer adjustment in nonclinical samples, the association is small. This means that claims that CSA pervasively produces lasting, severe psychological injury are vastly overstated.

There are some important weaknesses in these two quantitative studies, which, incidentally, were the only published meta-analyses up until a year ago, which ultimately provided the rationale for conducting our own meta-analyses.

First, very few male samples were examined--none in the second review.

Second, no analyses were presented to address whether the associations found between CSA and adjustment were caused by the CSA, as opposed to other factors such as poor family environment.

Third, no results were provided to indicate the pervasiveness of effects. That is, if CSA did have an effect, did it affect 100% of persons with CSA or 50% or 10% or some other percentage?

And fourth, no results were provided on the subjects’ reactions to their sexual experience. It is possible that some or even many did not react negatively. Popular assumptions do not allow for this possibility, but objective science must inquire, because such information speaks directly to the validity of popular assumptions about CSA.

To improve over these two meta-analyses, we conducted two of our own. We conducted these meta-analyses to test the popular assumption that, in the general population, CSA causes intense harm, which occurs pervasively and is equally negative for boys and girls. Since we were interested in CSA in the general population, we focused exclusively on nonclinical samples. This focus is justified because the two meta-analyses just discussed demonstrated that clinical samples do not generalize, as is true in most domains of behavior. To know the nature of CSA, to test whether CSA per se is harmful, it is people in the general population who have to be examined.