Analytical Report on The Stroop Effect Experiement


The replication crisis in psychology, where a study found that only about a third of psychological research could be replicated, highlighted the necessity for further assessment of replication in the Stroop experiment to test its reliability. A computerized Stroop-experiment was used to replicate studies that used the Stroop task. The scores of a single participant were compared to the mean score of a sample as it was required to do so for an undergraduate assignment. Sixty-nine participants of mixed gender were asked to respond to the colour of a colour-word by pressing the corresponding key on a keyboard. The response time (RT) and accuracy were measured to assess the inhibition of interference in the incongruent condition. The results showed that the RT increased in the incongruent condition and the accuracy decreased, when compared to the congruent condition. The results correlated with findings of Stroop-experiments, using the same measurements, as RT and accuracy correlated. However, reliability was not supported because the procedure of a single experiment was not followed. Therefore replication of a single study failed.


The Stroop task is used to measure performers' ability to inhibit interferences in cognitive processes. Although several designs of the Stroop task have been formed, participants are commonly asked to name the colour of a written colour-word. The response times (RT) in incongruent conditions are predicted to increase, suggesting that participants automatically read the word before naming the colour, demonstrating inhibition of a cognitive process.

Inhibitions of cognitive processes are necessary for normal mental functioning. Frequently, interruptions of interference occur in cases where the interference, for example, distracts or is not needed. Failure of such suppression can be used as detection of neurological or cognitive disabilities (Wöstmann et al., 2013). Moreover, in the Stroop task, the inhibition of interference is demonstrated and measured by its delayed response time in the incongruent condition compared to the congruent condition/

Since its first conduction, the Stroop-test has been commonly used, and, due to its reliability, it was perceived as a scientifically attractive assessment method. The reliability of the Stroop-task has strong support as the results are consistent across research. Eide, Kemp, Silberstein, Nathan and Stough (2002) suggested that the test-retest reliability of the RT in an emotional Stroop task was high. Also, Siegrist (1997) argued that the reliability of experimental designs varies depending on the stimuli which caused interference. Furthermore, Vora, Varghese, Weisenbach and Bhatts (2016) findings supported the reliability in computerized designs as well in a study where they examined a computerized Stroop test to test its validity and reliability.

However, Nosek (2015) highlighted a replication crisis in psychological research where only about a third of the 100 studies that were examined, were replicable (Open Science Collaboration, 2015). The findings caused concern about the scientific basis of psychology and it drew attention to the need for replicating already conducted research to test reliability.

This report used a single trial computerized design to replicate the Stroop task which was the main aim of this experiment. Participants were exposed to congruent and incongruent conditions of colour-words and accuracy and response time were measured to assess the interference. It was hypothesized that an incongruent condition would produce a slower RT and lower accuracy compared to congruent conditions. Predictions were made on the basis of conducted research of the topic that found that RT was increased and accuracy decreased in the incongruent condition.



The experimental design was a computerized single-trial study retrieved from and maintained by vsoch. A repeated-measure design was used and data was collected from one sample. The dependent variables, RT and accuracy were used to measure inhibition of interference in two conditions. In one condition the independent variables were congruent and in the other condition they were incongruent. Measurements of participants’ performance in the congruent condition were compared with participants’ performance in the incongruent condition.


There were 76 participants in the study, and their genders were not identified. Data collected from eight participants were excluded due to extreme scores. Therefore only the scores of 68 participants were used to convey the results. All participants were first-year undergraduate psychology students from Trinity College Dublin who were required to undertake the test during a seminar as a part of the course module.


The material was a Stroop-experiment retrieved from The test was run on iMac IOS desktop computers. The data collected was copy-pasted into Microsoft Excel version 15.37 and organized. SPSSStatistics version was used to analyse the organized data.


The experiment was manually retrieved from by searching for 'Stroop' in the search function of the experiment library. The experiment maintained by vsoch was downloaded and was conducted in a laboratory room at Trinity College Dublin. Participants were informed of the estimated duration, 8 minutes, and given instructions for the test. In the congruent condition, subjects were exposed to stimuli without the introduction of second interference which caused inhibition. In this condition, words of colour ( green, blue and red) were written in their corresponding colour (for example 'RED' written in red ink). The incongruent condition introduced inhibition employing colour-words not corresponding to the ink colour (for example 'GREEN' written in blue ink). Participants were asked to respond to the printed colour of the word by pressing a key corresponding to the specific colour exposed to (G=green, B=blue and R=red). Once the instructions were read, the subjects were exposed to one incongruent practice trial. In this condition, the participants were told to respond faster if the response time was slower than supposed to be and were also informed whether the key pressed for each coloured word was accurate or not. The subjects were informed when the practice trial was finished and that the actual experiment was about to commence. The participants pressed enter to start the and test as it commenced, the subjects were exposed to congruent and incongruent combinations. Furthermore, participants were informed if the key they pressed was correct. Once the test was completed the data was downloaded and entered into a Microsoft Excel file. The data was cleared and only the number of trials, accuracy and RT for congruent and incongruent conditions was kept. The data was copy-pasted into SPSSStatistics and M=accuracy and M=RT was sent into Tutors responsible for the seminar. The data was collected from the two seminar groups and put into a Microsoft Excel version 15.37. The gathered data was further analysed in the following way. Accuracy and response-time (RT) were measured to assess the inhibition of secondary interference by comparing scores in congruent and incongruent conditions. Outliers were removed according to the extremity of the scores (anything above RT= 2000m/s and above accuracy 1 was trimmed).


The mean accuracy and standard deviation (SD) in two conditions are M=95%, M=88%. However, SD was larger in the incongruent than the congruent one SD=0,10182, SD= 0,06756.

M=799.29 and M=913.64 shows an increase of 114.35 ms from congruent to incongruent condition. SD=193.88 and SD=191.86 suggest a small deviance of scores in both conditions.

Participant n=44 showed an increase in RT in the incongruent condition compared to the congruent condition (RT=534ms and RT=707ms). Accuracy reduced in the congruent condition when compared to the incongruent condition, however only by 0.02. Z-scores were computed from raw scores in both conditions of accuracy and RT. Participant n=44 had a significantly higher accuracy z=0.8, z=0.99 across both conditions compared with the mean accuracy of the whole sample. N=44’s RT was also significantly faster than the sample mean with z-scores computed from the participant's raw scores, of z=-1.3 and z=-1.07


This study aimed to assess whether a computerized Stroop task was replicable. Research has been using the Stroop task to assess inhibition of interference in cognitive processes. As predicted, the findings confirmed the results of prior research on this topic. The RT increased and the accuracy decreased in the incongruent condition (Stroop, 1935). The results suggested that there is an inhibition of interference in the incongruent condition because two cognitive processes were active at the same time and therefore demonstrated the Stroop-phenomenon.

Furthermore, when examining the single data point that this assignment required, the hypothesis was also supported because the RT increased and the accuracy reduced. Participant 44 responded quicker with greater accuracy than the sample in both conditions. Therefore the conclusion suggested that participant 44 was better at inhibiting interferences of a secondary cognitive process compared to the sample mean (Wöstmann et al., 2013). The results of the whole sample and participant 44 individually enhance our understanding of the interferences of cognition. This Stroop-experiment implicated that the cognitive process of reading the word was the primary output and that naming the colour was the secondary one. It was suggested that this was due to the highly practiced ability to read and that as a result reading comes more naturally than reacting to a colour. This was understood because the RT increased in the incongruent condition compared to when a participant was only required to read without the interference of the secondary process (naming the colour) in the congruent condition (Schadler & Thissen, 1981). Additionally, as the results were consistent with other single-trial computerized tests, it supported reliability. However, for accurate assessment of reliability, the study would have to be replicated with a task that used the same experimental design, followed the exact procedure and was administered in the same test conditions (Shaffer & Kipp, 2014). As the results of this study were only compared to the general findings of research using the Stroop-experiment, the evidence for reliability was insufficient due to the failure to replicated a specific experiment. The findings, therefore, do not support the reliability which the study aimed to assess.

Further limitations on the study were that the test conditions during the experiment were not controlled. As it was conducted in a lab class at trinity college Dublin, participants were exposed to several distractions, for example noise and not fully understanding the task. These distractions could explain the reduced SD of accuracy in the incongruent condition though it was expected to be smaller in the incongruent condition due to absence of secondary cognitive processes(Lurquin, McFadden, & Harbke, 2014). Furthermore, distractions could also explain the eight outliers that were removed from the raw data, reducing the participants to 68 instead of 76. However, removing outliers in the Stroop effect may be problematic because one might be removing significant data. For example, extreme scores could be due to the poor processing speed of performer. However, the removed outliers were assessed to be due to distractions or human error and were therefore removed on a legitimate basis (Heathcote et al., 1991).

Due to the replication crisis that psychology is currently threatened by, future research should focus on replicating specific studies in order to increase the reliability of Stroop-experiments. As this study suggested an accurate increase in the RT, supporting the reliability of the Stroop task, more careful assessment needs to be done to fully increase the reliability of individual experiments using the Stroop-experiment.


When first confronted with the Stroop-task it was perceived as undemanding. However, as the experiment commenced and inhibition of an interference was required, some difficulty became apparent. An additional difficulty was that participants were required to press the right key, and so it could be that they were exposed to three cognitive processes instead of two. In an attempt to suppress the primary output (reading the word), a change of focus on the screen, from the word to the colour occurred. This did ease the task, possibly due to the fact that reading the word became more difficult than seeing the colour, therefore transforming the secondary cognitive process (reacting to the colour) into the primary one. Finally, this assignment was beneficial in various ways. Firstly, an increased understanding of how to structure a research paper was attained and writing in the past tense and with a passive voice became easier over time. Secondly, the author of this assignment gained an improved understanding as to why the conductions of z-scores could be beneficial, especially when comparing a single score with the whole sample. However, even though an understanding of the function of z-scores increased, it also became clearer why one would not usually compare one single data point to the whole sample, as it does not improve understanding the Stroop-effect over a population.


  • Diener, E. & Biswas-Diener, R. (2015). The replication crisis in psychology. In R.
  • Biswas-Diener & E. Diener (Eds), Noba Textbook Series: Psychology. Champaign, IL:
  • DEF publishers. DOI:
  • Din, N. C., Chia, E., & Meng, T. (2019). Computerized Stroop Tests: A Review. Journal of Psychology & Psychotherapy, 9(1), 2161–0487.
  • Eide, P., Kemp, A., Silberstein, R. B., Nathan, P. J., & Stough, C. (2002). Test-retest reliability of the emotional Stroop task: Examining the paradox of measurement change. Journal of Psychology: Interdisciplinary and Applied, 136(5), 514–520.
  • Erdodi, L. A., Sagar, S., Seke, K., Zuccato, B. G., Schwartz, E. S., & Roth, R. M. (2018a). The Stroop test as a measure of performance validity in adults clinically referred for neuropsychological assessment. Psychological Assessment, 30(6), 755–766.
  • Erdodi, L. A., Sagar, S., Seke, K., Zuccato, B. G., Schwartz, E. S., & Roth, R. M. (2018b). The Stroop test as a measure of performance validity in adults clinically referred for neuropsychological assessment. Psychological Assessment, 30(6), 755–766.
  • Heathcote, A., Popiel, S. J., & Mewhort, D. J. K. (1991). Analysis of response time distributions: an example using the Stroop task. Psychological Bulletin, 109(2), 340–347.
  • Lavoie, M. E., & Charlebois, P. (1994). The discriminant validity of the Stroop color and word test: Toward a cost‐effective strategy to distinguish subgroups of disruptive preadolescents. Psychology in the Schools, 31(2), 98–107.;2-F
  • Lee, C., Landre, N., & Sweet, J. J. (2019). Performance validity on the Stroop color and word test in a mixed forensic and patient sample. Clinical Neuropsychologist.
  • Lurquin, J. H., McFadden, S. L., & Harbke, C. R. (2014). An electrophysiological investigation of the effects of social rejection on self control. Journal of Social Psychology, 154(3), 186–197.
  • Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Sience, 349(6251), aac4716.
  • Salo, R., Henik, A., & Robertson, L. C. (2001). Interpreting Stroop interference: An analysis of differences between task versions. Neuropsychology, 15(4), 462–471.
  • Scarpina, F., & Tagini, S. (2017). The Stroop color and word test. Frontiers in Psychology, 8(APR), 1–8.
  • Schadler, M., & Thissen, D. M. (1981). The development of automatic word recognition and reading skill. In Memory & Cognition (Vol. 9).
  • Shaffer, D. R., & Kipp, K. (2014). Introduction to developmental psychology and its research strategies. In J. Perkins (Ed.), Developmental Psychology Childhood & Adolescence (9th ed., p. 12). Wandsworth: Cengage Learning.
  • Siegrist, M. (1997). Test-retest reliability of different versions of the Stroop test. Journal of Psychology: Interdisciplinary and Applied, 131(3), 299–306.
  • Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18(6), 643–662.
  • Verhaeghen, P., & De Meersman, L. (1998). Aging and the Stroop effect: A meta-analysis. Psychology and Aging, 13(1), 120–126.
  • Vora, J. P., Varghese, R., Weisenbach, S. L., & Bhatt, T. (2016). Test-retest reliability and validity of a custom-designed computerized neuropsychological cognitive test battery in young healthy adults. Journal of Psychology and Cognition, 1(1), 11–19. Retrieved from
  • Wöstmann, N. M., Aichert, D. S., Costa, A., Rubia, K., Möller, H. J., & Ettinger, U. (2013). Reliability and plasticity of response inhibition and interference control. Brain and Cognition, 81(1), 82–94. 
16 August 2021
Your Email

By clicking “Send”, you agree to our Terms of service and  Privacy statement. We will occasionally send you account related emails.

close thanks-icon

Your essay sample has been sent.

Order now
Still can’t find what you need?

Order custom paper and save your time
for priority classes!

Order paper now