Successfully dealing with faking
Successfully dealing with faking on a self-report personality test
Faking on self-report personality tests is common and a strong drawback of such tests. Many approaches have been tried to counteract this serious source of error, see e.g. a recent papers in the Journal of Applied Psychology (Bangerter, Roulin, & König, 2012; Fan, et al., 2012).
The UPP test (Sjöberg, 2010/2012) is a self-report personality test and as such it is vulnerable to faking in high-stakes testing situations. However, this test uses a simple but powerful methodology for correcting test scores for faking. It measures separately two social desirability (SD) dimensions, one overt (similar to the classical Crowne-Marlowe scale (Crowne & Marlowe, 1960)) and one covert. The covert scale uses items similar to conventional personality items but selected for their strong correlation with the overt scale. The two scales are highly correlated and give similar results when used to correct test scales for faking.
The correction procedure uses regression models where each test scale in turn is the dependent variable and the SD scales are independent variables. It is necessary to fit a new model for each test scale because the different scales are related to SD in different ways, correlations varying widely. The corrected test scales are the residuals in these regression models.
This procedure gives corrected test scales which correlate zero with SD. So far, so good, but does it also work? In other words, can it be validated on empirical data? One way to validated it is to study groups tested under different levels of involvement, from incumbents where test results have no consequences, to applicants where they do, and consequences are very important. In a recent study of applicants to the officers’ training program in the Swedish Army, I had a chance to study this question, using the UPP test and its SD scales. (Previous studies had given similar results). Data were available for 5 groups:
C. Applicants (low consequences of test results)
D. Applicants (moderate consequences)
E. Applicants (high-stakes testing)
I expected increasing SD scale values in the order A – E. I also expected test scales to have the same rank order, if they were sensitive to SD, such as emotional stability. Finally, I expected the group differences in emotional stability to vanish if the test data were corrected for faking using the two SD scales (and a multiple regression model). For the results, see Figs. 1 and 2 below, and Table 1.
Fig. 1. Means of SD scales
Fig. 2. Means of emotional stability before and after SD correction
Tabell 1. Mean values of emotional stability (standardized scales), uncorrected and corrected data, effect size and one-way ANOVA of group differences.
|Grupp||Before correction||Corrected for SD|
|C. Applicants (low consequences of test results)||0.43||0.28|
|D. Applicants (moderate consequences)||0.56||0.06|
|E. Applicants (high-stakes testing)||0.73||0.11|
|Effect size (eta2)||0.147||0.006|
|One-way ANOVA||F(4,1638) = 70.693, p < 0.0005||F(4,1828) = 2.763, p = 0.026|
Note that the effect size decreased to about 5 %.
In other work on leader effectiveness, using 360 degrees feedback as criterion, I found that the validities of the test scales increased after correction for SD according to the same method (Sjöberg, Bergman, Lornudd, & Sandahl, 2011), see Fig. 3.
Fig. 3. Validities of uncorrected and corrected persnality scales
In conclusion, a simple method for correction for faking has been found to successfully remove about 95 % of the variance due to SD in test responses, and such a method increased the validity of the test scores against an external criterion.
It is often argued that SD scales really measure ”personality”, such as need for approval, and not a tendency to distort responses. However, the present results strongly refute this view. It is very plausible that different levels of consequences of testing should lead to different levels of motivation for impression management, but unlikely that they should result in different levels of some personality dimension such as need for approval.
- Bangerter, A., Roulin, N., & König, C. J. (2012). Personnel selection as a signaling game. [doi:10.1037/a0026078]. Journal of Applied Psychology, 97, 719-738.
- Crowne, D. P., & Marlowe, D. (1960). A new scale of social desirability independent of psychopathology. Journal of Consulting and Clinical Psychology, 24, 349-354.
- Fan, J., Gao, D., Carroll, S. A., Lopez, F. J., Tian, T. S., & Meng, H. (2012). Testing the efficacy of a new procedure for reducing faking on personality tests within selection contexts. [doi:10.1037/a0026655]. Journal of Applied Psychology, 97, 866-880.
- Sjöberg, L. (2010/2012). A third generation personality test (SSE/EFI Working Paper Series in Business Administration No. 2010:3). Stockholm: Stockholm School of Economics.
- Sjöberg, L., Bergman, D., Lornudd, C., & Sandahl, C. (2011). Sambandet mellan ett personlighetstest och 360-graders bedömningar av chefer i hälso- och sjukvården. (Relationship between a personality test and 360 degrees judgments of health care managers). Stockholm: Karolinska Institute, Institutionen för lärande, informatik, management och etik (LIME).