· 

Replication Paper: “The corrosive effect of corruption on trust in politicians: Evidence from a natural experiment”

1. Introduction to the Study

1.1. Summary

In their article “The corrosive effect of corruption on trust in politicians: Evidence from a natural experiment”, the authors Macarena Ares and Enrique Hernàndez (2018) statistically exploit the coincidence of the disclosure of the Spanish 2013 Bárcenas corruption scandal during the survey collection for the European Social Survey (ESS). As 230 surveys were already recorded until the release of corruption information, that affected politicians of the ruling People’s party, this setting works as an quasi-natural experiment that induces an exogenous (i.e., unconfounded) shock on the trust in politicians (dependent variable). The paper uses the as-if-random selection of interviewees before and after the scandal release on January 31st, 2013. It then applies a comparision in means, OLS estimation (with controls for a potential reachability bias), and additional robustness checks, such as matching and a Regression Discontinuity Design. After Ares and Hernàndez, all of these approaches find prove that (a) the scandal significantly reduces the trust in politcians, (b) the effect decreases as time passes, and (c) the partisan preferences did not affect the loss of trust.

1.2. Criticism

I criticize three major points about the paper:
  1. Despite the convincing argumentation in favour of this natural experiment, the authors base their assignment into treatment and control group merely on the timing of the interview, i.e., those observations after Jan 31, 2013 are in the treatment group and those before the scandal release in control. This approach does not account for previous, though minor press releases regarding the scandal which might have already affected the trust in politician for some control group observations. Furthermore, the assignment assumes that all interviewee after the threshold have heard of the scandal. Despite presented information on the range of the scandal, the design of the survey does not let the authors account for individual knowledge of the scandal, nor do the authors account for news consumption or general interest in politics. Though their claim of knowing of scandal is theoretically well defended, there is no empirical check-up for their assignment assumption (such as, for example, a Fuzzy RDD)

  2. In addition, time series data requires account for serial dependence of the trust variables (i.e., autoregression). The authors neither apply control for autoregression in the OLS, nor the RDD approaches, nor do they explain the (non-)applicability of autoregression in time series data (as suggested by Hausman/Rapson, 2017).

  3. Important: The authors’ coding of the covariate variable for voting for the PP party (“election_winner”) raises serious doubt about the validity of the findings. As explained in the data preparation section (2.3.), the authors recode missing values into non-PP voters, probably to avoid a further reduction of sample size. Yet, this specification assumes information content in the missing values (i.e., no information = not voting for PP) without any justification. This is particularily important as if correctly specified (missing values stay missing values), the results of the study turn insignificant and do not withstand most robustness checks.


2. Replication of Ares/Hernàndez 2018

2.1. Used packages

library(stargazer)
library(ggplot2)
library(readstata13)
library(dplyr)
library(fastDummies)
library(knitr)
library(kableExtra)
library(KernSmooth)
library(ggalt)
library(ggthemes)
library(cem)
library(rdd)
library(plm)


2.2. Load Data

Data was loaded.


2.3. Preparation of data

## Treatment variables (exposure to scandal)
# Create dummy, treatment variable called "D_exposure_barcenas"
# Alternative D_exposure_barcenas variable (for the whole survey to study decay of D_exposure_barcenas)

## Create running variable "time", centered around scandal release
# Code time for Jan and Feb
# Time variable for the whole survey fieldwork period (Mar, Apr, May)

## Covariate variables
# Rename dummies for party voted for
# Create corrected election winner variable (coded 1 for PP voters, including NAs), see notes*
# Election winner alternative (with NA coded as 0, as in the originial paper, see notes*)
# Recode activity variable to fewer employment categories
# Create and rename dummy variables for emplyoment
# Reinsert NAs that "dummy_cols" deleted
# Rename data

## Region variable/dummies
# Encode regions from string to numeric
# Create dummy for regions where only treated units are present (0) from where treatment and control units (common support) are present (1)

Notes: The authors mispecified the elections winner variable as they code it 1 for PP voters and 0 for all other values. Thereby, they turn the 716 NAs of the variable into 0s, even though they are NAs and we cannot say whether they would have voted for PP or not. It is highly dubious to interpret observations without information into either one dummy group (0 or 1), especially as the results are highly sensitivie to these observations (the following analysis shows no significiance for the authors’ findings if the variable is correctly specified). To account for this misspecification, I display the original variables (“election_winner_alt”) as well as a corrected version where NAs were not coded as zeros (“election_winner”) in the following.

2.4. Table 1 - Two-sample t-tests for treatment vs control covariates

# Relevant covariate for t.test (including correct and incorrect election winner variable)
ttest_var <- c("eduyrs", "gndr", "agea", "emp_paid_Work", "emp_in_education", "emp_unemployed", "emp_out_of_labour_market", "election_winner", "election_winner_alt")

# calculating and creating t.test matrix
ttest_matrix <- matrix(1,nrow=4, ncol=1) #auxiliary t.test matrix

for (i in ttest_var) {
  ttest_res <- t.test(ESSdat[[i]][ESSdat$D_exposure_barcenas == 1], ESSdat[[i]][ESSdat$D_exposure_barcenas == 0], na.rm=TRUE)
  ttest_matrix <- cbind(ttest_matrix, c(ttest_res$estimate, ttest_res$p.value, sum(!is.na(ESSdat[[i]][ESSdat$D_exposure_barcenas>=0]))))
} #Run t.test for covariates and store values in matrix

ttest_df <- as.data.frame(ttest_matrix[,2:10]) #drop initialization row of matrix / convert to DF
names(ttest_df) <- c("Years of Education", "Gender", "Age", "In paid work", "In education", "Unemployed", "Out of the labour market", "Election winner (corrected values)", "Election winner (original values)") #rename column names
rownames(ttest_df) <- c("Treatment", "Control", "p-value", "Valid N") #rename rownames
ttest_df <- t(ttest_df) #switch rows and columns
ttest_df <- round(ttest_df, digits = 2) #round results to two digits
# Create table for t.test results

ttest_df %>%
  kable(caption="Table 1: Two-sample t-tests", align = "l") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "responsive"), full_width=F, position="left") %>%
  add_header_above(c("Variables" = 1, "Mean" = 2, " "=2), align="left") %>% pack_rows("Employment status:", 4,7, bold=FALSE, hline_before = FALSE, hline_after = FALSE)
Table 1: Two-sample t-tests
Variables
Mean
Treatment Control p-value Valid N
Years of Education 12.57 13.08 0.27 1409
Gender 1.51 1.53 0.59 1428
Age 46.99 49.39 0.08 1428
Employment status:
In paid work 0.45 0.35 0.01 1422
In education 0.09 0.11 0.38 1422
Unemployed 0.16 0.14 0.46 1422
Out of the labour market 0.29 0.39 0.00 1422
Election winner (corrected values) 0.41 0.44 0.49 922
Election winner (original values) 0.27 0.27 0.81 1428


2.5. Figure 2 - Descriptive evidence

# Data preparation (cleaning for NA in trstplt)
ESSdat_cleaned <- subset(ESSdat, subset = !is.na(ESSdat$trstplt))

# Plot distributions of observations and smoothed Trust in politicians
ggplot(ESSdat) + 
  geom_histogram(aes(x=time), binwidth=1, color="darkgrey", fill="lightgrey") +  
  geom_smooth(aes(x=time, y=trstplt * (100/2.4), shape=as.factor(D_exposure_barcenas), color=as.factor(D_exposure_barcenas)), method="glm", formula= y~poly(x, 5), position="identity") + 
  scale_x_continuous(name = "Days to/since scandal (Jan 31 = 0)") + 
  scale_y_continuous(name = "Number of Respondents", position="right", sec.axis = sec_axis(~./(100/2.4), name = "Trust in Politicians")) + 
  theme(legend.title=element_blank(), legend.position = "bottom") +
  scale_colour_discrete(labels=c(" Trust in Politicians (D=0) ", " Trust in Politicians (D=1) "))+
  labs(title = "Figure 2: Change in trust in Politicians at Time of Bárcenas scandal", 
       caption = "Note: Trust in politicians is measured on a 0-10 scale, where 10 indicates complete trust.\n The Trust in Politicians graphs are estimated with a \nGLM quintic polynomial to resemble the authors' modelling.")


The adequacy of plotting the distribution of observation together with the interrupted trust estimate can be questioned. The combination bears little to no additional information in comparison to two separate figure. I decided to follow the original paper and combine graph and bars.

2.6. Table 2 - Regression Analysis

## Regression setup (with corrected election winner variable) 

# LM with treatment on trust
reg1 <- lm(trstplt ~ D_exposure_barcenas, data = ESSdat)

# Region-fixed effects model with treatment and covariates on trust
reg2 <- plm(trstplt ~ D_exposure_barcenas + election_winner + gndr + eduyrs + agea + as.factor(employment), index="region", data = ESSdat, model="within")

# Region-fixed effects model with treatment, interaction and covariates on trust
reg3 <- plm(trstplt ~ D_exposure_barcenas*election_winner + gndr + eduyrs + agea + as.factor(employment), index="region", data = ESSdat, model="within")


## Regression setup (with incorrect/original election winner variable) 

# LM with treatment on trust
reg1_alt <- lm(trstplt ~ D_exposure_barcenas, data = ESSdat)

# Region-fixed effects model with treatment and covariates on trust
reg2_alt <- plm(trstplt ~ D_exposure_barcenas + election_winner_alt + gndr + eduyrs + agea + as.factor(employment), index="region", data = ESSdat, model="within")

# Region-fixed effects model with treatment, interaction and covariates on trust
reg3_alt <- plm(trstplt ~ D_exposure_barcenas*election_winner_alt + gndr + eduyrs + agea + as.factor(employment), index="region", data = ESSdat, model="within")
# Plotting all six regressions (corrected and original election_winner variable)
stargazer::stargazer(reg1_alt, reg2_alt, reg3_alt, reg1, reg2, reg3, header=FALSE, type="html",
                     title = "OLS regression models. Dependent variable: trust in politicians",
                     style="asr",
                     report="vc*s",
                     digits = 3,
                     keep = c("D_exposure_barcenas", 
                              "election_winner", 
                              "election_winner_alt", 
                              "gndr", 
                              "eduyrs", 
                              "agea", 
                              "employment", 
                              "Constant"),
                     omit.stat = c("ser", "f", "adj.rsq"),
                     notes = "Standard errors in parentheses", 
                     notes.align = "l",
                     covariate.labels = c("D (Exposure to the Bárcenas scandal)",
                                          "Election winner (wrong)", 
                                          "Election winner (correct)", 
                                          "Female", "Years of education", 
                                          "Age", "Employment: In Education (Ref: In paid work)", 
                                          "Employment: Unemployed (Ref: In paid work)", 
                                          "Employment: Out of the labour market (Ref: In paid work)", 
                                          "Employment: Other (Ref: In paid work)", 
                                          "Election winner*D (wrong)",
                                          "Election winner*D (correct)"),
                     add.lines = list(c("Region Fixed Effects", "No", "Yes", "Yes", "No", "Yes", "Yes")),
                     model.names = FALSE,
                     model.numbers = TRUE,
                     dep.var.labels.include = FALSE,
                     column.labels   = c("Incorrect Election Winner variable", "Corrected Election Winner varibale"),
                     column.separate = c(3, 3)
                     )
OLS regression models. Dependent variable: trust in politicians
Incorrect Election Winner variable Corrected Election Winner varibale
(1) (2) (3) (4) (5) (6)
D (Exposure to the Bárcenas scandal) -0.483** -0.451** -0.410* -0.483** -0.321 -0.070
(0.151) (0.168) (0.193) (0.151) (0.211) (0.270)
Election winner (wrong) 0.306* 0.430
(0.130) (0.311)
Election winner (correct) 0.410** 0.884*
(0.142) (0.349)
Female -0.158 -0.157 -0.245 -0.247
(0.112) (0.112) (0.136) (0.136)
Years of education 0.027** 0.027** 0.029* 0.031*
(0.010) (0.010) (0.012) (0.012)
Age 0.009 0.009 0.021*** 0.021***
(0.005) (0.005) (0.006) (0.006)
Employment: In Education (Ref: In paid work) 0.698** 0.699** 0.366 0.364
(0.228) (0.228) (0.330) (0.330)
Employment: Unemployed (Ref: In paid work) -0.360* -0.359* -0.299 -0.292
(0.168) (0.169) (0.212) (0.212)
Employment: Out of the labour market (Ref: In paid work) 0.286 0.284 0.024 0.017
(0.175) (0.175) (0.213) (0.213)
Employment: Other (Ref: In paid work) -0.565 -0.562 -1.290* -1.265
(0.526) (0.526) (0.650) (0.650)
Election winner*D (wrong) -0.149
(0.339)
Election winner*D (correct) -0.563
(0.379)
Constant 2.224*** 2.224***
(0.138) (0.138)
Region Fixed Effects No Yes Yes No Yes Yes
N 1,414 1,392 1,392 1,414 906 906
R2 0.007 0.038 0.038 0.007 0.051 0.053
p < .05; p < .01; p < .001
Standard errors in parentheses



2.7. Figure 3 - Effect of scandal over time (with corrected variable)

## Calculation of effect of scandal over time (with corrected "election winner" variable).

# Create auxiliary dataframes and vectors
coeffs_lm_df <- data.frame(1)
coeffs_lm_pval <- NULL
time_lm <- NULL

# Calculate linear models with changing time intervals
for (i in 0:19) {
  add_days <- i*5
  temp_subset <- subset(ESSdat, subset = ESSdat$time_whole < 7 + add_days)
  temp_lm <- plm(trstplt ~ treatment1 + election_winner + gndr + eduyrs + agea + as.factor(employment), index="region",model="within", data = temp_subset)
  coeffs_lm_df <- rbind(coeffs_lm_df, temp_lm$coefficients[1])
  time_lm <- append(time_lm, 7+add_days)
  coeffs_lm_pval <- append(coeffs_lm_pval, summary(temp_lm)$coefficients[1,4])
}

# Bind results into plottable data.frame
coeffs_lm_df <- coeffs_lm_df[-1,]
coeffs_lm_df <- as.data.frame(cbind(coeffs_lm_df, coeffs_lm_pval, time_lm))
colnames(coeffs_lm_df) <- c("Treatment_effect", "pvalues", "time")
coeffs_lm_df$significant <- with(coeffs_lm_df, ifelse(pvalues < 0.05, 1, 0))
## Plotting of scandal effect over time (with corrected variable).

ggplot(data = coeffs_lm_df) + 
  geom_smooth(aes(x=time, y=Treatment_effect), stat ="smooth", method = "loess",  color="#696969") +
  geom_point(aes(x=time, y=Treatment_effect, shape=as.factor(significant))) +
  scale_y_reverse() + 
  labs(title="Change in Treatment effect over Time", subtitle = "with corrected 'election_winner' variable", caption="\nNote: The x-axis indicates the last day included in the treatment group.\n The graph is estimated with a 'LOESS' function to smooth volatility.") + xlab("Days since the Scandal (Jan 31)") + ylab("Treatment effect (Coefficient D)") + 
  theme(legend.position='bottom') +
  scale_colour_calc() +
  scale_shape_discrete(name=" Coefficient D ", breaks=c("1","0"), labels=c(" Signficant at 5%-level ", " Not significant at 5%-level "))



2.8. Figure 3 - Effect of scandal over time (with incorrect/original variable)

To prove the validity of the calculation, I reproduce the original results with the incorrect “election_winner” variables. Due to space-saving reasons, I do not output the code that is the exact replication of the previous two chunks but with the variable “election_winner” substituted with “election_winner_alt”.



2.9. Table A7: Coarsened Exact Matching (with corrected “election_winner” variable)

### Preparation for Matching

## All regions

# Pick relevant covariates for matching
matching_covar1 <- c("trstplt", "D_exposure_barcenas", "eduyrs", "gndr", "agea", "emp_paid_Work", "emp_in_education", "emp_unemployed", "emp_out_of_labour_market", "emp_other", "election_winner")

# Reduce data set to relevant variables
ESSdat_reduced_allregions <- subset(ESSdat, subset=TRUE, select=matching_covar1)

# Clean data set from NA 
ESSdat_reduced_allregions <- na.omit(ESSdat_reduced_allregions)

# Measuring Imbalance (L1 statistic)
L1_pre_allregions <- imbalance(group = ESSdat_reduced_allregions$D_exposure_barcenas, data = ESSdat_reduced_allregions, drop = c("trstplt", "D_exposure_barcenas"))

# Matching (CEM) & ATT 
match_out_allregions <- cem(treatment = "D_exposure_barcenas", data = ESSdat_reduced_allregions, drop="trstplt", keep.all = TRUE, eval.imbalance = TRUE)
ATT_allregions <- att(match_out_allregions, trstplt~D_exposure_barcenas, data=ESSdat_reduced_allregions)

# Calculate Imbalance Change
L1_post_allregions <- round(as.numeric(match_out_allregions$imbalance$L1[1]) - as.numeric(L1_pre_allregions$L1[1]), digits=3)

###########

## Only regions with Common Support

# Pick relevant covariates
matching_covar2 <- c("trstplt", "D_exposure_barcenas", "eduyrs", "gndr", "agea", "emp_paid_Work", "emp_in_education", "emp_unemployed", "emp_out_of_labour_market", "emp_other", "election_winner", "region_treatment")

# Reduce data set to relevant variables 
ESSdat_reduced_CSregions <- subset(ESSdat, subset=TRUE, select=matching_covar2)

# Clean data set from NA and select only regions with common support
ESSdat_reduced_CSregions <- na.omit(ESSdat_reduced_CSregions)
ESSdat_reduced_CSregions <- subset(ESSdat_reduced_CSregions, ESSdat_reduced_CSregions$region_treatment == 1)

# Measuring Imbalance (L1 statistic)
L1_pre_CSregions <- imbalance(group = ESSdat_reduced_CSregions$D_exposure_barcenas, data = ESSdat_reduced_CSregions, drop = c("trstplt", "D_exposure_barcenas"))

# Matching (CEM) & ATT 
match_out_CSregions <- cem(treatment = "D_exposure_barcenas", data = ESSdat_reduced_CSregions, drop="trstplt", keep.all = TRUE, eval.imbalance = TRUE)
ATT_CSregions <- att(match_out_CSregions, trstplt~D_exposure_barcenas, data=ESSdat_reduced_CSregions)

# Calculate Imbalance change
L1_post_CSregions <- round(as.numeric(match_out_CSregions$imbalance$L1[1]) - as.numeric(L1_pre_CSregions$L1[1]), digits=3)
### Plotting the Matching Results

# All regions: Pick relevant model values and store them into vector
ATT_allregions_vec <- round(with(ATT_allregions, c(att.model[1,2], att.model[4,2], att.model[2,2], tab[2,1], tab[2,2], tab[3,1]+tab[3,2])), digits=3)
ATT_allregions_vec <- append(ATT_allregions_vec, c(match_out_allregions$n.strata, L1_post_allregions, "All"))

# CS regions: Pick relevant model values and store them into vector
ATT_CSregions_vec <- round(with(ATT_CSregions, c(att.model[1,2], att.model[4,2], att.model[2,2], tab[2,1], tab[2,2], tab[3,1]+tab[3,2])), digits=3)
ATT_CSregions_vec <- append(ATT_CSregions_vec, c(match_out_CSregions$n.strata, L1_post_CSregions, "Surveyed in treatment \n and control"))

# Bind DF for all and CS regions (incl. renaming)
ATT_df <- as.data.frame(cbind(ATT_allregions_vec, ATT_CSregions_vec))
rownames(ATT_df) <- c("SATT", "p-value", "Standard error", "Number of matched observations in control", "Number of matched observations in treatment", "Number of observations pruned", "Number of strata", "Change in L1 statistic after matching", "Regions included")
colnames(ATT_df) <- c("Model-13", "Model-14")

# Output table
ATT_df %>%
  kable(caption="Table A7: Coarsened exact matching summary results. Dependent variable: Trust in politicans. With corrected 'election_winner' variable.", align = "l") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "responsive"), full_width=F, position="left") %>%
  footnote(general = "p-value < 0.001: significant at 0.1% level \n p-value < 0.01: significant at 1% level \n p-value < 0.05: significant at 5% level")
Table A7: Coarsened exact matching summary results. Dependent variable: Trust in politicans. With corrected ‘election_winner’ variable.
Model-13 Model-14
SATT -0.308 -0.28
p-value 0.18 0.252
Standard error 0.229 0.244
Number of matched observations in control 115 105
Number of matched observations in treatment 333 245
Number of observations pruned 458 296
Number of strata 293 246
Change in L1 statistic after matching -0.175 -0.139
Regions included All Surveyed in treatment and control
Note:
p-value < 0.001: significant at 0.1% level
p-value < 0.01: significant at 1% level
p-value < 0.05: significant at 5% level


2.10. Table A7: Coarsened Exact Matching (with incorrect/original “election_winner” variable)

To prove the validity of the calculation, I reproduce the original results with the incorrect “election_winner” variables. These results differ with regards to their individual values due to slightly different estimation approach in R than in Stata (ATT estimation instead of cem_weighting), but the direction of the effect with the original “election_winner” variable is similar and equally significant. Due to space-saving reasons, I do not output the code for the incorrect variables as this is merely an exact replication of the previous two chunks but with the variable “election_winner” substituted with “election_winner_alt”.

Table A7: Coarsened exact matching summary results. Dependent variable: Trust in politicans. With incorrect/original ‘election_winner’ variable.
Model-13 Model-14
SATT -0.618 -0.559
p-value 0.001 0.004
Standard error 0.187 0.194
Number of matched observations in control 182 179
Number of matched observations in treatment 592 477
Number of observations pruned 618 361
Number of strata 382 303
Change in L1 statistic after matching -0.154 -0.128
Regions included All Surveyed in treatment and control
Note:
p-value < 0.001: significant at 0.1% level
p-value < 0.01: significant at 1% level
p-value < 0.05: significant at 5% level


2.11. Table A9: RDD summary results

Here, I do not need to differentiate between corrected and original “election_winner” varbiable as the RDD does not include covariates into its estimate.

## RDD estimation

# Running an RDD with the forcing variable time and the threshold time = 0
RD_est <- RDestimate(trstplt ~ time, data = ESSdat, cutpoint = 0, kernel = "triangular")
## Plotting RDD estimates

# Bind DF with RDD results
RDD_df <- data.frame(1,1,1) # initialize df
RDD_df <- rbind(RDD_df, RD_est$est, RD_est$p, RD_est$se, RD_est$bw) #Pick interesting estimates
RDD_df[-6,] <- round(RDD_df[-6,], digits=3) #round values to three digits
RDD_df <- rbind(RDD_df, c("Trinagular", "Triangular", "Triangular"), as.character(RD_est$obs)) #include Kernel information
RDD_df <- RDD_df[-1,] # Dump intialization row

# Adjust names of rows and columns
rownames(RDD_df) <- c("RD estimate", "p-value", "Standard error", "Bandwidth", "Kernel type", "Number of observations")
colnames(RDD_df) <- c("Model-17 (Standard BW)", "Model-18 (Half BW)", "Model-19 (Double BW)")

# Kable table for RDD estimation
RDD_df %>%
  kable(caption="Table A9: RDD summary results. Dependent variable: Trust in politicans", align = "l") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "responsive"), full_width=F, position="left") %>%
  footnote(general = "Standard erroer clustered by region \n p-value < 0.001: significant at 0.1% level \n p-value < 0.01: significant at 1% level \n p-value < 0.05: significant at 5% level \n p-value < 0.1: significant at 10% level")
Table A9: RDD summary results. Dependent variable: Trust in politicans
Model-17 (Standard BW) Model-18 (Half BW) Model-19 (Double BW)
RD estimate -1.13 -0.993 -0.662
p-value 0.041 0.214 0.068
Standard error 0.553 0.799 0.362
Bandwidth 4.334 2.167 8.668
Kernel type Trinagular Triangular Triangular
Number of observations 322 206 709
Note:
Standard erroer clustered by region
p-value < 0.001: significant at 0.1% level
p-value < 0.01: significant at 1% level
p-value < 0.05: significant at 5% level
p-value < 0.1: significant at 10% level


Note: The results of the authors’ RDD are to be handled with care as the table clearly misspecifies the number of observations involved in the RDD estimate. Constant numbers of observations in models with changing bandwith would mean that all observations lie within the minimum bandwith and increasing the bandwidth does not increase the number of observations. Previous plottings of the data have found data beyond the minimum bandwith and thus have refuted this claim. Therefore, the numbers of observations are incorrect and this error might also effect the estimate of the models. The authors also fail to provide information on the problem of autoregression when modelling a RD design with the forcing variable time, as adviced in Hausman/Rapson (2017). Therefore, conclusions drawn from the RDD are to be handled with care.


3. Testing an alternative hypothesis

3.1. Idea and Hypothesis

As testing the effect of the corruption scandal on the trust in politicians is a very obvious, but nevertheless interesting analysis, I decided to look at the potential spill-over and conmitant effects of the scandal on social, political, and institutional trust. The underlying hypothesis (H1) is that the corruption scandal reduces overall trust levels, regardless of social, institutional or political nature.

3.2. Preliminary checks for significant changes at the threshold

In a preliminary step, I ran t.tests and visualizations on the following types of trust and the respective variables (in parenthesis) in the ESS6 data set:

Social trust:
  • Most people can be trusted or you can’t be too careful (ppltrst)
  • Most people try to take advantage of you, or try to be fair (pplfair)
  • Most of the time people helpful or mostly looking out for themselves (pplhlp)
Institutional trust:
  • Trust in the legal system (trstlgl)
  • Trust in the police (trstplc)
Non-spanish political trust (spillover effects:
  • Trust in the European Parliament (trstep)
  • Trust in the United Nations (trstun)

Only the “Trust in Police” variable showed a significant and large change after the release of the scandal. Due to space-saving reasons, I only report the preliminary testing for the “Trust in Police” variable. The other variables were tested accordingly.

## Preliminary check: t.test (for Trust in Police as an example)

# calculating and creating t.test matrix
ttest_matrix <- matrix(1,nrow=4, ncol=1) #auxiliary t.test matrix

ttest_res <- t.test(trstplc ~ D_exposure_barcenas, data = ESSdat) #run t.test
ttest_matrix <- cbind(ttest_matrix, c(ttest_res$estimate, ttest_res$p.value, sum(!is.na(ESSdat[[i]][ESSdat$D_exposure_barcenas>=0])))) #assign values of interest to matrix

ttest_df <- as.data.frame(ttest_matrix[,2]) #drop initialization row of matrix / convert to DF
names(ttest_df) <- c("Trust in Police") #rename column names
rownames(ttest_df) <- c("Treatment", "Control", "p-value", "Valid N") #rename rownames
ttest_df <- t(ttest_df) #switch rows and columns
ttest_df <- round(ttest_df, digits = 3) #round results to two digits

# plotting nicely-formatted t.test

ttest_df %>%
  kable(caption="Table 1: Two-sample t-tests", align = "l") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "responsive"), full_width=F, position="left") %>%
  add_header_above(c("Variable" = 1, "Mean" = 2, " "=2), align="left")
Table 1: Two-sample t-tests
Variable
Mean
Treatment Control p-value Valid N
Trust in Police 6.236 5.723 0.005 1296


## Preliminary check: visualization of discontinuity (for Trust in Police as an example)

# Data preparation (cleaning for NA in trstplc)
ESSdat_cleaned <- subset(ESSdat, subset = !is.na(ESSdat$trstplc))

# Plot distributions of observations and smoothed Trust in Police
ggplot(ESSdat) + 
  geom_histogram(aes(x=time), binwidth=1, color="darkgrey", fill="lightgrey") +  
  geom_smooth(aes(x=time, y=trstplc * (100/9), shape=as.factor(D_exposure_barcenas), color=as.factor(D_exposure_barcenas)), method="loess", position="identity") + 
  scale_x_continuous(name = "Days to/since scandal (Jan 31 = 0)") + 
  scale_y_continuous(name = "Number of respondents", position="right", sec.axis = sec_axis(~./(100/9), name = "Trust in Police")) + 
  theme(legend.title=element_blank(), legend.position = "bottom") +
  scale_colour_discrete(labels=c(" Trust in Police (D=0) ", " Trust in Police (D=1) "))+
  labs(title = "Change in Trust in Police at time of Bárcenas scandal", 
       caption = "Note: Trust in Police is measured on a 0-10 scale, where 10 indicates complete trust.\n The Trust in Police graphs are estimated with a LOESS function.")

The preliminary results of the t.test as well as the graph show that the scandal significantly effects the trust in police in Spain. The previously moderate level of 6.24 points on a 0-10 scale drops to a post-treatment mean of 5.72. The preliminary effect therefore amounts to -0.51.

3.3. Model results

3.3.1. Covariate balance

To underpin and maintain the ignorability assumption (i.e., the treatment is truely exogeneous and hence independent of other variables), I table potentially misbalanced covariates in the following. The list expands the previously used demograhic covariates by all potential variables related to the trust in police and captured by the ESS. The election_winner variable is dropped as there is no theoretical relationship between voting for the ruling party and trusting the police. The additional variables and their theoretical connection to the outcome variable are:

  • Position on the political left-right scale (0: left, 10: right): The political left opposes state authority, the political right supports it.
  • Trust in other people (0: no trust at all, 10: maximum trust): The less you trust other people, the more you rely and trust on the government guaranteeing your safety.
  • Trust in the legal system (0: no trust at all, 10: maximum trust): Due to the close interaction between legal system and police forces, trust in legal instutions relates to trust in governmental authority.
  • Immigrants make a county a worse or better place to live (0: worse, 10: better): Opposing immigrants increases the wish for safety and strong police protection.
  • Having been victim of a crime in the past 5 years (0: no, 1: yes): Having been victim of a crime could either shrink trust in police (because of their inability to protect) or increase trust as one is more scared of becoming a victim again.
  • Feeling safe when walking alone in local area after nightfall (1: very safe, 4: very unsafe): The less safe you feel, the more you require governmental authorities to maintain safety; or vice versa, because you mistrust the police, you feel unsafe.
  • Being member of a group discriminated against in Spain (0: no, 1: yes): If being a member of a discriminated group, might reduce trust in the police to effective protect you against harrassment.


# Relevant covariate for t.test
ttest_var <- c("eduyrs", "gndr", "agea", "emp_paid_Work", "emp_in_education", "emp_unemployed", "emp_out_of_labour_market", "lrscale", "ppltrst", "trstlgl", "imwbcnt", "crmvct", "aesfdrk", "dscrgrp")

# calculating and creating t.test matrix
ttest_matrix <- matrix(1,nrow=4, ncol=1) #auxiliary t.test matrix

for (i in ttest_var) {
  ttest_res <- t.test(ESSdat[[i]][ESSdat$D_exposure_barcenas == 1], ESSdat[[i]][ESSdat$D_exposure_barcenas == 0], na.rm=TRUE)
  ttest_matrix <- cbind(ttest_matrix, c(ttest_res$estimate, ttest_res$p.value, sum(!is.na(ESSdat[[i]][ESSdat$D_exposure_barcenas>=0]))))
} #Run t.test for covariates and store values in matrix

ttest_df <- as.data.frame(ttest_matrix[,2:15]) #drop initialization row of matrix / convert to DF
names(ttest_df) <- c("Years of Education", "Gender", "Age", "In paid work", "In education", "Unemployed", "Out of the labour market", "Position on political left-right scale", "Trusting people", "Legal Trust", "Immigrant Goodness", "Victim of crime", "Feeling safe", "Member of discriminated group") #rename column names
rownames(ttest_df) <- c("Treatment", "Control", "p-value", "Valid N") #rename rownames
ttest_df <- t(ttest_df) #switch rows and columns
ttest_df <- round(ttest_df, digits = 2) #round results to two digits
# Create table for t.test results

ttest_df %>%
  kable(caption="Table 1: Two-sample t-tests", align = "l") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "responsive"), full_width=F, position="left") %>%
  add_header_above(c("Variables" = 1, "Mean" = 2, " "=2), align="left") %>% pack_rows("Employment status:", 4,7, bold=FALSE, hline_before = FALSE, hline_after = FALSE)
Table 1: Two-sample t-tests
Variables
Mean
Treatment Control p-value Valid N
Years of Education 12.57 13.08 0.27 1409
Gender 1.51 1.53 0.59 1428
Age 46.99 49.39 0.08 1428
Employment status:
In paid work 0.45 0.35 0.01 1422
In education 0.09 0.11 0.38 1422
Unemployed 0.16 0.14 0.46 1422
Out of the labour market 0.29 0.39 0.00 1422
Position on political left-right scale 4.42 4.78 0.05 1330
Trusting people 5.07 5.24 0.29 1427
Legal Trust 3.48 3.64 0.36 1398
Immigrant Goodness 5.22 5.70 0.01 1397
Victim of crime 1.70 1.64 0.08 1426
Feeling safe 2.01 2.05 0.50 1412
Member of discriminated group 1.94 1.94 0.99 1420

The covariate table shows that treatment and control groups are rather balanced. To account for structural differences with regard to the inidividual positioning on the left-right scale, different opinions towards immigrants, and having being a victim of crime, we include these factors as well as the usual demographics into the OLS estimation.

3.3.2. OLS estimation

The first regression estimates the treatment effect on the trust. The second one includes the covariates and accounts for region fixed effects, as not all regions of Spain contained treatment and control group observations (no Common Support).

## Regression setup 

# LM with treatment on trust
reg1_police <- lm(trstplc ~ D_exposure_barcenas, data = ESSdat)

# Region-fixed effects model with treatment and covariates on trust
reg2_police <- plm(trstplc ~ D_exposure_barcenas + lrscale + imwbcnt + crmvct + gndr + eduyrs + agea + as.factor(employment), index="region", data = ESSdat, model="within")
# Plotting all six regressions (corrected and original election_winner variable)
stargazer::stargazer(reg1_police, reg2_police, header=FALSE, type="html",
                     title = "OLS regression models. Dependent variable: trust in police",
                     style="asr",
                     report="vc*s",
                     digits = 3,
                     keep = c("D_exposure_barcenas", 
                              "lrscale", 
                              "imwbcnt",
                              "crmvct",
                              "gndr", 
                              "eduyrs", 
                              "agea", 
                              "employment", 
                              "Constant"),
                     omit.stat = c("ser", "f", "adj.rsq"),
                     notes = "Standard errors in parentheses", 
                     notes.align = "l",
                     covariate.labels = c("D (Exposure to the Bárcenas scandal)",
                                          "Position on left-right scale", 
                                          "Immigrant Goodness", 
                                          "Victime of Crime",
                                          "Female", "Years of education", 
                                          "Age", "Employment: In Education (Ref: In paid work)", 
                                          "Employment: Unemployed (Ref: In paid work)", 
                                          "Employment: Out of the labour market (Ref: In paid work)", 
                                          "Employment: Other (Ref: In paid work)"),
                     add.lines = list(c("Region Fixed Effects", "No", "Yes")),
                     model.names = FALSE,
                     model.numbers = TRUE,
                     dep.var.labels.include = FALSE,
                     column.labels   = c("LM w/o covariates", "PLM w/ covariates and FE")
                     )
OLS regression models. Dependent variable: trust in police
LM w/o covariates PLM w/ covariates and FE
(1) (2)
D (Exposure to the Bárcenas scandal) -0.513** -0.395*
(0.178) (0.194)
Position on left-right scale 0.190***
(0.027)
Immigrant Goodness 0.111***
(0.028)
Victime of Crime 0.262
(0.143)
Female -0.248
(0.130)
Years of education 0.011
(0.012)
Age 0.022***
(0.006)
Employment: In Education (Ref: In paid work) 0.149
(0.264)
Employment: Unemployed (Ref: In paid work) -0.470*
(0.196)
Employment: Out of the labour market (Ref: In paid work) -0.126
(0.205)
Employment: Other (Ref: In paid work) -0.259
(0.628)
Constant 6.236***
(0.163)
Region Fixed Effects No Yes
N 1,417 1,283
R2 0.006 0.086
p < .05; p < .01; p < .001
Standard errors in parentheses


The OLS results show a significant effect of the Barcenas scandal on the trust in police, even when accounting for coveraiate unbalance with regards to being a crime victim, your stance towards immigration, as well as the positioning on the left-right scale for political positions. In the second model accounting for covariates and region fixed effects, the release of the scandal information reduced the trust in police by -0.4 points.

3.4. Robustness check

To check upon the robustness of the finding, I run an RDD around the threshold with Imbens-Kalyanaraman optimal bandwidth. I plot the graph estimated with a LOESS function and the results of the RD estimate.

## Robustness check: RD estimate

RD_est_trstplc <- RDestimate(trstplc ~ time, data = ESSdat, cutpoint = 0, kernel = "triangular")
plot(RD_est_trstplc)

## Plotting RDD estimates

# Bind DF with RDD results
RDD_df <- data.frame(1,1,1) # initialize df
RDD_df <- rbind(RDD_df, RD_est_trstplc$est, RD_est_trstplc$p, RD_est_trstplc$se, RD_est_trstplc$bw) #Pick interesting estimates
RDD_df[-6,] <- round(RDD_df[-6,], digits=3) #round values to three digits
RDD_df <- rbind(RDD_df, c("Trinagular", "Triangular", "Triangular"), as.character(RD_est$obs)) #include Kernel information
RDD_df <- RDD_df[-1,] # Dump intialization row

# Adjust names of rows and columns
rownames(RDD_df) <- c("RD estimate", "p-value", "Standard error", "Bandwidth", "Kernel type", "Number of observations")
colnames(RDD_df) <- c("Model-17 (Standard BW)", "Model-18 (Half BW)", "Model-19 (Double BW)")

# Kable table for RDD estimation
RDD_df %>%
  kable(caption="Table A9: RDD summary results. Dependent variable: Trust in politicans", align = "l") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "responsive"), full_width=F, position="left") %>%
  footnote(general = "Standard erroer clustered by region \n p-value < 0.001: significant at 0.1% level \n p-value < 0.01: significant at 1% level \n p-value < 0.05: significant at 5% level \n p-value < 0.1: significant at 10% level")
Table A9: RDD summary results. Dependent variable: Trust in politicans
Model-17 (Standard BW) Model-18 (Half BW) Model-19 (Double BW)
RD estimate -1.147 -1.525 -1.137
p-value 0.024 0.066 0.003
Standard error 0.509 0.829 0.384
Bandwidth 5.761 2.881 11.523
Kernel type Trinagular Triangular Triangular
Number of observations 322 206 709
Note:
Standard erroer clustered by region
p-value < 0.001: significant at 0.1% level
p-value < 0.01: significant at 1% level
p-value < 0.05: significant at 5% level
p-value < 0.1: significant at 10% level

The RD underlines the robustness of the finding as they turn significant as for optimal and double-optimal bandwidth.

3.5. Summary and Interpretation of Findings

This analysis expands the research of Ares/Hernandez (2018) on the effect of the Bárcenas corruption scandal on trust in politicians by scrutinizing spillover and conmitant effects on social, instutional, and non-Spanish political trust. Based on the European Social Survey 6, the only observable co-effect is the one reducing trust in the police. The finding withstand covariate conditioning and robustness checks.

Without further empirical analysis it is difficult to name one explanation for the presented finding. A possible reason for the conmitant effect of the scandal on the trust in police is that (1) the scandal undermines trust in the executive branch of the state (government and authorities controlled by it, such as the police). Therefore, the trust in police also suffers from the scandal release. (2) An alternative explanation ranks around the leaking of the information as not anti-corruption authorities (i.e., subunits of the police) discovered the scandal but the newspaper El País. Therefore, people trust governmental authorities less to detect corruption and maintain safety.


4. Literature

  • Ares, M., and Hernández, E. (2018). The corrosive effect of corruption on trust in politicians: Evidence from a natural experiment. Research & Politics.
  • Hausman, C., and Rapson, D. (2017): Regression Discontinuity in Time: Considerations for Empirical Applications, Energy Institute at Haas, Haas Working Paper 282, Berkeley, USA.