← Contents

Extra Empirical Project 2: The politics of carbon taxation Working in R

These code downloads have been constructed as supplements to the full Doing Economics projects. You’ll need to download the data before running the code that follows.

Download the code

To download the code chunks used in this project, right-click on the download link and select ‘Save Link As…’. You’ll need to save the code download to your working directory, and open it in RStudio.

Don’t forget to also download the data into your working directory by following the steps in this project.

Getting started in R

For this project you will need the following packages:

tidyverse, to help with data manipulation
readxl, to import an Excel spreadsheet.

If you need to install either of these packages, run the following code:

install.packages(c("readxl", "tidyverse"))

You can import the libraries now, or when they are used in the R walk-throughs.

library(readxl)
library(tidyverse)

Part 1: Measuring and explaining public support for carbon taxation

In this part, we will analyse survey data on public support for carbon taxation in the UK. We will summarize how support for carbon taxes is distributed and how it is associated with the survey respondents’ demographic characteristics and beliefs.

First, download the survey data and documentation:

Download the data, which is a simplified version of the dataset from the article ‘Unequal treatment perceptions and rural backlashes against carbon taxation’ by Hope, Limberg, and Steinebach (2026). Also download their article for reference.
Read the Data dictionary tab in the spreadsheet. Familiarize yourself with the definitions of the variables in the dataset and check that each variable listed in the Data dictionary is also in the Data tab.

R walk-through 1 Importing data into R

We start by setting our working directory using the setwd command. This command tells R where your codes and data files are stored. In the code below, replace ‘YOURFILEPATH’ with the full file path that indicates the folder in which you have saved the code chunks file. Note that you have to use forward slashes (‘/’) rather than backslashes (‘\’). If you don’t know how to find the path to your working folder, see the Technical Reference section.
setwd(‘YOURFILEPATH’)
Since our data is in Excel format, we use the read_excel function to import the data into R. We run this command twice: once to import the data dictionary (which we will call var_info) and once to import the survey data (which we will call dat). We use the sheet option to tell R which tab in the Excel file to import.
var_info <- read_excel("dataset_hope-et-al_simplified.xlsx",
  sheet = "Data dictionary")
dat <- read_excel("dataset_hope-et-al_simplified.xlsx", sheet = "Data")
To check that the data has been imported correctly, you can use the head function to view the first six rows of the dataset, and confirm that they correspond to the columns in the Excel file.
head(dat)
## # A tibble: 6 × 9
##   respondent_id   age neighbou…¹ commute parti…² treat…³ carbo…⁴ unequ…⁵ carbo…⁶
##           <dbl> <dbl> <chr>      <chr>   <chr>     <dbl>   <dbl>   <dbl>   <dbl>
## 1             1    21 Suburban   Walk    Labour…       1       2       7       3
## 2             2    24 Suburban   Car     Labour…       1       3       4       6
## 3             3    21 Urban      Car     Labour…       0       2       6       3
## 4             4    54 Suburban   <NA>    Labour…       1       2       7       4
## 5             5    48 Suburban   Public… Labour…       1       4      10       8
## 6             6    46 Suburban   Car     Libera…       0       4       7       7
## # … with abbreviated variable names ¹neighbourhood, ²partisanship, ³treatment,
## #   ⁴carbon_tax_support, ⁵unequal_treatment, ⁶carbon_tax_unfairness
Before working with the data, we use the str function to check that the data is formatted correctly.
str(dat)
## tibble [2,997 × 9] (S3: tbl_df/tbl/data.frame)
##  $ respondent_id        : num [1:2997] 1 2 3 4 5 6 7 8 9 10 ...
##  $ age                  : num [1:2997] 21 24 21 54 48 46 66 30 28 21 ...
##  $ neighbourhood        : chr [1:2997] "Suburban" "Suburban" "Urban" "Suburban" ...
##  $ commute              : chr [1:2997] "Walk" "Car" "Car" NA ...
##  $ partisanship         : chr [1:2997] "Labour Party" "Labour Party" "Labour Party" "Labour Party" ...
##  $ treatment            : num [1:2997] 1 1 0 1 1 0 1 0 0 0 ...
##  $ carbon_tax_support   : num [1:2997] 2 3 2 2 4 4 5 3 3 3 ...
##  $ unequal_treatment    : num [1:2997] 7 4 6 7 10 7 8 10 9 9 ...
##  $ carbon_tax_unfairness: num [1:2997] 3 6 3 4 8 7 10 7 7 4 ...
R correctly recognizes variables that are numbers (num), such as respondent_id and age, and variables that are words (chr), such as partisanship.

Likert scale: A numerical scale (usually ranging from 1–5 or 1–7) used to measure attitudes or opinions, with each number representing the individual’s level of agreement or disagreement with a particular statement.

Attitudes towards carbon taxation are assessed on a Likert scale. In this case, the scale measured the level of support for a specific policy (on a 5-point scale running from 1 for ‘strongly oppose’ to 5 for ‘strongly support’). This is a common approach in survey research assessing people’s preferences for economic policies.

Find the survey question used to ask about carbon tax preferences in Part A of the supplementary material for the article. What step do the authors take to try to ensure that they receive accurate information about respondents’ support for the policy?

Use the data you have imported to answer the following questions:

Each respondent in the dataset has been assigned an ID (recorded as respondent_id in the spreadsheet). How many respondents are there in the dataset?

In the survey, respondents are randomly assigned to the treatment or control group. Use the treatment variable in the spreadsheet. How many respondents are in the treatment group and how many are in the control group? (Hint: Respondents in the treatment group are given a value of 1 and respondents in the control group are given a value of 0. So, you can highlight the column for the treatment variable and use the ‘Sum’ reported on the grey bar at the bottom of the Excel spreadsheet to find out the number of respondents in the treatment group.)

R walk-through 2 Making a frequency table

We use the count function (part of the tidyverse package) to count the number of respondents in the control and treatment groups. This information is stored in the variable treatment. The punctuation %>% can be used to link multiple commands together.
dat %>%
  count(treatment)
## # A tibble: 2 × 2
##   treatment     n
##       <dbl> <int>
## 1         0  1516
## 2         1  1481
There are 1,516 respondents in the control group (treatment = 0) and 1,481 respondents in the treatment group (treatment = 1).

On page 7 of the article, the authors describe how they recode the variable for carbon tax support for their empirical analysis. How do they recode the variable? What might be the advantages and disadvantages of doing this?

dummy variable (indicator variable): A variable that takes the value 1 if a certain condition is met, and 0 otherwise.

Binary variables are dichotomous—they can only take one of two possible values or categories (for example, ‘yes’ and ‘no’ or ‘true’ and ‘false’). One way to simplify variables in a dataset to make them easier to analyse is to transform them into binary variables. When a binary variable is created that only takes the values of 0 or 1, it is referred to as a dummy variable (also known as an indicator variable).

We will now create dummy variables.

Create a dummy variable for carbon tax support that takes a value of 1 if the variable carbon_tax_support is 1 or 2 (that is, respondents ‘strongly support’ or ‘support’ carbon taxation) and 0 otherwise. When creating this variable, missing data (blank cells that indicate which respondents did not answer the carbon tax support question in the survey) should still be coded as missing (that is, NA).

Create four more dummy variables that will be used in the analysis for this project. Give each of these new variables an informative name:

a dummy variable indicating whether respondents are aged 40 or above (coded as 1) or under 40 years of age (coded as 0)

a dummy variable indicating whether respondents commute by car (1) or by other means (0)

a dummy variable indicating whether respondents live in a rural area (1) or a non-rural area (0) (Note: people living in ‘urban’ and ‘suburban’ areas should be coded as 0)

a dummy variable indicating whether respondents have an unequal_treatment value of 8 or above (1) or below 8 (0) (Note: Part 2 of the project discusses the meaning of this variable in more depth).

R walk-through 3 Creating dummy variables

Here we use the mutate function (within the tidyverse package) to create new columns in our dat dataset, one for each dummy variable. We use the case_when function to specify the condition(s) that R should use to assign values, and the is.na function to code any missing data (NA) as NA_real (numerical missing data).
# Create carbon tax dummy (1 if support is 1 or 2)
dat <- dat %>%
  mutate(
    carbon_tax_support_dummy = case_when(
      carbon_tax_support == 1 ~ 1,
      carbon_tax_support == 2 ~ 1,
      is.na(carbon_tax_support) ~ NA_real_,
      TRUE ~ 0
    )
  )
In the code above, we set the values of our carbon tax support dummy variable (called carbon_tax_support_dummy) to equal 1 if the variable carbon_tax_support equals 1 or 2. Otherwise, this dummy variable should equal 0 (TRUE ~ 0).

We repeat the same steps to create dummy variables for age (age_dummy), commuting by car (car_dummy), rural neighbourhood (rural), and unequal treatment perceptions (unequal_treatment_dummy).
# Create age dummy (1 if age is greater than 40)
dat <- dat %>%
  mutate(
    age_dummy = case_when(
      age > 40 ~ 1,
      is.na(age) ~ NA_real_,
      TRUE ~ 0
    )
  )

#Create car dummy (1 if people commute by car)
dat <- dat %>%
  mutate(
    car_dummy = case_when(
    commute == "Car" ~ 1,
    is.na(commute) ~ NA_real_,
    TRUE ~ 0
    )
  )

#Create rural dummy (1 if people live in rural neighborhood)
dat <- dat %>%
  mutate(
    rural = case_when(
      neighbourhood == "Rural" ~ 1,
      is.na(neighbourhood) ~ NA_real_,
      TRUE ~ 0
    )
  )

#Create unequal treatment perceptions dummy (1 if value is 8 and above)
dat <- dat %>%
  mutate(
    unequal_treatment_dummy = case_when(
      unequal_treatment >= 8 ~ 1,
      is.na(unequal_treatment) ~ NA_real_,
      TRUE ~ 0
    )
  )

We will now use the dataset to explore public support for carbon taxation in the UK. For the rest of Part 1, we will only use data from the control group (as we want to look at baseline support without any influence from the treatment in the experiment, which will be discussed further in Part 2 of the project).

Create a new data frame called Control that only contains data for the control group (in other words, only for the respondents whose value for the variable treatment is 0).

For Questions 7–12, use the data in the dataframe Control.

We will start by using the original carbon tax variable with all five answer categories to see how carbon tax support is distributed. We will then turn to our dummy variable for carbon tax support to help simplify the remainder of the analysis.

Create a frequency table (like Figure 1) that shows the number and percentage of respondents in each of the five answer categories for the variable carbon_tax_support.

Carbon tax support	Number of respondents	Percentage of respondents
Strongly oppose
Oppose
Neither support nor oppose
Support
Strongly support

Figure 1 The distribution of carbon tax support in the UK.

Use the data from the frequency table in Question 7 to create a column chart showing the percentage of respondents in each category of carbon tax support.

R walk-through 4 Making a frequency and column chart on a subset of data

We apply the filter function to select observations in the control group (treatment == 0) and save these in a new dataframe called Control.
Control <- dat %>%
  filter(treatment == 0)
We will create a frequency table (called freq_table) using three tidyverse package functions. First, we filter (filter) out the missing data in the carbon_tax_support variable. Second, we count (count) the number of respondents in this variable. Third, we use mutate to create a new column called Percentage that contains the percentages.
#Frequency table for carbon tax support
freq_table <- dat %>%
  filter(!is.na(carbon_tax_support)) %>%
  count(carbon_tax_support) %>%
  mutate(Percentage = (n / sum(n)) * 100)
We now use ggplot to make a column chart (geom_col()) with carbon_tax_support as the horizontal (x) variable and Percentage as the vertical (y) variable.
#Create column chart
ggplot(freq_table, aes(x = carbon_tax_support, y = Percentage)) +
  geom_col() +
  labs(
  title = "Share of respondents by carbon tax support",
  x = "Carbon tax support",
  y = "Percentage of respondents"
  )
Fullscreen

Figure 2 Share of respondents by carbon tax support.

Use your chart from Question 8 to discuss the extent of support for carbon taxation in the UK. (For example, how does the percentage of respondents who support or strongly support carbon taxation compare with the percentage of respondents who oppose or strongly oppose carbon taxation?)

Use the dummy variable for carbon tax support to do the following:

Calculate the average of this dummy variable. How does this average relate to the table you created in Question 7?

Provide an interpretation of the average of the carbon tax support dummy variable.

R walk-through 5 Calculating the average of a dummy variable

The average of a dummy variable is a proportion (between 0 and 1) that can be multiplied by 100 to represent the percentage of respondents for which the variable equals 1. We use the count and mutate functions, just like in R walk-through 4, and store the results in freq_table_dummy.
freq_table_dummy <- dat %>%
  count(carbon_tax_support_dummy) %>%
  mutate(Percentage = (n / sum(n)) * 100)
freq_table_dummy
## # A tibble: 3 × 3
##   carbon_tax_support_dummy     n Percentage
##                      <dbl> <int>      <dbl>
## 1                        0  1623      54.2 
## 2                        1  1318      44.0 
## 3                       NA    56       1.87
R creates a separate row for the missing data (56 observations). To remove this missing data, we use the filter option before creating the frequency table.
freq_table_dummy <- dat %>%
  filter(!is.na(carbon_tax_support)) %>%
  count(carbon_tax_support_dummy) %>%
  mutate(Percentage = (n / sum(n)) * 100)
freq_table_dummy
## # A tibble: 2 × 3
##   carbon_tax_support_dummy     n Percentage
##                      <dbl> <int>      <dbl>
## 1                        0  1623       55.2
## 2                        1  1318       44.8
The counts (n) remain the same, but the percentages are now calculated over the non-missing data only.

1,318 respondents (44%, or 44.8% if excluding the missing data) support the carbon tax.

conditional mean: An average of a variable, taken over a subgroup of observations that satisfy certain conditions, rather than all observations.

We will now use our carbon tax support dummy variable and other variables in the dataset to explore how support for carbon taxes varies across different groups in the UK. Specifically, we will calculate the average of the carbon tax dummy variable for different subgroups in the dataset (in other words, we will calculate conditional means for the carbon tax dummy variable).

Create the following tables:

a table showing how average carbon tax support differs for respondents under 40, and those aged 40 and over

a table showing how average carbon tax support differs for respondents who commute by car and those who do not

a table showing how average carbon tax support differs for respondents living in rural areas and non-rural areas

a table showing how average carbon tax support differs for respondents who support different political parties.

R walk-through 6 Creating summary tables for different groups

Using the Control dataframe, we first use group_by to specify the groups for which the calculations will be done separately (age_dummy in this case), then use filter to remove rows with missing data for age and carbon tax support. Finally, we use the summarise function to calculate the mean (mean()) and store it in a new variable called mean_carbon_tax_support_dummy.

# Carbon tax support by age group (1 = age above 40)
Control %>%
  group_by(age_dummy) %>%
  filter(!is.na(age_dummy))%>%
  filter(!is.na(carbon_tax_support_dummy)) %>%
  summarise(mean_carbon_tax_support_dummy = mean(carbon_tax_support_dummy))

## # A tibble: 2 × 2
##   age_dummy mean_carbon_tax_support_dummy
##       <dbl>                         <dbl>
## 1         0                         0.528
## 2         1                         0.413

Now we repeat the same steps for the other dummy variables.

#Carbon tax support by commuting by car (1 = car)
Control %>%
  group_by(car_dummy) %>%
  filter(!is.na(car_dummy))%>%
  filter(!is.na(carbon_tax_support_dummy)) %>%
  summarise(mean_carbon_tax_support_dummy = mean(carbon_tax_support_dummy))

## # A tibble: 2 × 2
##   car_dummy mean_carbon_tax_support_dummy
##       <dbl>                         <dbl>
## 1         0                         0.552
## 2         1                         0.407

#Carbon tax support by rural status (1 = rural)
Control %>%
  group_by(rural) %>%
  filter(!is.na(rural))%>%
  filter(!is.na(carbon_tax_support_dummy)) %>%
  summarise(mean_carbon_tax_support_dummy = mean(carbon_tax_support_dummy))

## # A tibble: 2 × 2
##   rural mean_carbon_tax_support_dummy
##   <dbl>                         <dbl>
## 1     0                         0.468
## 2     1                         0.431

#Carbon tax support by partisanship
Control %>%
  group_by(partisanship) %>%
  filter(!is.na(partisanship))%>%
  filter(!is.na(carbon_tax_support_dummy)) %>%
  summarise(mean_carbon_tax_support_dummy = mean(carbon_tax_support_dummy))

## # A tibble: 11 × 2
##    partisanship                     mean_carbon_tax_support_dummy
##    <chr>                                                    <dbl>
##  1 Conservative Party                                       0.343
##  2 Democratic Unionist Party                                0.75 
##  3 Green Party of England and Wales                         0.690
##  4 Labour Party                                             0.552
##  5 Liberal Democrats                                        0.529
##  6 Other                                                    0.5  
##  7 Plaid Cymru                                              0.333
##  8 Prefer not to say                                        0.262
##  9 Reform UK                                                0.167
## 10 Scottish National Party                                  0.617
## 11 Sinn Féin                                                0.375

Use the tables in Question 11 to describe how support for carbon taxation in the UK varies across population subgroups. Suggest two other variables (not included in the dataset) that might be associated with people’s support for carbon taxes.

Part 2: Explaining rural backlashes against carbon taxation

Note

You will need to complete Part 1 before starting Part 2.

In Part 2 of the project, we focus on explaining rural backlashes against carbon taxation. In recent years, there have been several high-profile examples of this phenomenon including the 2018–2020 ‘Gilet Jaunes’ protests in France, which were sparked by a proposed rise in fuel taxes, and the mobilization of rural communities in British Columbia in Canada to fight the introduction of a new carbon tax. If governments hope to build broad-based support for carbon taxation, then it is important to understand better why these communities have such fierce resistance to carbon taxes.

That is the research question at the centre of Hope, Limberg, and Steinebach’s (2026) article ‘Unequal treatment perceptions and rural backlashes against carbon taxation’. In the article, the authors argue that rural backlashes against carbon taxation are not only driven by the direct costs borne by rural communities, but also by fairness considerations. People living in rural areas may oppose carbon taxes on the grounds that these taxes unfairly punish rural communities that are already disadvantaged and marginalized compared with the urban centres of economic and political power. Hence, the article argues that underlying resentments at unequal treatment by the government are an important reason for rural backlashes against carbon taxes.

We will use the simplified version of the dataset from the article to explore the empirical support for the authors’ argument. We will also learn about information provision survey experiments and how they can be utilized to test causal arguments about what drives people’s beliefs and policy preferences.

For this part of the project, we will add to the dataset that you worked on for Part 1 of the project. We will only use data from the rural respondents in the dataset, as this is the subgroup that we are interested in investigating further.

Before you begin the tasks below, look at the Data dictionary tab in the original Excel file and familiarize yourself with how the survey measures respondents’ perceptions of unequal treatment by the government and respondents’ perceptions of the unfairness of carbon taxes. Think about how we can we interpret high values for these two variables.

We start by selecting all the data for the rural respondents in the control group. Create a new dataframe called rural_control that only contains data for rural respondents in the control group.

Create the following tables:

a table showing how average carbon tax unfairness perceptions (carbon_tax_unfairness) differ for rural respondents in the control group who perceive a high degree of unequal treatment (8 or above on the 0–10 scale) compared with those who do not (Hint: Use the dummy variable for unequal treatment that you created in Part 1 Question 5 of the project.)

a table showing how average carbon tax support differs for rural respondents in the control group who perceive a lot of unequal treatment (8 or above on the 0–10 scale) and those who do not.

R walk-through 7 Creating summary tables on a subset of data

First, we use filter to select only the rows in the Control dataframe that have rural respondents, and call this dataframe rural_control.

rural_control <- Control %>%
  filter(rural == 1)

To create the summary tables, we use the group_by, filter, is.na, and summarise functions just like in R walk-through 6 from Part 1. Within the mean function, the na.rm = T option tells R to exclude any missing data from the calculation.

#Perception of carbon tax unfairness among rural respondents by unequal treatment perceptions (1 if people 8 and above)
rural_control %>%
  group_by(unequal_treatment_dummy) %>%
  filter(!is.na(unequal_treatment_dummy))%>%
  filter(!is.na(carbon_tax_unfairness)) %>%
  summarise(mean_unfairness = mean(carbon_tax_unfairness, na.rm = T))

## # A tibble: 2 × 2
##   unequal_treatment_dummy mean_unfairness
##                     <dbl>           <dbl>
## 1                       0            5.13
## 2                       1            6.20

#Support for carbon taxation among rural respondents by unequal treatment perceptions (1 if people 8 and above)
rural_control %>%
  group_by(unequal_treatment_dummy) %>%
  filter(!is.na(unequal_treatment_dummy))%>%
  filter(!is.na(carbon_tax_support_dummy)) %>%
  summarise(mean_carbon_tax_support_dummy = mean(carbon_tax_support_dummy, na.rm = T))

## # A tibble: 2 × 2
##   unequal_treatment_dummy mean_carbon_tax_support_dummy
##                     <dbl>                         <dbl>
## 1                       0                         0.472
## 2                       1                         0.405

Use the tables from Question 2 to describe the relationship between perceptions of unequal treatment and a) carbon tax unfairness perceptions, and b) support for carbon taxation.

correlation: A measure of how closely related two variables are. Two variables are correlated if knowing the value of one variable provides information on the likely value of the other, for example high values of one variable being commonly observed along with high values of the other variable. Correlation can be positive or negative. It is negative when high values of one variable are observed with low values of the other. Correlation does not mean that there is a causal relationship between the variables. Example: When the weather is hotter, purchases of ice cream are higher. Temperature and ice cream sales are positively correlated. On the other hand, if purchases of hot beverages decrease when the weather is hotter, we say that temperature and hot beverage sales are negatively correlated.
causation: A direction from cause to effect, establishing that a change in one variable produces a change in another. While a correlation gives an indication of whether two variables move together (either in the same or opposite directions), causation means that there is a mechanism that explains this association. Example: We know that higher levels of CO₂ in the atmosphere lead to a greenhouse effect, which warms the Earth’s surface. Therefore we can say that higher CO₂ levels are the cause of higher surface temperatures.
information provision survey experiment: A research methodology where survey respondents are randomly assigned to receive different information. Researchers then look at how the information provided affects respondents’ beliefs and preferences. Information provision survey experiments are a useful tool for testing causal arguments about what drives people’s economic policy preferences.

Correlation and causation are distinct concepts: a correlation between two variables does not necessarily mean that there is a causal relationship between them (Part 1.3 of Empirical Project 1 discusses these concepts in more detail). So, we need more evidence to determine whether there is a causal relationship between the variables you summarized in Question 2.

In the article, the authors carry out an information provision survey experiment to examine the causal relationship between unequal treatment perceptions and lower support for carbon taxation.

The survey respondents were randomly assigned to the control or treatment group when beginning the survey. They first answered some questions about their demographic characteristics (such as ethnicity, level of education, and household income). After this, the treatment group was shown some information, whereas the control group was not. The survey then asked respondents about their beliefs and policy preferences related to carbon taxation. The full survey that respondents completed is included in the supplementary material for the article.

Since the treatment and control group were randomly assigned, any differences in beliefs and policy preferences between these two groups would reflect the effect of the information provided to the treatment group. The authors wanted to test whether perceptions of unequal treatment affected carbon tax support for rural respondents, so they provided information that would particularly strengthen perceptions of unequal treatment among rural respondents.

Figure 3 shows the information provided to respondents in the treatment group. It highlights the highly uneven distribution of government spending on transport (per person) across regions in the UK. London stands out as the region with the highest per capita government spending on transport by far. The level spent in London is almost double the amount spent across the whole UK. Crucially, London is the largest urban area in the UK and the seat of political power. The information therefore particularly highlights unequal treatment by the UK government along urban–rural lines.

Fullscreen

Figure 3 The information provided to the treatment group in the experiment.

conditional mean: An average of a variable, taken over a subgroup of observations that satisfy certain conditions, rather than all observations.

In the remaining tasks, we will follow the approach used in the article by comparing average values for our key variables for rural respondents in the treatment and control groups. Your results for Questions 4–9 will look similar to Figures 5 and 6 of the article—but not exactly alike, as the authors have controlled for other characteristics between the groups, like taking the conditional mean, whereas your results will show the unconditional mean.

Create a new dataframe called rural that contains rural respondents in the control and treatment group.

Create the following column charts:

a chart showing how average unequal treatment perceptions differ for rural respondents in the control and treatment groups

a chart showing how average carbon tax unfairness perceptions differ for rural respondents in the control and treatment groups

a chart showing how average carbon tax support differs for rural respondents in the control and treatment groups.

R walk-through 8 Creating column charts

We use filter to create a new dataframe called rural that contains all rows that satisfy the condition rural == 1.
rural <- dat %>%
  filter(rural == 1)
To create the column charts, we first use the group_by and summarise functions to calculate the mean separately for each group (treatment and control). Then we ‘pipe’ this output into the ggplot function, using the factor option to specify that each category in the treatment variable should have a separate column.
rural %>%
  group_by(treatment) %>%
  summarise(mean_unequal_treatment = mean(unequal_treatment, na.rm = T)) %>%
  ggplot(., aes(x = factor(treatment), y = mean_unequal_treatment, fill = treatment)) +
  geom_col(width = 0.3) +
  labs(
    title = "Average unequal treatment perceptions",
    x = "Treatment group",
    y = "Average perceived unequal treatment"
  ) +
  theme(legend.position = "none")
Fullscreen

Figure 4 Average unequal treatment perceptions.

We repeat the same steps for unfairness perceptions (carbon_tax_unfairness) and carbon tax support (carbon_tax_support_dummy).
# Unfairness perception by treatment group
rural %>%
  group_by(treatment) %>%
  summarise(mean_carbon_tax_unfairness = mean(carbon_tax_unfairness, na.rm = T)) %>%
  ggplot(., aes(x = factor(treatment), y = mean_carbon_tax_unfairness, fill = treatment)) +
  geom_col(width = 0.3) +
  labs(
    title = "Average carbon tax unfairness perception",
    x = "Treatment group",
    y = "Average carbon tax unfairness perception"
  ) +
  theme(legend.position = "none")

# Carbon tax support by treatment group
rural %>%
  group_by(treatment) %>%
  summarise(mean_carbon_tax_support_dummy = mean(carbon_tax_support_dummy, na.rm = T)) %>%
  ggplot(., aes(x = factor(treatment), y = mean_carbon_tax_support_dummy, fill = treatment)) +
  geom_col(width = 0.3) +
  labs(
    title = "Average carbon tax support",
    x = "Treatment group",
    y = "Average carbon tax support"
  ) +
  theme(legend.position = "none")
Fullscreen

Figure 5 Average carbon tax unfairness perception.

Fullscreen

Figure 6 Average carbon tax support.

Use the arrange function (from the tidyverse package) to sort the treatment variable in ascending order. This will move all the observations for the control group to the top of the dataframe, and all the observations for the treatment group below.

We will now conduct a formal statistical test to assess how likely it is that the observed differences between the treatment and control group are due to chance (variation that naturally occurs when sampling from the whole UK population) or due to the information treatment (systematic changes in the treatment group’s beliefs and preferences).

p-value: The probability of observing data at least as extreme as the data collected if a particular hypothesis about the population is true. The p-value ranges from 0 to 1: the lower the probability (the lower the p-value), the less likely it is to observe the given data, and therefore the less compatible the data are with the hypothesis.

Calculate the p-value for the difference in means between the control and treatment groups for:

unequal treatment perceptions

carbon tax unfairness perceptions

carbon tax support.

R walk-through 9 Calculating p-values for differences in means

First, we use the arrange function to sort the data according to treatment.

rural <- rural %>%
  arrange(treatment, .by_group=TRUE)

head(rural$treatment)

## [1] 0 0 0 0 0 0

tail(rural$treatment)

## [1] 1 1 1 1 1 1

To calculate the p-value for unequal_treatment, we use pull to extract the data for that variable only, and create separate vectors to store the data for the control and treatment groups (p1_c and p1_t respectively). We then use these vectors as inputs in the t.test function. The t.test function stores the p-value as p.value.

# Create vectors for control and treatment group and run t test, unequal treatment perceptions
p1_c <- rural %>%
  filter(treatment == 0) %>%
  pull(unequal_treatment)

p1_t <- rural %>%
  filter(treatment == 1) %>%
  pull(unequal_treatment)

t1 <- t.test(x = p1_c, y = p1_t)
t1$p.value

## [1] 0.0233547

We repeat these steps for carbon_tax_unfairness and carbon_tax_support_dummy.

# t-test for unequal treatment perceptions
p2_c <- rural %>%
  filter(treatment == 0) %>%
  pull(carbon_tax_unfairness)

p2_t <- rural %>%
  filter(treatment == 1) %>%
  pull(carbon_tax_unfairness)

t2 <- t.test(x = p2_c, y = p2_t)
t2$p.value

## [1] 0.08197306

# t-test for carbon tax support
p3_c <- rural %>%
  filter(treatment == 0) %>%
  pull(carbon_tax_support_dummy)

p3_t <- rural %>%
  filter(treatment == 1) %>%
  pull(carbon_tax_support_dummy)

t3 <- t.test(x = p1_c, y = p1_t)
t3$p.value

## [1] 0.0233547

How do the p-values differ across the three variables? What can this tell us about the statistical significance of the treatment effects found in the experiment? (In other words, how likely is it that the observed differences between treatment and control groups are due to chance?) (Hint: see the discussion on interpreting p-values in Part 2.3 of Empirical Project 2.)

Extension: Calculate a 95% confidence interval for each of the variables in Question 7 and create a new chart showing the differences in means with their corresponding confidence intervals. (You can either show all three outcomes on the same chart, with carbon tax support expressed as a proportion instead of a percentage, or you can make three separate charts.) Provide an interpretation of these confidence intervals and compare them across the three variables.

R walk-through 10 Calculating confidence intervals and adding them to a chart

We will use unequal_treatment as an example; the steps for the other variables are exactly the same (just change the variable name in the code below).

The t.test function calculates confidence intervals automatically, along with other information (including the p-value, which we used in R walk-through 9). The confidence interval is stored as conf.int[1:2].
# Confidence interval for unequal_treatment
t1$conf.int[1:2]
## [1] -0.6670331 -0.0487137
standard error
A measure of the degree to which the sample mean deviates from the population mean. It is calculated by dividing the standard deviation of the sample by the square root of the number of observations.

For R to plot the confidence intervals, instead of the actual values, we need to store the interval values as the amount to add/subtract from the mean value. The easiest way is to calculate the standard error⁠ for the sample mean and multiply this by 1.96 (m_err = $1.96 \times \text{standard deviation}/\sqrt{\text{number of observations}}$), where 1.96 is the factor required to get a 95% confidence interval (assuming a normal distribution). The confidence interval is then [mean_m − m_err, mean_m + m_err].

We use group_by and summarise to create a summary table showing the number of observations, the mean, standard deviation, and standard error for the treatment and control group. We arrange the data according to the mean values of unequal_treatment (revsorts from highest to lowest), and save the final result in table_stats.
table_stats <- rural %>%
  group_by(treatment) %>%
  summarise(obs = length(unequal_treatment),
    mean_m = mean(unequal_treatment),
    sd_m = sd(unequal_treatment, na.rm = TRUE),
    m_err = 1.96 * sqrt(sd_m^2 / (obs - 1))) %>%
  arrange(rev(mean_m))

table_stats
## # A tibble: 2 × 5
##   treatment   obs mean_m  sd_m m_err
##       <dbl> <int>  <dbl> <dbl> <dbl>
## 1         1   290   8.05  1.82 0.210
## 2         0   323   7.69  2.08 0.227
Now we can use this information to make a bar chart.
ggplot(table_stats, aes(y = mean_m, x = treatment)) +
    geom_bar(position = position_dodge(),
    # Use black outlines and add thinner bar outlines
    stat = "identity", colour = "black", width = .5) +
  ylab("Mean of unequal treatment perceptions") + xlab("") +
  geom_errorbar(aes(ymin = mean_m - m_err,
    ymax = mean_m + m_err),
    # Thinner lines for confidence intervals
    size = .6, width = .2, position = position_dodge(.2))
Fullscreen

Figure 7 Mean of unequal treatment perceptions, with 95% confidence intervals.

Information provision survey experiments are an increasingly widely used research methodology in economics. The article, Designing information provision experiments by Haaland et al. (2021) in the Journal of Economic Literature reviews the existing literature and discusses how to best design this type of experiment. Use this article to answer the following questions:

What are some of the strengths and weaknesses of information provision survey experiments?

If you were going to re-run the experiment in Hope et al. (2026), what changes would you make to improve the experimental design?

Use a generative-AI tool to (i) find some strengths and weaknesses of survey experiments that are not mentioned in the Haaland et al. (2023) article and (ii) critique the design of the Hope et al. (2026) experiment. Use the answers provided by the AI tool to revise and enhance your answers to Questions 10(a) and 10(b).

Extra Empirical Project 2: The politics of carbon taxation Working in R

Download the code

Getting started in R

Part 1: Measuring and explaining public support for carbon taxation

R walk-through 1 Importing data into R

R walk-through 2 Making a frequency table

R walk-through 3 Creating dummy variables

R walk-through 4 Making a frequency and column chart on a subset of data

R walk-through 5 Calculating the average of a dummy variable

R walk-through 6 Creating summary tables for different groups

Part 2: Explaining rural backlashes against carbon taxation

Note

R walk-through 7 Creating summary tables on a subset of data

R walk-through 8 Creating column charts

R walk-through 9 Calculating p-values for differences in means

R walk-through 10 Calculating confidence intervals and adding them to a chart