This project dives into data about OPEC (Organization of the Petroleum Exporting Countries) crude oil production and OECD (Organization for Economic Co-operation and Development) countries from 1960 to 2022. Including providing political context of OPEC and OECD, exploring their respective roles and significance in the global oil trade. The data source for this project is from the official OPEC website, which includes data such as crude oil production, demand, spot prices, refinery throughput, refinery capacity, and OPEC production quotas by country.
Founded in 1960, OPEC comprises a group of petroleum-exporting nations. OPEC was created with the primary aim of asserting collective control over their oil resources and global oil trade. OPEC's founding members included Iran, Iraq, Kuwait, Saudi Arabia, and Venezuela. Over the years, the organization has expanded to include several other member nations, totaling 13.
It is important to focus on OPEC member nations due to their global importance as petroleum exporters. Alongside OECD countries, who are reliant on oil imports to meet their energy needs, and have historically been affected by changes in OPEC production quotas and embargoes. This by fixing production and suppliers, OPECs behaves like as cartel-style supplier.
# Installing library requirements.
!pip install pandas numpy scipy matplotlib sklearn --quiet
DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621 DEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621 WARNING: There was an error checking the latest version of pip.
# Import library requirements.
import pandas as pd
import warnings
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import zscore
import re
from pandas.plotting import scatter_matrix
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.impute import SimpleImputer
from sklearn.metrics import precision_score, recall_score
warnings.simplefilter(action='ignore', category=FutureWarning)
While OPEC consists of major oil-producing nations such as Saudi Arabia, Iran, Iraq, and Venezuela, among others. OECD comprises economically advanced nations, primarily in Europe and North America, which heavily rely on oil imports to fulfill their energy requirements. Currently there are 38 member countries in OECD including the United States and United Kingdom.
The primary difference between OPEC and free market nation in OECD is how OPEC acts as a cartel-style supplier. In economics, a cartel supplier is defined "a formal agreement between a group of producers of a good or service to control supply or to regulate or manipulate prices" (investopedia.com).
# List of OPEC countries in 2023.
opec_countries = pd.read_csv('data/countries/opec.csv', header=None, squeeze=True)
opec_countries.head(13)
0 Algeria 1 Angola 2 Equatorial Guinea 3 Gabon 4 I.R.Iran 5 Iraq 6 Kuwait 7 Libya 8 Nigeria 9 Congo 10 Saudi Arabia 11 United Arab Emirates 12 Venezuela Name: 0, dtype: object
# List of OECD countries in 2023.
oecd_countries = pd.read_csv('data/countries/oecd.csv', header=None, squeeze=True)
oecd_countries.head()
0 Australia 1 Austria 2 Belgium 3 Canada 4 Chile Name: 0, dtype: object
This imports crude oil production and demand for countries world wide (including all OPEC members) from 1960 to 2020. Crude Oil refers to the fossil fuel that exits in Earths geological formations. They will be stored in the project as Pandas DataFrames, along with a third DataFrame that describes countries domestic crude oil demand deficits per year.
$\text{Deficit}_{\text{year}} = \text{Production}_{\text{year}} - \text{Consumption}_{\text{year}}$
# Importing oil production dataset.
oil_production_df = pd.read_csv("data/opec_upstream/world_oil_production.csv", index_col="Index")
oil_production_df.replace('na', np.nan, inplace=True)
oil_production_df = oil_production_df.astype(float)
# Replacing incorrectly labeled countries.
replace = {'IR Iran': 'I.R.Iran', 'Saudi Arabia1': 'Saudi Arabia', 'Kuwait1': 'Kuwait', 'Syrian Arab Rep.': 'Syria', 'Uzbekistan`': 'Uzbekistan'}
oil_production_df = oil_production_df.rename(index=replace)
# Importing oil demand dataset.
oil_demand_df = pd.read_csv("data/opec_downstream/world_oil_demand.csv", index_col="Index")
oil_demand_df.replace('na', np.nan, inplace=True)
oil_demand_df = oil_demand_df.astype(float)
# Merging oil production and oil demand datasets; One-to-one matching needed to compute oil deficit.
oil_deficit_df = oil_demand_df.merge(oil_production_df, left_index=True, right_index=True, how='inner')
# Iterating over years of oil production and oil demand to compute oil deficit.
for year in range(1960, 2023):
year_x, year_y = f"{year}_x", f"{year}_y"
oil_deficit_df[year] = oil_deficit_df[year_x] - oil_deficit_df[year_y]
oil_deficit_df = oil_deficit_df.drop(columns=[year_x, year_y])
# Transposing DataFrames to set years as index.
oil_production_df = oil_production_df.transpose()
oil_demand_df = oil_demand_df.transpose()
oil_deficit_df = oil_deficit_df.transpose()
# Converting year index to datetime object.
oil_production_df.index = pd.to_datetime(oil_production_df.index, format='%Y')
oil_demand_df.index = pd.to_datetime(oil_demand_df.index, format='%Y')
oil_deficit_df.index = pd.to_datetime(oil_deficit_df.index, format='%Y')
# Display DataFrame
oil_deficit_df.head()
Index | Canada | Chile | Mexico | United States | United Kingdom | Australia | China | India | Indonesia | Malaysia | ... | Egypt | Equatorial Guinea | Gabon | Libya | Nigeria | Russia | Azerbaijan | Kazakhstan | OPEC | OECD |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1960-01-01 | 290.164 | 24.395 | -181.090 | 3189.7 | 918.239 | 269.000 | 105.50 | 143.806 | -309.6 | 9.0 | ... | 37.684 | 0.0 | -12.4 | 4.0 | -1.4 | -1010.1 | NaN | -4.1 | -7773.752195 | 7966.45244 |
1961-01-01 | 213.230 | 23.284 | -181.544 | 3213.1 | 969.721 | 287.000 | 100.85 | 153.615 | -313.3 | 16.0 | ... | 28.730 | 0.0 | -11.9 | -14.2 | -27.0 | -1214.3 | NaN | -5.8 | -8407.232041 | 8562.18744 |
1962-01-01 | 194.771 | 20.082 | -181.161 | 3522.0 | 1063.617 | 297.000 | 96.55 | 138.397 | -328.4 | 22.0 | ... | 13.908 | 0.0 | -13.4 | -176.3 | -50.5 | -1398.8 | NaN | -7.4 | -9527.366548 | 9691.92063 |
1963-01-01 | 213.598 | 19.499 | -154.292 | 3947.3 | 1155.410 | 336.000 | 120.15 | 143.370 | -315.0 | 29.0 | ... | -1.710 | 0.0 | -13.7 | -435.8 | -54.5 | -1488.7 | NaN | -7.2 | -10522.296582 | 11105.58335 |
1964-01-01 | 234.151 | 21.190 | -120.369 | 4106.7 | 1276.202 | 349.163 | 138.35 | 131.591 | -318.6 | 36.0 | ... | -14.546 | 0.0 | -17.0 | -855.4 | -96.2 | -1671.7 | NaN | 1.7 | -11910.261359 | 12455.20195 |
5 rows × 37 columns
Because our two datasets do not have a perfect one-to-one matching, deficits cannot be computed for every country. This mean exploratory data analysis will be for all OPEC member countries and a limited number of OECD member countries. In addition here are the missing years for each country.
print('Countries in oil production DataFrame:', len(oil_production_df.columns))
print('Countries in oil demand DataFrame:', len(oil_demand_df.columns))
print('Countries in oil deficit DataFrame:', len(oil_deficit_df.columns))
oil_deficit_df.isna().sum()
Countries in oil production DataFrame: 48 Countries in oil demand DataFrame: 61 Countries in oil deficit DataFrame: 37
Index Canada 0 Chile 0 Mexico 0 United States 0 United Kingdom 0 Australia 0 China 0 India 0 Indonesia 0 Malaysia 0 Thailand 0 Vietnam 0 Argentina 0 Brazil 0 Colombia 0 Ecuador 0 Venezuela 0 I.R.Iran 0 Iraq 0 Kuwait 0 Qatar 0 Saudi Arabia 0 Syria 0 United Arab Emirates 0 Algeria 0 Angola 0 Congo 0 Egypt 0 Equatorial Guinea 0 Gabon 0 Libya 0 Nigeria 0 Russia 0 Azerbaijan 30 Kazakhstan 0 OPEC 0 OECD 0 dtype: int64
# Function graphs oil production, oil demand and oil deficit for a given list of countries.
def line_plot_production(cols, title=None):
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(24, 8))
oil_production_df[cols].plot(kind='line', legend=False, ax=ax1)
ax1.set_xlabel('Year')
ax1.set_ylabel('Crude Oil (1000 barrels/day)')
ax1.set_title('Domestic Production')
oil_demand_df[cols].plot(kind='line', legend=False, ax=ax2)
ax2.set_xlabel('Year')
ax2.set_ylabel('Crude Oil (1000 barrels/day)')
ax2.set_title('Domestic Demand')
oil_deficit_df[cols].plot(kind='line', legend=False, ax=ax3)
ax3.set_xlabel('Year')
ax3.set_ylabel('Crude Oil (1000 barrels/day)')
ax3.set_title('Domestic Deficit')
ax3.axhline(y=0, color='black', linestyle='--')
lines, labels = ax1.get_legend_handles_labels()
fig.legend(lines, labels, loc='center', bbox_to_anchor=(0.5, -0.1), ncol=3)
fig.suptitle(title, fontsize=20, y=1.02)
plt.tight_layout()
# Function plots the distributions of oil production, oil demand and oil deficit for a given list of countries.
def box_plot_production(cols, title=None):
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(24, 8))
# Generating x-axis labels.
x_labels = list(oil_production_df.index.year)
x_labels = [label if i % 5 == 0 else '' for i, label in enumerate(x_labels)]
oil_production_df[cols].transpose().boxplot(ax=ax1, showfliers=False)
ax1.set_title('Domestic Production')
ax1.set_ylabel('Crude Oil (1000 barrels/day)')
ax1.set_xlabel('Year')
ax1.grid(False, axis='x')
ax1.set_xticklabels(x_labels)
ax1.set_ylim(0, 5000)
oil_demand_df[cols].transpose().boxplot(ax=ax2, showfliers=False)
ax2.set_title('Domestic Demand')
ax2.set_ylabel('Crude Oil (1000 barrels/day)')
ax2.set_xlabel('Year')
ax2.grid(False, axis='x')
ax2.set_xticklabels(x_labels)
ax2.set_ylim(0, 5000)
oil_deficit_df[cols].transpose().boxplot(ax=ax3, showfliers=False)
ax3.set_title('Domestic Deficit')
ax3.set_ylabel('Crude Oil (1000 barrels/day)')
ax3.set_xlabel('Year')
ax3.axhline(y=0, color='red', linestyle='--')
ax3.grid(False, axis='x')
ax3.set_xticklabels(x_labels)
fig.suptitle(title, fontsize=20, y=1.02)
plt.tight_layout()
top_oil_producing_countries = ["United States", "Saudi Arabia", "Russia", "Canada", "China", "OPEC", "OECD"]
line_plot_production(top_oil_producing_countries, title="Top Crude Oil Producing Countries Including OPEC and OECD Metrics")
box_plot_production(opec_countries, title="OPEC Member Countries Crude Oil Metrics")
# Collecting OECD countries with accessible data.
oecd_sample_countries = oecd_countries[oecd_countries.isin(oil_deficit_df.columns)]
box_plot_production(oecd_sample_countries, "Sample OECD Members Countries Crude Oil Metrics")
The visualization show how OPEC countries have a negative crude oil deficit, while the median sample of OECD have a positive crude oil deficit. It is also show how large economies United States (OECD member) Canada (OECD member), Saudi Arabia (OPEC member) and Russia (OEPC+ member) can influence the distributions.
This imports crude oil refinery utilization and capacities for countries world wide (including all OPEC members) from 1980 to 2022. A Oil Refinery processes crude oil into various refined products, including gasoline, diesel, and petrochemicals. They will be stored in the project as Pandas DataFrames, along with a third DataFrame that describes countries utilization rates of oil refineries per year.
$\text{Utilization}_{\text{year}} = \frac{\text{Refinery Throughput}_{\text{year}}}{\text{Refinery Capacity}_{\text{year}}}$
# Importing oil production dataset.
refinery_capacity_df = pd.read_csv("data/opec_downstream/world_refinery_capacity.csv", index_col="Index")
refinery_capacity_df.replace('na', np.nan, inplace=True)
refinery_capacity_df.replace('n.a.', np.nan, inplace=True)
refinery_capacity_df = refinery_capacity_df.astype(float)
# Replacing incorrectly labeled countries.
replace = {'United States2': 'United States',
'South Korea2': 'South Korea',
'Venezuela3': 'Venezuela',
'I.R.Iran2': 'I.R.Iran',
'Qatar2': 'Qatar',
'Saudi Arabia2': 'Saudi Arabia',
'United Arab Emirates2': 'United Arab Emirates',
'Algeria2': 'Algeria'
}
refinery_capacity_df = refinery_capacity_df.rename(index=replace)
# Importing oil demand dataset.
refinery_throughput_df = pd.read_csv("data/opec_downstream/world_refinery_throughput.csv", index_col="Index")
refinery_throughput_df.replace('na', np.nan, inplace=True)
refinery_throughput_df.replace('n.a.', np.nan, inplace=True)
refinery_throughput_df = refinery_throughput_df.astype(float)
# Merging oil production and oil demand datasets; One-to-one matching needed to compute oil deficit.
refinery_utilization_df = refinery_capacity_df.merge(refinery_throughput_df, left_index=True, right_index=True, how='inner')
# Iterating over years of oil production and oil demand to compute oil deficit.
for year in range(1980, 2023):
year_x, year_y = f"{year}_x", f"{year}_y"
refinery_utilization_df[year] = refinery_utilization_df[year_y] / refinery_utilization_df[year_x]
refinery_utilization_df = refinery_utilization_df.drop(columns=[year_x, year_y])
# Fix rates for countries where throughput do not align perfectly.
refinery_utilization_df = refinery_utilization_df.clip(upper=100)
# Transposing DataFrames to set years as index.
refinery_capacity_df = refinery_capacity_df.transpose()
refinery_throughput_df = refinery_throughput_df.transpose()
refinery_utilization_df = refinery_utilization_df.transpose()
# Converting year index to datetime object.
refinery_capacity_df.index = pd.to_datetime(refinery_capacity_df.index, format='%Y')
refinery_throughput_df.index = pd.to_datetime(refinery_throughput_df.index, format='%Y')
refinery_utilization_df.index = pd.to_datetime(refinery_utilization_df.index, format='%Y')
# Display.
refinery_utilization_df.head()
Index | Algeria | Angola | Argentina | Australia | Azerbaijan | Belarus | Belgium | Brazil | Bulgaria | Canada | ... | Spain | Thailand | Turkey | Turkmenistan | Ukraine | United Arab Emirates | United Kingdom | United States | Venezuela | Vietnam |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1980-01-01 | 0.479626 | 0.803859 | 0.703216 | 0.786999 | 0.663333 | 0.975641 | 0.633019 | 0.778894 | 0.851986 | 0.912803 | ... | 0.705460 | 0.864407 | 0.715847 | 0.669231 | 0.871139 | 0.800000 | 0.621653 | 0.781096 | 0.673269 | NaN |
1981-01-01 | 0.651528 | 0.778816 | 0.706140 | 0.736552 | 0.555556 | 0.982051 | 0.554717 | 0.752511 | 0.873646 | 0.848464 | ... | 0.659634 | 0.892655 | 0.748634 | 0.669231 | 0.842012 | 0.414815 | 0.574662 | 0.747401 | 0.636820 | NaN |
1982-01-01 | 0.874363 | 0.654206 | 0.681287 | 0.753103 | 0.555556 | 0.983333 | 0.688705 | 0.729445 | 0.891697 | 0.747366 | ... | 0.628591 | 0.887006 | 0.721030 | 0.597544 | 0.848191 | 0.614815 | 0.597545 | 0.748339 | 0.691286 | NaN |
1983-01-01 | 0.802207 | 0.809969 | 0.661743 | 0.696552 | 0.556944 | 0.985897 | 0.639118 | 0.705226 | 0.909747 | 0.796965 | ... | 0.638374 | 0.920904 | 0.701717 | 0.597544 | 0.815159 | 0.611111 | 0.669634 | 0.730731 | 0.691286 | NaN |
1984-01-01 | 0.872241 | 0.872274 | 0.639586 | 0.701245 | 0.555556 | 0.984615 | 0.764415 | 0.749129 | 0.927798 | 0.788094 | ... | 0.634840 | 0.892655 | 0.800429 | 0.597544 | 0.782676 | 0.788889 | 0.721209 | 0.776245 | 0.667063 | NaN |
5 rows × 56 columns
Because our second datasets is larger then out first dataset, we do not have perfect one-to-one matching for our third dataset. In addition here are the missing years for each country.
print('Countries in oil refinery throughput DataFrame:', len(refinery_throughput_df.columns))
print('Countries in oil refinery capacity:', len(refinery_capacity_df.columns))
print('Countries in oil refinery utilization:', len(refinery_utilization_df.columns))
refinery_utilization_df.isna().sum()
Countries in oil refinery throughput DataFrame: 76 Countries in oil refinery capacity: 56 Countries in oil refinery utilization: 56
Index Algeria 0 Angola 0 Argentina 0 Australia 0 Azerbaijan 0 Belarus 0 Belgium 0 Brazil 0 Bulgaria 0 Canada 0 Chile 0 China 0 Colombia 0 Congo 2 Croatia 0 Ecuador 0 Egypt 0 France 0 Gabon 0 Germany 0 I.R.Iran 0 India 0 Indonesia 0 Iraq 0 Italy 0 Japan 0 Kazakhstan 0 Kuwait 0 Libya 0 Malaysia 0 Mexico 0 Netherlands 0 New Zealand 0 Nigeria 0 OECD 0 OPEC 0 Pakistan 0 Philippines 0 Poland 0 Qatar 0 Romania 0 Russia 0 Saudi Arabia 0 Singapore 0 South Africa 0 South Korea 0 Spain 0 Thailand 0 Turkey 0 Turkmenistan 0 Ukraine 0 United Arab Emirates 0 United Kingdom 0 United States 0 Venezuela 0 Vietnam 14 dtype: int64
# Function plots refinery data.
def box_plot_refinery(cols, title=None):
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(24, 8))
# Generating x-axis labels.
x_labels = list(refinery_utilization_df.index.year)
x_labels = [label if i % 5 == 0 else '' for i, label in enumerate(x_labels)]
refinery_throughput_df[cols].transpose().boxplot(ax=ax1, showfliers=False)
ax1.set_title('Total Throughput')
ax1.set_ylabel('Crude Oil (1000 barrels/day)')
ax1.set_xlabel('Year')
ax1.grid(False, axis='x')
ax1.set_xticklabels(x_labels)
ax1.set_ylim(0, 4000)
refinery_capacity_df[cols].transpose().boxplot(ax=ax2, showfliers=False)
ax2.set_title('Total Capacity')
ax2.set_ylabel('Crude Oil (1000 barrels/day)')
ax2.set_xlabel('Year')
ax2.grid(False, axis='x')
ax2.set_xticklabels(x_labels)
ax2.set_ylim(0, 4000)
refinery_utilization_df[cols].transpose().boxplot(ax=ax3, showfliers=False)
ax3.set_title('Utilization')
ax3.set_ylabel('Rate (%)')
ax3.set_xlabel('Year')
ax3.grid(False, axis='x')
ax3.set_xticklabels(x_labels)
ax3.set_ylim(0, 1)
fig.suptitle(title, fontsize=20, y=1.02)
plt.tight_layout()
opec_sample_countries = opec_countries[opec_countries.isin(refinery_utilization_df.columns)]
box_plot_refinery(opec_sample_countries, title="Sample OPEC Countries Oil Refinery Metrics")
oecd_sample_countries = oecd_countries[oecd_countries.isin(refinery_utilization_df.columns)]
box_plot_refinery(oecd_sample_countries, title="Sample OECD Countries Oil Refinery Metrics")