Daily covid-19 Deaths compared to average deaths the last 10 years
In this blogpost we try to get an idea of how many extra deaths we have in Belgium due to covid-19 compared to the average we had the last 10 years.
# Import pandas for data wrangling and Altair for plotting
import pandas as pd
import altair as alt
The number of deadths per day from 2008 until 2018 can obtained from Statbel, the Belgium federal bureau of statistics:
df = pd.read_excel('https://statbel.fgov.be/sites/default/files/files/opendata/bevolking/TF_DEATHS.xlsx') # , skiprows=5, sheet_name=sheetnames
# Get a quick look to the data
df.head()
df['Jaar'] = df['DT_DATE'].dt.year
df['Dag'] = df['DT_DATE'].dt.dayofyear
df_plot = df.groupby('Dag')['MS_NUM_DEATHS'].mean().to_frame().reset_index()
# Let's make a quick plot
alt.Chart(df_plot).mark_line().encode(x='Dag', y='MS_NUM_DEATHS').properties(width=600)
The John Hopkings University CSSE keeps track of the number of covid-19 deadths per day and country in a github repository: https://github.com/CSSEGISandData/COVID-19. We can easily obtain this data by reading it from github and filter out the cases for Belgium.
deaths_url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
deaths = pd.read_csv(deaths_url, sep=',')
Filter out Belgium
deaths_be = deaths[deaths['Country/Region'] == 'Belgium']
Inspect how the data is stored
deaths_be
Create dateframe for plotting
df_deaths = pd.DataFrame(data={'Datum':pd.to_datetime(deaths_be.columns[4:]), 'Overlijdens':deaths_be.iloc[0].values[4:]})
Check for Nan's
df_deaths['Overlijdens'].isna().sum()
We need to do some type convertions. We cast 'Overlijdens' to integer. Next, we add the number of the day.
df_deaths['Overlijdens'] = df_deaths['Overlijdens'].astype(int)
df_deaths['Dag'] = df_deaths['Datum'].dt.dayofyear
Plot the data:
dead_2008_2018 = alt.Chart(df_plot).mark_line().encode(x='Dag', y='MS_NUM_DEATHS')
dead_2008_2018
Calculate the day-by-day change
df_deaths['Nieuwe covid-19 Sterfgevallen'] = df_deaths['Overlijdens'].diff()
# Check types
df_deaths.info()
Plot covid-19 deaths in Belgium according to JHU CSSE. The plot shows a tooltip if you hover over the points.
dead_covid= alt.Chart(df_deaths).mark_line(point=True).encode(
x=alt.X('Dag',scale=alt.Scale(domain=(1, 110), clamp=True)),
y='Nieuwe covid-19 Sterfgevallen',
color=alt.ColorValue('red'),
tooltip=['Dag', 'Nieuwe covid-19 Sterfgevallen'])
dead_covid
Now we add average deaths per day in the last 10 year to the plot.
dead_2008_2018 + dead_covid
Take quick look to the datatable:
df.head()
The column 'DT_DATE' is a string. We convert it to a datatime so we can add it to the tooltip.
df['Datum'] = pd.to_datetime(df['DT_DATE'])
Now we are prepared to make the final graph. We use the Altair mark_errorband(extend='ci') to bootstrap 95% confidence band around the average number of deaths per day.
line = alt.Chart(df).mark_line().encode(
x=alt.X('Dag', scale=alt.Scale(
domain=(1, 120),
clamp=True
)),
y='mean(MS_NUM_DEATHS)'
)
# Bootstrapped 95% confidence interval
band = alt.Chart(df).mark_errorband(extent='ci').encode(
x=alt.X('Dag', scale=alt.Scale(domain=(1, 120), clamp=True)),
y=alt.Y('MS_NUM_DEATHS', title='Overlijdens per dag'),
)
dead_covid= alt.Chart(df_deaths).mark_line(point=True).encode(
x=alt.X('Dag',scale=alt.Scale(domain=(1, 120), clamp=True)),
y='Nieuwe covid-19 Sterfgevallen',
color=alt.ColorValue('red'),
tooltip=['Dag', 'Nieuwe covid-19 Sterfgevallen', 'Datum']
)
(band + line + dead_covid).properties(width=1024, title='Gemiddeld aantal overlijdens over 10 jaar versus overlijdens door covid-19 in Belgie')
df_sc = pd.read_csv('https://epistat.sciensano.be/Data/COVID19BE_MORT.csv')
df_sc.head()
df_dead_day = df_sc.groupby('DATE')['DEATHS'].sum().reset_index()
df_dead_day['Datum'] = pd.to_datetime(df_dead_day['DATE'])
df_dead_day['Dag'] = df_dead_day['Datum'].dt.dayofyear
line = alt.Chart(df).mark_line().encode(
x=alt.X('Dag', title='Dag van het jaar', scale=alt.Scale(
domain=(1, 120),
clamp=True
)),
y='mean(MS_NUM_DEATHS)'
)
# Bootstrapped 95% confidence interval
band = alt.Chart(df).mark_errorband(extent='ci').encode(
x=alt.X('Dag', scale=alt.Scale(domain=(1, 120), clamp=True)),
y=alt.Y('MS_NUM_DEATHS', title='Overlijdens per dag'),
)
dead_covid= alt.Chart(df_dead_day).mark_line(point=True).encode(
x=alt.X('Dag',scale=alt.Scale(domain=(1, 120), clamp=True)),
y='DEATHS',
color=alt.ColorValue('red'),
tooltip=['Dag', 'DEATHS', 'Datum']
)
(band + line + dead_covid).properties(width=750, title='Gemiddeld aantal overlijdens over 10 jaar versus overlijdens door covid-19 in Belgie')
Obviously, data form 16-17-18 April 2020 is not final yet. Also, the amounts are smaller then those from JHU.
df_tot_sc = pd.read_excel('https://epistat.sciensano.be/Data/COVID19BE.xlsx')
df_tot_sc
We know that there are a lot of reional differences:
df_plot = df_tot_sc.groupby(['DATE', 'PROVINCE'])['CASES'].sum().reset_index()
df_plot
df_plot['DATE'] = pd.to_datetime(df_plot['DATE'])
base = alt.Chart(df_plot, title='Number of cases in Belgium per day and province').mark_line(point=True).encode(
x=alt.X('DATE:T', title='Datum'),
y=alt.Y('CASES', title='Cases per day'),
color='PROVINCE',
tooltip=['DATE', 'CASES', 'PROVINCE']
).properties(width=600)
base
From the above graph we see a much lower number of cases in Luxembourg, Namur, Waals Brabant.
!pwd
!dir