# Import pandas for data wrangling and Altair for plotting
import pandas as pd
import altair as alt

The number of deadths per day from 2008 until 2018 can obtained from Statbel, the Belgium federal bureau of statistics:

df = pd.read_excel('https://statbel.fgov.be/sites/default/files/files/opendata/bevolking/TF_DEATHS.xlsx') # , skiprows=5, sheet_name=sheetnames
# Get a quick look to the data
df.head()
DT_DATE MS_NUM_DEATHS
0 2008-01-01 342
1 2008-01-02 348
2 2008-01-03 340
3 2008-01-04 349
4 2008-01-05 348
df['Jaar'] = df['DT_DATE'].dt.year
df['Dag'] = df['DT_DATE'].dt.dayofyear
df_plot = df.groupby('Dag')['MS_NUM_DEATHS'].mean().to_frame().reset_index()
# Let's make a quick plot
alt.Chart(df_plot).mark_line().encode(x='Dag', y='MS_NUM_DEATHS').properties(width=600)

The John Hopkings University CSSE keeps track of the number of covid-19 deadths per day and country in a github repository: https://github.com/CSSEGISandData/COVID-19. We can easily obtain this data by reading it from github and filter out the cases for Belgium.

deaths_url =  'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
deaths = pd.read_csv(deaths_url, sep=',')

Filter out Belgium

deaths_be = deaths[deaths['Country/Region'] == 'Belgium']

Inspect how the data is stored

deaths_be
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 ... 4/9/20 4/10/20 4/11/20 4/12/20 4/13/20 4/14/20 4/15/20 4/16/20 4/17/20 4/18/20
23 NaN Belgium 50.8333 4.0 0 0 0 0 0 0 ... 2523 3019 3346 3600 3903 4157 4440 4857 5163 5453

1 rows × 92 columns

Create dateframe for plotting

df_deaths = pd.DataFrame(data={'Datum':pd.to_datetime(deaths_be.columns[4:]), 'Overlijdens':deaths_be.iloc[0].values[4:]})

Check for Nan's

df_deaths['Overlijdens'].isna().sum()
0

We need to do some type convertions. We cast 'Overlijdens' to integer. Next, we add the number of the day.

df_deaths['Overlijdens'] = df_deaths['Overlijdens'].astype(int)
df_deaths['Dag'] = df_deaths['Datum'].dt.dayofyear

Plot the data:

dead_2008_2018 = alt.Chart(df_plot).mark_line().encode(x='Dag', y='MS_NUM_DEATHS')
dead_2008_2018

Calculate the day-by-day change

df_deaths['Nieuwe covid-19 Sterfgevallen'] = df_deaths['Overlijdens'].diff()
# Check types
df_deaths.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88 entries, 0 to 87
Data columns (total 4 columns):
 #   Column                         Non-Null Count  Dtype         
---  ------                         --------------  -----         
 0   Datum                          88 non-null     datetime64[ns]
 1   Overlijdens                    88 non-null     int32         
 2   Dag                            88 non-null     int64         
 3   Nieuwe covid-19 Sterfgevallen  87 non-null     float64       
dtypes: datetime64[ns](1), float64(1), int32(1), int64(1)
memory usage: 2.5 KB

Plot covid-19 deaths in Belgium according to JHU CSSE. The plot shows a tooltip if you hover over the points.

dead_covid= alt.Chart(df_deaths).mark_line(point=True).encode(
    x=alt.X('Dag',scale=alt.Scale(domain=(1, 110), clamp=True)),
    y='Nieuwe covid-19 Sterfgevallen', 
    color=alt.ColorValue('red'), 
    tooltip=['Dag', 'Nieuwe covid-19 Sterfgevallen'])
dead_covid

Now we add average deaths per day in the last 10 year to the plot.

dead_2008_2018 + dead_covid

Take quick look to the datatable:

df.head()
DT_DATE MS_NUM_DEATHS Jaar Dag
0 2008-01-01 342 2008 1
1 2008-01-02 348 2008 2
2 2008-01-03 340 2008 3
3 2008-01-04 349 2008 4
4 2008-01-05 348 2008 5

The column 'DT_DATE' is a string. We convert it to a datatime so we can add it to the tooltip.

df['Datum'] = pd.to_datetime(df['DT_DATE'])

Now we are prepared to make the final graph. We use the Altair mark_errorband(extend='ci') to bootstrap 95% confidence band around the average number of deaths per day.

line = alt.Chart(df).mark_line().encode(
    x=alt.X('Dag', scale=alt.Scale(
            domain=(1, 120),
            clamp=True
        )),
    y='mean(MS_NUM_DEATHS)'
)

# Bootstrapped 95% confidence interval
band = alt.Chart(df).mark_errorband(extent='ci').encode(
    x=alt.X('Dag', scale=alt.Scale(domain=(1, 120), clamp=True)),
    y=alt.Y('MS_NUM_DEATHS', title='Overlijdens per dag'),
)

dead_covid= alt.Chart(df_deaths).mark_line(point=True).encode(
    x=alt.X('Dag',scale=alt.Scale(domain=(1, 120), clamp=True)),
    y='Nieuwe covid-19 Sterfgevallen',
    color=alt.ColorValue('red'),
    tooltip=['Dag', 'Nieuwe covid-19 Sterfgevallen', 'Datum']
)

(band + line + dead_covid).properties(width=1024, title='Gemiddeld aantal overlijdens over 10 jaar versus overlijdens door covid-19 in Belgie')

Source date from sciensano

In this section, we compare the graph obtained with data obtained from sciensano.

df_sc = pd.read_csv('https://epistat.sciensano.be/Data/COVID19BE_MORT.csv')
df_sc.head()
DATE REGION AGEGROUP SEX DEATHS
0 2020-03-10 Brussels 85+ F 1
1 2020-03-11 Flanders 85+ F 1
2 2020-03-11 Brussels 75-84 M 1
3 2020-03-11 Brussels 85+ F 1
4 2020-03-12 Brussels 75-84 M 1
df_dead_day = df_sc.groupby('DATE')['DEATHS'].sum().reset_index()
df_dead_day['Datum'] = pd.to_datetime(df_dead_day['DATE'])
df_dead_day['Dag'] = df_dead_day['Datum'].dt.dayofyear
line = alt.Chart(df).mark_line().encode(
    x=alt.X('Dag', title='Dag van het jaar', scale=alt.Scale(
            domain=(1, 120),
            clamp=True
        )),
    y='mean(MS_NUM_DEATHS)'
)

# Bootstrapped 95% confidence interval
band = alt.Chart(df).mark_errorband(extent='ci').encode(
    x=alt.X('Dag', scale=alt.Scale(domain=(1, 120), clamp=True)),
    y=alt.Y('MS_NUM_DEATHS', title='Overlijdens per dag'),
)

dead_covid= alt.Chart(df_dead_day).mark_line(point=True).encode(
    x=alt.X('Dag',scale=alt.Scale(domain=(1, 120), clamp=True)),
    y='DEATHS',
    color=alt.ColorValue('red'),
    tooltip=['Dag', 'DEATHS', 'Datum']
)

(band + line + dead_covid).properties(width=750, title='Gemiddeld aantal overlijdens over 10 jaar versus overlijdens door covid-19 in Belgie')

Obviously, data form 16-17-18 April 2020 is not final yet. Also, the amounts are smaller then those from JHU.

Obtain more detail (for another blogpost...)

df_tot_sc = pd.read_excel('https://epistat.sciensano.be/Data/COVID19BE.xlsx')
df_tot_sc
DATE PROVINCE REGION AGEGROUP SEX CASES
0 2020-03-01 Brussels Brussels 10-19 M 1
1 2020-03-01 Brussels Brussels 10-19 F 1
2 2020-03-01 Brussels Brussels 20-29 M 1
3 2020-03-01 Brussels Brussels 30-39 F 1
4 2020-03-01 Brussels Brussels 40-49 F 1
... ... ... ... ... ... ...
6875 NaN OostVlaanderen Flanders NaN F 4
6876 NaN VlaamsBrabant Flanders 40-49 M 3
6877 NaN VlaamsBrabant Flanders 40-49 F 2
6878 NaN VlaamsBrabant Flanders 50-59 M 1
6879 NaN WestVlaanderen Flanders 50-59 M 3

6880 rows × 6 columns

We know that there are a lot of reional differences:

df_plot = df_tot_sc.groupby(['DATE', 'PROVINCE'])['CASES'].sum().reset_index()
df_plot
DATE PROVINCE CASES
0 2020-03-01 Brussels 6
1 2020-03-01 Limburg 1
2 2020-03-01 Liège 2
3 2020-03-01 OostVlaanderen 1
4 2020-03-01 VlaamsBrabant 6
... ... ... ...
505 2020-04-17 OostVlaanderen 44
506 2020-04-17 VlaamsBrabant 42
507 2020-04-17 WestVlaanderen 30
508 2020-04-18 Brussels 1
509 2020-04-18 Hainaut 1

510 rows × 3 columns

df_plot['DATE'] = pd.to_datetime(df_plot['DATE'])
base = alt.Chart(df_plot, title='Number of cases in Belgium per day and province').mark_line(point=True).encode(
    x=alt.X('DATE:T', title='Datum'),
    y=alt.Y('CASES', title='Cases per day'),
    color='PROVINCE',
    tooltip=['DATE', 'CASES', 'PROVINCE']
).properties(width=600)
base

From the above graph we see a much lower number of cases in Luxembourg, Namur, Waals Brabant.

!pwd
'pwd' is not recognized as an internal or external command,
operable program or batch file.
!dir
 Volume in drive C is Windows
 Volume Serial Number is 7808-E933

 Directory of C:\Users\lnh6dt5\AppData\Local\Temp\Mxt121\tmp\home_lnh6dt5\blog\_notebooks

19/04/2020  14:14    <DIR>          .
19/04/2020  14:14    <DIR>          ..
19/04/2020  10:37    <DIR>          .ipynb_checkpoints
19/04/2020  10:17            23.473 2020-01-28-Altair.ipynb
19/04/2020  10:34             9.228 2020-01-29-bullet-chart-altair.ipynb
19/04/2020  10:26            41.041 2020-02-15-breakins.ipynb
19/04/2020  09:43            30.573 2020-02-20-test.ipynb
19/04/2020  09:49             1.047 2020-04-18-first-test.ipynb
19/04/2020  14:14         1.237.674 2020-04-19-deads-last-ten-year-vs-covid.ipynb
19/04/2020  09:43    <DIR>          my_icons
19/04/2020  09:43               771 README.md
               7 File(s)      1.343.807 bytes
               4 Dir(s)  89.905.336.320 bytes free