Daily covid-19 Deaths compared to average deaths the last 10 years¶
"In this blogpost we try to get an idea of how many extra deaths we have in Belgium due to covid-19 compared to the average we had the last 10 years."
The number of deadths per day from 2008 until 2018 can obtained from Statbel, the Belgium federal bureau of statistics:
df = pd.read_excel('https://statbel.fgov.be/sites/default/files/files/opendata/bevolking/TF_DEATHS.xlsx') # , skiprows=5, sheet_name=sheetnames
DT_DATE | MS_NUM_DEATHS | |
---|---|---|
0 | 2008-01-01 | 342 |
1 | 2008-01-02 | 348 |
2 | 2008-01-03 | 340 |
3 | 2008-01-04 | 349 |
4 | 2008-01-05 | 348 |
# Let's make a quick plot
alt.Chart(df_plot).mark_line().encode(x='Dag', y='MS_NUM_DEATHS').properties(width=600)
The John Hopkings University CSSE keeps track of the number of covid-19 deadths per day and country in a github repository: https://github.com/CSSEGISandData/COVID-19. We can easily obtain this data by reading it from github and filter out the cases for Belgium.
deaths_url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
deaths = pd.read_csv(deaths_url, sep=',')
Filter out Belgium
Inspect how the data is stored
Province/State | Country/Region | Lat | Long | 1/22/20 | 1/23/20 | 1/24/20 | 1/25/20 | 1/26/20 | 1/27/20 | ... | 4/9/20 | 4/10/20 | 4/11/20 | 4/12/20 | 4/13/20 | 4/14/20 | 4/15/20 | 4/16/20 | 4/17/20 | 4/18/20 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
23 | NaN | Belgium | 50.8333 | 4.0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 2523 | 3019 | 3346 | 3600 | 3903 | 4157 | 4440 | 4857 | 5163 | 5453 |
1 rows × 92 columns
Create dateframe for plotting
df_deaths = pd.DataFrame(data={'Datum':pd.to_datetime(deaths_be.columns[4:]), 'Overlijdens':deaths_be.iloc[0].values[4:]})
Check for Nan's
0
We need to do some type convertions. We cast 'Overlijdens' to integer. Next, we add the number of the day.
df_deaths['Overlijdens'] = df_deaths['Overlijdens'].astype(int)
df_deaths['Dag'] = df_deaths['Datum'].dt.dayofyear
Plot the data:
Calculate the day-by-day change
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88 entries, 0 to 87
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Datum 88 non-null datetime64[ns]
1 Overlijdens 88 non-null int32
2 Dag 88 non-null int64
3 Nieuwe covid-19 Sterfgevallen 87 non-null float64
dtypes: datetime64[ns](1), float64(1), int32(1), int64(1)
memory usage: 2.5 KB
Plot covid-19 deaths in Belgium according to JHU CSSE. The plot shows a tooltip if you hover over the points.
dead_covid= alt.Chart(df_deaths).mark_line(point=True).encode(
x=alt.X('Dag',scale=alt.Scale(domain=(1, 110), clamp=True)),
y='Nieuwe covid-19 Sterfgevallen',
color=alt.ColorValue('red'),
tooltip=['Dag', 'Nieuwe covid-19 Sterfgevallen'])
dead_covid
Now we add average deaths per day in the last 10 year to the plot.
Take quick look to the datatable:
DT_DATE | MS_NUM_DEATHS | Jaar | Dag | |
---|---|---|---|---|
0 | 2008-01-01 | 342 | 2008 | 1 |
1 | 2008-01-02 | 348 | 2008 | 2 |
2 | 2008-01-03 | 340 | 2008 | 3 |
3 | 2008-01-04 | 349 | 2008 | 4 |
4 | 2008-01-05 | 348 | 2008 | 5 |
The column 'DT_DATE' is a string. We convert it to a datatime so we can add it to the tooltip.
Now we are prepared to make the final graph. We use the Altair mark_errorband(extend='ci') to bootstrap 95% confidence band around the average number of deaths per day.
line = alt.Chart(df).mark_line().encode(
x=alt.X('Dag', scale=alt.Scale(
domain=(1, 120),
clamp=True
)),
y='mean(MS_NUM_DEATHS)'
)
# Bootstrapped 95% confidence interval
band = alt.Chart(df).mark_errorband(extent='ci').encode(
x=alt.X('Dag', scale=alt.Scale(domain=(1, 120), clamp=True)),
y=alt.Y('MS_NUM_DEATHS', title='Overlijdens per dag'),
)
dead_covid= alt.Chart(df_deaths).mark_line(point=True).encode(
x=alt.X('Dag',scale=alt.Scale(domain=(1, 120), clamp=True)),
y='Nieuwe covid-19 Sterfgevallen',
color=alt.ColorValue('red'),
tooltip=['Dag', 'Nieuwe covid-19 Sterfgevallen', 'Datum']
)
(band + line + dead_covid).properties(width=1024, title='Gemiddeld aantal overlijdens over 10 jaar versus overlijdens door covid-19 in Belgie')
Source date from sciensano¶
In this section, we compare the graph obtained with data obtained from sciensano.
DATE | REGION | AGEGROUP | SEX | DEATHS | |
---|---|---|---|---|---|
0 | 2020-03-10 | Brussels | 85+ | F | 1 |
1 | 2020-03-11 | Flanders | 85+ | F | 1 |
2 | 2020-03-11 | Brussels | 75-84 | M | 1 |
3 | 2020-03-11 | Brussels | 85+ | F | 1 |
4 | 2020-03-12 | Brussels | 75-84 | M | 1 |
df_dead_day = df_sc.groupby('DATE')['DEATHS'].sum().reset_index()
df_dead_day['Datum'] = pd.to_datetime(df_dead_day['DATE'])
df_dead_day['Dag'] = df_dead_day['Datum'].dt.dayofyear
line = alt.Chart(df).mark_line().encode(
x=alt.X('Dag', title='Dag van het jaar', scale=alt.Scale(
domain=(1, 120),
clamp=True
)),
y='mean(MS_NUM_DEATHS)'
)
# Bootstrapped 95% confidence interval
band = alt.Chart(df).mark_errorband(extent='ci').encode(
x=alt.X('Dag', scale=alt.Scale(domain=(1, 120), clamp=True)),
y=alt.Y('MS_NUM_DEATHS', title='Overlijdens per dag'),
)
dead_covid= alt.Chart(df_dead_day).mark_line(point=True).encode(
x=alt.X('Dag',scale=alt.Scale(domain=(1, 120), clamp=True)),
y='DEATHS',
color=alt.ColorValue('red'),
tooltip=['Dag', 'DEATHS', 'Datum']
)
(band + line + dead_covid).properties(width=750, title='Gemiddeld aantal overlijdens over 10 jaar versus overlijdens door covid-19 in Belgie')
Obviously, data form 16-17-18 April 2020 is not final yet. Also, the amounts are smaller then those from JHU.
Obtain more detail (for another blogpost...)¶
DATE | PROVINCE | REGION | AGEGROUP | SEX | CASES | |
---|---|---|---|---|---|---|
0 | 2020-03-01 | Brussels | Brussels | 10-19 | M | 1 |
1 | 2020-03-01 | Brussels | Brussels | 10-19 | F | 1 |
2 | 2020-03-01 | Brussels | Brussels | 20-29 | M | 1 |
3 | 2020-03-01 | Brussels | Brussels | 30-39 | F | 1 |
4 | 2020-03-01 | Brussels | Brussels | 40-49 | F | 1 |
... | ... | ... | ... | ... | ... | ... |
6875 | NaN | OostVlaanderen | Flanders | NaN | F | 4 |
6876 | NaN | VlaamsBrabant | Flanders | 40-49 | M | 3 |
6877 | NaN | VlaamsBrabant | Flanders | 40-49 | F | 2 |
6878 | NaN | VlaamsBrabant | Flanders | 50-59 | M | 1 |
6879 | NaN | WestVlaanderen | Flanders | 50-59 | M | 3 |
6880 rows × 6 columns
We know that there are a lot of reional differences:
DATE | PROVINCE | CASES | |
---|---|---|---|
0 | 2020-03-01 | Brussels | 6 |
1 | 2020-03-01 | Limburg | 1 |
2 | 2020-03-01 | Liège | 2 |
3 | 2020-03-01 | OostVlaanderen | 1 |
4 | 2020-03-01 | VlaamsBrabant | 6 |
... | ... | ... | ... |
505 | 2020-04-17 | OostVlaanderen | 44 |
506 | 2020-04-17 | VlaamsBrabant | 42 |
507 | 2020-04-17 | WestVlaanderen | 30 |
508 | 2020-04-18 | Brussels | 1 |
509 | 2020-04-18 | Hainaut | 1 |
510 rows × 3 columns
base = alt.Chart(df_plot, title='Number of cases in Belgium per day and province').mark_line(point=True).encode(
x=alt.X('DATE:T', title='Datum'),
y=alt.Y('CASES', title='Cases per day'),
color='PROVINCE',
tooltip=['DATE', 'CASES', 'PROVINCE']
).properties(width=600)
base
From the above graph we see a much lower number of cases in Luxembourg, Namur, Waals Brabant.
'pwd' is not recognized as an internal or external command,
operable program or batch file.
Volume in drive C is Windows
Volume Serial Number is 7808-E933
Directory of C:\Users\lnh6dt5\AppData\Local\Temp\Mxt121\tmp\home_lnh6dt5\blog\_notebooks
19/04/2020 14:14 <DIR> .
19/04/2020 14:14 <DIR> ..
19/04/2020 10:37 <DIR> .ipynb_checkpoints
19/04/2020 10:17 23.473 2020-01-28-Altair.ipynb
19/04/2020 10:34 9.228 2020-01-29-bullet-chart-altair.ipynb
19/04/2020 10:26 41.041 2020-02-15-breakins.ipynb
19/04/2020 09:43 30.573 2020-02-20-test.ipynb
19/04/2020 09:49 1.047 2020-04-18-first-test.ipynb
19/04/2020 14:14 1.237.674 2020-04-19-deads-last-ten-year-vs-covid.ipynb
19/04/2020 09:43 <DIR> my_icons
19/04/2020 09:43 771 README.md
7 File(s) 1.343.807 bytes
4 Dir(s) 89.905.336.320 bytes free