Skip to content

Estimating the effective reproduction number in Belgium with the RKI method

Using the Robert Koch Institute method with serial interval of 4.

Every day Bart Mesuere tweets a nice dashboard with current numbers about Covid-19 in Belgium. This was the tweet on Wednesday 20/11/04:

twitter: https://twitter.com/BartMesuere/status/1323881489864548352

It's nice to see that the effective reproduction number ($Re(t)$) is again below one. That means the power of virus is declining and the number of infection will start to lower. This occured first on Tuesday 2020/11/3:

twitter: https://twitter.com/BartMesuere/status/1323519613855059968

I estimated the $Re(t)$ earlier with rt.live model in this notebook. There the $Re(t)$ was still estimated to be above one. Michael Osthege replied with a simulation results with furter improved model:

twitter: https://twitter.com/theCake/status/1323211910481874944

In that estimation, the $Re(t)$ was also not yet heading below one at the end of october.

In this notebook, we will implement a calculation based on the method of the Robert Koch Institute. The method is described and programmed in R in this blog post.

In that blogpost there's a link to a website with estimations for most places in the world The estimation for Belgium is here

LSHTM

According to that calculation, $Re(t)$ is already below zero for some days.

Load libraries and data

import numpy as np
import pandas as pd
df_tests = pd.read_csv('https://epistat.sciensano.be/Data/COVID19BE_tests.csv', parse_dates=['DATE'])
df_cases = pd.read_csv('https://epistat.sciensano.be/Data/COVID19BE_CASES_AGESEX.csv', parse_dates=['DATE'])
df_cases
DATE PROVINCE REGION AGEGROUP SEX CASES
0 2020-03-01 Antwerpen Flanders 40-49 M 1
1 2020-03-01 Brussels Brussels 10-19 F 1
2 2020-03-01 Brussels Brussels 10-19 M 1
3 2020-03-01 Brussels Brussels 20-29 M 1
4 2020-03-01 Brussels Brussels 30-39 F 1
... ... ... ... ... ... ...
36279 NaT VlaamsBrabant Flanders 40-49 M 3
36280 NaT VlaamsBrabant Flanders 50-59 M 1
36281 NaT WestVlaanderen Flanders 20-29 F 1
36282 NaT WestVlaanderen Flanders 50-59 M 3
36283 NaT NaN NaN NaN NaN 1

36284 rows × 6 columns

Reformat data into Rtlive format

df_cases_per_day = (df_cases
   .dropna(subset=['DATE'])
   .assign(region='Belgium')
   .groupby(['region', 'DATE'], as_index=False)
   .agg(cases=('CASES', 'sum'))
   .rename(columns={'DATE':'date'})
   .set_index(["region", "date"])
)

What's in our basetable:

df_cases_per_day
cases
region date
Belgium 2020-03-01 19
2020-03-02 19
2020-03-03 34
2020-03-04 53
2020-03-05 81
... ...
2020-11-01 2660
2020-11-02 13345
2020-11-03 11167
2020-11-04 4019
2020-11-05 5

250 rows × 1 columns

Let's plot the number of cases in function of the time.

ax = df_cases_per_day.loc['Belgium'].plot(figsize=(18,6))
ax.set(ylabel='Number of cases', title='Number of cases for covid-19 and number of positives in Belgium');

png

We see that the last days are not yet complete. Let's cut off the last two days of reporting.

import datetime
from dateutil.relativedelta import relativedelta

Calculate the date two days ago:

datetime.date(2020, 11, 3)
datetime.date(2020, 11, 3)
# today_minus_two = datetime.date.today() + relativedelta(days=-2)
today_minus_two = datetime.date(2020, 11, 3) # Fix the day
today_minus_two.strftime("%Y-%m-%d")
'2020-11-03'

Replot the cases:

ax = df_cases_per_day.loc['Belgium'][:today_minus_two].plot(figsize=(18,6))
ax.set(ylabel='Number of cases', title='Number of cases for covid-19 and number of positives in Belgium');

png

Select the Belgium region:

region = 'Belgium'
df = df_cases_per_day.loc[region][:today_minus_two]
df
cases
date
2020-03-01 19
2020-03-02 19
2020-03-03 34
2020-03-04 53
2020-03-05 81
... ...
2020-10-30 15185
2020-10-31 6243
2020-11-01 2660
2020-11-02 13345
2020-11-03 11167

248 rows × 1 columns

Check the types of the columns:

df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 248 entries, 2020-03-01 to 2020-11-03
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   cases   248 non-null    int64
dtypes: int64(1)
memory usage: 3.9 KB

Robert Koch Institute method

A basic method to calculate the effective reproduction number is described (among others) in this blogpost. I included the relevant paragraph:

In a recent report (an der Heiden and Hamouda 2020) the RKI described their method for computing R as part of the COVID-19 outbreak as follows (p. 13): For a constant generation time of 4 days, one obtains R as the ratio of new infections in two consecutive time periods each consisting of 4 days. Mathematically, this estimation could be formulated as part of a statistical model:

$$y_{s+4} | y_{s} \sim Po(R \cdot y_{s}), s= 1,2,3,4$$

where $y_{1}, \ldots, y_{4}$ are considered as fixed. From this we obtain

$$\hat{R}{RKI} = \sum$$}^{4} y_{s+4} / \sum_{s=1}^{4} y_{s

Somewhat arbitrary, we denote by $Re(t)$ the above estimate for R when $s=1$ corresponds to time $t-8$, i.e. we assign the obtained value to the last of the 8 values used in the computation.

In Python, we define a lambda function that we apply on a rolling window. Since indexes start from zero, we calculate:

$$\hat{R}{RKI} = \sum$$}^{3} y_{s+4} / \sum_{s=0}^{3} y_{s

rt = lambda y: np.sum(y[4:])/np.sum(y[:4])
df.rolling(8).apply(rt)
cases
date
2020-03-01 NaN
2020-03-02 NaN
2020-03-03 NaN
2020-03-04 NaN
2020-03-05 NaN
... ...
2020-10-30 1.273703
2020-10-31 0.929291
2020-11-01 0.601838
2020-11-02 0.499806
2020-11-03 0.475685

248 rows × 1 columns

The first values are Nan because the window is in the past. If we plot the result, it looks like this:

ax = df.rolling(8).apply(rt).plot(figsize=(16,4), label='Re(t)')
ax.set(ylabel='Re(t)', title='Effective reproduction number estimated with RKI method')
ax.legend(['Re(t)']);

png

To avoid the spikes due to weekend reporting issue, I first applied a rolling mean on a window of 7 days:

ax = df.rolling(7).mean().rolling(8).apply(rt).plot(figsize=(16,4), label='Re(t)')
ax.set(ylabel='Re(t)', title='Effective reproduction number estimated with RKI method after rolling mean on window of 7 days')
ax.legend(['Re(t)']);

png

Interactive visualisation in Altair

import altair as alt

alt.Chart(df.rolling(7).mean().rolling(8).apply(rt).fillna(0).reset_index()).mark_line().encode(
    x=alt.X('date:T'),
    y=alt.Y('cases', title='Re(t)'),
    tooltip=['date:T', alt.Tooltip('cases', format='.2f')]
).transform_filter(
    alt.datum.date > alt.expr.toDate('2020-03-13')
).properties(
    width=600,
    title='Effective reproduction number in Belgium based on Robert-Koch Institute method'
)

Making the final visualisation in Altair

In the interactive Altair figure below, we show the $Re(t)$ for the last 14 days. We reduce the rolling mean window to three to see faster reactions.

#collapse

df_plot = df.rolling(7).mean().rolling(8).apply(rt).fillna(0).reset_index()
last_value = str(df_plot.iloc[-1]['cases'].round(2)) + ' ↓'
first_value = str(df_plot[df_plot['date'] == '2020-10-21'].iloc[0]['cases'].round(2)) # + ' ↑'
today_minus_15 = datetime.datetime.today() + relativedelta(days=-15)
today_minus_15_str = today_minus_15.strftime("%Y-%m-%d")

line = alt.Chart(df_plot).mark_line(point=True).encode(
    x=alt.X('date:T', axis=alt.Axis(title='Datum', grid=False)),
    y=alt.Y('cases', axis=alt.Axis(title='Re(t)', grid=False, labels=False, titlePadding=40)),
    tooltip=['date:T', alt.Tooltip('cases', title='Re(t)', format='.2f')]
).transform_filter(
    alt.datum.date > alt.expr.toDate(today_minus_15_str)
).properties(
    width=600,
    height=100
)

hline = alt.Chart(pd.DataFrame({'cases': [1]})).mark_rule().encode(y='cases')


label_right = alt.Chart(df_plot).mark_text(
    align='left', dx=5, dy=-10 , size=15
).encode(
    x=alt.X('max(date):T', title=None),
    text=alt.value(last_value),
)

label_left = alt.Chart(df_plot).mark_text(
    align='right', dx=-5, dy=-40, size=15
).encode(
    x=alt.X('min(date):T', title=None),
    text=alt.value(first_value),
).transform_filter(
    alt.datum.date > alt.expr.toDate(today_minus_15_str)
)

source = alt.Chart(
    {"values": [{"text": "Data source: Sciensano"}]}
).mark_text(size=12, align='left', dx=-57).encode(
    text="text:N"
)

alt.vconcat(line + label_left + label_right + hline, source).configure(
    background='#D9E9F0'
).configure_view(
    stroke=None, # Remove box around graph
).configure_axisY(
    ticks=False,
    grid=False,
    domain=False
).configure_axisX(
    grid=False,
    domain=False
).properties(title={
      "text": ['Effective reproduction number for the last 14 days in Belgium'], 
      "subtitle": [f'Estimation based on the number of cases until {today_minus_two.strftime("%Y-%m-%d")} after example of Robert Koch Institute with serial interval of 4'],
}
)
# .configure_axisY(
#     labelPadding=50,
# )

To check the calculation, here are the last for values for the number of cases after applying the mean window of 7:

df.rolling(7).mean().iloc[-8:-4]
cases
date
2020-10-27 16067.571429
2020-10-28 16135.857143
2020-10-29 15744.571429
2020-10-30 15218.000000

Those must be added together:

df.rolling(7).mean().iloc[-8:-4].sum()
cases    63166.0
dtype: float64

And here are the four values, starting four days ago:

df.rolling(7).mean().iloc[-4:]
cases
date
2020-10-31 14459.428571
2020-11-01 14140.428571
2020-11-02 13213.428571
2020-11-03 11641.428571

These are added together:

df.rolling(7).mean().iloc[-4:].sum()
cases    53454.714286
dtype: float64

And now we divide those two sums to get the $Re(t)$ of 2020-11-03:

df.rolling(7).mean().iloc[-4:].sum()/df.rolling(7).mean().iloc[-8:-4].sum()
cases    0.846258
dtype: float64

This matches (as expected) the value in the graph. Let's compare with three other sources:

  1. Alas it does not match the calculation reported by Bart Mesuere on 2020-11-03 based on the RKI model that reports 0.96:

twitter: https://twitter.com/BartMesuere/status/1323519613855059968

  1. Also, the more elaborated model from rtliveglobal is not yet that optimistic. Mind that model rtlive start estimating the $Re(t)$ from the number of tests instead of the number of cases. It might be that other reporting delays are involved.

  2. epiforecast.io is already below 1 since beginning of November.

Another possiblity is that I made somewhere a mistake. If you spot it, please let me know.


Comments