Why the Smartest People in AI Disagree

What their disagreement reveals about how organizations should prepare for what comes next


why_the_smartest_people_in_ai_disagree.jpg
why_the_smartest_people_in_ai_disagree.jpg

The Question Behind the Question

Some of the smartest people working on AI disagree about where it is going. Not just on timelines, but on fundamentals. Some argue that progress will slow because we are hitting physical limits. Others believe that new breakthroughs will unlock systems far more general than anything we have today. Still others argue that the very idea of “superintelligence” is a distraction.

For a long time, I found this disagreement confusing. These are people with access to the same research, the same models, and often the same data. If anyone should agree about the future of AI, it should be them.

Over time, I started to suspect that the disagreement wasn’t really about technology.

It was about what counts as success.

When people talk about AI “winning,” they often mean different things. Sometimes they mean being first. Sometimes they mean being most capable. Sometimes they mean building something that looks impressive on a benchmark or in a demo. These goals are easy to measure, and they dominate public discussion.

They are also insufficient.

For me, advanced AI is only a success if three things are true:

  • its benefits are broadly shared rather than concentrated,
  • it makes human work more rewarding instead of hollowing it out,
  • and it can exist within real physical and ecological limits.

Once I started looking at the AI debate through this lens, many disagreements made more sense. People weren’t talking past each other because they misunderstood the technology. They were optimizing for different outcomes.

This essay is an attempt to understand those differences—not to predict who will be right about AGI, but to ask a more practical question: how should organizations act when the technology is powerful, the future is uncertain, and the consequences are unevenly distributed?


How Different Views of Success Shape Different Strategies

Once you define success this way, the disagreement around AI becomes easier to interpret. The question is no longer who is right about the future, but what each group is trying to optimize for.

This becomes especially clear when you look at a small number of influential voices—not as prophets, but as representatives of distinct strategies. Each is responding to the same technological reality. Each sees real risks. Where they diverge is in what they believe should be maximized, and what they believe must be constrained.

Understanding these differences matters, because organizations often copy the assumptions of the loudest or most successful players without realizing it. Before adopting their tools or their rhetoric, it is worth understanding the world they are implicitly trying to build.


Dan Wang: Speed, Scale, and the Logic of Competition

Dan Wang approaches the future of AI from a geopolitical angle. In his book
Breakneck: China’s Quest to Engineer the Future, the central question is not what intelligence is, but how technological capability translates into national power.

Wang’s core observation is simple: China and the United States are locked in a competition where speed matters. Not just speed of invention, but speed of deployment. The advantage does not necessarily go to whoever builds the most elegant system, but to whoever can turn new capabilities into real-world infrastructure fastest.

China, as Wang describes it, excels at this. Once a technology is deemed strategically important, it can be rolled out at scale, embedded into institutions, and iterated on quickly. The United States, by contrast, tends to lead in early research but often struggles with coordination and diffusion.

What matters is that Wang’s argument does not depend on AGI arriving soon—or at all. Even narrow or imperfect AI systems can have enormous impact if they are widely deployed and tightly integrated into society.

This leads to a very specific definition of success: whoever aligns technology, institutions, and incentives most effectively will win.

From this perspective, questions about distribution, meaningful work, or sustainability are secondary. They may matter socially or politically, but they are not the primary drivers of the strategy.

Wang describes the world as it is. And it is against this reality—of competition, pressure, and uneven incentives—that the other perspectives react.


Tim Dettmers: The Case for Physical and Economic Limits

If Wang represents speed, Tim Dettmers represents constraint.

In his essay
Why AGI Will Not Happen,
Dettmers argues that the current AI strategy—relentless scaling through more compute, more energy, and more capital—runs into physical and economic limits much sooner than most narratives admit.

Computation is not abstract. It happens on chips that consume power, generate heat, and depend on complex supply chains. For a long time, progress felt almost free. Bigger models reliably worked better. Hardware improved predictably. Capital was abundant.

Dettmers argues that this era is ending. Linear gains now require exponential resources, and that trajectory cannot continue indefinitely.

This matters because “speed at all costs” assumes scaling is always available as an option. Dettmers challenges that assumption. If compute, energy, and money become binding constraints, then racing faster becomes a gamble rather than a strategy.

There is also an implicit sustainability argument here. Even if massive scaling were technically possible, it raises questions about environmental impact and opportunity cost.

Dettmers does not claim that AI development will stop. His point is more uncomfortable: the easiest path forward is narrowing, and organizations built on assumptions of unlimited growth may find themselves brittle.


Ilya Sutskever: Limits Matter, but Curves Can Still Bend

Where Dettmers sees constraints,
Ilya Sutskever sees a fork in the road.

Sutskever has openly stated that the era of effortless scaling is ending, but he does not conclude that progress must therefore stall. Instead, he argues that limits signal the need for conceptual breakthroughs.

Past progress in AI has not come from scaling alone. Backpropagation, convolutional networks, transformers—each reshaped what scaling even meant. In hindsight they look obvious. At the time, they were not.

This belief explains his focus on long-term research and safety, most recently through
Safe Superintelligence Inc..

What distinguishes this view is its combination of ambition and restraint. Sutskever takes the possibility of extremely powerful systems seriously—and precisely because of that, treats alignment and safety as prerequisites rather than afterthoughts.

For organizations, this suggests a different posture toward uncertainty: build the capacity to adapt, rather than optimizing prematurely for today’s dominant paradigm.


Yann LeCun: Questioning the Curve Itself

If Sutskever believes the curve can bend,
Yann LeCun questions whether there is a single curve at all.

LeCun has long argued that the AGI and superintelligence debate rests on a flawed abstraction: the idea that intelligence is a single scalar quantity that can be increased and extrapolated.

In reality, intelligence is multi-dimensional. Systems can excel in some areas while remaining weak in others. Asking whether one system is “more intelligent” than another is often as misleading as asking whether a hammer is smarter than a screwdriver.

LeCun is particularly skeptical that scaling language models leads naturally to world understanding. Language, he argues, is a surface phenomenon. Much of human intelligence is grounded in perception and interaction with the physical world.

This reframing dissolves both runaway optimism and hard ceilings. If intelligence is not one-dimensional, there is no single curve to race along.

For organizations, the implication is quiet but radical: there is no finish line—only choices.


Synthesis: Strategy When the Future Is Powerful but Unclear

Taken together, these perspectives do not converge on a single prediction. They converge on something more useful: a way to think about action under uncertainty.

  • Wang reminds us that technology is deployed as soon as it exists.
  • Dettmers reminds us that scaling faces real limits.
  • Sutskever argues that breakthroughs can change the curve.
  • LeCun questions whether the curve metaphor even applies.

What unites them is this: the future will not be linear.

If inequality from AI is an organizational and institutional issue, then the most important choices are not technical. They are structural.

Three principles follow:

  1. Avoid irreversible bets.
  2. Preserve human agency where values are involved.
  3. Invest in understanding, not just usage.

These principles work whether progress accelerates, slows, or fragments.


Conclusion: What We Owe the Future

It is tempting to ask who will win the AI race. That question is simple, and it feels urgent. It is also the wrong one.

The systems we are building will be powerful whether or not we ever agree on what AGI means. What matters is not how impressive they become, but how they are woven into institutions, work, and daily life.

This essay itself was written with the help of AI. Not as a substitute for judgment, but as a tool for thinking. The responsibility for the conclusions—and for their consequences—remains human.

Used this way, AI does not diminish meaningful work. It supports it.

The future will not ask whether we were clever enough to build powerful machines.
It will ask whether we were wise enough to use them well.

About Vibe Coding

Impact of vibe coding on product desing

Software development is changing rapidly overnight. Indeed:

  1. OpenAI announced Codex on May 16, 2025, for their $200/month Pro users https://openai.com/index/introducing-codex/.
  2. Microsoft GitHub Copilot released its new coding agent on May 19, 2025. https://bsky.app/profile/github.com/post/3lpjxvgje7s2k
  3. Google announced a tool called Jules (jules.google.com) on May 20, 2025, making it available for free and
  4. Mistral releases devstral, an open-source model for coding agents on May 21, 2025. https://mistral.ai/news/devstral

These new coding agents—along with Cursor, Lovable, Windsurf, V0, Bold.new, and others—are all tools that support some form of “vibe coding” (a term coined by Karpathy indicating AI-assisted coding).

This gives rise to a lot of FUD (fear, uncertainty, and doubt) from the corporate gatekeepers. The short-term opportunity is this: in a design thinking approach, a “research prototype” that checks the basic hypotheses (who is this product for, what problem does the product solve) can be developed much faster using vibe coding.

Even with the expected “valley of disappointment” that may follow (because users tend to overreact to the initial prototype, which will likely need to be rewritten from scratch), in the end, the chance of building a product that resonates with users is much higher and it will be ready sooner—if the same good old software process is followed, from prototype to Minimum Viable Product (MVP) to Version 1 accepted by users.

The future Ai ecosystem will be open

a79b47e679f3b1e93e1d2a2aadbb3461875225293794331ce2b9f471931c3f44.jpg
The future AI ecosystem will be open

I was just reading the article The walled garden cracks: Nadella bets Microsoft’s Copilots—and Azure’s next act—on A2A/MCP interoperability and this is how I see what's happening in the AI landscape:

  • Antropic: best user experience and defined MCP (Model Context Protocol)
  • Google: best model on all leaderboards with Gemini and defined A2A (Agent-to-Agent)
  • Microsoft: let's build an open AI ecosystem with MCP and A2A, releases supports for A2A and MCP in VS Code
  • Deepseek: after they pulled of DeepSeek V3, Deepseek released open model DeepSeek-Prover-V2 that tackles advanced theorem proving achieving an 88.9% pass rate on the MiniF2F-test benchmark (Olympiad/AIME level theorems) and solving 49 out of 658 problems on the new PutnamBench. This means that Deepseek is cracking the reasoning part of LLM's.

And OpenAI:

Seeing all those signals, we conclude: the AI future will be open!

"Bar chart made in Altair with Financial Times style"

"#30DayChartChallenge #Day24 Themeday: Financial times"

  • image: images/barchart_FT_style_altair.png
import pandas as pd
import altair as alt

The #30DayChartChallenge Day 24 calls for Financial Times themed charts. The bar chart that I will try to reproduce in Altair was published in the article: "Financial warfare: will there be a backlash against the dollar?"

This is the graph (without FT background) to we want to reproduce:

I digitized the heights of yhe bars with WebplotDigitizer:

data = """Bar0, 3.23
Bar1, 1.27
Bar2, 1.02
Bar3, 0.570
Bar4, 0.553
Bar5, 0.497
Bar6, 0.467
Bar7, 0.440
Bar8, 0.420
Bar9, 0.413
Bar10, 0.317
Bar11, 0.0433"""

data_values = [float(x.split()[1]) for x in data.splitlines()]

I put the values into a Pandas dataframe:

source = pd.DataFrame({
    'label': ['China', 'Japan', 'Switserland', 'India', 'Taiwan', 'Hong Kong', 'Russia', 'South Korea', 'Saudi Arabia', 'Singapore', 'Eurozone', 'US'],
    'val': data_values
})

Now we build the graph and alter it's style to resemble the Financial Times style:

square = alt.Chart().mark_rect(width=80, height=5, color='black', xOffset=-112, yOffset=10)

bars = alt.Chart(source).mark_bar(color='#174C7F', size=30).encode(
    x=alt.X('val:Q', title='', axis=alt.Axis(tickCount=6, domain=False, labelColor='darkgray'), scale=alt.Scale(domain=[0, 3.0])),
    y=alt.Y('label:N', title='', sort=alt.EncodingSortField(
            field="val:Q",  # The field to use for the sort
            op="sum",  # The operation to run on the field prior to sorting
            order="ascending"  # The order to sort in
        ), axis=alt.Axis(domainColor='lightgray',
                         labelFontSize=18, labelColor='darkgray', labelPadding=5,
                         labelFontStyle='Bold',
                         tickSize=18, tickColor='lightgray'))
).properties(title={
      "text": ["The biggest holders of FX reserves", ], 
      "subtitle": ["Official foreign exchange reserve (Jan 2022, $tn)"],
      "align": 'left',
      "anchor": 'start'
    },
    width=700,
    height=512
)

source_text = alt.Chart(
    {"values": [{"text": "Source: IMF, © FT"}]}
).mark_text(size=12, align='left', dx=-140, color='darkgrey').encode(
    text="text:N"
)

# from https://stackoverflow.com/questions/57244390/has-anyone-figured-out-a-workaround-to-add-a-subtitle-to-an-altair-generated-cha
chart = alt.vconcat(
    square,
    bars,
    source_text
).configure_concat(
    spacing=0
).configure(
    background='#fff1e5',
).configure_view(
    stroke=None, # Remove box around graph
).configure_title(
    # font='metricweb',
    fontSize=22,
    fontWeight=400,
    subtitleFontSize=18,
    subtitleColor='darkgray',
    subtitleFontWeight=400,
    subtitlePadding=15,
    offset=80,
    dy=40
)

chart

Trying to use the offical Financial Times fonts

The chart looks quit similar to the original. Biggest difference is the typography. The Financial times uses its own Metric Web and Financier Display Web fonts and Altair can only use fonts available in the browser.

The fonts could be made available via CSS:

@font-face {
    font-family: 'metricweb';
    src: url('https://www.ft.com/__origami/service/build/v2/files/o-fonts-assets@1.5.0/MetricWeb-Regular.woff2''
);
}
from IPython.display import HTML
from google.colab.output import _publish as publish
publish.css("""@font-face {
    font-family: 'metricweb', sans-serif;
    src: url('https://www.ft.com/__origami/service/build/v2/files/o-fonts-assets@1.5.0/MetricWeb-Regular.woff2') format('woff2');
}""")
square = alt.Chart().mark_rect(width=80, height=5, color='black', xOffset=-112, yOffset=10)

bars = alt.Chart(source).mark_bar(color='#174C7F', size=30).encode(
    x=alt.X('val:Q', title='', axis=alt.Axis(tickCount=6, domain=False), scale=alt.Scale(domain=[0, 3.0])),
    y=alt.Y('label:N', title='', sort=alt.EncodingSortField(
            field="val:Q",  # The field to use for the sort
            op="sum",  # The operation to run on the field prior to sorting
            order="ascending"  # The order to sort in
        ), axis=alt.Axis(domainColor='lightgray',
                         labelFontSize=18, labelColor='darkgray', labelPadding=5,
                         labelFontStyle='Bold',
                         tickSize=18, tickColor='lightgray'))
).properties(title={
      "text": ["The biggest holders of FX reserves", ], 
      "subtitle": ["Official foreign exchange reserve (Jan 2022, $tn)"],
      "align": 'left',
      "anchor": 'start'
    },
    width=700,
    height=512
)

source_text = alt.Chart(
    {"values": [{"text": "Source: IMF, © FT"}]}
).mark_text(size=12, align='left', dx=-140, color='darkgrey').encode(
    text="text:N"
)

# from https://stackoverflow.com/questions/57244390/has-anyone-figured-out-a-workaround-to-add-a-subtitle-to-an-altair-generated-cha
chart = alt.vconcat(
    square,
    bars,
    source_text
).configure_concat(
    spacing=0
).configure(
    background='#fff1e5',
).configure_view(
    stroke=None, # Remove box around graph
).configure_title(
    font='metricweb',
    fontSize=22,
    fontWeight=400,
    subtitleFont='metricweb',
    subtitleFontSize=18,
    subtitleColor='darkgray',
    subtitleFontWeight=400,
    subtitlePadding=15,
    offset=80,
    dy=40
)

chart

For the moment the font does not look at all to be Metric web :-(

A second minor difference are the alignment of the 0.0 and 3.0 labels of the x-axis. In the orginal, those labels are centered. Altair aligns 0.0 to the left and 3.0 to the right.


"Reconstructing Economist graph with Altair"

"#30DayChartChallenge #altair #day12"

  • image: images/Economist_stye%3B_30dayschartchallenge_day12.png

In an Economist article "The metamorphosis: How Jeremy Corbyn took control of Labour", the following graph appeared:

Later, Sarah Leo, data visualiser at The Economist, improved the graph to:

The rationale behind this improvement is discussed in her article: 'Mistakes, we made a few'.

In this article, I show how visualisation library Altair can be used to reconstruct the improved graph.

import numpy as np
import pandas as pd
import altair as alt

Read the data for the graph into a Pandas dataframe:

df = pd.read_csv('http://infographics.economist.com/databank/Economist_corbyn.csv').dropna()

This is how the data looks:

df
Page Average number of likes per Facebook post 2016
0 Jeremy Corbyn 5210.0
1 Labour Party 845.0
2 Momentum 229.0
3 Owen Smith 127.0
4 Andy Burnham 105.0
5 Saving Labour 56.0

A standard bar graph in Altair gives this:

alt.Chart(df).mark_bar().encode(
    x='Average number of likes per Facebook post 2016:Q',
    y='Page:O'
)

The message of the graph is that Jerermy Corbyn has by far the most likes per Facebook post in 2016. There are a number of improvements possible:

The number on the x-axis are multiple of thousands. In spirit of removing as much inkt as possible, let's rescale the x-asis with factor 1000. The label 'Page' on the y-axis is superfluous. Let's remove it.

df['page1k'] = df['Average number of likes per Facebook post 2016']/1000.0

After scaling the graphs looks like this:

alt.Chart(df).mark_bar().encode(
    x=alt.X('page1k', title='Average number of likes per Facebook post 2016'),
    y=alt.Y('Page:O', title='')
)

A third improvement is to sort the bars from high to low. This supports the message, Jeremy Corbyn has the most clicks.

alt.Chart(df).mark_bar().encode(
    x=alt.X('page1k:Q', title='Average number of likes per Facebook post 2016'),
    y=alt.Y('Page:O', title='', sort=alt.EncodingSortField(
            field="Average number of likes per Facebook post 2016:Q",  # The field to use for the sort
            op="sum",  # The operation to run on the field prior to sorting
            order="ascending"  # The order to sort in
        ))
)

Now, we see that we have to many ticks on the x-axis. We can add a scale and map the x-axis to integers to cope with that. While adding markup for the x-axis, we add orient='top'. That move the xlabel text to the top of the graph.

alt.Chart(df).mark_bar().encode(
    x=alt.X('page1k:Q', title='Average number of likes per Facebook post 2016',
            axis=alt.Axis(title='Average number of likes per Facebook post 2016', orient="top", format='d', values=[1,2,3,4,5,6]),
            scale=alt.Scale(round=True, domain=[0,6])),
    y=alt.Y('Page:O', title='', sort=alt.EncodingSortField(
            field="Average number of likes per Facebook post 2016:Q",  # The field to use for the sort
            op="sum",  # The operation to run on the field prior to sorting
            order="ascending"  # The order to sort in
        ))
)

Now, we want to remove the x-axis itself as it adds nothing extra. We do that by putting the stroke at None in the configure_view. We also adjust the x-axis title to make clear the numbers are multiples of thousands.

alt.Chart(df).mark_bar().encode(
    x=alt.X('page1k:Q', title="Average number of likes per Facebook post 2016  ('000)",
            axis=alt.Axis(title='Average number of likes per Facebook post 2016', orient="top", format='d', values=[1,2,3,4,5,6]),
            scale=alt.Scale(round=True, domain=[0,6])),
    y=alt.Y('Page:O', title='', sort=alt.EncodingSortField(
            field="Average number of likes per Facebook post 2016:Q",  # The field to use for the sort
            op="sum",  # The operation to run on the field prior to sorting
            order="ascending"  # The order to sort in
        ))
).configure_view(
    stroke=None, # Remove box around graph
)

Next we try to left align the y-axis labels:

alt.Chart(df).mark_bar().encode(
    x=alt.X('page1k:Q',
            axis=alt.Axis(title="Average number of likes per Facebook post 2016  ('000)", orient="top", format='d', values=[1,2,3,4,5,6]),
            scale=alt.Scale(round=True, domain=[0,6])),
    y=alt.Y('Page:O', title='', sort=alt.EncodingSortField(
            field="Average number of likes per Facebook post 2016:Q",  # The field to use for the sort
            op="sum",  # The operation to run on the field prior to sorting
            order="ascending"  # The order to sort in
        ))
).configure_view(
    stroke=None, # Remove box around graph
).configure_axisY(
    labelPadding=70, 
    labelAlign='left'
)

Now, we apply the Economist style:

square = alt.Chart().mark_rect(width=50, height=18, color='#EB111A', xOffset=-105, yOffset=10)

bars = alt.Chart(df).mark_bar().encode(
    x=alt.X('page1k:Q',
            axis=alt.Axis(title="", orient="top", format='d', values=[1,2,3,4,5,6], labelFontSize=14),
            scale=alt.Scale(round=True, domain=[0,6])),
    y=alt.Y('Page:O', title='', sort=alt.EncodingSortField(
            field="Average number of likes per Facebook post 2016:Q",  # The field to use for the sort
            op="sum",  # The operation to run on the field prior to sorting
            order="ascending"  # The order to sort in
        ),
        # Based on https://stackoverflow.com/questions/66684882/color-some-x-labels-in-altair-plot
        axis=alt.Axis(labelFontSize=14, labelFontStyle=alt.condition('datum.value == "Jeremy Corbyn"', alt.value('bold'), alt.value('italic'))))
).properties(title={
      "text": ["Left Click", ], 
      "subtitle": ["Average number of likes per Facebook post\n", "2016, '000"],
      "align": 'left',
      "anchor": 'start'
    }
)

source = alt.Chart(
    {"values": [{"text": "Source: Facebook"}]}
).mark_text(size=12, align='left', dx=-120, color='darkgrey').encode(
    text="text:N"
)

# from https://stackoverflow.com/questions/57244390/has-anyone-figured-out-a-workaround-to-add-a-subtitle-to-an-altair-generated-cha
chart = alt.vconcat(
    square,
    bars,
    source
).configure_concat(
    spacing=0
).configure(
    background='#D9E9F0'
).configure_view(
    stroke=None, # Remove box around graph
).configure_axisY(
    labelPadding=110,
    labelAlign='left',
    ticks=False,
    grid=False
).configure_title(
    fontSize=22,
    subtitleFontSize=18,
    offset=30,
    dy=30
)

chart

The only thing, I could not reproduce with Altair is the light bar around the the first label and bar. For those final touches I think it's better to export the graph and add those finishing touches with a tool such as Inkscape or Illustrator.

Comparing Rt numbers for Belgium on 09-12-2020

Model Based on URL Rt Date
by Niel Hens Cases https://gjbex.github.io/DSI_UHasselt_covid_dashboard/ 0.96 20-12-5
Cori et al. (2013) Hospitalisations https://covid-19.sciensano.be/sites/default/files/Covid19/COVID-19_Weekly_report_NL.pdf 0.798 20-11-27/ till 20-12-3
RKI Hospitalisations https://datastudio.google.com/embed/u/0/reporting/c14a5cfc-cab7-4812-848c-0369173148ab/page/ZwmOB 0.97 20-12-09
rtlive Cases https://rtlive.de/global.html 0.80 20-12-09
epiforecast Cases and Deaths https://epiforecasts.io/covid/posts/national/belgium/ 0.5 20-12-07
Huisman et al. (2020) Cases https://ibz-shiny.ethz.ch/covid-19-re-international/ 1.01 20-11-24
Huisman et al. (2020) Hospitalisations https://ibz-shiny.ethz.ch/covid-19-re-international/ 0.84 20-11-24
RKI Cases https://twitter.com/BartMesuere/status/1336565641764089856 0.99 20-12-08
Deforche (2020) Hospitalisations and Deaths https://twitter.com/houterkabouter/status/1336582281994055680 0.85 20-12-09
SEIR Hospitalisations and Deaths https://twitter.com/vdwnico/status/1336557572254552065 1.5 20-12-09

Estimating the effective reproduction number in Belgium with the RKI method

Using the Robert Koch Institute method with serial interval of 4.

Every day Bart Mesuere tweets a nice dashboard with current numbers about Covid-19 in Belgium. This was the tweet on Wednesday 20/11/04:

twitter: https://twitter.com/BartMesuere/status/1323881489864548352

It's nice to see that the effective reproduction number ($Re(t)$) is again below one. That means the power of virus is declining and the number of infection will start to lower. This occured first on Tuesday 2020/11/3:

twitter: https://twitter.com/BartMesuere/status/1323519613855059968

I estimated the $Re(t)$ earlier with rt.live model in this notebook. There the $Re(t)$ was still estimated to be above one. Michael Osthege replied with a simulation results with furter improved model:

twitter: https://twitter.com/theCake/status/1323211910481874944

In that estimation, the $Re(t)$ was also not yet heading below one at the end of october.

In this notebook, we will implement a calculation based on the method of the Robert Koch Institute. The method is described and programmed in R in this blog post.

In that blogpost there's a link to a website with estimations for most places in the world The estimation for Belgium is here

LSHTM

According to that calculation, $Re(t)$ is already below zero for some days.

Load libraries and data

import numpy as np
import pandas as pd
df_tests = pd.read_csv('https://epistat.sciensano.be/Data/COVID19BE_tests.csv', parse_dates=['DATE'])
df_cases = pd.read_csv('https://epistat.sciensano.be/Data/COVID19BE_CASES_AGESEX.csv', parse_dates=['DATE'])
df_cases
DATE PROVINCE REGION AGEGROUP SEX CASES
0 2020-03-01 Antwerpen Flanders 40-49 M 1
1 2020-03-01 Brussels Brussels 10-19 F 1
2 2020-03-01 Brussels Brussels 10-19 M 1
3 2020-03-01 Brussels Brussels 20-29 M 1
4 2020-03-01 Brussels Brussels 30-39 F 1
... ... ... ... ... ... ...
36279 NaT VlaamsBrabant Flanders 40-49 M 3
36280 NaT VlaamsBrabant Flanders 50-59 M 1
36281 NaT WestVlaanderen Flanders 20-29 F 1
36282 NaT WestVlaanderen Flanders 50-59 M 3
36283 NaT NaN NaN NaN NaN 1

36284 rows × 6 columns

Reformat data into Rtlive format

df_cases_per_day = (df_cases
   .dropna(subset=['DATE'])
   .assign(region='Belgium')
   .groupby(['region', 'DATE'], as_index=False)
   .agg(cases=('CASES', 'sum'))
   .rename(columns={'DATE':'date'})
   .set_index(["region", "date"])
)

What's in our basetable:

df_cases_per_day
cases
region date
Belgium 2020-03-01 19
2020-03-02 19
2020-03-03 34
2020-03-04 53
2020-03-05 81
... ...
2020-11-01 2660
2020-11-02 13345
2020-11-03 11167
2020-11-04 4019
2020-11-05 5

250 rows × 1 columns

Let's plot the number of cases in function of the time.

ax = df_cases_per_day.loc['Belgium'].plot(figsize=(18,6))
ax.set(ylabel='Number of cases', title='Number of cases for covid-19 and number of positives in Belgium');

png

We see that the last days are not yet complete. Let's cut off the last two days of reporting.

import datetime
from dateutil.relativedelta import relativedelta

Calculate the date two days ago:

datetime.date(2020, 11, 3)
datetime.date(2020, 11, 3)
# today_minus_two = datetime.date.today() + relativedelta(days=-2)
today_minus_two = datetime.date(2020, 11, 3) # Fix the day
today_minus_two.strftime("%Y-%m-%d")
'2020-11-03'

Replot the cases:

ax = df_cases_per_day.loc['Belgium'][:today_minus_two].plot(figsize=(18,6))
ax.set(ylabel='Number of cases', title='Number of cases for covid-19 and number of positives in Belgium');

png

Select the Belgium region:

region = 'Belgium'
df = df_cases_per_day.loc[region][:today_minus_two]
df
cases
date
2020-03-01 19
2020-03-02 19
2020-03-03 34
2020-03-04 53
2020-03-05 81
... ...
2020-10-30 15185
2020-10-31 6243
2020-11-01 2660
2020-11-02 13345
2020-11-03 11167

248 rows × 1 columns

Check the types of the columns:

df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 248 entries, 2020-03-01 to 2020-11-03
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   cases   248 non-null    int64
dtypes: int64(1)
memory usage: 3.9 KB

Robert Koch Institute method

A basic method to calculate the effective reproduction number is described (among others) in this blogpost. I included the relevant paragraph:

In a recent report (an der Heiden and Hamouda 2020) the RKI described their method for computing R as part of the COVID-19 outbreak as follows (p. 13): For a constant generation time of 4 days, one obtains R as the ratio of new infections in two consecutive time periods each consisting of 4 days. Mathematically, this estimation could be formulated as part of a statistical model:

$$y_{s+4} | y_{s} \sim Po(R \cdot y_{s}), s= 1,2,3,4$$

where $y_{1}, \ldots, y_{4}$ are considered as fixed. From this we obtain

$$\hat{R}{RKI} = \sum$$}^{4} y_{s+4} / \sum_{s=1}^{4} y_{s

Somewhat arbitrary, we denote by $Re(t)$ the above estimate for R when $s=1$ corresponds to time $t-8$, i.e. we assign the obtained value to the last of the 8 values used in the computation.

In Python, we define a lambda function that we apply on a rolling window. Since indexes start from zero, we calculate:

$$\hat{R}{RKI} = \sum$$}^{3} y_{s+4} / \sum_{s=0}^{3} y_{s

rt = lambda y: np.sum(y[4:])/np.sum(y[:4])
df.rolling(8).apply(rt)
cases
date
2020-03-01 NaN
2020-03-02 NaN
2020-03-03 NaN
2020-03-04 NaN
2020-03-05 NaN
... ...
2020-10-30 1.273703
2020-10-31 0.929291
2020-11-01 0.601838
2020-11-02 0.499806
2020-11-03 0.475685

248 rows × 1 columns

The first values are Nan because the window is in the past. If we plot the result, it looks like this:

ax = df.rolling(8).apply(rt).plot(figsize=(16,4), label='Re(t)')
ax.set(ylabel='Re(t)', title='Effective reproduction number estimated with RKI method')
ax.legend(['Re(t)']);

png

To avoid the spikes due to weekend reporting issue, I first applied a rolling mean on a window of 7 days:

ax = df.rolling(7).mean().rolling(8).apply(rt).plot(figsize=(16,4), label='Re(t)')
ax.set(ylabel='Re(t)', title='Effective reproduction number estimated with RKI method after rolling mean on window of 7 days')
ax.legend(['Re(t)']);

png

Interactive visualisation in Altair

import altair as alt

alt.Chart(df.rolling(7).mean().rolling(8).apply(rt).fillna(0).reset_index()).mark_line().encode(
    x=alt.X('date:T'),
    y=alt.Y('cases', title='Re(t)'),
    tooltip=['date:T', alt.Tooltip('cases', format='.2f')]
).transform_filter(
    alt.datum.date > alt.expr.toDate('2020-03-13')
).properties(
    width=600,
    title='Effective reproduction number in Belgium based on Robert-Koch Institute method'
)

Making the final visualisation in Altair

In the interactive Altair figure below, we show the $Re(t)$ for the last 14 days. We reduce the rolling mean window to three to see faster reactions.

#collapse

df_plot = df.rolling(7).mean().rolling(8).apply(rt).fillna(0).reset_index()
last_value = str(df_plot.iloc[-1]['cases'].round(2)) + ' ↓'
first_value = str(df_plot[df_plot['date'] == '2020-10-21'].iloc[0]['cases'].round(2)) # + ' ↑'
today_minus_15 = datetime.datetime.today() + relativedelta(days=-15)
today_minus_15_str = today_minus_15.strftime("%Y-%m-%d")

line = alt.Chart(df_plot).mark_line(point=True).encode(
    x=alt.X('date:T', axis=alt.Axis(title='Datum', grid=False)),
    y=alt.Y('cases', axis=alt.Axis(title='Re(t)', grid=False, labels=False, titlePadding=40)),
    tooltip=['date:T', alt.Tooltip('cases', title='Re(t)', format='.2f')]
).transform_filter(
    alt.datum.date > alt.expr.toDate(today_minus_15_str)
).properties(
    width=600,
    height=100
)

hline = alt.Chart(pd.DataFrame({'cases': [1]})).mark_rule().encode(y='cases')


label_right = alt.Chart(df_plot).mark_text(
    align='left', dx=5, dy=-10 , size=15
).encode(
    x=alt.X('max(date):T', title=None),
    text=alt.value(last_value),
)

label_left = alt.Chart(df_plot).mark_text(
    align='right', dx=-5, dy=-40, size=15
).encode(
    x=alt.X('min(date):T', title=None),
    text=alt.value(first_value),
).transform_filter(
    alt.datum.date > alt.expr.toDate(today_minus_15_str)
)

source = alt.Chart(
    {"values": [{"text": "Data source: Sciensano"}]}
).mark_text(size=12, align='left', dx=-57).encode(
    text="text:N"
)

alt.vconcat(line + label_left + label_right + hline, source).configure(
    background='#D9E9F0'
).configure_view(
    stroke=None, # Remove box around graph
).configure_axisY(
    ticks=False,
    grid=False,
    domain=False
).configure_axisX(
    grid=False,
    domain=False
).properties(title={
      "text": ['Effective reproduction number for the last 14 days in Belgium'], 
      "subtitle": [f'Estimation based on the number of cases until {today_minus_two.strftime("%Y-%m-%d")} after example of Robert Koch Institute with serial interval of 4'],
}
)
# .configure_axisY(
#     labelPadding=50,
# )

To check the calculation, here are the last for values for the number of cases after applying the mean window of 7:

df.rolling(7).mean().iloc[-8:-4]
cases
date
2020-10-27 16067.571429
2020-10-28 16135.857143
2020-10-29 15744.571429
2020-10-30 15218.000000

Those must be added together:

df.rolling(7).mean().iloc[-8:-4].sum()
cases    63166.0
dtype: float64

And here are the four values, starting four days ago:

df.rolling(7).mean().iloc[-4:]
cases
date
2020-10-31 14459.428571
2020-11-01 14140.428571
2020-11-02 13213.428571
2020-11-03 11641.428571

These are added together:

df.rolling(7).mean().iloc[-4:].sum()
cases    53454.714286
dtype: float64

And now we divide those two sums to get the $Re(t)$ of 2020-11-03:

df.rolling(7).mean().iloc[-4:].sum()/df.rolling(7).mean().iloc[-8:-4].sum()
cases    0.846258
dtype: float64

This matches (as expected) the value in the graph. Let's compare with three other sources:

  1. Alas it does not match the calculation reported by Bart Mesuere on 2020-11-03 based on the RKI model that reports 0.96:

twitter: https://twitter.com/BartMesuere/status/1323519613855059968

  1. Also, the more elaborated model from rtliveglobal is not yet that optimistic. Mind that model rtlive start estimating the $Re(t)$ from the number of tests instead of the number of cases. It might be that other reporting delays are involved.

  2. epiforecast.io is already below 1 since beginning of November.

Another possiblity is that I made somewhere a mistake. If you spot it, please let me know.


My talk at data science leuven

Links to video and slides of the talk and thanking people.

Talk material

On 23 April 2020, I was invited for a talk at Data science Leuven. I talked about how you can explore and explain the results of a clustering exercise. The target audience are data scientists that that have notions of how to cluster data and that want to improve their skills.

The video is recorded on Youtube:

youtube: https://youtu.be/hk0arqhcX9U?t=3570

You can see the slides here: slides

The talk itself is based on this notebook that I published on this blog yesterday and that I used to demo during the talk.

The host of the conference was Istvan Hajnal. He tweeted the following:

twitter: https://twitter.com/dsleuven/status/1253391470444371968

He also took the R out of my family name NachteRgaele. Troubles with R, it's becoming a story of my life... 😂 Behind the scene Kris Peeters calmly took the heat of doing the live streaming. 👍 Almost Pydata quality! Big thanks to the whole Data Science Leuven team that is doing all this on voluntary basis.

Standing on the shoulders of the giants

This talks was not possible without the awesome Altair visualisation library made by Jake VanderPlas. Secondly, it builds upon the open source Shap library made by Scott Lundberg. Those two libraries had a major impact on my daily work as datascientist at Colruyt group. They inspired me in trying to give back to the open source community with this talk. 🤘

If you want to learn how to use Altair I recommend the tutorial made by Vincent Warmerdam on his calm code site: https://calmcode.io/altair/introduction.htm

I would also like to thank my collegues at work who endured the dry-run of this talk and who made the suggestion to try to use a classifier to explain the clustering result. Top team!

Awesome fastpages

Finally, this blog is build with the awesome fastpages. I can now share a rendered Jupyter notebook, with working interactive demos, that can be opened in My binder or Google Colab with one click on a button. This means that readers can directly tinker around with the code and methods discussed in the talk. All you need is a browser and an internet connection. So thank you Jeremy Howard, Hamel Husain, and the fastdotai team for pulling this off. Thank you Hamel Husain for your Github Actions. I will cast for two how awesome this all is.