Linear Relationships in the Transformer’s Positional Encoding

In June 2017, Vaswani et al. published the paper “Attention Is All You Need” describing the “Transformer” architecture, which is a purely attention based sequence to sequence model. It can be applied to many tasks, such as language translation and text summarization. Since its publication, the paper has been cited more than one thousand times and several excellent blog posts were written on the topic; I recommend this one.

Vaswani et al. use positional encoding, to inject information about a token’s position within a sentence into the model. The exact definition is written down in section 3.5 of the paper (it is only a tiny aspect of the Transformer, as the red circle in the cover picture of this post indicates). After the definition, the authors state:

“We chose this function because we hypothesized it would allow the model to easily learn to attend by relative positions, since for any fixed offset $k$, $PE_{pos+k}$ can be represented as a linear function of $PE_{pos}$.”

But why is that? In this post I prove this linear relationship between relative positions in the Transformer’s positional encoding. Continue reading Linear Relationships in the Transformer’s Positional Encoding

Using TensorFlow’s Batch Normalization Correctly

The TensorFlow library’s layers API contains a function for batch normalization: tf.layers.batch_normalization. It is supposedly as easy to use as all the other tf.layers functions, however, it has some pitfalls. This post explains how to use tf.layers.batch_normalization correctly. It does not delve into what batch normalization is, which can be looked up in the paper “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” by Ioeffe and Szegedy (2015). Continue reading Using TensorFlow’s Batch Normalization Correctly

Notes on “Haskell Programming – from first principles”

From November, 13th 2017 to June, 9th 2018, a friend and I were working our way through the 1285 pages of “Haskell Programming – from first principles” by Christopher Allen and Julie Moronuki. That’s more than six pages per day! While reading and discussing, I took a few notes here and there, which I want to publish in this post. Some of the sentences are directly taken from the book, which I highly recommend to anyone who wants to learn Haskell, by the way. Continue reading Notes on “Haskell Programming – from first principles”

Poker Heads-Up Pre-Flop Odds

In this article we define and publish the exact pre-flop probabilities for each possible combination of two hands in Textas Hold’em poker. An online tool at makes the data visually accessible. Continue reading Poker Heads-Up Pre-Flop Odds

Making “Slice” Pointfree

Let the Haskell function slice be defined as

slice :: Int -> Int -> [a] -> [a]
slice from len xs = take len (drop from xs)

It takes to integral values, marking the beginning and the length of a sub-list, which is sliced out of the third parameter (a list). Applied to strings, this function may be known as substring. Here are some examples, illustrating what it does:

Prelude> slice 2 3 [0..9]
Prelude> slice 2 10 [0..9]

The goal is, to make slice pointfree, i.e. write it as slice = ..., and thereby illustrate the systematic approach of doing so. Continue reading Making “Slice” Pointfree

Digital-Piano Dashboard

I have recently acquired a digital piano. Other than playing it, I have spent some time processing and visualizing its MIDI output data. What’s that? The digital piano has a USB port for data streaming (while somebody is playing). This data contains among other things information about what key is being pushed at a given time. The Digital-Piano Dashboard visualizes that information and renders a nice piano-key-push-history. Continue reading Digital-Piano Dashboard

Haskell BNF Parser

This is a brief update about a project I have been working on lately. It’s my first bigger Haskell project, and about parsing a Backus-Naur form (BNF) expression and returning it in JSON format. More formally, this can be seen as compilation between two languages, namely BNF and JSON. Continue reading Haskell BNF Parser


Corsairs3D is a riveting single player game that puts you into a pirate’s shoes. Your ship sails around an island, defended by brave guards, with the goal being, to collect as many coins as possible. But watch out: The defense tour’s cannons are pointing your way. If you do not change course quickly enough, your ship might sink. Continue reading Corsairs3D

Connecting PyCharm to a TensorFlow Docker Container

This guide walks you through setting up PyCharm Professional and Docker, with the goal of developing TensorFlow applications in PyCharm, while executing the code inside of an encapsulated container. After completing the following steps, you will be able to write Python code in PyCharm, and have the execution take place on a container, without any complication. Continue reading Connecting PyCharm to a TensorFlow Docker Container