## The Role of Artificial Intelligence in Future Technology

“That’s me!” is what world’s leading professional player Lee Sedol from South Korea would have proudly replied to someone asking “Who is the best player at the Chinese game of Go?” – until he was beaten in a five-game match in March 2016. His opponent was a computer program called “Alpha Go” (Silver et al., 2016). For the first time, a computer had exceeded human-level performance at playing Go. This was previously thought of as impossible because the number of valid sequences of moves is outrageously large: ~250150, compared to ~3580 for chess. The artificial player was not able to search this tree exhaustively; it had to mimic a human in that it assessed a given situation to make intelligent decisions – decisions more intelligent than the ones made by the human sitting across the table. Today, the best Go player is a computer. The machine, while using its classic strengths like processing power, also imitates human behavior and is now better and smarter than all humans in this particular field due to aforementioned ability. That can be considered a fundamental change. Continue reading The Role of Artificial Intelligence in Future Technology

## Matrix Types Cheat Sheet

In the field of linear algebra there are variety of different matrix types. Each has its own definition and relevance. I had trouble finding a good overview online and thought I’d compile a list myself: This article lists a selection of matrix types as well as their definition, mostly based on the corresponding Wikipedia articles. Continue reading Matrix Types Cheat Sheet

## Linear Relationships in the Transformer’s Positional Encoding

In June 2017, Vaswani et al. published the paper “Attention Is All You Need” describing the “Transformer” architecture, which is a purely attention based sequence to sequence model. It can be applied to many tasks, such as language translation and text summarization. Since its publication, the paper has been cited more than one thousand times and several excellent blog posts were written on the topic; I recommend this one.

Vaswani et al. use positional encoding, to inject information about a token’s position within a sentence into the model. The exact definition is written down in section 3.5 of the paper (it is only a tiny aspect of the Transformer, as the red circle in the cover picture of this post indicates). After the definition, the authors state:

“We chose this function because we hypothesized it would allow the model to easily learn to attend by relative positions, since for any fixed offset $k$, $PE_{pos+k}$ can be represented as a linear function of $PE_{pos}$.”

But why is that? In this post I prove this linear relationship between relative positions in the Transformer’s positional encoding. Continue reading Linear Relationships in the Transformer’s Positional Encoding

## Using TensorFlow’s Batch Normalization Correctly

The TensorFlow library’s layers API contains a function for batch normalization: tf.layers.batch_normalization. It is supposedly as easy to use as all the other tf.layers functions, however, it has some pitfalls. This post explains how to use tf.layers.batch_normalization correctly. It does not delve into what batch normalization is, which can be looked up in the paper “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” by Ioeffe and Szegedy (2015). Continue reading Using TensorFlow’s Batch Normalization Correctly

## Notes on “Haskell Programming – from first principles”

From November, 13th 2017 to June, 9th 2018, a friend and I were working our way through the 1285 pages of “Haskell Programming – from first principles” by Christopher Allen and Julie Moronuki. That’s more than six pages per day! While reading and discussing, I took a few notes here and there, which I want to publish in this post. Some of the sentences are directly taken from the book, which I highly recommend to anyone who wants to learn Haskell, by the way. Continue reading Notes on “Haskell Programming – from first principles”

## Making “Slice” Pointfree

Let the Haskell function slice be defined as

slice :: Int -> Int -> [a] -> [a]
slice from len xs = take len (drop from xs)

It takes to integral values, marking the beginning and the length of a sub-list, which is sliced out of the third parameter (a list). Applied to strings, this function may be known as substring. Here are some examples, illustrating what it does:

Prelude> slice 2 3 [0..9]
[2,3,4]
Prelude> slice 2 10 [0..9]
[2,3,4,5,6,7,8,9]

The goal is, to make slice pointfree, i.e. write it as slice = ..., and thereby illustrate the systematic approach of doing so. Continue reading Making “Slice” Pointfree

## Digital-Piano Dashboard

I have recently acquired a digital piano. Other than playing it, I have spent some time processing and visualizing its MIDI output data. What’s that? The digital piano has a USB port for data streaming (while somebody is playing). This data contains among other things information about what key is being pushed at a given time. The Digital-Piano Dashboard visualizes that information and renders a nice piano-key-push-history. Continue reading Digital-Piano Dashboard