Personal blog and portfolio

Text generation by Markov Chains

text-generation, markov-chains

I was listening to the (great) podcast Idle Thumbs by Chris Remo and others. This is a largely gaming-related podcast, but at the end of every podcast, Chris will read out some interesting e-mails received by fans (the so-called "reader mail" section). At the very end of Episode 273, by a fan's request, they discuss this tumblr post where a computer supposedly generated a synopsis of a Batman episode from a large corpus of old episode synopses.

Then at the end of Episode 278, they discuss another post by the same author, this time a generated Yelp review of the Catacombs of Paris. Both of these stories are, besides funny, quite well structured. There are multiple references to earlier sentences and they form some kind of consistent story, at least, it seems to be that way.

(unintelligent) Markov chains...

The python script behind these posts reveils that generating these texts is actually a very manual process. It simply analyzes some corpus input file, generates Markov chains of words in a sentence, and lets the user pick from the best 20 suggestions. After picking a word, it's added to the output and a new list of 20 suggested words is shown. This allows the user to steer towards an interesting or funny story.

Let's try this myself.

Somewhat disappointed about the lack of intelligence in the script, I decided I should at least give it a go with my own corpus, so I collected the podcast descriptions from Idle Thumbs episodes 200 up to 279 (currently the latest released episode) and put them in a text file.

I tried the randomized mode, where the script would pick a number of random words from the list of suggestions, but this didn't yield very interesting results. I think this has to do with the small dataset and large variety of words, so the suggestions aren't very good except from the top few suggested words. I tried again but only picking the first suggestion unless this caused a loop and ended up with the following text (after formatting):

The time to the left and right. "I'm hungry", says the pilot over text chat.
You see he's named Nick. You're hungry too. Maybe there's somewhere to eat.
Maybe to two of you will be spoiled.

This week we take a look into the creation of this podcast. It is about
Metal Gear Solid V: The Phantom Pain, a game that stirs up memories of something
we loved dearly, long ago. Then we'll let you in on the sad space dad's craze that
has taken the gaming world by storm. If that isn't enough, stick with us for the
video game Downwell. And this is a perfect moment to preserve as a photo. You
try to clear up the shot by setting any enemies to hidden and find yourself
suddenly alone. on Idle Thumbs podcast descriptions

Note: The italic word video is the only word where I picked the second-best suggestion, since it would otherwise choose time, which started the entire sequence all over again.

I thought this was a somewhat interesting result (and surprisingly consistent). It could very well be a podcast description for an Idle Thumbs podcast (an arbitrary episode for comparison). I might give text generation a go myself some day.

Network: a HTTP and HTTPS Java library

java, telegram

My last post focussed on a thread-blocking rate-limiter for a Telegram Bot API framework I'm building. For this framework, I needed a way to make sending GET and POST requests very simple. If you look up any Java code snippets that do not require external libraries for making a simple GET request, you may see that this isn't exactly a neat and compact task. If you want to do more complex stuff, the code gets messy very quickly. Especially if you want to send POST requests with Content-Type: multipart/formdata, which is a complex way of saying you want to be able to upload files to a server. That's why I wrote a little library that hides the details and offers a simple interface.

Design goals

My design goals where as follows:

  • No external libraries. The less dependencies, the better.
  • Simple interface. I don't need to know about complexity that I don't care about as a user.
  • As few method calls as possible for the user. If I want to make just one request, I only want to call one method.
  • No callbacks. I will be making my API calls from various threads so I intend to just block the thread until I receive a response. A typical HTTP request takes a few hundred milliseconds, so this is acceptable to me.
  • I need some control over the timeout limits for making a connection and maintaining a connection because I intend to use long polling for receiving the updates.
  • Modular code. No dependencies regarding the framework, so I can re-use it in other projects.

Eclipse.. *sigh*

I had to re-write almost everything from scratch just after I finished. Eclipse decided to delete all my files while I was doing final refactoring. It offered me to undo 'my' mistake, but then couldn't find the files and just froze. The files were gone. Since it was only about 700 lines of code at that point, it took me just a little over an hour to reproduce it from memory. I have never seen Eclipse do anything like this, but I've had it running continiously for several days. I guess I learned why version management is important, even when you're not getting paid to keep your project management in order.

Code? Did you say CODE!?

The code is available on my Github profile. The README file contains some useful examples, but to illustrate how it applies to the framework, here are some specific examples to Telegram bots. Note that the API token I'm using here is the example token from the API documentation, so don't waste your time trying to use it.

NB: I have left out a bit of exception handling to make the code snippets more readable.

Example: Sending a text message to a chat

The code snippet below sends a text message saying Hello, world! to the chat with ID 1234567890.

By the way, if you need the ID of any chat, just add my bot @GetMyIDBot to the chat or forward a message from a specific chat to it. It will reply with a message containing the information you requested.

Connection con = new HttpsConnection("");
String basepath = "bot123456:ABC-DEF1234ghIkl-zyx57W2v1u123ew11/";

Map<String, String> fields = new HashMap<String, String>();
fields.put("chat_id", "1234567890");
fields.put("text", "Hello, world!"); + "sendMessage", fields);

Example: Sending a photo to a channel

The code snippet below sends a photo to a Telegram channel named @ExampleChannel, with a caption saying What a lovely photo!.

Connection con = new HttpsConnection("");
String basepath = "bot123456:ABC-DEF1234ghIkl-zyx57W2v1u123ew11/";

Map<String, String> fields = new HashMap<String, String>();
fields.put("chat_id", "@ExampleChannel");
fields.put("caption", "What a lovely photo!");

byte[] photoBytes = Files.readAllBytes(Paths.get("photo.jpg"));
InputFile photoFile = new InputFile("photo.jpg", "image/jpg", photoBytes); + "sendPhoto", fields, "photo", photoFile);

As you can see, sending requests is pretty simple, even if they're as complex as a multipart/formdata request.

A quick and simple Java rate limiter

java, telegram

For my current project (a Telegram Bot API framework) I had to figure out a way to limit the amount of outgoing API calls that bots would send. I did not want the bot programmer to have to deal with the error message Telegram sends you when you've passed your requests per second limit, nor with the timing of the API calls.

The API limits work as follows: bots can send up to 30 messages per second, with an additional limitation that they can only send 20 messages per minute for a specific group chat. This RateLimiter will be the basis of such a system.

The way my framework works is that every update is handled as a task in some thread pool. Whenever the bot sends a message, this thread is blocked until the API responds. To limit the outgoing requests, I will block the thread some more before sending, if necessary. This way, incoming updates can be handled without having to deal with asynchronous requests, callbacks, etc. Instead, the API calls just block and wait, then return the proper value.

Here's what I was able to come up with in about an hour. In terms of the finished lines of code, this is probably worth 10 minutes of programming, but could be useful to anyone facing the same problem.

package nl.jochemkuijpers.ratelimiter;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;

 * A simple BlockingQueue based rate limiter.
 * Usage: call limit() to throttle the current thread (blocks)
 * and call tick() at regular intervals from a separate thread.
public class RateLimiter {
	private final long fillPeriod;
	private final BlockingQueue<Object> queue;
	private long timer;

	 * Create a simple blocking queue based rate limiter with a
	 * certain capacity and fill rate. Be careful when handling
	 * lots of requests with a high capacity as memory usage 
	 * scales with capacity.
	 * @param capacity
	 *            capacity before rate limiting kicks in
	 * @param rate
	 *            rate limit in allowed calls per second
	public RateLimiter(int capacity, double rate) {
		if (rate <= 0) {
			this.fillPeriod = Long.MAX_VALUE;
		} else {
			this.fillPeriod = (long) (1000000000L / rate);
		this.queue = new ArrayBlockingQueue<Object>(capacity);
		this.timer = System.nanoTime();

	 * Tick the rate limiter, advancing the timer and possibly
	 * unblocking calls to limit()
	public synchronized void tick() {
		long elapsedTime = System.nanoTime() - timer;
		int numToRemove = (int) (elapsedTime / fillPeriod);

		// advance timer
		timer += fillPeriod * numToRemove;

		List<Object> discardedObjects = new ArrayList<Object>(numToRemove);
		queue.drainTo(discardedObjects, numToRemove);

	 * A call to this method blocks when it is called too often
	 * (depleted capacity).
	 * @return false when interrupted, otherwise true
	public boolean limit() {
		try {
			queue.put(new Object());
		} catch (InterruptedException e) {
			return false;
		return true;

And here is an example on how to use it:

import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;

public class Example {
	public static void main(String[] args) {
		// capacity of 10 and a rate of 1/second
		RateLimiter limiter = new RateLimiter(10, 1);

		// schedule rate limiter ticks every 100 milliseconds
		ScheduledExecutorService scheduler = Executors
		scheduler.scheduleAtFixedRate(new Runnable() {
			public void run() {
		}, 0, 100, TimeUnit.MILLISECONDS);
		// bark 100 times, but limit using the rate limiter
		for (int n = 0; n < 100; n += 1) {
			System.out.println("bark #" + n);

It works by utilising Java's standard library's BlockingQueue, which blocks when you try to add an object when it is full (only works with bounded implementations such as the ArrayBlockingQueue). It requires the tick method to be called at regular intervals. This should be done at a frequency in the same order of magnitude as the fill rate passed to the constructor. Example: If the fill rate is 10, you probably want to call tick every 100 milliseconds (or 50, or 150, whatever).

The drawback of this implementation is that it will store unused objects that literaly serve no purpose other than to have the BlockingQueue fill up and block when it reaches its capacity. These objects need to be disposed of by the garbage collector, and take up a bit of valuable memory space.

Since I will be using a small amount of RateLimiter objects combined with small capacities (less than a hundred), I think the current solution is fine for now. I'm posting this mainly because I think it's an interesting way to implement a rate limiter as it automatically handles the order of unblocking blocked threads.

I'm building a Telegram Bot API framework


I'm working on a Telegram Bot API framework that will eventually be released as an open-source project. Some of my Telegram bots already run on a prototype of this framework:

  • @ZombieBot - A simple bot that replies in zombie-speak.
  • @GetMyIDBot - This bot replies with your user ID, chat ID and various other bits of information. Useful for when you want to quickly find out a chat ID so you can have your bot send messages there.
  • @ScoreDevBot - Keeps track of the score in your group conversations. Currently in development.

Don't worry. I haven't forgotten about @SudokuBot. I will get to that once this project is finished, because of obvious reasons.

About me

I'm a 23-year-old Computer Science student at Eindhoven University of Technology.

More on the about page.



telegram, java, meta, text-generation, markov-chains

Other stuff