JochemKuijpers.nl Personal blog and portfolio

How to share your code through GitHub

github, git, programming

I recently got asked how to share a piece of code. This person explained to me they did not know how to properly share a piece of code. I don't think there's a proper way, but there are certainly less convenient ways of sharing your code.

Small disclaimer: I'm largely self-thought in the use of git. I've used version control software (including git) during my Computer Science courses, and I often use git for my personal projects. I think I have a pretty good idea of how git works, good enough to explain the basics. Note that this is merely an introduction. I will not go into details about branches or other advanced operations.

I will start by explaining the terminology, then I will give a few examples on how to use git and GitHub.

So what is git, and what is GitHub?

Let's ignore GitHub for now, we'll first have a look at what problems git solves for us.

What problem does git solve?

When working on a program, it sometimes occurs you want to restore a previous version of your code, because you tried to fix something but accidentaly broke a lot of other code. This is not possible as every change overwrites your old code files and you end up re-writing it all from memory.

Another problem you may encounter when writing software, is that multiple programmers may be working on similar files. Say you store your code on a shared network folder, it is possible you overwrite changes another programmer made because you edited a small line of code somewhere else in the file.

There's all kinds of issues with multiple people trying to work on the same set of code files. Beginning programmers often move towards Dropbox or Google Drive, or other similar file sharing services, but may notice errors or duplicate files whenever two people edit the same file.

Git solves all of these problems by giving each programmer access to their own set of code files to edit. Git then records the changed made to these files, and applies them to the files of other programmers, though git doesn't do this automatically. The record of your current and all previous versions of your software, is called a repository.

Index

When you're making changes to your local files, it is important that git knows which files are important to track. You need to specify to git which files it needs to track by adding them to the git index. You can do this at any point in time, even mid-project, but keep in mind that any file not in the index will not be tracked by git and that file will not be shared with the other programmers on your project.

Files you don't necessarily want to track in git, are files that are different for each programmer. For example configuration files that hold absolute paths. Often, you also want to keep database passwords and such out of the git index. A good alternative is a text file (often called README.md) which describes that these files need to be set-up manually by anyone using the code in this repository. You can prevent adding configuration files or other files you don't want in your git index by adding them to a file called .gitignore ('dot-git-ignore', indeed). More on that later.

Committing

Once you're done making changes, it is time to record these changes and group the changes in all the files together. This is called committing. This is where you write a small summary (called a commit message) of your changes so everyone knows why you made them.

The word 'commit' should be seen the way it's used in the sentence: "to commit to an idea". In other words, you are determined to share the current state of your code with other programmers. Implicitly, this means you agree that this is a good change to the source code. You commit to this change, so to speak. Often, a team will only accept working and compiling code. If your code does not compile, generates errors or does not work correctly, yet, it's not yet time to commit your changes.

Exceptions to this rule may occus when setting up a new project. It may be helpful in the very beginning to set up a skeleton project that does not necessarily compile, but allows multiple programmers to start working on getting the initial working version.

When you commit, your changes are grouped into one big report. This report of changes, including a changelog and your name and e-mail address (the author of the commit), is called a commit. This commit is stored in your local repository.

Wait..? Aren't we sending our changes to other programmers? Why is it stored in our local repository? Read on.. :)

Pushing

This is where remotes and GitHub come in..

Every programmer on your team has their own local repository, tracking their own local changes and keeping a record of all commits ever made to that repository. To share and use each others commits, you need some way to get your local commits to the repository of another programmer, and vice versa. You can do this by pushing your changes to a remote repository. Pushing here, means uploading.

A remote is basically a normal repository, but accessible to everyone on your team and sometimes to the rest of the world as well (if the repository is a public one). Your repository may be a remote of someone else's repository. Often though, GitHub hosts the remote repository for all programmers. They all push to GitHub and pull from GitHub. This makes the GitHub repository a centralized repository. You could just as easily use one of your team members' repositories as a remote. GitHub is just an easy solution which, in addition, has a lot of nice features on their website (such as issue tracking, wikis, etc.).

There's an important issue here: What if your commit is not compatible with a commit made by someone else? For example, say you added a line of code to the function doSomething() but before your commit got to the remote repository, someone else removed the entire function and pushed their commit first. Now your commit is no longer valid and cannot be applied to the remote repository because commits where made that you've never applied to your local repository. This is when your push gets rejected.

Pulling and merge conflict resolving

Just like we can push, we can also pull changes from the remote repository. This is how other programmers would get your commits to their repositories: they would pull from the remote repository after you pushed your commits there. Pulling means downloading here.

Let's stick with our scenario from the last section, where you wanted to push some commits, but got rejected. Remember how we will only commit working code (see the commit section)? We will have to fix this with another commit before we're allowed to push to the remote repository again. So we will pull all the changes that have been recorded in the remote repository since the last time. These changes are incompatible, so this will create a merge conflict. Git will indicate which files have merge conflicts and we will have to manually inspect which changes where made in our version and in the remote version of the time.

The commit messages of the remote commits will be available to us to figure out how to resolve this merge conflict. This is why it is important to write a good commit message.

Once you have resolved the merge conflict, you can commit these changes and try to push them to the remote repository again. Be sure to check your local code actually compiles before committing. We don't want to break the remote repository everyone depends on!

Now that the remote repository contains your changes, other programmers can pull the remote repository to their own local repositories. This allows other programmers to work with our changes. In the same way, you should pull every time you start working on a new feature or bug, to ensure you are working with the latest version of the code.

Cloning

We have yet to discuss a small, but important, part of git. Sometimes, a new programmer joins the team. To set up their local repository, they can simply clone the remote repository. This means all previous changes to the repository are now also on the machine of this new programmer. It's a simple command, but it's important to know. This is how you can participate in other programmer's projects!

If you understand the above text , then you understand the basic concepts of git. Now take a look at the examples below and see if you can execute them yourself.

Example usage of git

Below are some examples on how to use git. I assume you have installed the command line program git. You can use GUI clients as well, which do largely the same thing, but as a programmer, a command line shouldn't scare you.

To name a few git GUIs you can use:

  • Git GUI (the download contains both command line and a graphical interface for git)
  • TortoiseGit (shell integration for Windows)
  • GitHub Desktop (streamlined but somewhat limited interface by GitHub)

If you're confident and you want to use the command line, be sure to navigate to the repository root directory. This is also called a working directory. It's the directory that holds all your files  and the .git folder.

As a side note: You may or may not see files and folders starting with a dot. Under linux, these files are hidden by default. Windows 10 seems to have copied that behaviour. If you can't see these files, that doesn't mean they're not here. As a developer, you'll probably want to see them to give you absolute control over what is in your git repository, so you should look up how to achieve that with your file explorer.

Create a local repository

Creating a local repository is simple. It's one single command:

git init [directory]

This will create a git repository (and a folder named .git) in the directory you specified, or in the current directory if you did not specify a directory (note: directory means folder. It's the same thing).

The .git folder contains all the repository data. You should not touch it, it's what holds all of the data your git repository needs to function properly. The parent folder, which holds the .git folder, is called the working directory. This is where you put all your code and other project files. You can include documentation here as well, but GitHub has a nice feature called Wikis, which is better suited for shared documentation.

You can find the git-init documentation here.

Adding files to the index

When adding files to the working directory, they're not immediately tracked by git. As I described in the index section above, you may not necessarily want to track every file. Configuration files and such, often are left out. Most of the time, you only want to share the code files.

Adding files in the working directory is done using the following command:

git add <path>

Note that <path> is not optional. If you specify a directory here, all files in the directory are added to the index. You can combine this with the fact that a single dot refers to the current directory, in order to arrive at this command: git add .. That's right: "git, add, dot". This adds all files in the working directory to the index, including hidden files.

You can find the git-add documentation here.

.gitignore

The one exception to git add is that all files that match one or more paths in the .gitignore file, are ignored by git when adding them to the index. If they're already in the index, matching to .gitignore will not remove them from the index.

For example, take the following files:

conf/database.ini
helloworld.cpp
build/executable.so

And take the following content of .gitignore:

conf
*.so

If we would execute git add . on this working directory, we would only add helloworld.cpp, since the other files match to either line in the .gitignore file.

You can find the gitignore documentation here.

Committing changes

Now imagine you added some files, deleted some others, made changes to even more files. Now you want to commit them.

You can do that using the following command:

git commit --all --message="Your commit message goes here"

This is a pretty long command, and you probably don't want to type it all the time. So here's exactly the same command, but shorter:

git commit -am "Your commit message goes here"

This gathers all the changes in all files in the git index, and creates a commit out of them. The commit message is added.

Pushing to a remote repository

Let's push our previous commits to a remote repository.

There's a little bit of setup left to do if this is a repository that we created ourselves. First off, head to GitHub.com and log into your account. Now create a new repository.

GitHub should now show you the exact steps I'm about to tell you. So here's what you do:

Copy the HTTPS link Github provides. It looks like this: https://github.com/<username>/<repositoryname>.git

Now paste it into the following command:

git remote add origin <HTTPS link goes here>

Now that we've told git where the remote repository called origin can be found, all that's left to do is push our changes there:

git push -u origin master

This command pushes the local changes to the remote repository. I have not discussed branches, and I will not do that in this guide, but that's where the master part comes from. We're pushing the current changes to the master branch of the remote repository called origin. The -u part is only required when setting up. When the remote repository is already setup you can leave it out and the full command becomes git push origin master.

A side-note regarding remote repositories

A remote is added automatically to our local repository when we clone an existing repository. This repository you are cloning from is usually called 'upstream', because all changes made there also affect us and any of the 'downstream' repositories that depend on our repository.

GitHub repositories are usually upstream from the developer's local repository.

You can find the git-push documentation here.

Pulling from the remote repository

To retrieve and apply commits from the remote repository, execute the following command:

git pull

That's it. In more advanced set-ups, you may need to specify more options.

You can find the git-pull documentation here.

Cloning an existing repository from GitHub

Let's say you've found a cool project on GitHub. Perhaps it's your own project, but you've never worked on it from this computer. Now we're going to clone this repository to your local machine so you can work on it and later on push your commits back to the repository.

Note that you need write-access to the remote repository if you are going to push. In general, if you find a repository that is not yours, you can fork it on github (basically copy it), to create your own remote repository with the same code. Then you can clone this repository onto your local machine, commit and push changes to it, since it is your remote repository. Then on GitHub, you can issue a pull request. This is a request which asks the maintainer of the original GitHub repository to look at the changes you've made and considder pulling changes from your GitHub fork into the original repository.

That's quite a mouthful.. Here's the command to clone a repository:

git clone https://github.com/<username>/<repositoryname>.git [directory]

This clones the specified repository into the directory. You can also leave out the directory and clone into the current directory. You can find the HTTPS url on the GitHub page of the repository you want to clone. Just click the green Clone button.

You can find the git-clone documentation here.

Reverting commits

We haven't really touched upon this, but since it was one of the reasons I made you read all of this, I thought I should share a bit of information.

Reverting commits is done using the git revert command. How you should go about this depends entirely on the commit you want to revert, but in general, reverting a commit that has been pushed to other repositories than your own, will always require a new commit to "undo" the changes. You're not actually changing the repository history. Instead, you apply new changes, that revert the repository to the old state of a certain commit.

You can find the git-revert documentation here.

Excercise

To test whether you understand the basics of git and GitHub, try to execute the following steps. You can use the command git status to inspect the current state of your repository between each step. I've uploaded my command line output here. In case you're not familiar with the syntax, the lines starting with dollar signs are commands entered by me. The other lines are output.

  1. Create a local repository somewhere on your computer
  2. Create two text documents: A.txt and B.txt
  3. Add only A.txt to the git index
  4. Create a commit, add a message stating which document is added.
  5. Create a new repository on GitHub.
  6. Add the remote to the local repository
  7. Push to the remote repository
  8. Delete the entire local repository
  9. Clone the remote repository

If you have done everything correcty, you should now only see one file named A.txt, B.txt has been lost because we never added it to the index.

If you have any questions that Google or the linked documentation cannot answer for you (and please do check those first!), feel free to contact me via the contact page. At some point in the future, I will implement a comment system on my website and you'll be able to ask questions there as well.

Text generation by Markov Chains

text-generation, markov-chains

I was listening to the (great) podcast Idle Thumbs by Chris Remo and others. This is a largely gaming-related podcast, but at the end of every podcast, Chris will read out some interesting e-mails received by fans (the so-called "reader mail" section). At the very end of Episode 273, by a fan's request, they discuss this tumblr post where a computer supposedly generated a synopsis of a Batman episode from a large corpus of old episode synopses.

Then at the end of Episode 278, they discuss another post by the same author, this time a generated Yelp review of the Catacombs of Paris. Both of these stories are, besides funny, quite well structured. There are multiple references to earlier sentences and they form some kind of consistent story, at least, it seems to be that way.

(unintelligent) Markov chains...

The python script behind these posts reveils that generating these texts is actually a very manual process. It simply analyzes some corpus input file, generates Markov chains of words in a sentence, and lets the user pick from the best 20 suggestions. After picking a word, it's added to the output and a new list of 20 suggested words is shown. This allows the user to steer towards an interesting or funny story.

Let's try this myself.

Somewhat disappointed about the lack of intelligence in the script, I decided I should at least give it a go with my own corpus, so I collected the podcast descriptions from Idle Thumbs episodes 200 up to 279 (currently the latest released episode) and put them in a text file.

I tried the randomized mode, where the script would pick a number of random words from the list of suggestions, but this didn't yield very interesting results. I think this has to do with the small dataset and large variety of words, so the suggestions aren't very good except from the top few suggested words. I tried again but only picking the first suggestion unless this caused a loop and ended up with the following text (after formatting):

The time to the left and right. "I'm hungry", says the pilot over text chat.
You see he's named Nick. You're hungry too. Maybe there's somewhere to eat.
Maybe to two of you will be spoiled.

This week we take a look into the creation of this podcast. It is about
Metal Gear Solid V: The Phantom Pain, a game that stirs up memories of something
we loved dearly, long ago. Then we'll let you in on the sad space dad's craze that
has taken the gaming world by storm. If that isn't enough, stick with us for the
video game Downwell. And this is a perfect moment to preserve as a photo. You
try to clear up the shot by setting any enemies to hidden and find yourself
suddenly alone.

voicebox.py on Idle Thumbs podcast descriptions

Note: The italic word video is the only word where I picked the second-best suggestion, since it would otherwise choose time, which started the entire sequence all over again.

I thought this was a somewhat interesting result (and surprisingly consistent). It could very well be a podcast description for an Idle Thumbs podcast (an arbitrary episode for comparison). I might give text generation a go myself some day.

Network: a HTTP and HTTPS Java library

java, telegram

My last post focussed on a thread-blocking rate-limiter for a Telegram Bot API framework I'm building. For this framework, I needed a way to make sending GET and POST requests very simple. If you look up any Java code snippets that do not require external libraries for making a simple GET request, you may see that this isn't exactly a neat and compact task. If you want to do more complex stuff, the code gets messy very quickly. Especially if you want to send POST requests with Content-Type: multipart/formdata, which is a complex way of saying you want to be able to upload files to a server. That's why I wrote a little library that hides the details and offers a simple interface.

Design goals

My design goals where as follows:

  • No external libraries. The less dependencies, the better.
  • Simple interface. I don't need to know about complexity that I don't care about as a user.
  • As few method calls as possible for the user. If I want to make just one request, I only want to call one method.
  • No callbacks. I will be making my API calls from various threads so I intend to just block the thread until I receive a response. A typical HTTP request takes a few hundred milliseconds, so this is acceptable to me.
  • I need some control over the timeout limits for making a connection and maintaining a connection because I intend to use long polling for receiving the updates.
  • Modular code. No dependencies regarding the framework, so I can re-use it in other projects.

Eclipse.. *sigh*

I had to re-write almost everything from scratch just after I finished. Eclipse decided to delete all my files while I was doing final refactoring. It offered me to undo 'my' mistake, but then couldn't find the files and just froze. The files were gone. Since it was only about 700 lines of code at that point, it took me just a little over an hour to reproduce it from memory. I have never seen Eclipse do anything like this, but I've had it running continiously for several days. I guess I learned why version management is important, even when you're not getting paid to keep your project management in order.

Code? Did you say CODE!?

The code is available on my Github profile. The README file contains some useful examples, but to illustrate how it applies to the framework, here are some specific examples to Telegram bots. Note that the API token I'm using here is the example token from the API documentation, so don't waste your time trying to use it.

NB: I have left out a bit of exception handling to make the code snippets more readable.

Example: Sending a text message to a chat

The code snippet below sends a text message saying Hello, world! to the chat with ID 1234567890.

By the way, if you need the ID of any chat, just add my bot @GetMyIDBot to the chat or forward a message from a specific chat to it. It will reply with a message containing the information you requested.

Connection con = new HttpsConnection("api.telegram.org");
String basepath = "bot123456:ABC-DEF1234ghIkl-zyx57W2v1u123ew11/";

Map<String, String> fields = new HashMap<String, String>();
fields.put("chat_id", "1234567890");
fields.put("text", "Hello, world!");

con.post(basepath + "sendMessage", fields);

Example: Sending a photo to a channel

The code snippet below sends a photo to a Telegram channel named @ExampleChannel, with a caption saying What a lovely photo!.

Connection con = new HttpsConnection("api.telegram.org");
String basepath = "bot123456:ABC-DEF1234ghIkl-zyx57W2v1u123ew11/";

Map<String, String> fields = new HashMap<String, String>();
fields.put("chat_id", "@ExampleChannel");
fields.put("caption", "What a lovely photo!");

byte[] photoBytes = Files.readAllBytes(Paths.get("photo.jpg"));
InputFile photoFile = new InputFile("photo.jpg", "image/jpg", photoBytes);

con.post(basepath + "sendPhoto", fields, "photo", photoFile);

As you can see, sending requests is pretty simple, even if they're as complex as a multipart/formdata request.

A quick and simple Java rate limiter

java, telegram

For my current project (a Telegram Bot API framework) I had to figure out a way to limit the amount of outgoing API calls that bots would send. I did not want the bot programmer to have to deal with the error message Telegram sends you when you've passed your requests per second limit, nor with the timing of the API calls.

The API limits work as follows: bots can send up to 30 messages per second, with an additional limitation that they can only send 20 messages per minute for a specific group chat. This RateLimiter will be the basis of such a system.

The way my framework works is that every update is handled as a task in some thread pool. Whenever the bot sends a message, this thread is blocked until the API responds. To limit the outgoing requests, I will block the thread some more before sending, if necessary. This way, incoming updates can be handled without having to deal with asynchronous requests, callbacks, etc. Instead, the API calls just block and wait, then return the proper value.

Here's what I was able to come up with in about an hour. In terms of the finished lines of code, this is probably worth 10 minutes of programming, but could be useful to anyone facing the same problem.

package nl.jochemkuijpers.ratelimiter;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;

/**
 * A simple BlockingQueue based rate limiter.
 * Usage: call limit() to throttle the current thread (blocks)
 * and call tick() at regular intervals from a separate thread.
 */
public class RateLimiter {
	private final long fillPeriod;
	private final BlockingQueue<Object> queue;
	private long timer;

	/**
	 * Create a simple blocking queue based rate limiter with a
	 * certain capacity and fill rate. Be careful when handling
	 * lots of requests with a high capacity as memory usage 
	 * scales with capacity.
	 * 
	 * @param capacity
	 *            capacity before rate limiting kicks in
	 * @param rate
	 *            rate limit in allowed calls per second
	 */
	public RateLimiter(int capacity, double rate) {
		if (rate <= 0) {
			this.fillPeriod = Long.MAX_VALUE;
		} else {
			this.fillPeriod = (long) (1000000000L / rate);
		}
		this.queue = new ArrayBlockingQueue<Object>(capacity);
		this.timer = System.nanoTime();
	}

	/**
	 * Tick the rate limiter, advancing the timer and possibly
	 * unblocking calls to limit()
	 */
	public synchronized void tick() {
		long elapsedTime = System.nanoTime() - timer;
		int numToRemove = (int) (elapsedTime / fillPeriod);

		// advance timer
		timer += fillPeriod * numToRemove;

		List<Object> discardedObjects = new ArrayList<Object>(numToRemove);
		queue.drainTo(discardedObjects, numToRemove);
	}

	/**
	 * A call to this method blocks when it is called too often
	 * (depleted capacity).
	 * 
	 * @return false when interrupted, otherwise true
	 */
	public boolean limit() {
		try {
			queue.put(new Object());
		} catch (InterruptedException e) {
			return false;
		}
		return true;
	}
}

And here is an example on how to use it:

import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;

public class Example {
	public static void main(String[] args) {
		// capacity of 10 and a rate of 1/second
		RateLimiter limiter = new RateLimiter(10, 1);

		// schedule rate limiter ticks every 100 milliseconds
		ScheduledExecutorService scheduler = Executors
				.newSingleThreadScheduledExecutor();
		
		scheduler.scheduleAtFixedRate(new Runnable() {
			@Override
			public void run() {
				limiter.tick();
			};
		}, 0, 100, TimeUnit.MILLISECONDS);
		
		// bark 100 times, but limit using the rate limiter
		for (int n = 0; n < 100; n += 1) {
			limiter.limit();
			System.out.println("bark #" + n);
		}
	}
}

It works by utilising Java's standard library's BlockingQueue, which blocks when you try to add an object when it is full (only works with bounded implementations such as the ArrayBlockingQueue). It requires the tick method to be called at regular intervals. This should be done at a frequency in the same order of magnitude as the fill rate passed to the constructor. Example: If the fill rate is 10, you probably want to call tick every 100 milliseconds (or 50, or 150, whatever).

The drawback of this implementation is that it will store unused objects that literaly serve no purpose other than to have the BlockingQueue fill up and block when it reaches its capacity. These objects need to be disposed of by the garbage collector, and take up a bit of valuable memory space.

Since I will be using a small amount of RateLimiter objects combined with small capacities (less than a hundred), I think the current solution is fine for now. I'm posting this mainly because I think it's an interesting way to implement a rate limiter as it automatically handles the order of unblocking blocked threads.

I'm building a Telegram Bot API framework

telegram

I'm working on a Telegram Bot API framework that will eventually be released as an open-source project. Some of my Telegram bots already run on a prototype of this framework:

  • @ZombieBot - A simple bot that replies in zombie-speak.
  • @GetMyIDBot - This bot replies with your user ID, chat ID and various other bits of information. Useful for when you want to quickly find out a chat ID so you can have your bot send messages there.
  • @ScoreDevBot - Keeps track of the score in your group conversations. Currently in development.

Don't worry. I haven't forgotten about @SudokuBot. I will get to that once this project is finished, because of obvious reasons.

About me

I'm a 21-year-old Computer Science student at Eindhoven University of Technology.

More on the about page.

Archive

Tags

telegram, java, git, github, programming, text-generation, meta, markov-chains

Other stuff