ctagd

So one of my favorite things to code is network-related programs. During my third year of university one of my modules (aptly named "Networking") had us group up with one other person and develop several networking-related programs, of which a list can be found at the bottom of this page. This was what spawned my interests of socket programming, and that brings me to today's topic: ctagd.

This article will be focused on socket programming in C (i.e using sys/socket.h, etc). One of the frustrating things for me personally is the complexity needed to setup basic socket communication. Look, I understand why the complexity exists, it is there since the communication has been abstracted to allow the library to achieve more. Let me illustrate some of the complexity with a few examples.

When you aim to do something simple (simple being just passing messages between sockets), at a first glance the C code to accomplish this can seem very verbose. A good example of this is the complexity of this socket function for cmesg, and passing/sending messages here. Granted the cmesg code base isn't exactly very clear, but this is mostly due to the fact that I was making the spec up as I was going along.

Firstly we shall highlight what the base goal of ctagd is (later on we shall expand upon it and add more features). Firstly we want it to be simple, that is easy to use and easy to extend. By keeping it simple it shall make it easier to debug as well as increase performance (as we can focus more on tuning than debugging).

In trying to achieve the base goal of ctagd it naturally brings me to my first point. The unneeded (for what we want to achieve) function parameters that is used when setting up a socket connection (for client and server). There are easily 4 parameters that can be refactored out to reduce complexity. Ideally what we would like when opening a socket is just to get the socket file descriptor. Which we use to identify and differentiate client connections from one another.

The easiest way to simplify this initial initialization process is to abstract all of these parameters into a single struct (for both the server and client). This struct will essentially act as our Server/Client settings, hence adding new Server/Client settings means we only have to update the appropriate struct and init function itself. This means when we extend ctagd it wont break the init process of a server/client that are already in production (i.e we keep backward compatibility). This is exactly what ctagd does and can be found here.

Next we move on to message passing. Now lets say we want to send 'Some String', then the "standard" C way is to do something like this:

/* Assume socket_fd is a valid socket file descriptor */
char *s = "Some String";
int flag = 0;
send(socket_fd, s, strlen(s), flag);

Now when reading in the message the "standard" C way would be something like this:

/* Assume socket_fd is a valid socket file descriptor. */
char buff[129];
read(socket_fd, buff, 128);

buff[128] = 0; /* Null terminator. */
printf("%s\n", buff); /* Print recieved message. */

The eager reader would have noticed a slight limitation/problem with the above segment of code. That being we can only read up to 128 bytes of data, so in the case of us sending 'Some String' it is fine, but what happens if we where to send 256 bytes of data? The result would be that the first 128 bytes of buff will contain the first 128 bytes from our message (and one byte free), but there would still be 128 bytes of data left in the socket that we have to read, to get them we would then have to do something like:

/* Assume socket_fd is a valid socket file descriptor. */
char buff1[129];
char buff2[129];
read(socket_fd, buff1, 128);
read(socket_fd, buff2, 128);

buff1[128] = 0; /* Null terminator. */
buff2[128] = 0; /* Null terminator. */
printf("%s%s\n", buff1, buff2);  /* Print the full message. */

Now before I continue, yes I know we can just increase size of our buffer and bytes read to 256 too solve the problem, but that would defeat the purpose of what I'm trying to illustrate, that being variable message length. In these situations we know how long the messages are and hence can allocate the buffers appropriately. But what if we want variable length messages and do not know their intended length? Hopefully this illustrates the problem to the reader. Normally to solve this issue we send control messages that are fixed in length before we send our actual message, but this requires a lot of tedious setup. Now let me explain how ctagd constructs its messages to deal with this issue.

The concept of a message in ctagd is a simple one. The theory behind it goes as follows: the first byte of our message is the tag (meaning we can have 256 unique tags) followed by 4 bytes that specify the length of the payload (we denote this field as len) and then lastly our payload which is len bytes long. Hence it is of the form:

[byte:tag][4-bytes:len][len-bytes:payload]

This brings me to the concept of an smsg, an smsg is simply shorthand for struct message. Which is exactly how we store the message to make working with its data easier. An smsg makes sending and receiving variable length messages easier, for example:

Sending a message

  • Lets construct a smsg with tag 1 and message "Hello, World!", we do this like so: create_smsg('1', "Hello, World!", smsg_pointer);
  • invoke csend(socket, smsg_pointer);, this will send the smsg to socket.

Receiving an smsg

  • If we have queues enabled (which will be described later) we simply invoke smsg_pointer = recv_tag(some_tag); which will set smsg_pointer to point to the first smsg from that tags queue.
  • If we have queues disabled, we simply invoke cfetch(socket, smsg_pointer); which will set smsg_pointer to point to the smsg that was read from the socket.

As you can see, this makes sending and receiving variable length messages trivial, as cfetch/recv_tag and csend handle all the encoding and decoding of the smsg to and from a struct. For another example you can find a simple server using ctagd here and a simple client here.

Finally we move onto the recent feature that was added in v2.0, queues. The idea behind the queue is that each tagged message gets placed in its own queue based on its tag. We can than make a blocking recv based on tags. Both the server and client have queues and they "look" the same, but the implementation is a little different.

The Server queue is the more complex of the two. For each client connected to the server, there is a thread spawned (dubbed a handler for that client) that will read messages sent from that client and place the messages in the appropriate queue based on its tag. This process is thread safe as mutexes are locked based on the tag. Hence two messages with different tags received from different clients simultaneously may be put in their queues at the exact same time without ambiguous behavior. But when we get two messages with the same tag from different clients simultaneously, the mutexes will lock and only allow one to be queued and then unlock to allow the next to be queued. When we want to remove an smsg from the queue we then lock that queue (once again based on its tag), meaning no smsg (with the same tag) can be added to that queue, preventing ambiguous behavior.

In contrast the client queue is far simpler. The client spawns a single thread which recv's smsg's from the Server and places messages into the appropriate queue (based on the smsg's tag). Here we still need locks (not for receiving multiple messages at once, as this is not possible since we are reading from a single socket) but only for locking the queue to prevent an smsg from being removed whilst a new one is added, as this can result in ambiguous behavior (too the eager reader, this may in fact seem identical to the locks explained above).

Stay tuned for the next post that will do a deep dive on the implementation of ctagd.

~ Skiqqy