Basic C TCP/IP Programming

Standard

In this article, I share how to have some TCP/IP programming with the C language.  I am using FreeBSD because it is my most familiar platform but it does not prevent you trying the source code elsewhere.  There are of course thousands of examples online.  To show that I am different, I will present my code in an unorthodox but effective way.  (I am a protestant, if you force me to answer.  🙂

I am using the word TCP/IP in the title, in case I want to have things like libibverbs in another time.  In this article, when I use the word “network”, I refer to TCP/IP.

Header Files

Some header files are required, such as standard system call headers, data type definitions, string operations, etc.

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <netdb.h>
#include <string.h>

System Calls and File Descriptors

In Unix system, a lot of things are presented as file descriptors and these include the network sockets we discuss here.  There are various system calls to get file descriptors of different types, such as open(2) for a named file, pipe(2) for inter-process communications, and socket(2) for a network socket.  Even so, once a file descriptor is prepared, the operations are similar: read(2) for withdrawing message, write(2) for depositing message, and close(2) for finishing after use.

Things are easy to be said than to be done.  To make a network connection, there are quite some steps that must go through.  Here I split into two roles, a server that receives connection, and a client that initiates connection.

Server:

  1. socket(2) to create a network socket
  2. bind(2) to attach to a particular defined network port
  3. listen(2) to create a connection queue
  4. accept(2) to accept a connection request
  5. read(2) or write(2) to communicate
  6. close(2) to finish

Client:

  1. socket(2) to create a network socket
  2. connect(2) to connect to a particular address
  3. read(2) or write(2) to communicate
  4. close(2) to finish

Obtaining Socket Address

In order to bind or connect, one needs to provide a socket address.  It is a structure with complex data structure.  Some writers would tell readers to fill in the structures one by one, tell them to be caution about the endiness, etc.  I am lazy and just use the getaddrinfo(3) function.  It is also the most reliable method if you want to handle different network types.  It takes the hostname, port number, and optionally hints as input.  It generates a data structure and return the pointer through a pointer of pointer.  The most difficult part is filling in the hints, but these can be blindly copied.  For example, to get a TCP over IPv4, we request SOCK_STREAM and AF_INET like…

memset(&hint, 0, sizeof(hint));
hint.ai_family = AF_INET;
hint.ai_socktype = SOCK_STREAM;
hint.ai_protocol = 0;
getaddrinfo(host, port, &hint, &info);

Given there are not errors, the info structure now contains the answer.  “info->ai_addr” points to the required sockaddr structure, and “info->ai_addrlen” points to the length of the answer.  Just in case you did not define the numbers, also helps by filling in the “info->ai_family,” “info->ai_socktype”, and “info->ai_protocol” according to the hints.  These are useful in creating the first socket.

Please note the address info can be a linked list when there are multiple options for the connection.  To be robust, one may want to try out all the connections.  For demonstration, trying one is enough.  After use, it can be cleanup with the corresponding freeaddrinfo(3) function.

Error Handling

Most system calls come with exceptional situations, like when a file is not found, or a network socket is not connectable.  It is a good practice to check the return value of each system call.  The code becomes very messy and this is when people yell for the “exception” feature of a language — put the error handling code out of the normal execution!  Hold on, checking return value can be trivial with C macro functions.  I learned this trick from a famous book but I do not recall the name.

#define pt {fprintf(stderr, "%s:%d: ", __FILE__, __LINE__); perror("");}
#define ez(x) {if ((x) != 0) {pt; goto error;}}
#define ep(x) {if ((x) <= 0) {pt; goto error;}}
#define ezp(x) {if ((x) < 0) {pt; goto error;}}

Whenever I expect it to return zero, I use ez (expect zero); also ep for expecting positive, and zp for expecting natural numbers.  With these, the error checking can be much easier, for example,

if (listen(fd, 1) != 0) {
        fprintf(stderr, "%s:%d: ", __FILE__, __LINE__);
        perror("");
        goto error;
}

can be replaced as

ez(listen(fd, 1));

At the end of the function, I just define a label to catch these errors and perform cleanup.  This is now even simpler than handling exceptions.  The actual cleanup code will be shown later.

The Server Code

The server first prepares the hints and obtains the socket address accordingly.  Then it calls the socket, bind, listen, accept, write system call accordingly.  There are two file descriptors, one for binding to the particular port, and one (or more) for communicating with the clients.  After use, it cleans up cautiously.  If there is ever error, the flow jumps to “error” immediately and the variable is updated accordingly.

int server(const char* host, const char* port)
{
        int fd = -1;
        int fd2 = -1;
        int error = 0;
        struct addrinfo* info = 0;
        struct addrinfo  hint;
        char message[16] = "hello world";
        memset(&hint, 0, sizeof(hint));
        hint.ai_family = AF_INET;
        hint.ai_socktype = SOCK_STREAM;
        hint.ai_protocol = 0;
        ez(getaddrinfo(host, port, &hint, &info));

        ezp(fd = socket(info->ai_family, info->ai_socktype, info->ai_protocol));
        ez(bind(fd, info->ai_addr, info->ai_addrlen));
        ez(listen(fd, 1));
        ezp(fd2 = accept(fd, 0, 0));
        ep(write(fd2, message, sizeof(message)));

cleanup:
        if (info != 0) freeaddrinfo(info);
        if (fd2 != -1) close(fd2);
        if (fd != -1) close(fd);
        return error;

error:
        error = 1;
        goto cleanup;
}

The Client Code

The client code is similar to the server, except it only has one file descriptor and it connects rather than bind or connect.  After reading the message from the server, it prints it out and finish.

int server(const char* host, const char* port)
{
        int fd = -1;
        int fd2 = -1;
        int error = 0;
        struct addrinfo* info = 0;
        struct addrinfo  hint;
        char message[16] = "";

        memset(&hint, 0, sizeof(hint));
        hint.ai_family = AF_INET;
        hint.ai_socktype = SOCK_STREAM;
        hint.ai_protocol = 0;
        ez(getaddrinfo(host, port, &hint, &info));
        ezp(fd = socket(info->ai_family, info->ai_socktype, info->ai_protocol));
        ez(connect(fd, info->ai_addr, info->ai_addrlen));
        ep(read(fd, message, sizeof(message)));
        printf("client received: %s\n", message);

cleanup:
        if (info != 0) freeaddrinfo(info);
        if (fd2 != -1) close(fd2);
        if (fd != -1) close(fd);
        return error;

error:
        error = 1;
        goto cleanup;
}

Putting Them Together

Finally, we need a main function to run a server and a client about the same time.  Here we use fork(2) and wait(3) calls.  The client is delayed 1 second to ensure the server has got ready before a connection is established.  The code pasting will be left as an exercise.  In short…

  1. The header files
  2. The error handling macros
  3. The server code
  4. The client code
  5. The main function

The main function is as follows.  After fork, there are two processes for the two roles.  The server is started to accept connections from port 8080 of any given IP addresses.  The client is targeting localhost port 8080.

int main(int argc, char** argv)
{
        int status = 0;
        pid_t pid = fork();

        // Forked as a parent
        if (pid > 0) {
                server(0, "8080");
                waitpid(pid, &status, 0);
        }

        // Forked as a child
        if (pid == 0) {
                sleep(1);
                client("localhost", "8080");
        }

        // Something went wrong with forking
        if (pid < 0) {
                perror("fork");
        }
}

And the program execution is as simple as…

# clang net.c -o net
# ./net
client received: hello world

Basic C Programming on FreeBSD

Standard

In this article, I share some programming techniques with the C programming language on FreeBSD.  As of FreeBSD 11.0, the basic installation comes with clang(1) tools for programming in C, C++, and Obj-C languages.

When I was a teaching assistant, I used to give my students a crash course in C programming.  They already know C++ from their year 1.  Once they understand the differences, they picked up C real quick.

C is one of my favourite programming languages.  It’s features are minimal yet comprehensive.  Because of its simplicity, it is also very popular in system implementation.  Once somebody understands it, he is able to read and understand a lot of system implementation work.  (Warning: he will also start to dislike some particular kernel implementations, because they are really badly written.)

Why is programming related to this blog?  Indeed, one of the long-term goals here is to build a distributed software transactional memory system and deploy it on the cloud.  Time will tell if I am capable of this accomplishment.

As a side note, in Cantonese, “C” has similar pronunciation as “詩” (poem), which is a popular character for feminine names, such as “李慧詩“.  It is quite romantic to say “寫詩” (write poems) in place of “programming in that low-level language”… unless one broke up with one of those girls.  Ah.  The weather today is no good; and life is really harsh!  Let’s carry (not Carrie) on.

The Euclidean Algorithm

Today, we take the Euclidean Algorithm as an example.  It is useful for finding the greatest common divisor (GCD) of two integers.  I am not a mathematician, so please let me jump to my conclusion, as copied from the Wikipedia:

function gcd(a, b)
    while b ≠ 0
       t := b; 
       b := a mod b; 
       a := t; 
    return a;

In short:  There are two numbers.  The larger number is replaced with the modulus between the two number.  The logic repeats until a number turns into nothing.  The remaining number is the answer.

The First Attempt

Open up your favourite text editor and write to a file “gcd.c”.  I use vi(1), but you can always use the easier ee(1).  There are two functions in this program.  The “gcd” function is responsible computing the greatest common divisor.  The “main” function is the entry point.  It uses scanf(3) to input two integers, “a” and “b”, and then uses printf(3) to output the result.  In order to use these two functions, a header file “stdio.h” has to be included, as hinted by the manual pages.

There are “&” signs near the “a” and “b” and they mean pass-by-pointer.  Normally, variables are only pass-by-value.  The pass-by-pointer strategy allows the scanf(3) function to take the pointers and update the variables directly.  The arguments “argc” and “argv” are the argument counts (a number) and argument vectors (array of strings, or array of array of characters).  We do not use the arguments so we can leave them untouched.

#include <stdio.h>

int gcd(int a, int b) {
        int c;
        while (b != 0) {
                c = a % b;
                a = b;
        }
}

int main(int argc, char** argv) {
        int a, int b;
        scanf("%d %d", &a, &b);
        printf("%d\n", gcd(a, b));
        return 0;
}

Compilation Errors

Compile with the cc(1) command.  Like the way I demonstrated to my students, the first attempt is usually a failed one…

# cc gcd.c -o gcd
gcd.c:9:1: warning: control reaches end of non-void function
}
^
gcd.c:12:9: error: expected identifier or '('
        int a, int b;
               ^
gcd.c:12:8: error: expected ';' at end of declaration
        int a, int b;
              ^
              ;
1 warning and 2 errors generated.

The Second Attempt

It seems that semicolon should be used to separate declaration of the two variables.  (I know, comma works another way, but I am not going to tell…)

#include <stdio.h>

int gcd(int a, int b) {
        int c;
        while (b != 0) {
                c = a % b;
                a = b;
        }
}

int main(int argc, char** argv) {
        int a;  // <-- this line
        int b;  // <-- this line
        scanf("%d %d", &a, &b);
        printf("%d\n", gcd(a, b));
        return 0;
}

Runtime Error

Let’s compile and try again.  The program does compile despite the warning.  For dramatic sake, ignore the warning and try.  In the smoke test, we are supposed to enter the two numbers into the standard input.  After inputting the two numbers, press enter a few times and …

# cc gcd.c -o gcd
gcd.c:9:1: warning: control reaches end of non-void function
}
^
1 warning generated.
# ./gcd
4 24

the program loops forever, press Ctrl-C to break out.

First Time Debugging

To debug, let us use the LLVM debugger. (This is a new standard feature in FreeBSD 11.0.  We used to have another debugger in the past.)  Issue the lldb command, run with command “run”, input some text, then break it with Ctrl-C.  Unlike last time we break to the shell, this time we end up seeing some assembly code dump.

# lldb gcd
(lldb) target create "gcd"
Current executable set to 'gcd' (x86_64).
(lldb) run
Process 93041 launching
Process 93041 launched: '/root/gcd' (x86_64)
4 24

^C
Process 93041 stopped
* thread #1: tid = 100067, 0x00000000004007ab gcd`gcd + 27, stop reason = signal SIGSTOP
    frame #0: 0x00000000004007ab gcd`gcd + 27
gcd`gcd:
->  0x4007ab <+27>: movl   %edx, -0x10(%rbp)
    0x4007ae <+30>: movl   -0xc(%rbp), %edx
    0x4007b1 <+33>: movl   %edx, -0x8(%rbp)
    0x4007b4 <+36>: jmp    0x40079a                  ; <+10>
(lldb) quit
Quitting LLDB will kill one or more processes. Do you really want to proceed: [Y/n] y

As part of my plan, you see nothing useful unless you understand assembly code.  Quit and say yes to confirm.  Then, we try compiling again with debug symbols.

# cc gcd.c -o gcd
gcd.c:9:1: warning: control reaches end of non-void function
}
^
1 warning generated.
# lldb gcd
(lldb) target create "gcd"
Current executable set to 'gcd' (x86_64).
(lldb) run
Process 93075 launching
Process 93075 launched: '/root/gcd' (x86_64)
4 24

^C
Process 93075 stopped
* thread #1: tid = 100096, 0x00000000004007ab gcd`gcd(a=24, b=24) + 27 at gcd.c:6, stop reason = signal SIGSTOP
    frame #0: 0x00000000004007ab gcd`gcd(a=24, b=24) + 27 at gcd.c:6
   3   int gcd(int a, int b) {
   4     int c;
   5     while (b != 0) {
-> 6       c = a % b;
   7       a = b;
   8     }
   9   }
(lldb)

To step through the program, use the command “print” to print a variable content, and “next” to step to the next statement.  Repeat this a few times, we see the variable content does not change.  The condition to break out of the loop will not be satisfiable.

(lldb) print b
(int) $0 = 24
(lldb) next
Process 93075 stopped
* thread #1: tid = 100096, 0x00000000004007ae gcd`gcd(a=24, b=24) + 30 at gcd.c:7, stop reason = step over
    frame #0: 0x00000000004007ae gcd`gcd(a=24, b=24) + 30 at gcd.c:7
   4     int c;
   5     while (b != 0) {
   6       c = a % b;
-> 7       a = b;
   8     }
   9   }
   10  
(lldb) print b
(int) $1 = 24
(lldb) next
Process 93075 stopped
* thread #1: tid = 100096, 0x00000000004007b4 gcd`gcd(a=24, b=24) + 36 at gcd.c:5, stop reason = step over
    frame #0: 0x00000000004007b4 gcd`gcd(a=24, b=24) + 36 at gcd.c:5
   2   
   3   int gcd(int a, int b) {
   4     int c;
-> 5     while (b != 0) {
   6       c = a % b;
   7       a = b;
   8     }
(lldb) print b
(int) $2 = 24
(lldb) next
Process 93075 stopped
* thread #1: tid = 100096, 0x00000000004007a4 gcd`gcd(a=24, b=24) + 20 at gcd.c:6, stop reason = step over
    frame #0: 0x00000000004007a4 gcd`gcd(a=24, b=24) + 20 at gcd.c:6
   3   int gcd(int a, int b) {
   4     int c;
   5     while (b != 0) {
-> 6       c = a % b;
   7       a = b;
   8     }
   9   }
(lldb) print b
(int) $3 = 24

The Third Attempt

In order to make the loop terminate, the variable “b” must change.  We take revision on the source code and noticed a line is missing.  Here is another iteration:

#include <stdio.h>

int gcd(int a, int b) {
        int c;
        while (b != 0) {
                c = a % b;
                a = b;
                b = c;  // <-- this line
        }
}

int main(int argc, char** argv) {
        int a;
        int b;
        scanf("%d %d", &a, &b);
        printf("%d\n", gcd(a, b));
        return 0;
}

Second Time Debugging

I am going to save time and tell you the code does not work.  It consistently returns “0” in my case.  Let us jump into the debugger again.  Because the program does not loop indefinitely this time, we need another way to stop the program before it finishes.  In this example, I used a “breakpoint set” command with “–file” and “–line” option.  Then I used “step” command to step inside the “gcd” function call.

# cc gcd.c -o gcd
gcd.c:9:1: warning: control reaches end of non-void function
}
^
1 warning generated.
# lldb gcd
(lldb) target create "gcd"
Current executable set to 'gcd' (x86_64).
(lldb) breakpoint set --file gcd.c --line 15
Breakpoint 1: where = gcd`main + 53 at gcd.c:15, address = 0x0000000000400805
(lldb) run
Process 93155 launching
Process 93155 launched: '/root/gcd' (x86_64)
4 24

^C
Process 93155 stopped
* thread #1: tid = 100085, 0x0000000000400805 gcd`main(argc=1, argv=0x00007fffffffeb60) + 53 at gcd.c:15, stop reason = breakpoint 1.1
    frame #0: 0x0000000000400805 gcd`main(argc=1, argv=0x00007fffffffeb60) + 53 at gcd.c:15
   12  int main(int argc, char** argv) {
   13  int a; int b;
   14  scanf("%d %d", &a, &b);
-> 15  printf("%d\n", gcd(a, b));
   16  return 0;
   17  }
(lldb) print a
(int) $0 = 4
(lldb) print b
(int) $1 = 24
(lldb) step
Process 93155 stopped
* thread #1: tid = 100085, 0x000000000040079a gcd`gcd(a=4, b=24) + 10 at gcd.c:5, stop reason = step in
    frame #0: 0x000000000040079a gcd`gcd(a=4, b=24) + 10 at gcd.c:5
   2   
   3   int gcd(int a, int b) {
   4     int c;
-> 5     while (b != 0) {
   6       c = a % b;
   7       a = b;
   8       b = c;
(lldb) next
Process 93155 stopped
* thread #1: tid = 100085, 0x00000000004007a4 gcd`gcd(a=4, b=24) + 20 at gcd.c:6, stop reason = step in
    frame #0: 0x00000000004007a4 gcd`gcd(a=4, b=24) + 20 at gcd.c:6
   3   int gcd(int a, int b) {
   4     int c;
   5     while (b != 0) {
-> 6       c = a % b;
   7       a = b;
   8       b = c;
   9     }
(lldb) next
Process 93155 stopped
* thread #1: tid = 100085, 0x00000000004007ae gcd`gcd(a=4, b=24) + 30 at gcd.c:7, stop reason = step in
    frame #0: 0x00000000004007ae gcd`gcd(a=4, b=24) + 30 at gcd.c:7
   4     int c;
   5     while (b != 0) {
   6       c = a % b;
-> 7       a = b;
   8       b = c;
   9     }
   10  }
(lldb) next
Process 93155 stopped
* thread #1: tid = 100085, 0x00000000004007b4 gcd`gcd(a=24, b=24) + 36 at gcd.c:8, stop reason = step over
    frame #0: 0x00000000004007b4 gcd`gcd(a=24, b=24) + 36 at gcd.c:8
   5     while (b != 0) {
   6       c = a % b;
   7       a = b;
-> 8       b = c;
   9     }
   10  }
   11  
(lldb) next
Process 93155 stopped
* thread #1: tid = 100085, 0x00000000004007ba gcd`gcd(a=24, b=4) + 42 at gcd.c:5, stop reason = step over
    frame #0: 0x00000000004007ba gcd`gcd(a=24, b=4) + 42 at gcd.c:5
   2   
   3   int gcd(int a, int b) {
   4     int c;
-> 5     while (b != 0) {
   6     c = a % b;
   7     a = b;
   8     b = c;
(lldb) next
Process 93155 stopped
* thread #1: tid = 100085, 0x00000000004007a4 gcd`gcd(a=24, b=4) + 20 at gcd.c:6, stop reason = step over
    frame #0: 0x00000000004007a4 gcd`gcd(a=24, b=4) + 20 at gcd.c:6
   3   int gcd(int a, int b) {
   4     int c;
   5     while (b != 0) {
-> 6       c = a % b;
   7       a = b;
   8       b = c;
   9     }
(lldb) next
Process 93155 stopped
* thread #1: tid = 100085, 0x00000000004007ae gcd`gcd(a=24, b=4) + 30 at gcd.c:7, stop reason = step over
    frame #0: 0x00000000004007ae gcd`gcd(a=24, b=4) + 30 at gcd.c:7
   4     int c;
   5     while (b != 0) {
   6       c = a % b;
-> 7       a = b;
   8       b = c;
   9     }
   10  }
(lldb) next
Process 93155 stopped
* thread #1: tid = 100085, 0x00000000004007b4 gcd`gcd(a=4, b=4) + 36 at gcd.c:8, stop reason = step over
    frame #0: 0x00000000004007b4 gcd`gcd(a=4, b=4) + 36 at gcd.c:8
   5     while (b != 0) {
   6       c = a % b;
   7       a = b;
-> 8       b = c;
   9     }
   10  }
   11  
(lldb) next
Process 93155 stopped
* thread #1: tid = 100085, 0x00000000004007ba gcd`gcd(a=4, b=0) + 42 at gcd.c:5, stop reason = step over
    frame #0: 0x00000000004007ba gcd`gcd(a=4, b=0) + 42 at gcd.c:5
   2   
   3   int gcd(int a, int b) {
   4     int c;
-> 5     while (b != 0) {
   6       c = a % b;
   7       a = b;
   8       b = c;
(lldb) next
Process 93155 stopped
* thread #1: tid = 100085, 0x00000000004007bf gcd`gcd(a=4, b=0) + 47 at gcd.c:10, stop reason = step over
    frame #0: 0x00000000004007bf gcd`gcd(a=4, b=0) + 47 at gcd.c:10
   7       a = b;
   8       b = c;
   9     }
-> 10  }
   11  
   12  int main(int argc, char** argv) {
   13  int a; int b;
(lldb) next
Process 93155 stopped
* thread #1: tid = 100085, 0x0000000000400813 gcd`main(argc=1, argv=0x00007fffffffeb60) + 67 at gcd.c:15, stop reason = step over
    frame #0: 0x0000000000400813 gcd`main(argc=1, argv=0x00007fffffffeb60) + 67 at gcd.c:15
   12  int main(int argc, char** argv) {
   13    int a; int b;
   14    scanf("%d %d", &a, &b);
-> 15    printf("%d\n", gcd(a, b));
   16    return 0;
   17  }

We see the values of variables “a” and “b” decreases alone the time line.  Pay attention to the transition between the “gcd” function to the “main” function.  There is not a return value.  This is why we kept having the warning.  Compiler warnings are often more useful than what programmers think.  Sometimes, I am amazed why the open source software packages contain some many warnings in compilations and they still work.

The Final Version

We get to return the answer in the “gcd” function.  Here is how it the code being corrected.

#include <stdio.h>

int gcd(int a, int b) {
        int c;
        while (b != 0) {
                c = a % b;
                a = b;
                b = c;
        }
        return a;  // <-- this line
}

int main(int argc, char** argv) {
        int a;
        int b;
        scanf("%d %d", &a, &b);
        printf("%d\n", gcd(a, b));
        return 0;
}

The Final Testing

# cc gcd.c -o gcd
# ./gcd
0 5
5
# ./gcd
5 0
5
# ./gcd
1 9
1
# ./gcd
9 1
1
# ./gcd
4 36
4
# ./gcd
36 4
4
# ./gcd
24 36
12
# ./gcd
36 24
12