In this article, we go through the user-space code of GEOM Gate in FreeBSD, which is responsible of exporting block devices to other hosts.
About GEOM Gate
How GEOM Gate Works
The mechanism includes three parts. On the server side, there is a daemon (gated) to listen to the network and act on the target disk accordingly. On the client side, there is some code in a kernel module (geom_gate.ko) so that some user code can simulate a block device, and then a user-space client (gatec) bridges the remote server daemon and the kernel module.
One of the laziest ways to read FreeBSD source code casually is from the Github or the official SVN repository. They include the source code of both the kernel space and user space. You can watch the code easily with any descent web browser, even when you are on the road with a tablet. The latter is official and comes with commit information, yet there are not syntax highlighting yet. So you decide.
If you have a FreeBSD installation and also the “src.txz” distribution file installed, you can watch the code directly in the directory /usr/src.
The Gate Server
When initiated (main, line 951), it reads the arguments (line 964), ensure no other processes of itself is running (line 1012), bind itself to a network port (line 1031), and start listening (line 1043).
When each client connects, it will be accepted (line 1052) and handshake (line 1061). The handshake function (line 835) does a few checks, such as the version (line 866), and proceed launching the connection (line 937).
To launch facilities the handle the new connection (line 529), it creates two pairs of locks (line 547) and conditions (line 552). The execution then splits (line 574) into three threads, namely the receive thread (line 623), the disk thread (line 688), the send thread (line 769).
The receive thread is responsible of receiving requests and forwarding to the disk thread. First of all, it allocates a new data structure (line 637). It then receives a request from the network socket (line 638). According to the request size, it creates a new buffer to hold the data (line 658). If the request is a write request, it separately receives the data to be written (line 666). Later, it acquires the inward lock (line 677). It inserts the request information into the inward queue (line 679). It raises the inward condition in case the disk thread is idle and sleeping (line 680). Finally, it releases the inward lock (line 682) and loops again for another request. If you do not understand the lock and the condition, no worry; you will understand when you read the disk thread as well.
The disk thread is responsible of receiving, executing the requests and forwarding the results to the send thread. Since it shares the same inward queue with the receive thread, The first thing is to acquire the inward lock (line 702). This ensures the two threads do not conflict on the same queue. If there are requests in the queue, of course it can continue; but the thread has to sleep if there is nothing to work on. In that case, the disk thread waits on the condition (line 705). The wait subroutine gives up the lock until the condition is raised (by the disk thread, in this context). Once the queue is no longer empty, a request can be withdrawn from the queue (line 708). Since it has obtained the data for this round, the lock can be released (line 709). Some checks are done to the request: the file border check (line 716), block alignment (line 717), and so on. The thread executes the pread (line 729) or the pwrite operation to the data (line 733). The p version of the read and write operations allow specifying the offset in one shot so that it needs not to seek in separated system calls. If it is a write request, the data can be discarded since it is written to the disk (line 736). (If it is a read request, the data has to be kept for the obvious reason.) Either it is a write or read operation, it is time to return the write completion signal, or the read data to the requester. Similar to how the receive thread shares data with the disk thread, the disk thread acquires the outward lock (line 755), inserts the outward queue (line 757), raises the outward signal (line 758), and releases the outward lock (line 760).
The send thread is responsible of sending the results to the requesting client. Similar to how the disk thread receives data from the disk thread, the send thread acquires the outward lock (line 783), waits if the outward queue is empty (line 786), withdraws the information (line 790), and releases the outward lock (line 791). The send thread sends the result header (line 801) at first. If there is accompanying data with the result (that is, it is not a write request, data of the write requests are discarded already), the thread sends the data in the second operation (line 808) and discards the data (line 816). It then loops again.
That is all of the gate server. How does the gate client talks with it?
The Gate Client
The gate client code is located at the source code /usr/src/sbin/ggate/ggatec/ggatec.c and that corresponds to the /sbin/ggatec(8) when you use it. Let us focus on how a connection is established with the server. The establishment code starts in a switch structure (line 606) of the main function. Firstly, it loads the GEOM gate kernel module (line 609, reference to ggate.c line 213) and opens it (line 610, reference to ggate.c line 176). Afterwards, it jumps to the subroutine to create client (line 613).
In the client creation subroutine (g_gatec_create, line 442), it obtains block device information from the server (line 446) and registers a new block device with the gate kernel module (line 462, reference to ggate.c line 192). Suppose all these goes smoothly, it demonise (line 469) and goes to the loop (line 470).
The loop (g_gatec_loop, line 420) is a mini procedure that keeps calling the start procedure (line 426) and reconnects (line 429). Every time it disconnects, it sends a G_GATE_CMD_CANCEL control message to the kernel module (line 437), indicating the kernel the pending operations have to be cancelled. The start procedure (g_gate_start, line 394) is the final initialisation step. Similar to the counterpart in the server, it spawns some threads; but it does not need locks or conditions. (I know why, but I decide to leave it for as a mental exercise.)
When functioning, the gate client is comprised of two major moving parts. They are namely the send thread (send_thread, line 92), and the receive thread (recv_thread, line 186).
The send thread is responsible of decoding requests from the kernel and sending them to the Gate server. First of all, it initialises a data structure to hold the disk request (line 102). It then resets the data length and error status (line 107, 108) for a new request. It indicates it is ready for a request with a G_GATE_CMD_START control message (line 109). As a special system call, the send thread is blocked until there is a request or there is an error. Suppose there are not any errors, the thread copies the required information (line 140), including the sequence number (line 148), and eventually sends the request (line 154). If it is a read request, simply the request sequence number If it is a write request, the data to be written is sent separately (line 167). Since the send thread is only for sending a request, it has no obligations to check the result of these operations. Each request from the kernel is identified with the sequence number and it is sent to the server, and eventually it will return to the receive thread.
The receive thread is responsible receiving results from the gate server and reporting back to the kernel. Repeatedly, it receives messages from the server (line 200) and copies back the information according to the kernel module’s data structure (line 214), including the request sequence number, the operation kind, the offset, the length, and any errors incurred. If it is a read request, the data is received separately (line 211). Once the data structure is filled up nicely, it sends a G_GATE_CMD_DONE control message to the kernel module. It is then paused until the kernel finishes copying the data necessary (line 237). The thread then restarts the loop, clears the data structure, and resume. That is it.
Allocation and deallocation policy: as we see in the server code, some buffers are allocated in the receive thread and they are deallocated in either disk thread or the send thread. To me (with 9 years studying computer science in a university), it is understandable. To some others, it can be a bit difficult. The bottom line is, such design should be documented well, and the pointers should be zeroed immediately upon deallocation.
Reversed ordering: as we see the code, the line number of the functions are laid in reversed order. For example, the main function is usually at the bottom. This is because C language was used to be handled by one-pass compiler where symbols can be used only when it is defined or declared. This is typical when we read this type of program code.
Number of threads: since the inward queue and outward queue are well protected, there can be multiple disk threads in a server to handle one client. Conversely, there can be at most one send thread and one receive thread per connection, since the network packet ordering is important.
Event-based handling: in modern standard, people prefer more event-based handling with fewer threads. If applied to the server code, it generates constant amount of system load in peak use, regardless the number of connections. The select(2) system call can be used to hint which network socket is ready. Nevertheless, without the event-based handling, the code is more readable in this form.
Copying: it would save a lot of processing power if the operating system can copy the data directly from the GEOM gate buffer and disk buffer to the network buffer. But it seems it is not possible without major revamp to the system.