GEOM Gate Code Reading


In this article, we go through the user-space code of GEOM Gate in FreeBSD, which is responsible of exporting block devices to other hosts.

About GEOM Gate

GEOM Gate is a mechanism to allow block devices on one host to appear on another.  It can do interesting things from sharing an optical disc drive to synchronising two disks in two hosts in real time.

How GEOM Gate Works

The mechanism includes three parts.  On the server side, there is a daemon (gated) to listen to the network and act on the target disk accordingly.  On the client side, there is some code in a kernel module (geom_gate.ko) so that some user code can simulate a block device, and then a user-space client (gatec) bridges the remote server daemon and the kernel module.

Source Code

One of the laziest ways to read FreeBSD source code casually is from the Github or the official SVN repository.  They include the source code of both the kernel space and user space.  You can watch the code easily with any descent web browser, even when you are on the road with a tablet.  The latter is official and comes with commit information, yet there are not syntax highlighting yet.  So you decide.

If you have a FreeBSD installation and also the “src.txz” distribution file installed, you can watch the code directly in the directory /usr/src.

The Gate Server

The gate server code is located at the source code /usr/src/sbin/ggate/ggated/ggated.c and that corresponds to the /sbin/ggated(8) when you use it.

When initiated (main, line 951), it reads the arguments (line 964), ensure no other processes of itself is running (line 1012), bind itself to a network port (line 1031), and start listening (line 1043).

When each client connects, it will be accepted (line 1052) and handshake (line 1061).  The handshake function (line 835) does a few checks, such as the version (line 866), and proceed launching the connection (line 937).

To launch facilities the handle the new connection (line 529), it creates two pairs of locks (line 547) and conditions (line 552).  The execution then splits (line 574) into three threads, namely the receive thread (line 623), the disk thread (line 688), the send thread (line 769).

The receive thread is responsible of receiving requests and forwarding to the disk thread.  First of all, it allocates a new data structure (line 637).  It then receives a request from the network socket (line 638).  According to the request size, it creates a new buffer to hold the data (line 658).  If the request is a write request, it separately receives the data to be written (line 666).  Later, it acquires the inward lock (line 677).  It inserts the request information into the inward queue (line 679).  It raises the inward condition in case the disk thread is idle and sleeping (line 680).  Finally, it releases the inward lock (line 682) and loops again for another request.  If you do not understand the lock and the condition, no worry; you will understand when you read the disk thread as well.

The disk thread is responsible of receiving, executing the requests and forwarding the results to the send thread.  Since it shares the same inward queue with the receive thread, The first thing is to acquire the inward lock (line 702).  This ensures the two threads do not conflict on the same queue.  If there are requests in the queue, of course it can continue; but the thread has to sleep if there is nothing to work on.  In that case, the disk thread waits on the condition (line 705).  The wait subroutine gives up the lock until the condition is raised (by the disk thread, in this context).  Once the queue is no longer empty, a request can be withdrawn from the queue (line 708).  Since it has obtained the data for this round, the lock can be released (line 709).  Some checks are done to the request: the file border check (line 716), block alignment (line 717), and so on.  The thread executes the pread (line 729) or the pwrite operation to the data (line 733).  The p version of the read and write operations allow specifying the offset in one shot so that it needs not to seek in separated system calls.  If it is a write request, the data can be discarded since it is written to the disk (line 736).  (If it is a read request, the data has to be kept for the obvious reason.)  Either it is a write or read operation, it is time to return the write completion signal, or the read data to the requester.  Similar to how the receive thread shares data with the disk thread, the disk thread acquires the outward lock (line 755), inserts the outward queue (line 757), raises the outward signal (line 758), and releases the outward lock (line 760).

The send thread is responsible of sending the results to the requesting client.  Similar to how the disk thread receives data from the disk thread, the send thread acquires the outward lock (line 783), waits if the outward queue is empty (line 786), withdraws the information (line 790), and releases the outward lock (line 791).  The send thread sends the result header (line 801) at first.  If there is accompanying data with the result (that is, it is not a write request, data of the write requests are discarded already), the thread sends the data in the second operation (line 808) and discards the data (line 816).  It then loops again.

That is all of the gate server.  How does the gate client talks with it?

The Gate Client

The gate client code is located at the source code /usr/src/sbin/ggate/ggatec/ggatec.c and that corresponds to the /sbin/ggatec(8) when you use it.  Let us focus on how a connection is established with the server.  The establishment code starts in a switch structure (line 606) of the main function.  Firstly, it loads the GEOM gate kernel module (line 609, reference to ggate.c line 213) and opens it (line 610, reference to ggate.c line 176).  Afterwards, it jumps to the subroutine to create client (line 613).

In the client creation subroutine (g_gatec_create, line 442), it obtains block device information from the server (line 446) and registers a new block device with the gate kernel module (line 462, reference to ggate.c line 192).  Suppose all these goes smoothly, it demonise (line 469) and goes to the loop (line 470).

The loop (g_gatec_loop, line 420) is a mini procedure that keeps calling the start procedure (line 426) and reconnects (line 429).  Every time it disconnects, it sends a G_GATE_CMD_CANCEL control message to the kernel module (line 437), indicating the kernel the pending operations have to be cancelled.  The start procedure (g_gate_start, line 394) is the final initialisation step.  Similar to the counterpart in the server, it spawns some threads; but it does not need locks or conditions.  (I know why, but I decide to leave it for as a mental exercise.)

When functioning, the gate client is comprised of two major moving parts.  They are namely the send thread (send_thread, line 92), and the receive thread (recv_thread, line 186).

The send thread is responsible of decoding requests from the kernel and sending them to the Gate server.  First of all, it initialises a data structure to hold the disk request (line 102).  It then resets the data length and error status (line 107, 108) for a new request.  It indicates it is ready for a request with a G_GATE_CMD_START control message (line 109).  As a special system call, the send thread is blocked until there is a request or there is an error.  Suppose there are not any errors, the thread copies the required information (line 140), including the sequence number (line 148), and eventually sends the request (line 154).  If it is a read request, simply the request sequence number If it is a write request, the data to be written is sent separately (line 167).  Since the send thread is only for sending a request, it has no obligations to check the result of these operations.  Each request from the kernel is identified with the sequence number and it is sent to the server, and eventually it will return to the receive thread.

The receive thread is responsible receiving results from the gate server and reporting back to the kernel.  Repeatedly, it receives messages from the server (line 200) and copies back the information according to the kernel module’s data structure (line 214), including the request sequence number, the operation kind, the offset, the length, and any errors incurred.  If it is a read request, the data is received separately (line 211).   Once the data structure is filled up nicely, it sends a G_GATE_CMD_DONE control message to the kernel module.  It is then paused until the kernel finishes copying the data necessary (line 237).  The thread then restarts the loop, clears the data structure, and resume.  That is it.


Allocation and deallocation policy: as we see in the server code, some buffers are allocated in the receive thread and they are deallocated in either disk thread or the send thread.  To me (with 9 years studying computer science in a university), it is understandable.  To some others, it can be a bit difficult.  The bottom line is, such design should be documented well, and the pointers should be zeroed immediately upon deallocation.

Reversed ordering: as we see the code, the line number of the functions are laid in reversed order.  For example, the main function is usually at the bottom.  This is because C language was used to be handled by one-pass compiler where symbols can be used only when it is defined or declared.  This is typical when we read this type of program code.

Number of threads: since the inward queue and outward queue are well protected, there can be multiple disk threads in a server to handle one client.  Conversely, there can be at most one send thread and one receive thread per connection, since the network packet ordering is important.

Event-based handling: in modern standard, people prefer more event-based handling with fewer threads.  If applied to the server code, it generates constant amount of system load in peak use, regardless the number of connections.  The select(2) system call can be used to hint which network socket is ready.  Nevertheless, without the event-based handling, the code is more readable in this form.

Copying: it would save a lot of processing power if the operating system can copy the data directly from the GEOM gate buffer and disk buffer to the network buffer.  But it seems it is not possible without major revamp to the system.


Little Status Update…


In case you wonder why there are not any new articles… I am fine, but I had been sick for a while.  I am writing something big and it will be ready next week or so.  Stay tuned and good luck.  🙂

Network Booting FreeBSD


Installing an operating system is easy for a computer or two.  What about installing for multiples of them?  In some situations, one wants to save money and time from buying that many boot devices and maintain the disk images separately.  Network boot comes handy.  It is relatively easy with FreeBSD, especially since we do not need whatever RAM disk image just to load whatever drivers.

The idea is, we have one computer holding the disk image, and let other computers obtain the image from there and start their journeys.  We first let these computers obtain their network information.  Together with the network configuration, these computers also get the location to download the network boot loader.  With the network boot loader, they will able to mount a network file system.

I tried to avoid GPL software.  Otherwise I would have tried something more integrated.  Anyway, that software piece got discovered huge vulnerabilities recently…


I will assume you have your cloud environment, with a network isolated from elsewhere.  For example, in Virtual Box, create a “host-only” network and plug in your computers (vtnet1,  To make dynamic IP address allocation possible, one will want to enable promiscuous mode of the interfaces.  Of course, in addition, one will need a out-going “NAT” network for internet access (vtnet0).  I leave other details as readers’ exercise.

Operating System Files

Similar to setting up a full-blown jail or installing without an installer, one needs to deploy a directory to hold the operating system files.  In this exercise, I randomly picked a directory “/compute”.

# mkdir /compute
# tar Jxf base.txz -C /compute
# tar Jxf kernel.txz -C /compute

Dynamic Host Configuration

Typical computers nowadays can be configured to boot from the network in the BIOS.  The first thing such computers do is to of course obtain network configuration.  Therefore, the dynamic host configuration server is the first thing we need to set.  In this practice, I used the OpenBSD DHCP Daemon.  The usage is similar to that described in the handbook.

# pkg install dhcpd

The file dhcpd.conf(5) “/usr/local/etc/dhcpd.conf” is as follows.  In short, it allows IP addresses be allocated for 7200 seconds per lease.  And for the subnet, the addresses to can be allocated to the computers dynamically.  The default gateway will be  More importantly, when the computer boots, it should obtain the file “/boot/pxeboot” (from the DHCP server, by default), and mount the NFS location “” as root file system.

default lease-time 7200;
max-lease-time 7200;
subnet netmask {
  option routers;
  option rootpath ""
  filename "/boot/pxeboot"

And the rc.conf(5) file “/etc/rc.conf” appended as follows.  As you guess it right, just one statement to enable the DHCP server.


Trivial File Transfer

By no means I say the file transfer protocol is trivial.  The so-called “trivial file transfer” is yet another protocol for transferring file without complicated handshakes.  Here I used the package “tftp-hpa”.

# pkg install tftp-hpa

By default, it uses the directory “/usr/local/tftp”.  But we can act lazy.  The rc.conf(5) gets appended as follows:

tftpd_flags="-s /compute"

If you concern the security, you should make a single directory (like the default /usr/local/tftp) and copy the file “/compute/boot/pxeboot” there.  Right, one file will do.

Network File System

To let the system boot, we prepare a network file system, and make the /compute accessible.  The exports(5) file “/etc/exports” look as follows:

/compute -network -alldirs -maproot=root

And the rc.conf(5) is appended as follows.  You may want to refer here if you want to run it behind a firewall.


Starting the Services

As a friendly reminder, you will need to enable and start the services – the DHCP daemon, the TFTP daemon, and the network file system.  You will need proper firewall rules to allow network traffic.  In particular, you will need UDP port 67, 68, and 69 for DHCP and TFTP to work.

How it Works

DHCP Requests: At the beginning, the computer tries to obtain network configuration by broadcasting requests.  If you are stuck here, try to fix the DHCP daemon.

螢幕快照 2017-10-20 上午12.21.16

TFTP for Boot Loader: After obtaining the network configuration, the computer tries to obtain the boot loader with TFTP.  If you are stuck here, fix the TFTP configurations.

螢幕快照 2017-10-20 上午12.23.35

Network File System: With the first stage boot loader ready, it tries to mount the network file system.  If you are stuck here, check the network file system configurations.

螢幕快照 2017-10-20 上午12.22.14

The Loader: When you see this screen, most of the services, DHCP, FTFP, and NFS, are already used.  If you are stuck beyond this point, read the error message, and good luck…

螢幕快照 2017-10-20 上午12.23.52

Getting FreeBSD Jail to Run


In this short article, I discuss the way to run a full-blown FreeBSD jail, with some common tunings to get it more “normal”.


I assume the host has two network interfaces.  The “vtnet0” is Internet-facing and “vtnet1” is the intranet, where the jail will be running on.  This way, the jail does not have direct access to the Internet.  I also assume we set up the jail at “/root/myjail”.  You can define other locations you want.

Installing a Jail

To run an operating system, one need to install it on some media.  In context of a jail, we install it to a directory.  The FreeBSD handbook tells you to install from source and it is not necessary since BSD system files are mostly tar balls.  Download what you require like this.  For a minimal system, you will need the “base.txz”.  Hmm, no, “kernel.txz” is not required for the jail.

# mkdir /root/myjail
# tar Jxvf base.txz -C /root/myjail

Running a Jail

To run a jail, in particular, with networking, try the following command.

# jail -c path=/root/myjail name=myjail \
  interface=vtnet1 ip4.addr= \
  exec.start="/bin/sh /etc/rc"

Where are We?螢幕快照 2017-10-15 下午11.00.02

Here is what it looks like.  No, we are not inside the jail.  The so-called resource configuration script completed execution and we get back to the shell of the host.

Attaching to Jail

Attaching to a jail is simple.  We can simply start one more process in the jail.  (There is no such requirement all processes originate from the same process tree.)

# jexec myjail /bin/tcsh

When finished, you can get out by hanging up the shell (Ctrl-D).  That’s right, you won’t kill the jail by doing this.

Intranet Connection

Inside the jail, try ping other places, like the host IP…

# ping
ping: ssend socket: Operation not permitted

To allow ping, you need to change the following, in the host.  If you want to allow this from the beginning, the “allow.raw_sockets=1” option can be passed when the jail is initiated next time.

# jail -m name=myjail allow.raw_sockets=1

Internet Connection

The Intranet now works.  What about going to the Internet?

# ping
PING ( 56 data bytes

Seems there is something wrong with the routing.  Modify the PF rules in the host.  Put this line before the first block / pass statement.  (I assume you have a PF firewall installed.). After you reload the rules (“service pf reload”), it should work.

nat pass vtnet0 from vtnet1:network to any -> (vtnet0)

Domain Name Resolve

Let’s randomly pick a domain name to ping.

# ping
ping: cannot resolve Host name lookup failure

Ask your network administrators if there are suggested name servers.  If you have nothing in concern, use anything that works.  For my laziness, I use the shortest IP addresses I can recite.  Do this inside the guest:

# cat >> /etc/resolv.conf << EOF

Or, in the host:

# cat >> /root/myjail/etc/resolv.conf << EOF

Listing Processes

System tools like “ps” does not work by default:

# ps
ps: empty file: invalid argument

I think there could be better ways.  But the simplest way is to mount the device file system:

# mount -t devfs devfs /root/myjail/dev

Please note it is not necessary a good idea to people you don’t trust.

Network File System with Firewall in FreeBSD


Network file system nfs(8) in FreeBSD is built on top of rpc(3) infrastructure where rpcbind(8) daemon is responsible binding the services for the clients.  Together with the companion services rpc.lockd(8)rpc.statd(8), and mountd(8), providing the total service with a firewall is tricky since the ports are different all the time.  In addition, sometimes the services are better to be available to the intranet only, not the internet.

Server Flags

Here is how to lock these services to the non-changing ports and non-wildcard addresses in the rc.conf(5).  In the example, I assumed the IP address to provide the service as “”:

rpc_lockd_flags="-h -p 2632"
rpc_statd_flags="-h -p 2633"
mountd_flags="-h -p 2634"
nfs_server_flags="-h -t -u"

Network Ports to Open

Here are the ports to open for the aforementioned services, both UDP and TCP.  The firewall can be adjusted accordingly:

Installing FreeBSD without the Installer


Here are the steps to install FreeBSD without a installer (here is the one using installer, and here is the one appending packages later on).  First of all, one will need a minimum boot media to boot (maybe a USB flash drive) and get into a shell environment.

Creation of the volumes, boot sector and boot code:

Assume you have a fresh SAS drive named /dev/da0 and you want FreeBSD be there.  You first create a GPT partition scheme and then three partitions.  The first is a boot code.  The second is swap.  The third is the root file system.  The latter two file systems are aligned to 1-megabyte boundaries.

Usually, people put the root file system as the second and the swap as the third.  I insist doing the reverse since this allows me to expand the root file system when the disk expands.

gpart create -s gpt /dev/da0
gpart add -t freebsd-boot -s 512K /dev/da0
gpart add -t freebsd-swap -s 2047M -a 1M /dev/da0
gpart add -t freebsd-ufs -a 1M /dev/da0
gpart bootcode -b /boot/pmbr -p /boot/gptboot -i 1 /dev/da0
newfs -U /dev/da0p3

Network connection:

Making a network connection outside is easy, if your cables are already plugged.  Assume your network card is vtnet0…  (The numbers need to be replaced.)

ifconfig vtnet0 inet net mask

Extract the minimum installation files:

Here, I assume you have your “.txz” files ready.  If not, you may want to find it in the same subnet (an existing computer, for example).  Afterwards, we change root into the destination as if we already booted the computer with it.

mount /dev/da0p3 /mnt
cat kernel.txz | tar -Jxvf - -C /mnt
cat base.txz | tar -Jxvf - -C /mnt
chroot /mnt

File system table:

Even if you want to forget all the upcoming steps, this current one is the last one important.  A proper file system table is crucial for FreeBSD to boot.  (This is unlike some other operating system the whereabout of the root file system is hardcoded in some strange place.)

cat > /etc/fstab << EOF
/dev/da0p2 none swap sw 0 0
/dev/da0p3 /    ufs  rw 1 1

Other configuration files:

Up to your taste, modify as many as you can…

cat > /etc/rc.conf << EOF
ifconfig_vtnet1=inet net mask

cat > /boot/loader.conf << EOF

cat > /etc/sysctl.conf << EOF


Switch off.  Remove the boot media.  Boot again.

Simple Experiment with Jails and Resource Control


While it has been a good practise to have one computer doing one role, it has also been  asked if a computer can be split for multiple functions as if it is a group of computers.  One would immediately answer virtual machines, but it can be too heavy and complicated, say like having virtual machines inside the virtual machines you ordered in a public cloud.  Another would say container and cgroup in L…, em, what?  In other operating systems, we have other more battle-proven stuffs like zones in Solaris and Jails in FreeBSD.  Jails in FreeBSD had not been too attractive to me because it obviously lacked the resource control so that out-of-control processes can make resource starvation for others.  Thankfully, since FreeBSD 9.0, the rctl(8) has made the world a fairer place.

How simple is it to make a jail and try resource control, and gain confidence it works?  The suggested method in the FreeBSD handbook is to make a directory and reinstall FreeBSD there with the appropriate distribution files.  It is a generic method but it can be more inspiring.  Let’s begin.  In this article, we will demonstrate writing a dummy program, instantiate it in a jail, and fiddle the resource control.

System Setup

First of all, in order to use the resource control, add the following line to the loader.conf(5) /boot/loader.conf.  The file could be missing on a fresh installation, you can create a new one if it is the case.  After this, reboot to make it effective.


The Program and Compilation

Let us go to do some programming.  We want a dummy program that consumes some memory and processing power.  Here is my randomly crafted C program.  It is deliberately running some nonsense loops and leaking some memory.  Once every 10 seconds, it allocates 16 MB (malloc) of memory and uses 1 MB (memset) of it.  (The remaining 15 MB is lazily allocated and forgotten.)  By watching the lines printed, you get a sense how fast the process runs.  The longer it runs, the more memory it consumes.

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

int main(int argc, char** argv) {
  time_t then, now;
  int i, j, allocated;
  void* region;

  i = j = allocated = 0;
  while (1) {
    while(now - then < 10) {
      for (i = 0; i < (1 << 25); ++i)
        j *= i;
      printf("now: %ld\n", now);
    then = now;
    region = malloc(1 << 24);
    memset(region, j % 23, (1 << 20));
    allocated += 16;
    printf("allocated: %d\n", allocated);
  return 0;

Usually, the program can be compiled simply with clang(1) command.  In other to compile for the jail in the most lazy method, call clang with the “-static” option.  The executable is significantly larger.  Mark down the absolute path of the directory so you can do the next step.

# ls
# clang resource.c -o resource -static -Wall
# ls
resource resource.c
# ldd resource
ldd: resource: not a dynamic ELF executable
# pwd

Running inside a Jail

Then, it’s time to start the jail.  The simplest way is to create a whatever jail(1), encapsulate at the path at the ssching’s home, and execute the executable we compiled.

# jail -c -u ssching path=/home/ssching \
  exec.start='./resource' name=myjail

Open another terminal and issue the top(1) command.  Press key “j” once and you see a process with non-zero jail number.  We see the process is consuming 100% CPU power and the memory usage is increasing.

2648    14  ... 8288K 872K CPU1  1 0:09 100.67% resource

Stopping a Jail

Stopping a jail, attached or not, can be as simple as “-r” command.  Upon execution, all the processes in the jail will be killed.

# jail -r myjail

Processor Resource Control

Open yet another terminal and issue the following commands.

# rctl -a jail:myjail:pcpu:deny=50

The usage becomes a bit tamed but it fluctuates a lot from time to time.

2648    14 ...  392M 25544K CPU3  3 3:52  50.83% resource

Memory Resource Control

Similarly, the virtual memory usage, or physical memory usage can be constrained as follows.  The virtual memory counts the total memory (SIZE in the top display) allocated, the physical memory counts the actual memory used (RES in the top display).

# rctl -a jail:myjail:vmemoryuse=240M
# rctl -a jail:myjail:memoryuse=120M

If memory usage is beyond the allowed, the system refuses to allocate more memory.  Given the simple design of the program code above, it breaks with signal 11.

now: 1505838082
now: 1505838083
jail: ./resource: exited on signal 11

Resource Control Rules

To check what resource control rules are applied, simply a “rctl” command will do.

# rctl 

Clearing Resource Rules

Clearing the existing rules can be as simple as using the “-r” command, for example:

# rctl -r jail:myjail:memoryuse
# rctl -r jail:myjail
# rctl -r jail:

Actual Resource Used

Last but not the last, usage of a jail can be listed.

# rctl -u jail:myjail

Other Thoughts

In fact, the resource control can be also applied per process, per user, per group, etc.  The command is mostly similar, just replace “jail” with “process”, “user”, “group”, etc.  Say, a computer is being shared among a few friends.  Even if they do not want to be in jails (who wants?  Pun intended…), their resource usage can be constrained and accounted by the command.

In addition to the processor and memory usage, the resource control can also put limitations on other items like disk read / write (hint: use the word throttle instead of deny), which is useful when the IOPs is counted in a public cloud…

In the next article, we will explore how to put a complicated program, with dynamic linked libraries, into a jail.