How setting the TZ environment variable avoids thousands of system calls

In our previous blog post about Linux system calls, we explained the different system call methods and highlighted a very interesting method in particular: the vDSO system call method.

The purpose of the vDSO system call method was to create a way for certain, very frequently used system calls (like gettimeofday, time, clock_gettime, etc) to avoid needing to actually enter the kernel and cause a context switch from user land to kernel land. The result of this method is that certain system calls, like those listed, can be used by programs at much, much lower cost.

What would happen if every time you called one of these fast vDSO system calls (like time) you also called a normal system call like, say stat, which does not pass through the vDSO?

If you did that, you’d essentially be negating some of the performance improvement you were meant to gain by the vDSO optimization in the first place; you’d be making a slow system call very often.

It turns out that this situation happens rather often with a pair of functions that are commonly used together:

  • time: A vDSO-enabled system call used to obtain the number of seconds since the epoch, and
  • localtime: A glibc provided function which converts the output of time to a local time in the user’s timezone. This is not a system call, but internally localtime can make a system call in some cases.

The time vDSO-enabled system call and the localtime function from glibc are often used together in applications either directly by the programmer, or at a lower level unbeknownst to the programmer for formatting dates and times for everything from log messages to SQL queries. This pair is commonly used in Ruby on Rails.

It turns out that the localtime function in glibc will check if the TZ environment variable is set. If it is not set (the two Ubuntus I’ve tested do not set it), then glibc will use the stat system call every time localtime is called.

In other words: your system supports calling the time system call via the Linux kernel’s vDSO to avoid the cost of switching to the kernel. But, as soon as your program calls time, it calls localtime immediately after, which invokes a system call anyway. So, you’ve eliminated one system call with the vDSO, but replaced it with another.

Let’s see a sample program that shows this behavior, how to use strace to detect this, and finally how to prevent it by setting the TZ environment variable.

Sample program showing the issue

Let’s start by creating a simple test program that reproduces this issue:

#include <time.h>
#include <stdio.h>

int
main(int argc, char *argv[]) {
  int i = 0;
  time_t timep;

  printf("Greetings!\n");

  for(i=0; i<10; i++) {
    time(&timep);
    localtime(&timep);
  }

  printf("Godspeed, dear friend!\n");
  return 0;
}

You can compile this program by simply running gcc -o test test.c. As you can see, this program simply calls time and localtime in a loop 10 times.

Verifying this with strace

Every single call to the localtime glibc function will generate a system call to stat. Don’t believe me? Let’s use strace to prove it using the program shown above:

$ gcc -o test test.c
$ strace -ttT ./test
... truncated output ...
23:55:16.957457 write(1, "Greetings!\n", 11) = 11 <0.000033>
23:55:16.957560 open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 3 <0.000013>
23:55:16.957650 fstat(3, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000007>
23:55:16.957723 fstat(3, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000006>
23:55:16.957797 read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096) = 127 <0.000011>
23:55:16.957904 lseek(3, -71, SEEK_CUR) = 56 <0.000008>
23:55:16.957975 read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096) = 71 <0.000008>
23:55:16.958048 close(3)                = 0 <0.000009>
23:55:16.958130 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000009>
23:55:16.958215 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000009>
23:55:16.958296 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000010>
23:55:16.958375 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000009>
23:55:16.958454 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000009>
23:55:16.958533 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000009>
23:55:16.958611 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000009>
23:55:16.958690 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000009>
23:55:16.958769 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000009>
23:55:16.958848 write(1, "Godspeed, dear friend!\n", 23) = 23 <0.000020>

In the strace output above we see a few things:

  1. The /etc/localtime file is opened.
  2. fstat is called twice, followed by two reads to the file to pull in the timezone data.
  3. Next, 9 calls to stat are made passing /etc/localtime over and over.

Notice that the strace output does not show the call to time. This is expected. System calls made via the vDSO do not appear in strace output. To see them, you’d need to use ltrace instead.

What’s going on here is that the first call to localtime in glibc opens and reads the contents of /etc/localtime. All subsequent calls to localtime internally call stat, but they do this to ensure that the timezone file has not changed.

On many systems /etc/localtime is a symlink to a timezone file. It is conceivable that a program might be running when the /etc/localtime symlink is updated. If this were to happen, glibc would notice this when localtime is called and re-read the file before doing any time conversions.

Preventing extraneous system calls

Many production systems use the UTC timezone and don’t ever need (or want) to change that. For this use case, there’s no reason to stat the /etc/localtime file or symlink over and over and over when localtime is called. The timezone is never going to change from UTC (or if it does, the application can just be restarted).

The easiest way to prevent these stat calls is to set the TZ environment variable. When the TZ environment variable is set glibc will:

  1. Notice you’ve told it explicitly what timezone file to use for your program.
  2. Read the file and cache it internally.
  3. Never stat or read that file path again, as long as the value of the TZ environment variable is left unchanged.

Let’s set the TZ variable and check with strace:

$ TZ=:/etc/localtime strace -ttT ./test
00:14:53.274564 write(1, "Greetings!\n", 11) = 11 <0.000051>
00:14:53.274721 open("/etc/localtime", O_RDONLY|O_CLOEXEC) = 3 <0.000038>
00:14:53.274873 fstat(3, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000026>
00:14:53.275001 fstat(3, {st_mode=S_IFREG|0644, st_size=127, ...}) = 0 <0.000026>
00:14:53.275129 read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096) = 127 <0.000028>
00:14:53.275257 lseek(3, -71, SEEK_CUR) = 56 <0.000027>
00:14:53.275379 read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096) = 71 <0.000033>
00:14:53.275512 close(3)                = 0 <0.000029>
00:14:53.275657 write(1, "Godspeed, dear friend!\n", 23) = 23 <0.000048>

As you can see, simply setting TZ causes glibc to read /etc/localtime a single time and never again. The same program runs with 9 fewer system calls all because of a single environment variable.

Effect on production systems

The effect of setting TZ on a production system will depend mostly on how often localtime is called. This is application specific and can vary with request load. All that said, if you aren’t changing your timezone often (or ever), it may be worth simply eliminating these unnecessary calls even if your system isn’t making many of them.

On my test system with a real life app (not the example shown above):

  • Without setting TZ during normal operations yields approximately: 14,925 calls to stat over a 30 second period (or roughly 497 stats per second).
  • With TZ set during the same time period results in 8 calls to stat over a 30 second period.

So, I’ve eliminated on the order of 10,000 extra system calls (and their associated context switches) without changing anything other than an environment variable. Pretty cool.

Conclusion

Using strace and questioning why patterns of system calls emerge from your infrastructure can help you better understand exactly what your systems are doing and why. You may even be able to remove unncessary system calls and save some system resources, too.

 

 

source: https://blog.packagecloud.io/eng/2017/02/21/set-environment-variable-save-thousands-of-system-calls/

Attachments
There are no attachments for this article.
Related Articles RSS Feed
Linux Kernel /etc/sysctl.conf Security Hardening
Viewed 1769 times since Fri, Aug 3, 2018
10 Linux DIG Command Examples for DNS Lookup
Viewed 395 times since Sun, Sep 30, 2018
Linux An introduction to swap space on Linux systems
Viewed 320 times since Thu, Jan 23, 2020
LVM: Reduce root PV/VG
Viewed 797 times since Sat, Jun 2, 2018
LVM: Remove an existing Volume Group
Viewed 655 times since Sat, Jun 2, 2018
SYS: Configure a local repository. local repo
Viewed 616 times since Mon, Oct 29, 2018
SSH: Execute Remote Command or Script – Linux
Viewed 394 times since Mon, Feb 18, 2019
HowTo: Send Email from an SMTP Server using the Command Line
Viewed 420 times since Mon, Feb 18, 2019
RHEL: iSCSI target/initiator configuration on RHEL7
Viewed 1293 times since Sat, Jun 2, 2018
RHCS6: Reduce a Global Filesystem 2 (GFS2)
Viewed 589 times since Sun, Jun 3, 2018