In a previous post I explained how to measure the number of processes that are generated when a fork() or clone() call checks the nproc limit. There is another limit in /etc/limits.conf – or in /etc/limits.d – that is displayed by ‘ulimit -n’. It’s the number of open files – ‘nofile’ – and here again we need to know what kind of files are counted.
nofile
‘nofile’ is another limit that may not be easy to monitor, because if you just count the ‘lsof’ output you will include a lot of lines which are not file descriptors. So how can we count the number of files descriptors in a process?
lsof
‘lsof’ is a utility that show all the open files. Let’s take an example:
I get the pid of my pmon process:
1
2
3
|
[oracle@VM211 ulimit]$ ps -edf | grep pmon
oracle 10586 1 0 19 : 21 ? 00 : 00 : 02 ora_pmon_DEMO
oracle 15494 15290 0 22 : 12 pts/ 1 00 : 00 : 00 grep pmon
|
And I list the open files for that process
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
[oracle@VM211 ulimit]$ lsof -p 10586
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NAME
ora_pmon_ 10586 oracle cwd DIR 252 , 0 4096 /app/oracle/product/ 12.1 /dbs
ora_pmon_ 10586 oracle rtd DIR 252 , 0 4096 /
ora_pmon_ 10586 oracle txt REG 252 , 0 322308753 /app/oracle/product/ 12.1 /bin/oracle
ora_pmon_ 10586 oracle mem REG 0 , 17 4194304 /dev/shm/ora_DEMO_150175744_0
ora_pmon_ 10586 oracle mem REG 0 , 17 4194304 /dev/shm/ora_DEMO_150208513_0
ora_pmon_ 10586 oracle mem REG 0 , 17 4194304 /dev/shm/ora_DEMO_150208513_1
ora_pmon_ 10586 oracle mem REG 0 , 17 4194304 /dev/shm/ora_DEMO_150208513_2
ora_pmon_ 10586 oracle mem REG 0 , 17 4194304 /dev/shm/ora_DEMO_150208513_3
ora_pmon_ 10586 oracle mem REG 0 , 17 4194304 /dev/shm/ora_DEMO_150208513_4
ora_pmon_ 10586 oracle mem REG 0 , 17 4194304 /dev/shm/ora_DEMO_150208513_5
...
ora_pmon_ 10586 oracle mem REG 252 , 0 1135194 /app/oracle/product/ 12.1 /lib/libskgxp12.so
ora_pmon_ 10586 oracle mem REG 252 , 0 6776936 /app/oracle/product/ 12.1 /lib/libcell12.so
ora_pmon_ 10586 oracle mem REG 252 , 0 14597 /app/oracle/product/ 12.1 /lib/libodmd12.so
ora_pmon_ 10586 oracle 0r CHR 1 , 3 0t0 /dev/ null
ora_pmon_ 10586 oracle 1w CHR 1 , 3 0t0 /dev/ null
ora_pmon_ 10586 oracle 2w CHR 1 , 3 0t0 /dev/ null
ora_pmon_ 10586 oracle 3r CHR 1 , 3 0t0 /dev/ null
ora_pmon_ 10586 oracle 4r REG 252 , 0 1233408 /app/oracle/product/ 12.1 /rdbms/mesg/oraus.msb
ora_pmon_ 10586 oracle 5r DIR 0 , 3 0 /proc/ 10586 /fd
ora_pmon_ 10586 oracle 6u REG 252 , 0 1544 /app/oracle/product/ 12.1 /dbs/hc_DEMO.dat
ora_pmon_ 10586 oracle 7u REG 252 , 0 24 /app/oracle/product/ 12.1 /dbs/lkDEMO_SITE1
ora_pmon_ 10586 oracle 8r REG 252 , 0 1233408 /app/oracle/product/ 12.1 /rdbms/mesg/oraus.msb
|
I’ve removed hundreds of lines with FD=mem and size=4M. I’m in AMM with memory_target=800M and SGA is implemented in /dev/shm granules. With lsof, we see all of them. And with a large memory_target we can have thousands of them (even if granule becomes 16M when memory_target is larger than 1GB). But don’t worry, they don’t count in the ‘nofile’ limit. Only ‘real’ file descriptors are counted – those with a numeric FD.
So, if you want to know the processes that are near the limit, you can use the following:
1
2
3
4
5
6
7
8
9
10
11
|
[oracle@VM211 ulimit]$ lsof | awk '$4 ~ /[0-9]+[rwu -].*/{p[$1"t"$2"t"$3]=p[$1"t"$2"t"$3]+1}END{for (i in p) print p[i],i}' | sort -n | tail
15 ora_dmon_ 10634 oracle
16 ora_dbw0_ 10608 oracle
16 ora_mmon_ 10626 oracle
16 ora_rsm0_ 10722 oracle
16 tnslsnr 9785 oracle
17 automount 1482 root
17 dbus-daem 1363 dbus
20 rpc.mount 1525 root
21 ora_lgwr_ 10610 oracle
89 master 1811 root
|
The idea is to filter the output of lsof and use awk to keep only the numeric file descriptors, and aggregate per process. Then, we sort them and show the highest counts. Here the Postfix master process has 89 files open. Then log writer follows.
You can get the same information from /proc filesystem where files handles are in /proc//fd:
for p in /proc/[0-9]* ; do echo $(ls $p/fd | wc -l) $(cat $p/cmdline) ; done | sort -n | tail
15 ora_dmon_DEMO
16 ora_dbw0_DEMO
16 ora_mmon_DEMO
16 ora_rsm0_DEMO
16 /app/oracle/product/12.1/bin/tnslsnrLISTENER-inherit
17 automount--pid-file/var/run/autofs.pid
17 dbus-daemon--system
20 rpc.mountd
21 ora_lgwr_DEMO
89 /usr/libexec/postfix/master
Same result, much quicker and more information about the process. This is the way I prefer, but remember that if you want to see all processes, you should be logged as root.
The proof
As I did for nproc, I have written a small C program that open files (passed as arguments) for a few seconds, so that I’m sure I’m counting the right things.
And I encourage to do the same on a test system and let me know if your result differs. Here is the source: openfiles.zip
First, I set my nofile limit to only 10
ulimit -n 10
Then, let’s open 7 files. In addition with stdin, stdout and stderr we will have 10 file handles:
1
2
3
4
5
6
7
8
|
[oracle@VM211 ulimit]$ ./openfiles myfile1.tmp myfile2.tmp myfile3.tmp myfile4.tmp myfile5.tmp myfile6.tmp myfile7.tmp &
open file 1 of 7 getrlimit nofile: soft= 10 hard= 10 myfile1.tmp
open file 2 of 7 getrlimit nofile: soft= 10 hard= 10 myfile2.tmp
open file 3 of 7 getrlimit nofile: soft= 10 hard= 10 myfile3.tmp
open file 4 of 7 getrlimit nofile: soft= 10 hard= 10 myfile4.tmp
open file 5 of 7 getrlimit nofile: soft= 10 hard= 10 myfile5.tmp
open file 6 of 7 getrlimit nofile: soft= 10 hard= 10 myfile6.tmp
open file 7 of 7 getrlimit nofile: soft= 10 hard= 10 myfile7.tmp
|
I was able to open those 7 files. Then I check lsof:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
[oracle@VM211 ulimit]$ lsof | grep openfiles
openfiles 21853 oracle cwd DIR 0 , 24 380928 9320 /tmp/ulimit
openfiles 21853 oracle rtd DIR 252 , 0 4096 2 /
openfiles 21853 oracle txt REG 0 , 24 7630 9494 /tmp/ulimit/openfiles
openfiles 21853 oracle mem REG 252 , 0 156928 1579400 /lib64/ld- 2.12 .so
openfiles 21853 oracle mem REG 252 , 0 1926800 1579401 /lib64/libc- 2.12 .so
openfiles 21853 oracle 0u CHR 136 , 1 0t0 4 /dev/pts/ 1
openfiles 21853 oracle 1u CHR 136 , 1 0t0 4 /dev/pts/ 1
openfiles 21853 oracle 2u CHR 136 , 1 0t0 4 /dev/pts/ 1
openfiles 21853 oracle 3r REG 0 , 24 0 9487 /tmp/myfile1.tmp
openfiles 21853 oracle 4r REG 0 , 24 0 9488 /tmp/myfile2.tmp
openfiles 21853 oracle 5r REG 0 , 24 0 9489 /tmp/myfile3.tmp
openfiles 21853 oracle 6r REG 0 , 24 0 9490 /tmp/myfile4.tmp
openfiles 21853 oracle 7r REG 0 , 24 0 9491 /tmp/myfile5.tmp
openfiles 21853 oracle 8r REG 0 , 24 0 9492 /tmp/myfile6.tmp
openfiles 21853 oracle 9r REG 0 , 24 0 9493 /tmp/myfile7.tmp
|
We see our 10 file handles and this proves that only numeric FD are counted when checking the nofile limit of 10. You see stdin, stdout, stderr as FD 0,1,2 and then my 7 files opened in read only.
Let’s try to open one more file:
1
2
3
4
5
6
7
8
9
10
|
[oracle@VM211 ulimit]$ ./openfiles myfile1.tmp myfile2.tmp myfile3.tmp myfile4.tmp myfile5.tmp myfile6.tmp myfile7.tmp myfile8.tmp
open file 1 of 8 getrlimit nofile: soft= 10 hard= 10 myfile1.tmp
open file 2 of 8 getrlimit nofile: soft= 10 hard= 10 myfile2.tmp
open file 3 of 8 getrlimit nofile: soft= 10 hard= 10 myfile3.tmp
open file 4 of 8 getrlimit nofile: soft= 10 hard= 10 myfile4.tmp
open file 5 of 8 getrlimit nofile: soft= 10 hard= 10 myfile5.tmp
open file 6 of 8 getrlimit nofile: soft= 10 hard= 10 myfile6.tmp
open file 7 of 8 getrlimit nofile: soft= 10 hard= 10 myfile7.tmp
open file 8 of 8 getrlimit nofile: soft= 10 hard= 10 myfile8.tmp
fopen() number 8 failed with errno= 24
|
Here the limit is reached and the open() call returns error 24 (ENFILE) because we reached the nofile=10.
Threads
When counting the processes for the nproc limit, we have seen that threads must be counted as processes. For the nofile limit we don’t need to detail the threads because all threads share the file descriptor table.
Recommended values
Currently this is what is set on Oracle linux 6 for 11gR2 (in /etc/security/limits.conf):
1
2
|
oracle soft nofile 1024
oracle hard nofile 65536
|
For 12c, these are set in /etc/security/limits.d/oracle-rdbms-server-12cR1-preinstall.conf which overrides /etc/security/limits.conf:
1
2
|
oracle soft nofile 1024
oracle hard nofile 65536
|
Do you think it’s a bit low? Just for information, here is what is set in the ODA X4-2:
oracle soft nofile 131072
In any case, it is a good idea to check if you are reaching the limit and the above scripts on lsof or /proc should help for that. |