Tips to Solve Linux & Unix Systems Hard Disk Problems

Want to diagnose corrupt disk issues on a server? Want to find out why you are getting “disk full” messages on screen? Want to learn how to solve full/corrupt and failed disk issues. Try these eight tips to diagnose a Linux and Unix server hard disk drive problems.

#1 – Error: No space left on device

When the Disk is full on Unix-like system you get an error message on screen. In this example, I’m running fallocate command and my system run out of disk space:

The first step is to run the df command to find out information about total space and available space on a file system including partitions:
$ df
OR try human readable output format:
$ df -h
Sample outputs:

From the df command output it is clear that /dev/sda10 has 4.0Gb of total space of which 4.0Gb is used.

Fixing problem when the disk is full

  1. Compress uncompressed log and other files using gzip or bzip2 or tar command:
  2. Delete unwanted files using rm command on a Unix-like system:
  3. Move files to other system or external hard disk using rsync command:
  4. Find out the largest directories or files eating disk space on a Unix-like systesm:
  5. Truncate a particular file. This is useful for log file:
  6. Find and remove large files that are open but have been deleted on Linux or Unix:

    To truncate it:

#2 – Is the file system is in read-only mode?

You may end up getting an error such as follows when you try to create a file or save a file:
$ cat > file
-bash: file: Read-only file system

Run mount command to find out if the file system is mounted in read-only mode:
$ mount
$ mount | grep '/ftpusers'

To fix this problem, simply remount the file system in read-write mode on a Linux based system:
# mount -o remount,rw /ftpusers/tmp
Another example, from my FreeBSD 9.x server to remount / in rw mode:
# mount -o rw /dev/ad0s1a /

#3 – Am I running out of inodes?

Sometimes, df command reports that there is enough free space but system claims file-system is full. You need to check for the inode which identifies the file and its attributes on a file systems using the following command:
$ df -i
$ df -i /ftpusers/

Sample outputs:

So /ftpusers has 62,50,496 total inodes but only 11,568 are used. You are free to create another 62,38,928 files on /ftpusers partition. If 100% of your inodes are used, try the following options:

  • Find unwanted files and delete or move to another server.
  • Find unwanted large files and delete or move to another server.

#4 – Is my hard drive is dying?

I/O errors in log file (such as /var/log/messages) indicates that something is wrong with the hard disk and it may be failing. You can check hard disk for errors using smartctl command, which is control and monitor utility for SMART disks under Linux and UNIX like operating systems. The syntax is:

You can also use “Disk Utility” to get the same information

#5 – Is my hard drive and server is too hot?

High temperatures can cause server to function poorly. So you need to maintain the proper temperature of the server and disk. High temperatures can result into server shutdown or damage to file system and disk. Use hddtemp or smartctl utility to find out the temperature of your hard on a Linux or Unix based system by reading data from S.M.A.R.T. on drives that support this feature. Only modern hard drives have a temperature sensor. hddtemp supports reading S.M.A.R.T. information from SCSI drives too. hddtemp can work as simple command line tool or as a daemon to get information from all servers:

You can use the smartctl command as follows too:

 

How do I get the CPU temperature?

You can use Linux hardware monitoring tool such as lm_sensor to get the cpu temperature on a Linux based system:

 

#6 – Dealing with corrupted file systems

File system on server may be get corrupted due to a hard reboot or some other error such as bad blocks. You can repair corrupted file systems with the following fsck command:

See how to surviving a Linux filesystem failures for more info.

#7 – Dealing with software RAID on a Linux

To find the current status of a Linux software raid type the following command:

You need to replace a failed hard drive. You must u remove the correct failed drive. In this example, I’m going to replace /dev/sdb (2nd hard drive of RAID 6). It is not necessary to take the storage offline to repair the RAID on Linux. This only works if your server support hot-swappable hard disk:

See our tips on increasing RAID sync speed on Linux for more information.

#8 – Dealing with hardware RAID

You can use the samrtctl command or vendor specific command to find out the status of RAID and disks in your controller:

See your vendor specific documentation to replace a failed disk.

0 (0)
Article Rating (No Votes)
Rate this article
Attachments
There are no attachments for this article.
Comments
There are no comments for this article. Be the first to post a comment.
Full Name
Email Address
Security Code Security Code
Related Articles RSS Feed
Linux - Cannot login from remote console but can access via ssh
Viewed 560 times since Fri, Jun 8, 2018
RHEL: Bonding network interfaces
Viewed 981 times since Sat, Jun 2, 2018
Improve security with polyinstantiation
Viewed 102 times since Fri, May 15, 2020
RHCS: Install a two-node basic cluster
Viewed 869 times since Sun, Jun 3, 2018
Open SSL Encrypt & Decrypt Files With Password Using OpenSSL
Viewed 707 times since Mon, Feb 18, 2019
systemd Auto-restart a crashed service in systemd
Viewed 314 times since Fri, Jan 17, 2020
Using etckeeper with git
Viewed 932 times since Sun, Jun 3, 2018
red hat 7 tmpfiles service
Viewed 423 times since Thu, Oct 11, 2018
HowTo: Create CSR using OpenSSL Without Prompt (Non-Interactive)
Viewed 742 times since Mon, Feb 18, 2019
Top 20 OpenSSH Server Best Security Practices ssh linux aix
Viewed 81 times since Fri, May 15, 2020