By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
Protecting data includes creating and managing backups. A backup, often called an archive, is a copy of data that can be restored sometime in the future should the data be destroyed or become corrupted. Backing up your data is a critical activity, but even more important is planning your backups. These plans include choosing backup types, determining the right compression methods to employ, and identifying which utilities will serve your organization's data needs best. You may also need to transfer your backup files over the network. In this case, ensuring that the archive is secure during transit is critical as well as validating its integrity once it arrives at its destination. All of these various topics concerning protecting your data files are covered in this guide. Key Topics: Understanding Backup Types Looking at Compression Methods Comparing Archive and Restore Utilities Securing Offsite/Off-System Backups Checking Backup Integrity Understanding Backup Types There are different classifications for data backups. Understanding these various categories is vital for developing your backup plan. The following backup types are the most common types: System image Full Incremental Differential Snapshot Snapshot clone
Each of these backup types is explored in this section. Their advantages and disadvantages are included.
System Image- A system image is a copy of the operating system binaries, configuration files, and anything else you need to boot the Linux system. Its purpose is to quickly restore your system to a bootable state. Sometimes called a clone, these backups are not normally used to recover individual files or directories, and in the case of some backup utilities, you cannot do so. Full- A full backup is a copy of all the data, ignoring its modification date. This backup type's primary advantage is that it takes a lot less time than other types to restore a system's data. However, not only does it take longer to create a full backup compared to the other types, it also requires more storage. It needs no other backup types to restore a system fully. Incremental- An incremental backup only makes a copy of data that has been modified since the last backup operation (any backup operation type). Typically, a file's modified timestamp is compared to the last backup type's timestamp. It takes a lot less time to create this backup type than the other types, and it requires a lot less storage space. However, the data restoration time for this backup type can be significant. Imagine that you performed a full backup copy on Monday and incremental backups on Tuesday through Friday. On Saturday the disk crashes and must be replaced. After the disk is replaced, you will have to restore the data using Monday's backup and then continue to restore data using the incremental backups created on Tuesday through Friday. This is very time-consuming and will cause significant delays in getting your system back in operation. Therefore, for optimization purposes, it requires a full backup to be completed periodically. Differential- A differential backup makes a copy of all data that has changed since the last full backup. It could be considered a good balance between full and incremental backups. This backup type takes less time than a full backup but potentially more time than an incremental backup. It requires less storage space than a full backup but more space than a plain incremental backup. Also, it takes a lot less time to restore using differential backups than incremental backups, because only the full backup and the latest differential backup are needed. For optimization purposes, it requires a full backup to be completed periodically. Snapshot- A snapshot backup is considered a hybrid approach, and it is a slightly different flavor of backups. First a full (typically read-only) copy of the data is made to backup media. Then pointers, such as hard links, are employed to create a reference table linking the backup data with the original data. The next time a backup is made, instead of a full backup, an incremental backup is made (only modified or new files are copied to the backup media), and the pointer reference table is copied and updated. This saves space because only modified files and the updated pointer reference table need to be stored for each additional backup. Note: split-mirror snapshot, where the data is kept on a mirrored storage device. When a backup is run, a copy of all the data is created, not just new or modified data.
With a snapshot backup, you can go back to any point in time and do a full system restore from that point. It also uses a lot less space than the other backup types. In essence, snapshots simulate multiple full backups per day without taking up the same space or requiring the same processing power as a full backup type would. The rsync utility (described later in this guide) uses this method. Snapshot Clone- Another variation of a snapshot backup is a snapshot clone. Once a snapshot is created, such as an LVM snapshot, it is copied, or cloned. Snapshot clones are useful in high data I/O environments. When performing the cloning, you minimize any adverse performance impacts to production data I/O because the clone backup takes place on the snapshot and not on the original data.
While not all snapshots are writable, snapshot clones are typically modifiable. If you are using LVM, you can mount these snapshot clones on a different system. Thus, a snapshot clone is useful in disaster recovery scenarios. Your particular server environment as well as data protection needs will dictate which backup method to employ. Most likely you need a combination of the preceding types to properly protect your data. Looking at Compression Methods Backing up data can potentially consume large amounts of additional disk or media space. Depending on the backup types you employ, you can reduce this consumption via data compression utilities.
The following popular utilities are available on Linux: gzip bzip2 xz zip
The advantages and disadvantages of each of these data compression methods are explored in this section.
gzip- The gzip utility was developed in 1992 as a replacement for the old compress program. Using the Lempel-Ziv (LZ77) algorithm to achieve text-based file compression rates of 60–70 percent, gzip has long been a popular data compression utility. To compress a file, simply type gzip followed by the file's name. The original file is replaced by a compressed version with a .gz file extension. To reverse the operation, type gunzip followed by the compressed file's name. bzip2- Developed in 1996, the bzip2 utility offers higher compression rates than gzip but takes slightly longer to perform the data compression. The bzip2 utility employs multiple layers of compression techniques and algorithms. Until 2013, this data compression utility was used to compress the Linux kernel for distribution. To compress a file, simply type bzip2 followed by the file's name. The original file is replaced by a compressed version with a .bz2 file extension. To reverse the operation, type bunzip2 followed by the compressed file's name, which decompresses (inflates) the data. Note: bzip utility program. However, in its layered approach, a patented data compression algorithm was employed. Thus, bzip2 was created to replace it and uses the Huffman coding algorithm instead, which is patent free. xz- Developed in 2009, the xz data compression utility quickly became very popular among Linux administrators. It boasts a higher default compression rate than bzip2 and gzip via the LZMA2 compression algorithm. However, with certain xz command options, you can employ the legacy LZMA compression algorithm, if needed or desired. The xz compression utility in 2013 replaced bzip2 for compressing the Linux kernel for distribution. To compress a file, simply type xz followed by the file's name. The original file is replaced by a compressed version with an .xz file extension. To reverse the operation, type unxz followed by the compressed file's name. zip- The zip utility has the ability to operate on multiple files. If you have ever created a zip file on a Windows operating system, then you've used this file format. Multiple files are packed together in a single file, often called a folder or an archive file, and then compressed. Another difference from the other Linux compression utilities is that zip does not replace the original file(s). Instead, it places a copy of the file(s) into the archive file. To archive and compress files with zip, type zip followed by the final archive file's name, which traditionally ends in a .zip extension. After the archive file, type one or more files you desire to place into the compressed archive, separating them with a space. The original files remain intact, but a copy of them is placed into the compressed zip archive file. To reverse the operation, type unzip followed by the compressed archive file's name. It's helpful to see a side-by-side comparison of the various compression utilities using their defaults. List: Comparing the various Linux compression utilities # cp /var/log/wtmp wtmp # cp wtmp wtmp1 # cp wtmp wtmp2 # cp wtmp wtmp3 # cp wtmp wtmp4 # ls -lh wtmp? -rw-r--r--. 1 root root 210K Oct 9 19:54 wtmp1 -rw-r--r--. 1 root root 210K Oct 9 19:54 wtmp2 -rw-r--r--. 1 root root 210K Oct 9 19:54 wtmp3 -rw-r--r--. 1 root root 210K Oct 9 19:54 wtmp4 # gzip wtmp1 # bzip2 wtmp2 # xz wtmp3 # zip wtmp4.zip wtmp4 adding: wtmp4 (deflated 96%) # # ls -lh wtmp?.* -rw-r--r--. 1 root root 7.7K Oct 9 19:54 wtmp1.gz -rw-r--r--. 1 root root 6.2K Oct 9 19:54 wtmp2.bz2 -rw-r--r--. 1 root root 5.2K Oct 9 19:54 wtmp3.xz -rw-r--r--. 1 root root 7.9K Oct 9 19:55 wtmp4.zip # ls wtmp? wtmp4
In the above List, first the /var/log/wtmp file is copied to the local directory using super user privileges. Four copies of this file are then made. Using the ls -lh command, you can see in human-readable format that the wtmp files are 210K in size. Next, the various compression utilities are employed. Notice that when using the zip command, you must give it the name of the archive file, wtmp4.zip, and follow it with any file names. In this case, only wtmp4 is put into the zip archive. After the files are compressed with the various utilities, another ls -lh command is issued in the List above. Notice the various file extension names as well as the files' compressed sizes. You can see that the xz program produces the highest compression of this file, because its file is the smallest in size. Note: -# option. The # is a number from 1 to 9, where 1 is the fastest but lowest compression and 9 is the slowest but highest compression method. The zip utility does not yet support these levels for compression, but it does for decompression. Typically, the utilities use -6 as the default compression level. It is a good idea to review these level specifications in each utility's man page, since useful but subtle differences exist. There are many compression methods. However, when you use a compression utility along with an archive and restore program for data backups, it is vital that you use a lossless compression method. A lossless compression is just as it sounds: no data is lost. The gzip, bzip2, xz, and zip utilities provide lossless compression. Obviously it is important not to lose data when doing backups. Comparing Archive and Restore Utilities There are several programs you can employ for managing backups. Some of the more popular products are Amanda, Bacula, Bareos, Duplicity, and BackupPC. Yet often these GUI and/or web-based programs have command-line utilities at their core. Our focus here is on those command-line utilities: cpio dd tar Copying with cpio The cpio utility's name stands for “copy in and out.” It gathers together file copies and stores them in an archive file. The program has several useful options. TABLE: The cpio command's commonly used options
To create an archive using the cpio utility, you have to generate a list of files and then pipe them into the command. List: Employing cpio to create an archive $ ls Project4?.txt Project42.txt Project43.txt Project44.txt Project45.txt Project46.txt $ ls Project4?.txt | cpio -ov > Project4x.cpio Project42.txt Project43.txt Project44.txt Project45.txt Project46.txt 59 blocks $ ls Project4?.* Project42.txt Project44.txt Project46.txt Project43.txt Project45.txt Project4x.cpio
Using the ? wildcard and the ls command, various text files within the present working directory are displayed first in the List above. This command is then used, and its STDOUT is piped as STDIN to the cpio utility. Read “Searching and Analyzing Text” if you need a refresher on STDOUT and STDIN.) The options used with the cpio command are -ov, which create an archive containing copies of the listed files. They also display the file's name as they are copied into the archive. The archive file used is named Project4x.cpio. Though not necessary, it is considered good form to use the .cpio extension on cpio archive files. Note: cpio utility. For example, suppose you want to create a cpio archive for any files within the virtual directory system owned by the JKirk user account. You can use the find / -user JKirk command and pipe it into the cpio utility in order to create the archive file. This is a handy feature. You can view the files stored within a cpio archive fairly easily. Just employ the cpio command again, and use its -itv options and the -I option to designate the archive file. List: Using cpio to list an archive's contents $ cpio -itvI Project4x.cpio -rw-r--r-- 1 Christin Christin 29900 Aug 19 17:37 Project42.txt -rw-rw-r-- 1 Christin Christin 0 Aug 19 18:07 Project43.txt -rw-rw-r-- 1 Christin Christin 0 Aug 19 18:07 Project44.txt -rw-rw-r-- 1 Christin Christin 0 Aug 19 18:07 Project45.txt -rw-rw-r-- 1 Christin Christin 0 Aug 19 18:07 Project46.txt
Though not displayed in List 12.3, the cpio utility maintains each file's absolute directory reference. Thus, it is often used to create system image and full backups. To restore files from an archive, employ just the -ivI options. However, because cpio maintains the files' absolute paths, this can be tricky if you need to restore the files to another directory location. To do this, you need to use the --no-absolute-filenames option. List: Using cpio to restore files to a different directory location $ ls -dF Projects Projects/ $ mv Project4x.cpio Projects/ $ cd Projects /home/Christine/Answers/Projects Project4x.cpio $ cpio -iv --no-absolute-filenames -I Project4x.cpio
In the above list, the Project4x.cpio archive file is moved into a preexisting subdirectory, Projects. By stripping the absolute path names from the archived files via the --no-absolute-filenames option, you restore the files to a new directory location. If you wanted to restore the files to their original location, simply leave that option off and just use the other cpio switches. Archiving with tar The tar utility's name stands for tape archiver, and it is popular for creating data backups. As with cpio, with the tar command, the selected files are copied and stored in a single file. This file is called a tar archive file. If this archive file is compressed using a data compression utility, the compressed archive file is called a tarball. The tar program has several useful options. TABLE: The tar command's commonly used tarball creation options
To create an archive using the tar utility, you have to add a few arguments to the options and the command. List: Using tar to create an archive file $ tar -cvf Project4x.tar Project4?.txt Project43.txt
In the above List, three options are used. The -c option creates the tar archive. The -v option displays the filenames as they are placed into the archive file. Finally, the -f option designates the archive filename, which is Project42x.tar. Though not required, it is considered good form to use the .tar extension on tar archive files. The command's last argument designates the files to copy into this archive.
Tip: tar command options. For this style, you remove the single dash from the beginning of the tar option. For example, -c becomes c. Keep in mind that additional old-style tar command options must not have spaces between them. Thus, tar cvf is valid, but tar c v f is not. If you are backing up lots of files or large amounts of data, it is a good idea to employ a compression utility. This is easily accomplished by adding an additional switch to your tar command options. List: Using tar to create a tarball $ tar -zcvf Project4x.tar.gz Project4?.txt $ ls Project4x.tar.gz Project4x.tar.gz
Notice that the tarball filename has the .tar.gz file extension. It is considered good form to use the .tar extension and tack on an indicator showing the compression method that was used. However, you can shorten it to .tgz if desired. There is a useful variation of this command to create both full and incremental backups. A simple example helps to explain this concept. List: Using tar to create a full backup $ tar -g FullArchive.snar -Jcvf Project42.txz Project4?.txt $ ls FullArchive.snar Project42.txz FullArchive.snar Project42.txz
Notice the -g option. The -g option creates a file, called a snapshot file, FullArchive.snar. The .snar file extension indicates that the file is a tarball snapshot file. The snapshot file contains metadata used in association with tar commands for creating full and incremental backups. The snapshot file contains file timestamps, so the tar utility can determine if a file has been modified since it was last backed up. The snapshot file is also used to identify any files that are new or determine if files have been deleted since the last backup. The previous example created a full backup of the designated files along with the metadata snapshot file, FullArchive.snar. Now the same snapshot file will be used to help determine if any files have been modified, are new, or have been deleted to create an incremental backup. List: Using tar to create an incremental backup $ echo "Answer to everything" >> Project42.txt $ tar -g FullArchive.snar -Jcvf Project42_Inc.txz Project4?.txt $ ls Project42_Inc.txz Project42_Inc.txz
In the above List, the file Project42.txt is modified. Again, the tar command uses the -g option and points to the previously created FullArchive.snar snapshot file. This time, the metadata within FullArchive.snar shows the tar command that the Project42.txt file has been modified since the previous backup. Therefore, the new tarball only contains the Project42.txt file, and it is effectively an incremental backup. You can continue to create additional incremental backups using the same snapshot file as needed. Note: tar command views full and incremental backups in levels. A full backup is one that includes all the files indicated, and it is considered a level 0 backup. The first tar incremental backup after a full backup is considered a level 1 backup. The second tar incremental backup is considered a level 2 backup, and so on. Whenever you create data backups, it is a good practice to verify them. TABLE: The tar command's commonly used archive verification options
Backup verification can take several different forms. You might ensure that the desired files (sometimes called members) are included in your backup by using the -v option on the tar command in order to watch the files being listed as they are included in the archive file. You can also verify that desired files are included in your backup after the fact. Use the -t option to list tarball or archive file contents. List: Using tar to list a tarball's contents $ tar -tf Project4x.tar.gz You can verify files within an archive file by comparing them against the current files. The option to accomplish this task is the -d option. List: Using tar to compare tarball members to external files $ tar -df Project4x.tar.gz Project42.txt: Mod time differs Project42.txt: Size differs
Another good practice is to verify your backup automatically immediately after the tar archive is created. This is easily accomplished by tacking on the -W option. List: Using tar to verify backed-up files automatically $ tar -Wcvf ProjectVerify.tar Project4?.txt Verify Project42.txt Verify Project43.txt Verify Project44.txt Verify Project45.txt Verify Project46.txt
You cannot use the -W option if you employ compression to create a tarball. However, you could create and verify the archive first and then compress it in a separate step. You can also use the -W option when you extract files from a tar archive. This is handy for instantly verifying files restored from archives. Be aware that several options used to create the backup, such as -g and -W, can also be used when restoring data. TABLE: The tar command's commonly used file restore options
Extracting files from an archive or tarball is fairly simple using the tar utility. List: Using tar to extract files from a tarball $ mkdir Extract $ mv Project4x.tar.gz Extract/ $ cd Extract $ tar -zxvf Project4x.tar.gz $ ls Project43.txt Project45.txt Project4x.tar.gz
In the above List, a new subdirectory, Extract, is created. The tarball created back in List 12.6 is moved to the new subdirectory, and then the files are restored from the tarball. If you compare the tar command used in this listing to the one used in List 12.6, you'll notice that here the -x option was substituted for the -c option used in List 12.6. Also notice in List 12.12 that the tarball is not removed after a file extraction, so you can use it again and again, as needed. Note: tar command has many additional capabilities, such as using tar backup parameters and/or the ability to create backup and restore shell scripts. Take a look at the GNU tar website, www.gnu.org/software/tar/manual, to learn more about this popular command-line backup utility. Since the tar utility is the tape archiver, you can also place your tarballs or archive files on tape, if desired. After mounting and properly positioning your tape, simply substitute your SCSI tape device filename, such as /dev/st0 or /dev/nst0, in place of the archive or tarball filename within your tar command. Duplicating with dd The dd utility allows you to back up nearly everything on a disk, including the old Master Boot Record (MBR) partitions some older Linux distributions still employ. It's primarily used to create low-level copies of an entire hard drive or partition. It is often used in digital forensics for creating system images, for copying damaged disks, and for wiping partitions. The command itself is fairly straightforward. The basic syntax structure for the dd utility is as follows: dd if=input-device of=output-device [OPERANDS] The output-device is either an entire drive or a partition. The input-device is the same. Just make sure that you get the right device for out and the right one for in; otherwise you may unintentionally wipe data. Besides the of and if, there are a few other arguments (called operands) that can assist in dd operations. TABLE: The dd command's commonly used operands
The status=LEVEL operand needs a little more explanation. LEVEL can be set to one of the following: none only displays error messages. noxfer does not display final transfer statistics. progress displays periodic transfer statistics.
It is usually easier to understand the dd utility through examples. List: Using dd to copy an entire disk # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdb 8:16 0 4M 0 disk ⌙sdb1 8:17 0 4M 0 part sdc 8:32 0 1G 0 disk ⌙sdc1 8:33 0 1023M 0 part # dd if=/dev/sdb of=/dev/sdc status=progress 8192+0 records in 8192+0 records out 4194304 bytes (4.2 MB) copied, 0.232975 s, 18.0 MB/s
In the above List, the lsblk command is used first. When copying disks via the dd utility, make sure the drives are not mounted anywhere in the virtual directory structure. The two drives involved in this operation, /dev/sdb and /dev/sdc, are not mounted. With the dd command, the if operand is used to indicate the disk we wish to copy, which is the /dev/sdb drive. The of operand indicates that the /dev/sdc disk will hold the copied data. Also, status=progress will display period transfer statistics.
You can also create a system image backup using a dd command similar to the one shown in List 12.13, with a few needed modifications. The basic steps are as follows:
Shut down your Linux system. Attach the necessary spare drives. You'll need one drive the same size or larger for each system drive. Boot the system using a live CD, DVD, or USB so that you can either keep the system's drives unmounted or unmount them prior to the backup operation. For each system drive, issue a dd command, specifying the drive to back up with the if operand and the spare drive with the of operand. Shut down the system, and remove the spare drives containing the system image.
Reboot your Linux system. If you have a disk you are getting rid of, you can also use the dd command to zero out the disk. List: Using dd to zero an entire disk # dd if=/dev/zero of=/dev/sdc status=progress 1061724672 bytes (1.1 GB) copied, 33.196299 s, 32.0 MB/s dd: writing to '/dev/sdc': No space left on device 2097153+0 records in 2097152+0 records out 1073741824 bytes (1.1 GB) copied, 34.6304 s, 31.0 MB/s
The if=/dev/zero uses the zero device file to write zeros to the disk. You need to perform this operation at least 10 times or more to thoroughly wipe the disk. You can also employ the /dev/random and/or the /dev/urandom device files to put random data onto the disk. This particular task can take a long time to run for large disks. It is still better to shred any disks that will no longer be used by your company. Replicating with rsync “Managing Files, Directories, and Text” covers this. The rsync utility is known for speed. With this program, you can copy files locally or remotely, and it is wonderful for creating backups. Before exploring the rsync program, it is a good idea to review a few of the commonly used options. This guide contains the more commonly used rsync options. There are a few additional switches that help with secure data transfers via the rsync utility: The -e, or --rsh, option changes the program to use for communication between a local and remote connection. The default is OpenSSH. The -z, or --compress, option compresses the file data during the transfer.
This option is the equivalent of using the -rlptgoD options and does the following: Directs rsync to copy files from the directory's contents and for any subdirectory within the original directory tree, consecutively copying their contents as well (recursively). Preserves the following items: Device files (only if run with super user privileges) File group File modification time File ownership (only if run with super user privileges) File permissions Special files Symbolic links
It's fairly simple to conduct rsync backup locally. The most popular options, -ahv, allow you to back up files to a local location quickly. List: Using rsync to back up files locally $ ls -sh *.tar 40K Project4x.tar 40K ProjectVerify.tar $ mkdir TarStorage $ rsync -avh *.tar TarStorage/ sending incremental file list Project4x.tar ProjectVerify.tar sent 82.12K bytes received 54 bytes 164.35K bytes/sec total size is 81.92K speedup is 1.00 $ ls TarStorage Project4x.tar ProjectVerify.tar
Where the rsync utility really shines is with protecting files as they are backed up over a network. For a secure remote copy to work, you need the OpenSSH service up and running on the remote system. In addition, the rsync utility must be installed on both the local and remote machines. List: Using rsync to back up files remotely $ rsync -avP -e ssh *.tar [email protected]:~ [email protected]'s password: 40,960 100% 7.81MB/s 0:00:00 (xfr#1, to-chk=1/2) ProjectVerify.tar 40,960 100% 39.06MB/s 0:00:00 (xfr#2, to-chk=0/2) sent 82,121 bytes received 54 bytes 18,261.11 bytes/sec total size is 81,920 speedup is 1.00
Notice that the -avP options are used with the rsync utility. These options not only set the copy mode to archive but will provide detailed information as the file transfers take place. The important switch to notice in this listing is the -e option. This option determines that OpenSSH is used for the transfer and effectively creates an encrypted tunnel so that anyone sniffing the network cannot see the data flowing by. The *.tar in the command simply selects what local files are to be copied to the remote machine. The last argument in the rsync command specifies the following: The user account (user1) located at the remote system to use for the transfer. The remote system's IPv4 address, but a hostname can be used instead. Where the files are to be placed. In this case, it is the home directory, indicated by the ~ symbol.
Notice also in that last argument that there is a needed colon (:) between the IPv4 address and the directory symbol. If you do not include this colon, you will copy the files to a new file named [email protected]~ in the local directory. Note: rsync utility uses OpenSSH by default. However, it's good practice to use the -e option. This is especially true if you are using any ssh command options, such as designating an OpenSSH key to employ or using a different port than the default port of 22. The rsync utility can be handy for copying large files to remote media. If you have a fast CPU but a slow network connection, you can speed things up even more by employing the rsync -z option to compress the data for transfer. This is not using gzip compression but instead applying compression via the zlib compression library. You can find more out about zlib at https://zlib.net. Securing Offsite/Off-System Backups In business, data is money. Thus it is critical not only to create data archives but also to protect them. There are a few additional ways to secure your backups when they are being transferred to remote locations. Besides rsync, you can use the scp utility, which is based on the Secure Copy Protocol (SCP). Also, the sftp program, which is based on the SSH File Transfer Protocol (SFTP), is a means for securely transferring archives. We'll cover both utilities in the following sections. Copying Securely via scp The scp utility is geared for quickly transferring files in a noninteractive manner between two systems on a network. This program employs OpenSSH. It is best used for small files that you need to securely copy on the fly, because if it gets interrupted during its operation, it cannot pick back up where it left off. For larger files or more extensive numbers of files, it is better to employ either the rsync or the sftp utility. There are some rather useful scp options. TABLE: The scp command's commonly used copy options
Performing a secure copy of files from a local system to a remote system is rather simple. You do need the OpenSSH service up and running on the remote system. List: Using scp to copy files securely to a remote system $ scp Project42.txt [email protected]:~ Project42.txt 100% 29KB 20.5MB/s 00:00 Notice that to accomplish this task, no scp command options are employed. The -v option gives a great deal of information that is not needed in this case. scp utility will overwrite any remote files with the same name as the one being transferred without asking or even displaying a message stating that fact. You need to be careful when copying files using scp that you don't tromp on any existing files. A handy way to use scp is to copy files from one remote machine to another remote machine. List: Using scp to copy files securely from/to a remote system $ ip addr show | grep 192 | cut -d" " -f6192.168.0.101/24 $ scp [email protected]:Project42.txt [email protected]:~ [email protected]'s password: Project42.txt 100% 29KB 4.8MB/s 00:00 Connection to 192.168.0.104 closed.
First in List 12.18, the current machine's IPv4 address is checked using the ip addr show command. Next the scp utility is employed to copy the Project42.txt file from one remote machine to another. Of course, you must have OpenSSH running on these machines and have a user account you can log into as well. Transferring Securely via sftp The sftp utility will also allow you to transfer files securely across the network. However, it is designed for a more interactive experience. With sftp, you can create directories as needed, immediately check on transferred files, determine the remote system's present working directory, and so on. In addition, this program employs OpenSSH. To get a feel for how this interactive utility works, it's good to see a simple example. List: Using sftp to access a remote system $ sftp [email protected] [email protected]'s password: Connected to 192.168.0.104. sftp> sftp> bye
In the above List: the sftp utility is used with a username and a remote host's IPv4 address. Once the user account's correct password is entered, the sftp utility's prompt is shown. At this point, you are connected to the remote system. At the prompt you can enter any commands, including help, to see a display of all the possible commands and, as shown in the listing, bye to exit the utility. Once you have exited the utility, you are no longer connected to the remote system. Before using the sftp interactive utility, it's helpful to know some of the more common commands. TABLE: The sftp command's commonly used commands
It can be a little tricky the first few times you use the sftp utility if you have never used an FTP interactive program in the past. List: Using sftp to copy a file to a remote system sftp> ls Desktop Documents Downloads Music Pictures Public Templates Videos sftp> lls AccountAudit.txt Grades.txt Project43.txt ProjectVerify.tar err.txt Life Project44.txt TarStorage Everything NologinAccts.txt Project45.txt Universe Extract Project42_Inc.txz Project46.txt FullArchive.snar Project42.txt Project4x.tar Galaxy Project42.txz Projects sftp> put Project4x.tar Uploading Project4x.tar to /home/Christine/Project4x.tar Project4x.tar 100% 40KB 15.8MB/s 00:00 sftp> ls Desktop Documents Downloads Music Pictures Project4x.tar Public Templates Videos sftp> exit In the above List: after the connection to the remote system is made, the ls command is used in the sftp utility to see the files in the remote user's directory. The lls command is used to see the files within the local user's directory. Next the put command is employed to send the Project4x.tar archive file to the remote system. There is no need to issue the progress command because by default progress reports are already turned on. Once the upload is completed, another ls command is used to see if the file is now on the remote system, and it is. Real World Scenario Backup Rule of Three Businesses need to have several archives in order to properly protect their data. The Backup Rule of Three is typically good for most organizations, and it dictates that you should have three archives of all your data. One archive is stored remotely to prevent natural disasters or other catastrophic occurrences from destroying all your backups. The other two archives are stored locally, but each is on a different media type. You hear about the various statistics concerning companies that go out of business after a significant data loss. A scarier statistic would be the number of system administrators who lose their jobs after such a data loss because they did not have proper archival and restoration procedures in place. The rsync, scp, and sftp utilities all provide a means to securely copy files. However, when determining what utilities to employ for your various archival and retrieval plans, keep in mind that one utility will not work effectively in every backup case. For example, generally speaking, rsync is better to use than scp in backups because it provides more options. However, if you just have a few files that need secure copying, scp works well. The sftp utility works well for any interactive copying, yet scp is faster because sftp is designed to acknowledge every packet sent across the network. It's most likely you will need to employ all of these various utilities in some way throughout your company's backup plans. Checking Backup Integrity Securely transferring your archives is not enough. You need to consider the possibility that the archives could become corrupted during transfer. Ensuring a backup file's integrity is fairly easy. A few simple utilities can help. Digesting an MD5 Algorithm The md5sum utility is based on the MD5 message digest algorithm. It was originally created to be used in cryptography. It is no longer used in such capacities due to various known vulnerabilities. However, it is still excellent for checking a file's integrity. List: Using md5sum to check the original file $ md5sum Project4x.tar efbb0804083196e58613b6274c69d88c Project4x.tar List: Using md5sum to check the uploaded file192.168.0.104/24 md5sum produces a 128-bit hash value. You can see from the results in the two listings that the hash values match. This indicates no file corruption occurred during its transfer. Warning: - A malicious attacker can create two files that have the same MD5 hash value. However, at this point in time, a file that is not under the attacker's control cannot have its MD5 hash value modified. Therefore, it is imperative that you have checks in place to ensure that your original backup file was not created by a third-party malicious user. An even better solution is to use a stronger hash algorithm. Securing Hash Algorithms The Secure Hash Algorithms (SHA) is a family of various hash functions. Though typically used for cryptography purposes, they can also be used to verify an archive file's integrity. Several utilities implement these various algorithms on Linux. The quickest way to find them is using the method shown below. Keep in mind that your particular distribution may store them in the /bin directory instead. List: Looking at the SHA utility names $ ls -1 /usr/bin/sha?sum /usr/bin/sha224sum /usr/bin/sha256sum /usr/bin/sha384sum /usr/bin/sha512sum
Each utility includes the SHA message digest it employs within its name. Therefore, sha384sum uses the SHA-384 algorithm. These utilities are used in a similar manner to the md5sum command. List: Using sha512sum to check the original file $ sha224sum Project4x.tar c36f1632cd4966967a6daa787cdf1a2d6b4ee5592 4e3993c69d9e9d0 Project4x.tar $ sha512sum Project4x.tar 6d2cf04ddb20c369c2bcc77db294eb60d401fb443 d3277d76a17b477000efe46c00478cdaf25ec6fc09 833d2f8c8d5ab910534ff4b0f5bccc63f88a992fa9 eb3 Project4x.tar
Notice in the above List that the different hash value lengths produced by the different commands. The sha512sum utility uses the SHA-512 algorithm, which is the best to use for security purposes and is typically employed to hash salted passwords in the /etc/shadow file on Linux. You can use these SHA utilities, just like the md5sum program was used in the above two lists to ensure archive files' integrity. That way, backup corruption is avoided as well as any malicious modifications to the file. Providing appropriate archival and retrieval of files is critical. Understanding your business and data needs is part of the backup planning process. As you develop your plans, look at integrity issues, archive space availability, privacy needs, and so on. Once rigorous plans are in place, you can rest assured that your data is protected. Important Exam Questions: 1. Describe the different backup types. - A system image backup takes a complete copy of files the operating system needs to operate. This allows a restore to take place, which will get the system back up and running. The full, incremental, and differential backups are tied together in how data is backed up and restored. Snapshots and snapshot clones are also closely related and provide the opportunity to achieve rigorous backups in high I/O environments. 2. Summarize compression methods. - The different utilities, gzip, bzip2, xz, and zip, provide different levels of lossless data compression. Each one's compression level is tied to how fast it operates. Reducing the size of archive data files is needed not only for backup storage but also for increasing transfer speeds across the network. 3. Compare the various archive/restore utilities. - The assorted command-line utilities each have their own strengths in creating data backups and restoring files. While cpio is one of the oldest, it allows for various files through the system to be gathered and put into an archive. The tar utility has long been used with tape media but provides rigorous and flexible archiving and restoring features, which make it still very useful in today's environment. The dd utility shines when it comes to making system images of an entire disk. Finally, not only is rsync very fast, but it also allows encrypted transfers of data across a network for remote backup storage. 4. Explain the needs when storing backups on other systems. - To move an archive across the network to another system, it is important to provide data security. Thus, often OpenSSH is employed. In addition, once an archive file arrives at its final destination, it is critical to ensure that no data corruption has occurred during the transfer. Therefore, tools such as md5sum and sha512sum are used.
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.