LPI 102-500 – 103.3: Basic file management Part 2
This lesson is about the Tar command. Tar is a So-called archiving tool and is the abbreviation for tape Archive. Originally Tar was used to back up to tape drives. With the help of Tar, you can pack several files together and store them in one file. Contrary to popular belief, this does not necessarily mean that the files are always compressed. You can use compression with Tar, but you can leave it alone with Tar. You can create an archive from several files or directories, both compressed and in original size. And of course you can also unpack appropriately packed archives.
Using Tar can seem a little bit complicated if not done regularly. You usually have to remember two, three or four different options depending on the exact project. Let’s maybe first create a so called tar ball. So a file with the ending tar. We have a couple of files and a subfolder here and I would now like to create an archive from file one and file two. So we use the following command tar fire one sorry, what does this command mean? The option C stands for create so create a tar file. The option F stands for a file.
So create the file with the name file tar and use the two files file one and file two. So basically take file one and two and pack them into an archive called file tar. It is important here that the option f is always mentioned last of the corresponding options because Tar interprets everything that comes after F as a file name. So the file name must always come after the F. We see that we now have a tar file here with the name file tar. If I call the whole thing again and just swap CF with FC so the other way around, then I think we will see that we get an error message here.
So FC because tar is of the option that the file name comes directly after the F and it cannot do anything with the C here in this case. So we have now created the file tar from file one and file two. In order to check the content of the tar file we use the t option in conjunction with F. So tartf file tar and then we can see what files this archive contains. Even if a Tar archive has already been written, we can add files later so we don’t have to unpack everything and repack everything first, because we have the option R, we have to use it again in conjunction with F to add another file. Let’s try it out with tar RF and we are using our tarbore file dot tar and we want to pack in the file maybe 522. So we are now doing the test with tar TF file star and we see now the archive has three files and we have here our last file, file 22.
Of course we can work with wild cards again. Here one can, for example, compress the contents of an entire folder into a tar file. For example, with tarcf we want to create a file with the name phi two tar. And then we are using this wildcard here, period. Slash is the name of the folder we are actually in. And the asterisk or star means simply use all files and all subfolders of this folder here. So all the files and all the subfolders and put that in a tar ball called file two dot tar. Now we have file two dot tar here and we will check this with tartf 52. And here we see we have fire 515-25-2254, 711, 40, 712, our file tar, this one here and that subfolder test.
So this command was successful to unpack an archive again, basically just swap the C, which means create with the X, which means extract. Again, for creating a tarball we are using CF and for extracting XF and then maybe and yeah, the file has been unsipped, which may not be seen as clearly here. That’s why we are creating again maybe a new folder with the name test two. And let’s move the file two star to the test two folder. So let’s try it again here. You can see the files are not included here, tarxf 52. And now you see all the files are unpacked. Now in connection with tar one also likes to use the option B, which also stands for verbose here, so that we are also shown afterwards what exactly happened here, for example, TARX Xvfr. So we now unpack the file dot tar and it is indicated to us that these three files have been unpacked. If we did this without b, we would get no hint at all. As I said at the beginning, we also have the option of compressing the tar file or its content here. We have three different compression options available.
So we have just created a tar file here, we just created a tar file here. As a reminder, with tarcfire tar I will choose file three tar and we are using file one and file two. You remember from the beginning of this video. So we want to create file three tar from file one and file two. We now simply add that that is the option for the gzip compression. So that and I’m not sending it off yet, because even if this file is compressed with gzip, we can in principle choose the file extension freely. If I were to use the command as I have now written it, one would not necessarily know that this is a compressed archive. Therefore you should choose an appropriate file extension so that this is also evident. So in this case I would use tar GZ. Then we know that we have a tar ball that is encoded with the GZ compression. I also add the V option to get a detailed statement.
And now I would send this command and we now have a file called file three target gzet. If you look at the size of the file of the file tar and compare this with the file three tar gzet, then you can see the difference in the size here. Here we have 20,480 bytes, and here we only have 176 bytes. And both of these files have the same content. But this is compressed with gzip and this is not compressed. Even if the file is compressed, we can look at the content, you know, as a reminder, we used tartf and then the file tar. In this case we also have to put in that option for the gzip compression.
And here you see we have phi one and phi two. How do we proceed? If we want to unzip the archive again, we just switch the C with an x. So with tar XVX f 53 z here too added, and the f has to be at the end and the file has been extracted. Instead of gzip. You can also use bZIP two or x that. These are both compression methods. The corresponding options are j for bZIP two or j in capital letter for x that. Yeah, maybe an example for BZ two tar C. So CF create fire with the name tar BZ two and we want to use file one and file two. And here again we can use the v option if we want to.
We don’t need to do it, but in this case I want to do it. So I use the v option for the verbose for detailed output and I have to use the j option for the bZIP compression. And here you see the two files. And maybe at the end of this video we do another example for x that tar CV and the capital J for the x that compression and the f, of course, file seven, for example, tar x that file one, phi two and now we have created. It may be confusing at first, of course, with all the options.
And if you haven’t used tar for a long time, you usually have to look at the main page again to remember what the options were. Again, it is important that when an archive is created, c is always used for create followed by the f for file. If compression is to be used for this purpose, then it also use the corresponding option so that for the gzip compression, j for the bZIP two compression and capital J for the x z compression. And if you want you can use the option for a detailed output. If an archive is to be unpacked, you simply use the x for extract instead of the C and the rest remains the same.
Now we come to CPIO, which is the short form of copy in and copy out. It is an archiving program. Similar to Tar. CPIO files are often given the ending CPIO. Just like with tar. The ending Tar CPIO files can be combined as an archive, and here both with compression or without a compression. Various compression methods can also be used with CPIO. However, CPIO works a little differently than Tar. There are three general options here to keep in mind. There is the first option, the copy out mode, which is started with CPIO and the option O copy out mode. CPIO copies files into an archive. A list of files is read, is read in via standard input, often via Find or LS, and then copied to an archive. So you copy files out of the file system in order to pack them into an archive accordingly copy out mode. There is also the copy in mode which is initiated with the I option. CPIO copies files from an archive into the file system. The display of the content of a CPIO archive also falls under the copy in mode.
Third, there is the copy pass mode, which is initiated with P option. Here CPIO copies files from one directory tree to another. This is done by combining copy out and copy in without actually being packed and unpacked beforehand. In principle, it is nothing more than a normal copy command. So let’s look at example. We list the contents of the directory by which we are currently located. So with LS and in this case pass the result with the pipe to CPIO. And we use CPIO in the copyout mode. And we want to copy the files in the packed file content CPIO for example. So again, we want to copy files out of the file system and basically want to archive them in a file. We call the file, for example, Content CPIO. So we can see that it differs significantly from Tar, even if the functionality is actually the same. We first use LS here to pass a few files to CPIO.
We must define the O option for the copy out mode with the assignment symbol. So the result of LS is assigned to this file. We see that we have now created the Content CPIO file here to list the contents of this archive, we use the copy in mode so CPIO note that we are using a less than symbol and if I would do that now, the archive would be unpacked. Again, before we do this, we also use the option T which shows us the content as was also the case with Tar. And here we see which files and folders these file contains.
Now let’s run the command to unzip CPIO copy in mode less than and then contents CPIO. And we see that CPIO does not perform this action because the appropriate files already exist. By the way, we used the following commands to pack these files and folders and folders here to this file. And we used LS. As you have seen in my entire course, I always use Ll, which is an alias of LS with the option L and A. If we try this the same command with Ll, then you will see it does not work. So you have to use LS for this to work. But only as a quick note. Instead of using LS CPIO can for example, use find. Or we can use Find to send data to CPIO. In other words, so for example, Find pipe CPIO copy out mode greater than and maybe the file name packed to CPIO. For example, this basically says find everything that is in the current folder and pass the results to CPIO in Copyout mode and pack it in a file with the name packed two CPIO.
And now we have a file packed two CPIO which then has the following content we can check the content with CPIO with the option I for Copy in mode T and then Pact Two. And we see the content of this file here now. Sorry. So finally, let’s take a look at the command in Copy Path mode. For example, what does this command do? It searches for files and sub directories in the current directory. If Find has found them, the result is not displayed on the screen, but passed on to CPIO through the pipe. CPIO uses the Copy pass mode P option and copies the corresponding files and directories to the Tamp test folder. The D option here ensures that the Tamp test directory is created if it does not already exist. We will take a look at that path Temp test and we have the content here. Ultimately, this command is nothing more than a simple command, simple copy command. So we could use CPR and then Star for example, Temptest and we would get the same results here.
- gzip, gunzip, bzip2, bunzip2, xz, unxz
We used gzip in the last few lessons on Tar and CPIO. We want to go into a little more detail here. Gzip is probably the most widely used compression method on Linux, and files compressed with gzip usually have the file extension gzet. Somewhat unusual is that gzip can only compress a single file. So let’s try it out. Gzip phi one for example, and we see that there is now a new file phi one dot gzet. Let’s try to compress two files with gzip gzip, and we see what gzip did, compressing each file individually but not grouping it together. So we have phi two GZ and file 22 GZ.
This is also the reason why the Tar gzet file extension is so widespread. With Tar, you can combine the files into a single file and then compress them with gzip. As you may have noticed, the original file is deleted after it is compressed. So you see file one, file two, file 22 don’t exist anymore, even when unpacking, and we see that immediately. The compressed file is then deleted with the k option for keep. We can stop this.
So let’s try it out with gzip k and then file dot 40 711, for example. And we see we still have the file 40 711 here. In addition, there is the file 40 711 dot gzet. To unzip gzip archive, use the D option, which stands for decompress. So for example gzip d and then phi one gzet. And here too we have file one back, and file one gzet has been deleted accordingly if we add the k option. So for example, gzip d for decompress and k for keep, and then file two GZ, for example, then we see that file two has been unpacked and file two GZ is still there. Another way to unzip gzet file is the gun zip command or g unzip. It’s basically the same as gzip with a D option.
So for example, gunzip file 22 gzet, and here too, the file is extracted. And because we did not use the k option, file 22 GZ is deleted and the file 22 is available. Again, using bZIP two is very similar to gzip, but a different algorithm is used for compression. The syntax is as follows, for example and we see we now have filone bzet two and similar to gzip, the bZIP two original file. So file one is also deleted here t here two. You can suspend this behavior with the k option.
With bZIP two, several files are created if you want to compress several files at once. And here too, the option to unpack is called d four decompress. So bZIP 2D, then k, for example, phi one BZ two and we have our file one again. And the file one BZ two has remained instead of bZIP two and the option D, you can use the button zip two command or b unzip two command, which is also the same as bZIP two with a D option.
So bunzip two and then file one BZ two, and we get an error message output. File one already exists, and accordingly it was not unpacked. We delete file one beforehand and try it again. So RM file one and then we try it again. And now the file has been unzipped. Without any problems, we come to the last compression method required for the test. This is the XAT method, which again works very similarly. The command is x that file one. And now we have a file, file one x that here. The original file is also deleted here with the option k.
This can also be prevented, so exit k maybe file 22 and file 22 x. That method can only compress one file at a time and behaves like gzip and bZIP two when attempting to unzip several files at the same time, namely that several packed files are created. If you unpack, or if you want to unpack and x that file, choose the D option for decompress. So x that d and then file 22 x that. So again, same problem as before, the file exists. So let’s delete the file before 22, and now let’s try it again. And yeah, it works. And of course file 22 dot x that has been deleted because we haven’t used the k option. So we have three different compression tools, but all three work the same. Or the commands we use are actually always the same.
- File Globbing
Now we come to the topic of file blobbing. I mentioned it briefly in another lesson, but without explicitly using the term file blobbing, you could translate it as chunks or chunks of files. We can use various so-called wildcards to search for specific files or to only list specific files. I have prepared a folder here with a few files and if I let me show them to you. These are files. As you see, there is no content in the files. It’s not important here. And now I want to display all files with the extension TXT. I can use the Asterisk as a wildcard. So as a principal LS asterisk TXT, then these files are displayed to me. Of course the Asterisk can be used as a wide card in all possible way.
The Asterisk only says that the system is looking for any characters, even no characters. So if we are looking for files that contain the word test, we can for example, also write like this LS test and then the esters or star, as you want to say. And then we are also shown files that only contain tests and do not have a TXT extension. Of course there is not only the Asterisk, but of course a few more options. For example, the question mark. Let’s choose the same example as above LS question mark TXT and here we get no results at all. Why? While the Asterisk stands for any character and any number of characters, the question mark stands for any character, but not for any number, but only for a single one.
So we must now have a file with the extension TXT which has only one character as a name. We can briefly create a corresponding file, maybe touch a TXT. And if we run now LS question mark TXT, then we will be shown the corresponding result. Of course we can use several question marks when searching. For example LS 12345 dot TXT and then we get a result again, because as I said, every quotation question mark stands for a sign that it is looking for. Here we have five signs. Now let’s imagine that we know that there is a file dormant on the hard drive that we are looking for. We know that it begins with the Word test, but we no longer know exactly what it is called. But at least we still know that it was a CSV file. So how do we go about it? There are of course always several options, but one could for example be this one LS. That looks a bit cryptic now, but it’s actually not that difficult.
The square brackets indicate that we are either looking for a word that begins with a lowercase t or a word with a capitalized t. This is followed by the text est. So we are looking for a test with a capital letter and test with a small letter. Then we have an Asterisk here. So anything else? What comes after test and the ending is a CSV file. In this case the system shows us two files test 48 CSV and testfile 47 CSV. In this example we have used two different letters in the square brackets and you see here small t and capital T. And of course you can use any and any number here. For example also numbers or whole number ranges. For example, test one, two, four TXT and then it shows us the corresponding files. Test one, test two and test three. I think a test four does not exist, but as you see, test four is available here, but it has no TXT ending here.
So because of that only these three, test one, two, three files are shown. You can also specify in the square brackets which number or which character should not be used. In this case we do this with the roof key at the top left of the keyboard. So in German keyboard layout it is the top left. I don’t know if it is the same in US or UK or other keyboard layouts. So for example LS test, then the roof TXT and you see only test three TXT is displayed here because test one and test two are excluded here. Because of this sign here there are also curly brackets. Whole words that you are looking for can be placed here. At least two words must always be used for example LS, table and maybe share. And we see that chair cannot be accessed because we don’t have a file with the name chair but underneath it can easily be overlooked. A file that begins with table.
So it’s the file table TXT. Of course you can also combine a few things. An example of that would be LS Table test. And then for example this command says the following list files with the word table or test but these must not use one or two in the file name. And the file extension should start with a capital t, a small t or and capital x and a small x and a third other character include we only have one result here, test three TXT and the search term is very cumbersome. However, the main thing here is to show what Fire lopping is and how it works and I hope that came across somewhat in this video.