2. Dealing with Files and Filesystems

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 2. Dealing with Files and Filesystems

Introduction

Now that you’re a bit more comfortable with the Unix environment, it’s time to tackle some commands. It’s funny how some of the most useful commands on a Unix system have gained themselves a reputation for being user-unfriendly. Do find, grep, sed, tr, or mount make you shudder? If not, remember that you still have novice users who are intimidated by—and therefore aren’t gaining the full potential of—these commands.

This chapter also addresses some useful filesystem manipulations. Have you ever inadvertently blown away a portion of your directory structure? Would you like to manipulate /tmp or your swap partition? Do your Unix systems need to play nicely with Microsoft systems? Might you consider ghosting your BSD system? If so, this chapter is for you.

Find Things

Finding fles in Unix can be an exercise in frustration for a novice user. Here’s how to soften the learning curve.

Remember the first time you installed a Unix system? Once you successfully booted to a command prompt, I bet your first thought was, “Now what?” or possibly, “Okay, where is everything?” I’m also pretty sure your first foray into man find wasn’t all that enlightening.

How can you as an administrator make it easier for your users to find things? First, introduce them to the built-in commands. Then, add a few tricks of your own to soften the learning curve.

Finding Program Paths

Every user should become aware of the three w’s: which, whereis, and whatis. (Personally, I’d like to see some why and when commands, but that’s another story.)

Use which to find the path to a program. Suppose you’ve just installed xmms and wonder where it went:

% which xmms
/usr/X11R6/bin/xmms

Better yet, if you were finding out the pathname because you wanted to use it in a file, save yourself a step:

% echo `which xmms` >> somefile

Remember to use the backticks (`), often found on the far left of the keyboard on the same key as the tilde (~). If you instead use the single quote (') character, usually located on the right side of the keyboard on the same key as the double quote (“), your file will contain the echoed string which xmms instead of the desired path.

The user’s current shell will affect how which’s switches work. Here is an example from the C shell:

% which -a xmms
-a: Command not found.
/usr/X11R6/bin/xmms

% which which
which: shell built-in command.

This is a matter of which which the user is using. Here, the user used the which which is built into the C shell and doesn’t support the options used by the which utility. Where then is that which? Try the whereis command:

% whereis -b which
which: /usr/bin/which

Here, I used -b to search only for the binary. Without any switches, whereis will display the binary, the manpage path, and the path to the original sources.

If your users prefer to use the real which command instead of the shell version and if they are only interested in seeing binary paths, consider adding these lines to /usr/share/skel/dot.cshrc [Hack #9] :

alias which     /usr/bin/which -a
alias whereis   whereis -b

The -a switch will list all binaries with that name, not just the first binary found.

Finding Commands

How do you proceed when you know what it is that you want to do, but have no clue which commands are available to do it? I know I clung to the whatis command like a life preserver when I was first introduced to Unix. For example, when I needed to know how to set up PPP:

% whatis ppp
i4bisppp(4)              - isdn4bsd synchronous PPP over ISDN B-channel network driver
ng_ppp(4)                - PPP protocol netgraph node type
ppp(4)                   - point to point protocol network interface
ppp(8)                   - Point to Point Protocol (a.k.a. user-ppp)
pppctl(8)                - PPP control program
pppoed(8)                - handle incoming PPP over Ethernet connections
pppstats(8)              - print PPP statistics

On the days I had time to satisfy my curiosity, I tried this variation:

% whatis "(1)"

That will show all of the commands that have a manpage in section 1. If you’re rusty on your manpage sections, whatis intro should refresh your memory.

Finding Words

The previous commands are great for finding binaries and manpages, but what if you want to find a particular word in one of your own text files? That requires the notoriously user-unfriendly find command. Let’s be realistic. Even with all of your Unix experience, you still have to dig into either the manpage or a good book whenever you need to find something. Can you really expect novice users to figure it out?

To start with, the regular old invocation of find will find filenames, but not the words within those files. We need a judicious use of grep to accomplish that. Fortunately, find’s -exec switch allows it to use other utilities, such as grep, without forking another process.

Start off with a find command that looks like this:

% find . -type f -exec grep "word" {  } ;

This invocation says to start in the current directory (.), look through files, not directories (-type f), while running the grep command (-exec grep) in order to search for the word word. Note that the syntax of the -exec switch always resembles:

-exec command with_its_parameters {  } ;

What happens if I search the files in my home directory for the word alias?

% find . -type f -exec grep "alias" {  } ;
alias h                history 25
alias j                jobs -l
Antialiasing=true
Antialiasing arguments=-sDEVICE=x11 -dTextAlphaBits=4 -dGraphicsAlphaBits=2 
-dMaxBitmap=10000000
(proc-arg 0 "antialiasing" "Apply antialiasing (TRUE/FALSE)")
(proc-arg 0 "antialiasing" "Apply antialiasing (TRUE/FALSE)")

While it’s nice to see that find successfully found the word alias in my home directory, there’s one slight problem. I have no idea which file or files contained my search expression! However, adding /dev/null to that command will fix that:

# find . -type f -exec grep "alias" /dev/null {  } ; 
./.cshrc:alias h                history 25
./.cshrc:alias j                jobs -l
./.kde/share/config/kghostviewrc:Antialiasing=true
./.kde/share/config/kghostviewrc:Antialiasing arguments=-sDEVICE=x11 
-dTextAlphaBits=4 -dGraphicsAlphaBits=2 -dMaxBitmap=10000000
./.gimp-1.3/pluginrc:        (proc-arg 0 "antialiasing" "Apply antialiasing (TRUE/FALSE)")
./.gimp-1.3/pluginrc:        (proc-arg 0 "antialiasing" "Apply antialiasing (TRUE/FALSE)")

Why did adding nothing, /dev/null, automagically cause the name of the file to appear next to the line that contains the search expression? Is it because Unix is truly amazing? After all, it does allow even the state of nothingness to be expressed as a filename.

Actually, it works because grep will list the filename whenever it searches multiple files. When you just use { }, find will pass each filename it finds one at a time to grep. Since grep is searching only one filename, it assumes you already know the name of that file. When you use /dev/null { }, find actually passes grep two files, /dev/null along with whichever file find happens to be working on. Since grep is now comparing two files, it’s nice enough to tell you which of the files contained the search string. We already know /dev/null won’t contain anything, so we just convinced grep to give us the name of the other file.

That’s pretty handy. Now let’s make it friendly. Here’s a very simple script called fstring:

% more ~/bin/fstring
#!/bin/sh
# script to find a string
# replaces $1 with user's search string
find . -type f -exec grep "$1" /dev/null {  } ;

That $1 is a positional parameter. This script expects the user to give one parameter: the word the user is searching for. When the script executes, the shell will replace "$1" with the user’s search string. So, the script is meant to be run like this:

% fstring 
               word_to_search

If you’re planning on using this script yourself, you’ll probably remember to include a search string. If you want other users to benefit from the script, you may want to include an if statement to generate an error message if the user forgets the search string:

#!/bin/sh
# script to find a string
# replaces $1 with user's search string
# or gives error message if user forgets to include search string
if test $1
then
   find . -type f -exec grep "$1" /dev/null {  } ;
else
   echo "Don't forget to include the word you would like to search for"
   exit 1
fi

Don’t forget to make your script executable with chmod +x and to place it in the user’s path. /usr/local/bin is a good location for other users to benefit.

Get the Most Out of grep

You may not know where its odd name originated, but you can’t argue the usefulness of grep.

Have you ever needed to find a particular file and thought, “I don’t recall the filename, but I remember some of its contents”? The oddly named grep command does just that, searching inside files and reporting on those that contain a given piece of text.

Finding Text

Suppose you wish to search your shell scripts for the text $USER. Try this:

% grep -s '$USER' *
add-user:if [ "$USER" != "root" ]; then
bu-user:  echo "  [-u user] - override $USER as the user to backup"
bu-user:if [ "$user" = "" ]; then user="$USER"; fi
del-user:if [ "$USER" != "root" ]; then
mount-host:mounted=$(df | grep "$ALM_AFP_MOUNT/$USER")
.....
mount-user:  echo "  [-u user] - override $USER as the user to backup"
mount-user:if [ "$user" = "" ]; then user="$USER"; fi

In this example, grep has searched through all files in the current directory, displaying each line that contained the text $USER. Use single quotes around the text to prevent the shell from interpreting special characters. The -s option suppresses error messages when grep encounters a directory.

Perhaps you only want to know the name of each file containing the text $USER. Use the -l option to create that list for you:

% grep -ls '$USER' *
add-user
bu-user
del-user
mount-host
mount-user

Searching by Relevance

What if you’re more concerned about how many times a particular string occurs within a file? That’s known as a relevance search . Use a command similar to:

% grep -sc '$USER' * | grep -v ':0' | sort  -k 2 -t : -r
mount-host:6
mount-user:2
bu-user:2
del-user:1
add-user:1

How does this magic work? The -c flag lists each file with a count of matching lines, but it unfortunately includes files with zero matches. To counter this, I piped the output from grep into a second grep, this time searching for ':0' and using a second option, -v, to reverse the sense of the search by displaying lines that don’t match. The second grep reads from the pipe instead of a file, searching the output of the first grep.

For a little extra flair, I sorted the subsequent output by the second field of each line with sort -k 2, assuming a field separator of colon (-t :) and using -r to reverse the sort into descending order.

Document Extracts

Suppose you wish to search a set of documents and extract a few lines of text centered on each occurrence of a keyword. This time we are interested in the matching lines and their surrounding context, but not in the filenames. Use a command something like this:

% grep -rhiw -A4 -B4 'preferences' *.txt > research.txt
% more research.txt

This grep command searches all files with the .txt extension for the word preferences. It performs a recursive search (-r) to include all subdirectories, hides (-h) the filename in the output, matches in a case-insensitive (-i) manner, and matches preferences as a complete word but not as part of another word (-w). The -A4 and -B4 options display the four lines immediately after and before the matched line, to give the desired context. Finally, I’ve redirected the output to the file research.txt.

You could also send the output straight to the vim text editor with:

% grep -rhiw -A4 -B4 'preferences' *.txt | vim -
Vim: Reading from stdin...

Tip

vim can be installed from /usr/ports/editors/vim.

Specifying vim - tells vim to read stdin (in this case the piped output from grep) instead of a file. Type :q! to exit vim.

To search files for several alternatives, use the -e option to introduce extra search patterns:

% grep -e 'text1' -e 'text2' *

Tip

Q. How did grep get its odd name?

A. grep was written as a standalone program to simulate a commonly performed command available in the ancient Unix editor ex. The command in question searched an entire file for lines containing a regular expression and displayed those lines. The command was g/re/p: globally search for a regular expression and print the line.

Using Regular Expressions

To search for text that is more vaguely specified, use a regular expression. grep understands both basic and extended regular expressions, though it must be invoked as either egrep or grep -E when given an extended regular expression. The text or regular expression to be matched is usually called the pattern.

Suppose you need to search for lines that end in a space or tab character. Try this command (to insert a tab, press Ctrl-V and then Ctrl-I, shown as <tab> in the example):

% grep -n '[ <tab>]$' test-file
2:ends in space 
3:ends in tab

I used the [...] construct to form a regular expression listing the characters to match: space and tab. The expression matches exactly one space or one tab character. $ anchors the match to the end of a line. The -n flag tells grep to include the line number in its output.

Alternatively, use:

% grep -n '[[:blank:]]$' test-file
2:ends is space 
3:ends in tab

Regular expressions provide many preformed character groups of the form [[:description:]]. Example groups include all control characters, all digits, or all alphanumeric characters. See man re_format for details.

We can modify a previous example to search for either “preferences” or “preference” as a complete word, using an extended regular expression such as this:

% egrep -rhiw -A4 -B4 'preferences?' *.txt > research.txt

The ? symbol specifies zero or one of the preceding character, making the s of preferences optional. Note that I use egrep because ? is available only in extended regular expressions. If you wish to search for the ? character itself, escape it with a backslash, as in ?.

An alternative method uses an expression of the form ( string1 | string2 ), which matches either one string or the other:

% egrep -rhiw -A4 -B4 'preference(s|)' *.txt > research.txt

As a final example, use this to seek out all bash, tcsh, or sh shell scripts:

% egrep '^#!/bin/(ba|tc|)sh[[:blank:]]*$' *

The caret (^) character at the start of a regular expression anchors it to the start of the line (much as $ at the end anchors it to the end). (ba|tc|) matches ba, tc, or nothing. The * character specifies zero or more of [[:blank:]], allowing trailing whitespace but nothing else. Note that the ! character must be escaped as ! to avoid shell interpretation in tcsh (but not in bash).

Tip

Here’s a handy tip for debugging regular expressions: if you don’t pass a filename to grep, it will read standard input, allowing you to enter lines of text to see which match. grep will echo back only matching lines.

Combining grep with Other Commands

grep works well with other commands. For example, to display all tcsh processes:

% ps axww | grep -w 'tcsh'
saruman 10329  0.0  0.2    6416  1196  p1  Ss  Sat01PM  0:00.68 -tcsh (tcsh)
saruman 11351  0.0  0.2    6416  1300 std  Ss  Sat07PM  0:02.54 -tcsh (tcsh)
saruman 13360  0.0  0.0    1116     4 std  R+  10:57PM  0:00.00 grep -w tcsh
%

Notice that the grep command itself appears in the output. To prevent this, use:

% ps axww | grep -w '[t]csh'
saruman 10329  0.0  0.2    6416  1196  p1  Ss  Sat01PM  0:00.68 -tcsh (tcsh)
saruman 11351  0.0  0.2    6416  1300 std  Ss  Sat07PM  0:02.54 -tcsh (tcsh)
%

I’ll let you figure out how this works.

Manipulate Files with sed

If you’ve ever had to change the formatting of a file, you know that it can be a time-consuming process.

Why waste your time making manual changes to files when Unix systems come with many tools that can very quickly make the changes for you?

Removing Blank Lines

Suppose you need to remove the blank lines from a file. This invocation of grep will do the job:

% grep -v '^$' letter1.txt > tmp ; mv tmp letter1.txt

The pattern ^$ anchors to both the start and the end of a line with no intervening characters—the regexp definition of a blank line. The -v option reverses the search, printing all nonblank lines, which are then written to a temporary file, and the temporary file is moved back to the original.

Warning

grep must never output to the same file it is reading, or the file will end up empty.

You can rewrite the preceding example in sed as:

% sed '/^$/d' letter1.txt > tmp ; mv tmp letter1.txt

'/^$/d' is actually a sed script. sed’s normal mode of operation is to read each line of input, process it according to the script, and then write the processed line to standard output. In this example, the expression '/^$/ is a regular expression matching a blank line, and the trailing d' is a sed function that deletes the line. Blank lines are deleted and all other lines are printed. Again, the results are redirected to a temporary file, which is then copied back to the original file.

Searching with sed

sed can also do the work of grep:

% sed -n '/$USER/p' *

This command will yield the same results as:

% grep '$USER' *

The -n (no-print, perhaps) option prevents sed from outputting each line. The pattern /$USER/ matches lines containing $USER, and the p function prints matched lines to standard output, overriding -n.

Replacing Existing Text

One of the most common uses for sed is to perform a search and replace on a given string. For example, to change all occurrences of 2003 into 2004 in a file called date, include the two search strings in the format 's/ oldstring / newstring /', like so:

% sed 's/2003/2004/' date
Copyright 2004
...
This was written in 2004, but it is no longer 2003.
...

Almost! Noticed that that last 2003 remains unchanged. This is because without the g (global) flag, sed will change only the first occurrence on each line. This command will give the desired result:

% sed 's/2003/2004/g' date

Search and replace takes other flags too. To output only changed lines, use:

% sed -n 's/2003/2004/gp' date

Note the use of the -n flag to suppress normal output and the p flag to print changed lines.

Multiple Transformations

Perhaps you need to perform two or more transformations on a file. You can do this in a single run by specifying a script with multiple commands:

% sed 's/2003/2004/g;/^$/d' date

This performs both substitution and blank line deletion. Use a semicolon to separate the two commands.

Here is a more complex example that translates HTML tags of the form <font> into PHP bulletin board tags of the form [font]:

% cat index.html
<title>hello
</title>

% sed 's/<(.*)>/[1]/g' index.html
[title]hello
[/title]

How did this work? The script searched for an HTML tag using the pattern '<.*>‘. Angle brackets match literally. In a regular expression, a dot (.) represents any character and an asterisk (*) means zero or more of the previous item. Escaped parentheses, ( and ), capture the matched pattern laying between them and place it in a numbered buffer. In the replace string, 1 refers to the contents of the first buffer. Thus the text between the angle brackets in the search string is captured into the first buffer and written back inside square brackets in the replace string. sed takes full advantage of the power of regular expressions to copy text from the pattern to its replacement.

% cat index1.html
<title>hello</title>

% sed 's/<(.*)>/[1]/g' index1.html
[title>hello</title]

This time the same command fails because the pattern .* is greedy and grabs as much as it can, matching up to the second >. To prevent this behavior, we need to match zero or more of any character except <. Recall that [...] is a regular expression that lists characters to match, but if the first character is the caret (^), the match is reversed. Thus the regular expression [^<] matches any single character other than <. I can modify the previous example as follows:

% sed 's/<([^<]*)>/[1]/g' index1.html
[title]hello[/title]

Remember, grep will perform a case-insensitive search if you provide the -i flag. sed, unfortunately, does not have such an option. To search for title in a case-insensitive manner, form regular expressions using [...], each listing a character of the word in both upper- and lowercase forms:

% sed 's/[Tt][Ii][Tt][Ll][Ee]/title/g' title.html

Format Text at the Command Line

Combine basic Unix tools to become a formatting expert.

Don’t let the syntax of the sed command scare you off. sed is a powerful utility capable of handling most of your formatting needs. For example, have you ever needed to add or remove comments from a source file? Perhaps you need to shuffle some text from one section to another.

In this hack, I’ll demonstrate how to do that. I’ll also show some handy formatting tricks using two other built-in Unix commands, tr and col.

Adding Comments to Source Code

sed allows you to specify an address range using a pattern, so let’s put this to use. Suppose we want to comment out a block of text in a source file by adding // to the start of each line we wish to comment out. We might use a text editor to mark the block with bc-start and bc-end:

% cat source.c
  if (tTd(27, 1))
    sm_dprintf("%s (%s, %s) aliased to %s
",
        a->q_paddr, a->q_host, a->q_user, p);
  bc-start
    if (bitset(EF_VRFYONLY, e->e_flags))
  {
    a->q_state = QS_VERIFIED;
    return;
  }
  bc-end
  message("aliased to %s", shortenstring(p, MAXSHORTSTR));

and then apply a sed script such as:

% sed '/bc-start/,/bc-end/s/^////' source.c

to get:

if (tTd(27, 1))
    sm_dprintf("%s (%s, %s) aliased to %s
",
        a->q_paddr, a->q_host, a->q_user, p);
  //bc-start
  //  if (bitset(EF_VRFYONLY, e->e_flags))
  //  {
  //      a->q_state = QS_VERIFIED;
  //      return;
  //  }
  //bc-end
message("aliased to %s", shortenstring(p, MAXSHORTSTR));

The script used search and replace to add // to the start of all lines (s/^////) that lie between the two markers (/bc-start/,/bc-end/). This will apply to every block in the file between the marker pairs. Note that in the sed script, the / character has to be escaped as / so it is not mistaken for a delimiter.

Removing Comments

When we need to delete the comments and the two bc- lines (let’s assume that the edited contents were copied back to source.c), we can use a script such as:

% sed '/bc-start/d;/bc-end/d;/bc-start/,/bc-end/s/^////' source.c

Oops! My first attempt won’t work. The bc- lines must be deleted after they have been used as address ranges. Trying again we get:

% sed '/bc-start/,/bc-end/s/^////;/bc-start/d;/bc-end/d' source.c

If you want to leave the two bc- marker lines in but comment them out, use this piece of trickery:

% sed '/bc-start/,/bc-end/{/^//bc-/!s/////;}' source.c

to get:

if (tTd(27, 1))
    sm_dprintf("%s (%s, %s) aliased to %s
",
        a->q_paddr, a->q_host, a->q_user, p);
  //bc-start
if (bitset(EF_VRFYONLY, e->e_flags))
{

    a->q_state = QS_VERIFIED;
    return;

}
  //bc-end
message("aliased to %s", shortenstring(p, MAXSHORTSTR));

Note that in the bash shell you must use:

% sed '/bc-start/,/bc-end/{/^//bc-/!s/////;}' source.c

because the bang character (!) does not need to be escaped as it does in tcsh.

What’s with the curly braces? They prevent a common mistake. You may imagine that this example:

% sed -n '/$USER/p;p' *

prints each line containing $USER twice because of the p;p commands. It doesn’t, though, because the second p is not restrained by the /$USER/ line address and therefore applies to every line. To print twice just those lines containing $USER, use:

% sed -n '/$USER/p;/$USER/p' *

or:

% sed -n '/$USER/{p;p;}' *

The construct {...} introduces a function list that applies to the preceding line address or range.

A line address followed by ! (or ! in the tcsh shell) reverses the address range, and so the function (list) that follows is applied to all lines not matching. The net effect is to remove // from all lines that don’t start with //bc- but that do lie within the bc- markers.

Using the Holding Space to Mark Text

sed reads input into the pattern space, but it also provides a buffer (called the holding space) and functions to move text from one space to the other. All other functions (such as s and d) operate on the pattern space, not the holding space.

Check out this sed script:

% cat case.script 
# Sed script for case insensitive search
#
# copy pattern space to hold space to preserve it
h
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
# use a regular expression address to search for lines containing:
/test/ {
i
vvvv
a
^^^^
}
# restore the original pattern space from the hold space
x;p

First, I have written the script to a file instead of typing it in on the command line. Lines starting with # are comments and are ignored. Other lines specify a sed command, and commands are separated by either a newline or ; character. sed reads one line of input at a time and applies the whole script file to each line. The following functions are applied to each line as it is read:

h: Copies the pattern space (the line just read) into the holding space.
y/ABC/abc/: Operates on the pattern space, translating A to a, B to b, and C to c and so on, ensuring the line is all lowercase.
/test/ {...}: Matches the line just read if it includes the text test (whatever the original case, because the line is now all lowercase) and then applies the list of functions that follow. This example appends text before (i) and after (a) the matched line to highlight it.
x: Exchanges the pattern and hold space, thus restoring the original contents of the pattern space.
p: Prints the pattern space.

Here is the test file:

% cat case
This contains text         Hello
that we want to            TeSt
search for, but in         test
a case insensitive         XXXX 
manner using the sed       TEST
editor.                    Bye bye.
%

Here are the results of running our sed script on it:

% sed -n -f case.script case
This contains text         Hello
vvvv
that we want to            TeSt
^^^^
vvvv
search for, but in         test
^^^^
a case insensitive         XXXX 
vvvv
manner using the sed       TEST
^^^^
editor.                    Bye bye.

Notice the vvv ^^^ markers around lines that contain test.

Translating Case

The tr command can translate one character to another. To change the contents of case into all lowercase and write the results to file lower-case, we could use:

% tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz' 
               < case > lower-case

tr works with standard input and output only, so to read and write files we must use redirection.

Translating Characters

To translate carriage return characters into newline characters, we could use:

% tr \r \n < 
               cr
                > 
               lf

where cr is the original file and lf is a new file containing line feeds in place of carriage returns. represents a line feed character, but we must escape the backslash character in the shell, so we use \n instead. Similarly, a carriage return is specified as \r.

Removing Duplicate Line Feeds

tr can also squeeze multiple consecutive occurrences of a particular character into a single occurrence. For example, to remove duplicate line feeds from the lines file:

% tr -s \n < lines > tmp ; mv tmp lines

Here we use the tmp file trick again because tr, like grep and sed, will trash the input file if it is also the output file.

Deleting Characters

tr can also delete selected characters. If for instance if you hate vowels, run your documents through this:

% tr -d aeiou < file

Translating Tabs to Spaces

To translate tabs into multiple spaces, use the -x flag:

% cat tabs
col     col     col

% od -x tabs
0000000     636f    6c09    636f    6c09    636f    6c0a    0a00        
0000015

% col -x < tabs > spaces
% cat spaces
col     col     col

% od -h spaces
0000000     636f    6c20    2020    2020    636f    6c20    2020    2020
0000020     636f    6c0a    0a00                                        
0000025

In this example I have used od -x to octal dump in hexadecimal the contents of the before and after files, which shows more clearly that the translation has worked. (09 is the code for Tab and 20 is the code for Space.)

Delimiter Dilemma

Deal with double quotation marks in delimited files.

Importing data from a delimited text file into an application is usually painless. Even if you need to change the delimiter from one character to another (from a comma to a colon, for example), you can choose from many tools that perform simple character substitution with great ease.

However, one common situation is not solved as easily: many business applications export data into a space- or comma-delimited file, enclosing individual fields in double quotation marks. These fields often contain the delimiter character. Importing such a file into an application that processes only one delimiter (PostgreSQL for example) may result in an incorrect interpretation of the data. This is one of those situations where the user should feel lucky if the process fails.

One solution is to write a script that tracks the use of double quotes to determine whether it is working within a text field. This is doable by creating a variable that acts as a text/nontext switch for the character substitution process. The script should change the delimiter to a more appropriate character, leave the delimiters that were enclosed in double quotes unchanged, and remove the double quotes. Rather than make the changes to the original datafile, it’s safer to write the edited data to a new file.

Attacking the Problem

The following algorithm meets our needs:

Create the switch variable and assign it the value of 1, meaning “nontext”. We’ll declare the variable tswitch and define it as tswitch = 1.
Create a variable for the delimiter and define it. We’ll use the variable delim with a space as the delimiter, so delim = '‘.
Decide on a better delimiter. We’ll use the tab character, so new_delim = ' ‘.
Open the datafile for reading.
Open a new file for writing.

Now, for every character in the datafile:

Read a character from the datafile.
If the character is a double quotation mark, tswitch = tswitch * -1.
If the character equals the character in delim and tswitch equals 1, write new_delim to the new file.
If the character equals that in delim and tswitch equals -1, write the value of delim to the new file.
If the character is anything else, write the character to the new file.

The Code

The Python script redelim.py implements the preceding algorithm. It prompts the user for the original datafile and a name for the new datafile. The delim and new_delim variables are hardcoded, but those are easily changed within the script.

This script copies a space-delimited text file with text values in double quotes to a new, tab-delimited file without the double quotes. The advantage of using this script is that it leaves spaces that were within double quotes unchanged.

There are no command-line arguments for this script. The script will prompt the user for source and destination file information.

You can redefine the variables for the original and new delimiters, delim and new_delim, in the script as needed.

#!/usr/local/bin/python
import os

print """ Change text file delimiters.

# Ask user for source and target files.
sourcefile = raw_input('Please enter the path and name of the source file:')
targetfile = raw_input('Please enter the path and name of the target file:')

# Open files for reading and writing.
source = open(sourcefile,'r')
dest   = open(targetfile,'w')

# The variable 'tswitch' acts as a text/non-text switch that reminds python
# whether it is working within a text or non-text data field.
tswitch = 1

# If the source delimiter that you want to change is not a space,
# redefine the variable delim in the next line.
delim = ' '

# If the new delimiter that you want to change is not a tab,
# redefine the variable new_delim in the next line.
new_delim = '	'

for charn in source.read( ):
        if tswitch == 1:
              if charn == delim:
                       dest.write(new_delim)
              elif charn == '"':
                       tswitch = tswitch * -1
              else:
                       dest.write(charn)
     elif tswitch == -1:
              if charn == '"':
                      tswitch = tswitch * -1
              else:
                      dest.write(charn)


source.close( )
dest.close( )

Use of redelim.py assumes that you have installed Python, which is available through the ports collection or as a binary package. The Python module used in this code is installed by default.

Hacking the Hack

If you prefer working with Perl, DBD::AnyData is another good solution to this problem.

DOS Floppy Manipulation

Bring simplicity back to using floppies.

If you’re like many Unix users, you originally came from a Windows background. Remember your initial shock the first time you tried to use a floppy on a Unix system? Didn’t Windows seem so much simpler? Forever gone seemed the days when you could simply insert a floppy, copy some files over, and remove the disk from the drive. Instead, you were expected to plunge into the intricacies of the mount command, only to discover that you didn’t even have the right to use the floppy drive in the first place!

There are several ways to make using floppies much, much easier on your FreeBSD system. Let’s start by taking stock of the default mechanisms for managing floppies.

Mounting a Floppy

Suppose I have formatted a floppy on a Windows system, copied some files over, and now want to transfer those files to my FreeBSD system. In reality, that floppy is a storage media. Since it is storing files, it needs a filesystem in order to keep track of the locations of those files. Because that floppy was formatted on a Windows system, it uses a filesystem called FAT12.

In Unix, a filesystem can’t be accessed until it has been mounted. This means you have to use the mount command before you can access the contents of that floppy. While this may seem strange at first, it actually gives Unix more flexibility. An administrator can mount and unmount filesystems as they are needed. Note that I used the word administrator. Regular users don’t have this ability, by default. We’ll change that shortly.

Unix also has the additional flexibility of being able to mount different filesystems. In Windows, a floppy will always contain the FAT12 filesystem. BSD understands floppies formatted with either FAT12 or UFS, the Unix File System. As you might expect from the name, the UFS filesystem is assumed unless you specify otherwise.

For now, become the superuser and let’s pick apart the default invocation of the mount command:

% su
Password:
# mount -t msdos /dev/fd0 /mnt
#

I used the type (-t) switch to indicate that this floppy was formatted from an msdos-based system. I could have used the mount_msdosfs command instead:

# mount_msdosfs /dev/fd0 /mnt

Both commands take two arguments. The first indicates the device to be mounted. /dev/fd0 represents the first (0) floppy drive (fd) device (/dev).

The second argument represents the mount point. A mount point is simply an empty directory that acts as a pointer to the mounted filesystem. Your FreeBSD system comes with a default mount point called /mnt. If you prefer, create a different mount point with a more useful name. Just remember to keep that directory empty so it will be available as a mount point, because any files in your mount point will become hidden and inaccessible when you mount a device over it.

Tip

This can be a feature in itself if you have a filesystem that should always be mounted. Place a README file in /mnt/important_directory containing: “If you can see this file, contact the administrator at this number . . . .”

In this example, I’ll create a mount point called /floppy, which I’ll use in the rest of the examples in this hack:

# mkdir /floppy

Common Error Messages

This is a good place to explain some common error messages. Trust me, I experienced them all before I became proficient at this whole mount business. At the time, I wished for a listing of error messages so I could figure out what I had done wrong and how to fix it.

Let’s take a look at the output of this command:

# mount /dev/fd0 /mnt
mount: /dev/fd0 on /mnt: incorrect super block

Remember my first mount command? I know it worked, as I just received my prompt back. I know this command didn’t work, because mount instead wrote me a message explaining why it did not do what I asked.

That error message isn’t actually as bad as it sounds. I forgot to include the type switch, meaning mount assumed I was using UFS. Since this is a FAT12 floppy, it simply didn’t understand the filesystem.

This error message also looks particularly nasty:

fd0: hard error cmd=read fsbn 0 of 0-3 (No status)
msdosfs: /dev/fd0: Input/output error

If you get that one, quickly reach down and push in the floppy before anyone else notices. You forgot to insert it into the bay.

Here’s another error message:

msdosfs: /dev/fd0: Operation not permitted

Oops. Looks like I didn’t become the superuser before trying that mount command.

How about this one:

mount: /floppy: No such file or directory

Looks like I forgot to make that mount point first. A mkdir /floppy should fix that one.

The one error message you do not want to see is a system panic followed by a reboot. It took me a while to break myself of the habit of just ejecting a floppy once I had copied over the files I wanted. That’s something you just don’t do in Unix land.

You must first warn your operating system that you have finished using a filesystem before you physically remove it from the computer. Otherwise, when it goes out looking for a file, it will panic when it realizes that it has just disappeared off of the edge of the universe! (Well, the computer’s universe anyway.) Put yourself in your operating system’s shoes for a minute. The user entrusted something important to your care. You blinked for just a split second and it was gone, nowhere to be found. You’d panic too!

Managing the Floppy

How do you warn your operating system that the universe has shrunk? You unmount the floppy before you eject it from the floppy bay. Note that the actual command used is missing the first n and is instead spelled umount:

# umount /floppy

Also, the only argument is the name of your mount point. In this example, it’s /floppy.

How can you tell if a floppy is mounted? The disk free command will tell you:

# df
Filesystem  1K-blocks    Used   Avail Capacity  Mounted on
/dev/ad0s1a    257838   69838  167374    29%    /
devfs               1       1       0   100%    /dev
/dev/ad0s1e    257838     616  236596     0%    /tmp
/dev/ad0s1f  13360662 2882504 9409306    23%    /usr
/dev/ad0s1d    257838   28368  208844    12%    /var
/dev/fd0         1424       1    1423     0%    /floppy

as will the mount command with no arguments:

# mount
/dev/ad0s1a on / (ufs, local)
devfs on /dev (devfs, local)
/dev/ad0s1e on /tmp (ufs, local, soft-updates)
/dev/ad0s1f on /usr (ufs, local, soft-updates)
/dev/ad0s1d on /var (ufs, local, soft-updates)
/dev/fd0 on /floppy  (msdosfs, local)

This system currently has a floppy /dev/fd0 mounted on /floppy, meaning you’ll need to issue the umount command before ejecting the floppy.

Several other filesystems are also mounted, yet I only used the mount command on my floppy drive. When did they get mounted and how? The answer is in /etc/fstab , which controls which filesystems to mount at boot time. Here’s my /etc/fstab; it’s pretty similar to the earlier output from df:

# more /etc/fstab
# Device     Mountpoint          FStype       Options    Dump  Pass#
/dev/ad0s1b  none                swap         sw         0     0
/dev/ad0s1a  /                   ufs          rw         1     1
/dev/ad0s1e  /tmp                ufs          rw         2     2
/dev/ad0s1f  /usr                ufs          rw         2     2
/dev/ad0s1d  /var                ufs          rw         2     2
/dev/acd0    /cdrom              cd9660       ro,noauto  0     0
proc         /proc               procfs       rw         0     0
linproc      /compat/linux/proc  linprocfs    rw         0     0

Each mountable filesystem has its own line in this file. Each has its own unique mount point and its filesystem type listed. See how the /cdrom mount point has the options ro,noauto instead of rw? The noauto tells your system not to mount your CD-ROM at bootup. That is a good thing—if there’s no CD in the bay at boot time, the kernel will either give an error message or pause for a few seconds, looking for that filesystem.

However, you can mount a data CD-ROM at any time by simply typing:

# mount /cdrom

That command was shorter than the usual mount command for one reason: there was an entry for /cdrom in /etc/fstab. That means you can shorten the command to mount a floppy by creating a similar entry for /floppy. Simply add this line to /etc/fstab:

/dev/fd0    /floppy    msdos    rw,noauto    0    0

Test your change by inserting a floppy and issuing this command:

# mount /floppy

If you receive an error, check /etc/fstab for a typo and try again.

Allowing Regular Users to Mount Floppies

Now that the superuser can quickly mount floppies, let’s give regular users this ability. First, we have to change the default setting of the vfs.usermount variable:

# sysctl vfs.usermount=1
vfs.usermount: 0 -> 1

By changing the default 0 to a 1, we’ve just enabled users to mount virtual filesystems. However, don’t worry about your users running amok with this new freedom—the devices themselves are still owned by root. Check out the permissions on the floppy device:

# ls -l /dev/fd0
crw-r-----  1 root  operator   9,  0 Nov 28 08:31 /dev/fd0

If you’d like any user to have the right to mount a floppy, change the permissions so everyone has read and write access:

# chmod 666 /dev/fd0

Tip

Now, if you don’t want every user to have this right, you could create a group, add the desired users to that group, and assign that group permissions to /dev/fd0.

You’re almost there. The only kicker is that the user has to own the mount point. The best place to put a user’s mount point is in his home directory. So, logged in as your usual user account:

% mkdir ~/floppy

Now, do you think the mount command will recognize that new mount point?

% mount ~/floppy
mount: /home/dru/floppy: unknown special file or file system

Oh boy. Looks like we’re back to square one, doesn’t it? Remember, that entry in /etc/fstab only refers to root’s mount point, so I can’t use that shortcut to refer to my own mount point. While it’s great to have the ability to use the mount command, I’m truly too lazy to have to type out mount -t msdos /dev/fd0 ~/floppy, let alone remember it.

Thank goodness for aliases. Try adding these lines to the alias section of your ~.cshrc file:

alias mf    mount -t msdos /dev/fd0 ~/floppy
alias uf    umount ~/floppy

Now you simply need to type mf whenever you want to mount a floppy and uf when it’s time to unmount the floppy. Or perhaps you’ll prefer to create a keyboard shortcut [Hack #4] .

Formatting Floppies

Now that you can mount and unmount floppies with the best of them, it’s time to learn how to format them. Again, let’s start with the default invocations required to format a floppy, then move on to some ways to simplify the process.

When you format a floppy on a Windows or DOS system, several events occur:

The floppy is low-level formatted, marking the tracks and sectors onto the disk.
A filesystem is installed onto the floppy, along with two copies of its FAT table.
You are given the opportunity to give the floppy a volume label.

The same process also has to occur when you format a floppy on a FreeBSD system. On a 5.x system, the order goes like this:

% fdformat -f 1440 /dev/fd0
Format 1440K floppy `/dev/fd0'? (y/n): y
Processing ----------------------------------------

% bsdlabel -w /dev/fd0 fd1440

% newfs_msdos /dev/fd0
/dev/fd0: 2840 sectors in 355 FAT12 clusters (4096 bytes/cluster)
bps=512 spc=8 res=1 nft=2 rde=512 sec=2880 mid=0xf0 spf=2 spt=18 hds=2 hid=0

First, notice that we don’t use the mount command. You can’t mount a filesystem before you have a filesystem! (You do have to have the floppy in the drive, though.) Take a look at the three steps:

fdformat does the low-level format.
bsdlabel creates the volume label.
newfs_msdos installs the FAT12 filesystem.

If I see the following error message when I try to mount the floppy, I’ll realize that I forgot that third step:

% mf 
msdosfs: /dev/fd0: Invalid argument

Because my mf mount floppy alias uses the msdos filesystem, it will complain if the floppy isn’t formatted with FAT12.

Automating the Format Process

Any three-step process is just begging to be put into a shell script. I like to keep these scripts under ~/bin. If you don’t have this directory yet, create it. Then create a script called ff (for format floppy):

% cd
% mkdir bin
% cd bin
% vi ff
#!/bin/sh
#this script formats a floppy with FAT12
#that floppy can also be used on a Windows system

# first, remind the user to insert the floppy
echo "Please insert the floppy and press enter"
read pathname

# then, proceed with the three format steps

fdformat -f 1440 /dev/fd0
bsdlabel -w /dev/fd0 fd1440
newfs_msdos /dev/fd0
echo "Format complete."

Note that this script is basically those three commands, with comments thrown in so I remember what the script does. The only new part is the read pathname line. I added it to force the user to press Enter before the script proceeds.

Remember to make the script executable:

% chmod +x ff

I’ll then return to my home directory and see how it works. Since I use the C shell, I’ll use the rehash command to make the shell aware that there is a new executable in my path:

% cd
% rehash
% ff
Please insert the floppy and press enter

Format 1440K floppy `/dev/fd0'? (y/n): y
Processing ----------------------------------------
/dev/fd0: 2840 sectors in 355 FAT12 clusters (4096 bytes/cluster)
bps=512 spc=8 res=1 nft=2 rde=512 sec=2880 mid=0xf0 spf=2 spt=18 hds=2 hid=0
Format complete.

Not too bad. I can now manipulate floppies with my own custom mf, uf, and ff commands.

Access Windows Shares Without a Server

Share files between Windows and FreeBSD with a minimum of fuss.

You’ve probably heard of some of the Unix utilities available for accessing files residing on Microsoft systems. For example, FreeBSD provides the mount_smbfs and smbutil utilities to mount Windows shares and view or access resources on a Microsoft network. However, both of those utilities have a caveat: they require an SMB server. The assumption is that somewhere in your network there is at least one NT or 2000 Server.

Not all networks have the budget or the administrative expertise to allow for commercial server operating systems. Sure, you can install and configure Samba, but isn’t that overkill for, say, a home or very small office network? Sometimes you just want to share some files between a Windows 9x system and a Unix system. It’s a matter of using the right-sized tool for the job. You don’t bring in a backhoe to plant flowers in a window box.

Installing and Configuring Sharity-Light

If your small network contains a mix of Microsoft and Unix clients, consider installing Sharity-Light on the Unix systems. This application allows you to mount a Windows share from a Unix system. FreeBSD provides a port for this purpose (see the Sharity-Light web site for other supported platforms):

# cd /usr/ports/net/sharity-light
# make install clean

Since Sharity-Light is a command-line utility, you should be familiar with UNC or the Universal Naming Convention. UNC is how you refer to Microsoft shared resources from the command line. A UNC looks like \ NetBIOSname sharename. It starts with double backslashes, then contains the NetBIOS name of the computer to access and the name of the share on that computer.

Before using Sharity-Light, you need to know the NetBIOS names of the computers you wish to access. If you have multiple machines running Microsoft operating systems, the quickest way to view each system’s name is with nbtstat. From one of the Windows systems, open a command prompt and type:

C:> nbtstat -A 192.168.2.10

       NETBIOS Remote Machine Name Table

   Name        Type        Status
-----------------------------------------
LITTLE_WOLF  <00> UNIQUE    Registered
<snip>

Repeat for each IP address in your network. Your output will be several lines long, but the entry (usually the first) containing <00> is the one with the name you’re interested in. In this example, LITTLE_WOLF is the NetBIOS name associated with 192.168.2.10.

Tip

Even though nbtstat /? indicates that -A is used to view a remote system, it also works with the IP address of the local system. This allows you to check all of the IP addresses in your network from the same system.

Once you know which IP addresses are associated with which NetBIOS names, you’ll need to add that information to /etc/hosts on your Unix systems:

# more /etc/hosts
127.0.0.1          localhost
192.168.2.95       genisis        #this system
192.168.2.10       little_wolf    #98 system sharing cygwin2

You’ll also need to know the names of the shares you wish to access. Again, from a Microsoft command prompt, repeat this command for each NetBIOS name and make note of your results:

C:> net view \little_wolf
Shared resources at \LITTLE_WOLF

Sharename     Type       Comment
---------------------------------------
CYGWIN2      Disk
The command was completed successfully.

Here the computer known as LITTLE_WOLF has only one share, the CYGWIN2 directory.

Finally, you’ll need a mount point on your Unix system, so you might as well give it a useful name. Since the typical floppy mount point is /floppy and the typical CD mount point is /cdrom, let’s use /windows:

# mkdir /windows

Accessing Microsoft Shares

Once you know the names of your computers and shares, using Sharity-Light is very easy. As the superuser, mount the desired share:

# shlight //little_wolf/cygwin2 /windows
Password: 
Using port 49923 for NFS.

Tip

Watch your slashes. Microsoft uses the backslash () at the command line, whereas Unix and Sharity-Light use the forward slash (/).

Note that I was prompted for a password because Windows 9x and ME users have the option of password protecting their shares. This particular share did not have a password, so I simply pressed Enter.

Tip

Adding -n to the previous command will forego the password prompt. Type shlight -h to see all available options.

However, if the share is on a Windows NT Workstation, 2000 Pro, or XP system, you must provide a username and password valid on that system. The syntax is:

# shlight //2000pro/cdrom /windows -U 
               username
                -P 
               password

Once the share is mounted, it works like any other mount point. Depending on the permissions set on the share, you should be able to browse that shared directory, copy over or add files, and modify files. When you’re finished using the share, unmount it:

$ unshlight /windows

Deal with Disk Hogs

Fortunately, you no longer have to be a script guru or a find wizard just to keep up with what is happening on your disks.

Think for a moment. What types of files are you always chasing after so they don’t waste resources? Your list probably includes temp files, core files, and old logs that have already been archived. Did you know that your system already contains scripts capable of cleaning out those files? Yes, I’m talking about your periodic scripts.

Periodic Scripts

You’ll find these scripts in the following directory on a FreeBSD system:

% ls /etc/periodic/daily | grep clean
100.clean-disks
110.clean-tmps
120.clean-preserve
130.clean-msgs
140.clean-rwho
150.clean-hoststat

Are you using these scripts? To find out, look at your /etc/periodic.conf file. What, you don’t have one? That means you’ve never tweaked your default configurations. If that’s the case, copy over the sample file and take a look at what’s available:

# cp /etc/defaults/periodic.conf /etc/periodic.conf
# more /etc/periodic.conf

daily_clean_disks

Let’s start with daily_clean_disks. This script is ideal for finding and deleting files with certain file extensions. You’ll find it about two pages into periodic.conf, in the Daily options section, where you may note that it’s not enabled by default. Fortunately, configuring it is a heck of a lot easier than using cron to schedule a complex find statement.

Warning

Before you enable any script, test it first, especially if it’ll delete files based on pattern-matching rules. Back up your system first!

For example, suppose you want to delete old logs with the .bz2 extension. If you’re not careful when you craft your daily_clean_disks_files line, you may end up inadvertently deleting all files with that extension. Any user who has just compressed some important data will be very miffed when she finds that her data has mysteriously disappeared.

Let’s test this scenario. I’d like to prune all .core files and any logs older than .0.bz2. I’ll edit that section of /etc/periodic.conf like so:

# 100.clean-disks
daily_clean_disks_enable="YES"                     # Delete files daily
daily_clean_disks_files="*.[1-9].bz2 *.core"       # delete old logs, cores
daily_clean_disks_days=1                           # on a daily basis
daily_clean_disks_verbose="YES"                    # Mention files deleted

Notice my pattern-matching expression for the .bz2 files. My expression matches any filename (*) followed by a dot and a number from one to nine (.[1-9]), followed by another dot and the .bz2 extension.

Now I’ll verify that my system has been backed up, and then manually run that script. As this script is fairly resource-intensive, I’ll do this test when the system is under a light load:

# /etc/periodic/daily/100.clean-disks

Cleaning disks:
/usr/ports/distfiles/MPlayer-0.92.tar.bz2
/usr/ports/distfiles/gnome2/libxml2-2.6.2.tar.bz2
/usr/ports/distfiles/gnome2/libxslt-1.1.0.tar.bz2

Darn. Looks like I inadvertently nuked some of my distfiles. I’d better be a bit more explicit in my matching pattern. I’ll try this instead:

# delete old logs, cores
daily_clean_disks_files="messages.[1-9].bz2 *.core"       

# /etc/periodic/daily/100.clean-disks

Cleaning disks:
/var/log/messages.1.bz2
/var/log/messages.2.bz2
/var/log/messages.3.bz2
/var/log/messages.4.bz2

That’s a bit better. It didn’t delete /var/log/messages or /var/log/messages.0.bz2, which I like to keep on disk. Remember, always test your pattern matching before scheduling a deletion script. If you keep the verbose line at YES, the script will report the names of files it deletes.

daily_clean_tmps

The other cleaning scripts are quite straightforward to configure. Take daily_clean_tmps, for example:

# 110.clean-tmps
daily_clean_tmps_enable="NO"                   # Delete stuff daily
daily_clean_tmps_dirs="/tmp"                   # Delete under here
daily_clean_tmps_days="3"                      # If not accessed for
daily_clean_tmps_ignore=".X*-lock quota.user quota.group" # Don't delete
                                                          # these
daily_clean_tmps_verbose="YES"                 # Mention files deleted

This is a quick way to clean out any temporary directories. Again, you get to choose the locations of those directories. Here is a quick way to find out which directories named tmp are on your system:

# find / -type d -name tmp
/tmp
/usr/tmp
/var/spool/cups/tmp
/var/tmp

That command asks find to start at root (/) and look for any directories (-type d) named tmp (-name tmp). If I wanted to clean those daily, I’d configure that section like so:

# 110.clean-tmps

# Delete stuff daily
daily_clean_tmps_enable="YES"                        
daily_clean_tmps_dirs="/tmp /usr/tmp /var/spool/cups/tmp /var/tmp"        

# If not accessed for
daily_clean_tmps_days="1"                            

# Don't delete these
daily_clean_tmps_ignore=".X*-lock quota.user quota.group" 

# Mention files deleted
daily_clean_tmps_verbose="YES"

Again, I immediately test that script after saving my changes:

# /etc/periodic/daily/110.clean-tmps

Removing old temporary files:
  /var/tmp/gconfd-root

This script will not delete any locked files or temporary files currently in use. This is an excellent feature and yet another reason to run this script on a daily basis, preferably at a time when few users are on the system.

daily_clean_preserve

Moving on, the next script is daily_clean_preserve:

# 120.clean-preserve
daily_clean_preserve_enable="YES"              # Delete files daily
daily_clean_preserve_days=7                    # If not modified for
daily_clean_preserve_verbose="YES"             # Mention files deleted

What exactly is preserve? The answer is in man hier. Use the manpage search function (the / key) to search for the word preserve:

# man hier
                  /preserve
       preserve/ temporary home of files preserved after an accidental 
                 death of an editor; see ex(1)

Now that you know what the script does, see if the default settings are suited for your environment. This script is run daily, but keeps preserved files until they are seven days old.

The last three clean scripts deal with cleaning out old files from msgs, rwho and sendmail’s hoststat cache. See man periodic.conf for more details.

Incidentally, you don’t have to wait until it is time for periodic to do its thing; you can manually run any periodic script at any time. You’ll find them all in subdirectories of /etc/periodic/.

Limiting Files

Instead of waiting for a daily process to clean up any spills, you can tweak several knobs to prevent these files from being created in the first place. For example, the C shell itself provides limits, any of which are excellent candidates for a customized dot.cshrc file [Hack #9] .

To see the possible limits and their current values:

% limit
cputime         unlimited
filesize        unlimited
datasize        524288 kbytes
stacksize       65536 kbytes
coredumpsize    unlimited
memoryuse       unlimited
vmemoryuse      unlimited
descriptors     4557 
memorylocked    unlimited
maxproc         2278 
sbsize          unlimited

You can test a limit by typing it at the command line; it will remain for the duration of your current shell. If you like the limit, make it permanent by adding it to .cshrc. For example:

% limit filesize 2k
% limit | grep filesize
filesize     2 kbytes

will set the maximum file size that can be created to 2 KB. The limit command supports both k for kilobytes and m for megabytes. Do note that this limit does not affect the total size of the area available to store files, just the size of a newly created file. See the Quotas section of the FreeBSD Handbook if you intend to limit disk space usage.

Having created a file limit, you’ll occasionally want to exceed it. For example, consider decompressing a file:

% uncompress largefile.Z
Filesize limit exceeded

% unlimit filesize
% uncompress largefile.Z
%

The unlimit command will allow me to override the file-size limit temporarily (for the duration of this shell). If you really do want to force your users to stick to limits, read man limits.

Now back to shell limits. If you don’t know what a core file is, you probably don’t need to collect them. Sure, periodic can clean those files out for you, but why make them in the first place? Core files are large. You can limit their size with:

limit coredumpsize 1m

That command will limit a core file to 1 MB, or 1024 KB. To prevent core files completely, set the size to 0:

limit coredumpsize 0

If you’re interested in the rest of the built-in limits, you’ll find them in man tcsh . Searching for coredumpsize will take you to the right spot.

The Other BSDs

The preceding discussion is based on FreeBSD. Other BSD systems ship with similar scripts that do identical tasks, but they are kept in a single file instead of in a separate directory.

NetBSD

For daily, weekly, and monthly tasks, NetBSD uses the /etc/daily, /etc/weekly, and /etc/monthly scripts, whose behavior is controlled with the /etc/daily.conf, /etc/weekly.conf, and /etc/monthly.conf configuration files. For more information about them, read man daily.conf, man weekly.conf, and man monthly.conf.

OpenBSD

OpenBSD uses three scripts, /etc/daily, /etc/weekly, and /etc/monthly. You can learn more about them by reading man daily.

Manage Temporary Files and Swap Space

Add more temporary or swap space without repartitioning.

When you install any operating system, it’s important to allocate sufficient disk space to hold temporary and swap files. Ideally, you already know the optimum sizes for your system so you can partition your disk accordingly during the install. However, if your needs change or you wish to optimize your initial choices, your solution doesn’t have to be as drastic as a repartition—and reinstall—of the system.

Tip

man tuning has some practical advice for guesstimating the appropriate size of swap and your other partitions.

Clearing /tmp

Unless you specifically chose otherwise when you partitioned your disk, the installer created a /tmp filesystem for you:

% grep tmp /etc/fstab
/dev/ad0s1e    /tmp    ufs    rw    2    2

% df -h /tmp
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/ad0s1e   252M   614K   231M     0%    /tmp

Here I searched /etc/fstab for the /tmp filesystem. This particular filesystem is 256 MB in size. Only a small portion contains temporary files.

Tip

The df (disk free) command will always show you a number lower than the actual partition size. This is because eight percent of the filesystem is reserved to prevent users from inadvertently overflowing a filesystem. See man tunefs for details.

It’s always a good idea to clean out /tmp periodically so it doesn’t overflow with temporary files. Consider taking advantage of the built-in periodic script /etc/periodic/daily/110.clean-tmps [Hack #20] .

You can also clean out /tmp when the system reboots by adding this line to /etc/rc.conf:

clear_tmp_enable="YES"

Moving /tmp to RAM

Another option is to move /tmp off of your hard disk and into RAM. This has the built-in advantage of automatically clearing the filesystem when you reboot, since the contents of RAM are volatile. It also offers a performance boost, since RAM access time is much faster than disk access time.

Before moving /tmp, ensure you have enough RAM to support your desired /tmp size. This command will show the amount of installed RAM:

% dmesg | grep memory
real memory  = 335462400 (319 MB)
avail memory = 320864256 (306 MB)

Also check that your kernel configuration file contains device md (or memory disk). The GENERIC kernel does; if you’ve customized your kernel, double-check that you still have md support:

% grep -w md /usr/src/sys/i386/conf/CUSTOM
device        md    # Memory "disks"

Changing the /tmp line in /etc/fstab as follows will mount a 64 MB /tmp in RAM:

md /tmp mfs rw,-s64m 0 0

Next, unmount /tmp (which is currently mounted on your hard drive) and remount it using the new entry in /etc/fstab:

# umount /tmp
# mount /tmp

# df -h /tmp
Filesystem    Size   Used  Avail Capacity  Mounted on
/dev/md0       63M   8.0K    58M     0%    /tmp

Notice that the filesystem is now md0, the first memory disk, instead of ad0s1e, a partition on the first IDE hard drive.

Creating a Swap File on Disk

Swap is different than /tmp. It’s not a storage area for temporary files; instead, it is an area where the filesystem swaps data between RAM and disk. A sufficient swap size can greatly increase the performance of your filesystem. Also, if your system contains multiple drives, this swapping process will be much more efficient if each drive has its own swap partition.

The initial install created a swap filesystem for you:

% grep swap /etc/fstab
/dev/ad0s1b    none     swap    sw    0    0

% swapinfo
Device          1K-blocks     Used    Avail Capacity  Type
/dev/ad0s1b        639688       68   639620     0%    Interleaved

Note that the swapinfo command displays the size of your swap files. If you prefer to see that output in MB, try the swapctl command with the -lh flags (which make the listing more human):

% swapctl -lh
Device:       1048576-blocks      Used:
/dev/ad0s1b          624          0

To add a swap area, first determine which area of disk space to use. For example, you may want to place a 128 MB swapfile on /usr. You’ll first need to use dd to create this as a file full of null (or zero) bytes. Here I’ll create a 128 MB swapfile as /usr/swap0:

# dd if=/dev/zero of=/usr/swap0 bs=1024k count=128
128+0 records in
128+0 records out
134217728 bytes transferred in 4.405036 secs (30469156 bytes/sec)

Next, change the permissions on this file. Remember, you don’t want users storing data here; this file is for the filesystem:

# chmod 600 /usr/swap0

Since this is really a file on an existing filesystem, you can’t mount your swapfile in /etc/fstab. However, you can tell the system to find it at boot time by adding this line to /etc/rc.conf:

swapfile="/usr/swap0"

To start using the swapfile now without having to reboot the system, use mdconfig:

# mdconfig -a -t vnode -f /usr/swap0 -u 1 && swapon /dev/md1

The -a flag attaches the memory disk. -t vnode marks that the type of swap is a file, not a filesystem. The -f flag sets the name of that file: /usr/swap0.

The unit number -u 1 must match the name of the memory disk /dev/md1. Since this system already has /tmp mounted on /dev/md0, I chose to mount swap on /dev/md1. && swapon tells the system to enable that swap device, but only if the mdconfig command succeeded.

swapctl should now show the new swap partition:

% swapctl -lh
Device:       1048576-blocks      Used:
/dev/ad0s1b          624          0
/dev/md1             128          0

Monitoring Swap Changes

Whenever you make changes to swap or are considering increasing swap, use systat to monitor how your swapfiles are being used in real time:

% systat -swap

The output will show the names of your swap areas and how much of each is currently in use. It will also include a visual indicating what percentage of swap contains data.

OpenBSD Differences

You can make this hack work on OpenBSD, as long as you remember that the RAM disk device is rd and its configuration tool is rdconfig. Read the relevant manpages, and you’ll be hacking away.

Recreate a Directory Structure Using mtree

Prevent or recover from rm disasters.

Someday the unthinkable may happen. You’re doing some routine maintenance and are distracted by a phone call or perhaps another employee’s question. A moment later, you’re faced with the awful realization that your fingers typed either a rm * or a rm -R in the wrong place, and now a portion of your system has evaporated into nothingness.

Painful thought, isn’t it? Let’s pause for a moment to catch our breath and examine a few ways to prevent such a scenario from happening in the first place.

Close your eyes and think back to when you were a fresh-faced newbie and were introduced to the omnipotent rm command. Return to the time when you actually read man rm and first discovered the -i switch. “What a great idea,” you thought, “to be prompted for confirmation before irretrievably deleting a file from disk.” However, you soon discovered that this switch can be a royal PITA. Face it, it’s irritating to deal with the constant question of whether you’re sure you want to remove a file when you just issued the command to remove that file.

Necessary Interaction

Fortunately, there is a way to request confirmation only when you’re about to do something as rash as rm *. Simply make a file called -i. Well, actually, it’s not quite that simple. Your shell will complain if you try this:

% touch -i
touch: illegal option -- i
usage: touch [-acfhm] [-r file] [-t [[CC]Y]MMDDhhmm[.SS]] file ...

You see, to your shell, -i looks like the -i switch, which touch doesn’t have. That’s actually part of the magic. The reason why we want to make a file called -i in the first place is to fool your shell: when you type rm *, the shell will expand * into all of the files in the directory. One of those files will be named -i, and, voila, you’ve just given the interactive switch to rm.

So, how do we get past the shell to make this file? Use this command instead:

% touch ./-i

The ./ acts as a sort of separator instruction to the shell. To the left of the ./ go any options to the command touch; in this case, there are none. To the right of the ./ is the name of the file to touch in “this directory.”

In order for this to be effective, you need to create a file called -i in every directory that you would like to protect from an inadvertent rm *.

An alternative method is to take advantage of the rmstar shell variable found in the tcsh shell. This method will always prompt for confirmation of a rm *, regardless of your current directory, as long as you always use tcsh. Since the default shell for the superuser is tcsh, add this line to /root/.cshrc:

set rmstar

Tip

This is also a good line to add to /usr/share/skel/dot.cshrc [Hack #9] .

If you want to take advantage of the protection immediately, force the shell to reread its configuration file:

# source /root/.cshrc

Using mtree

Now you know how to protect yourself from rm *. Unfortunately, neither method will save you from a rm -R. If you do manage to blow away a portion of your directory structure, how do you fix the mess with a minimum of fuss, fanfare, and years of teasing from your coworkers? Sure, you can always restore from backup, but that means filling in a form in triplicate, carrying it with you as you walk to the other side of the building where backups are stored, and sheepishly handing it over to the clerk in charge of tape storage.

Fortunately for a hacker, there is always more than one way to skin a cat, or in this case, to save your skin. That directory structure had to be created in the first place, which means it can be recreated.

When you installed FreeBSD, it created a directory structure for you. The utility responsible for this feat is called mtree.

To see which directory structures were created with mtree:

% ls /etc/mtree/
./                    BSD.root.dist           BSD.x11-4.dist
../                   BSD.sendmail.dist       BSD.x11.dist
BSD.include.dist      BSD.usr.dist
BSD.local.dist        BSD.var.dist

Each of these files is in ASCII text, meaning you can read, and more interestingly, edit their contents. If you’re a hacker, I know what you’re thinking. Yes, you can edit a file to remove the directories you don’t want and to add other directories that you do.

Let’s start with a simpler example. Say you’ve managed to blow away /var. To recreate it:

# mtree -deU -f /etc/mtree/BSD.var.dist -p /var

where:

-d: Ignores everything except directory files.
-e: Doesn’t complain if there are extra files.
-U: Recreates the original ownerships and permissions.
-f /etc/mtree/BSD.var.dist: Specifies how to create the directory structure; this is an ASCII text file if you want to read up ahead of time on what exactly is going to happen.
-p /var: Specifies where to create the directory structure; if you don’t specify, it will be placed in the current directory.

When you run this command, the recreated files will be echoed to standard output so you can watch as they are created for you. A few seconds later, you can:

% ls /var
./            crash/          heimdal/        preserve/       yp/
../           cron/           lib/            run/
account/      db/             log/            rwho/
at/           empty/          mail/           spool/
backups/      games/          msgs/

That looks a lot better, but don’t breathe that sigh of relief quite yet. You still have to recreate all of your log files. Yes, /var/log is still glaringly empty. Remember, mtree creates a directory structure, not all of the files within that directory structure. If you have a directory structure containing thousands of files, you’re better off grabbing your backup tape.

There is hope for /var/log, though. Rather than racking your brain for the names of all of the missing log files, do this instead:

% more /etc/newsyslog.conf
# configuration file for newsyslog
# $FreeBSD: src/etc/newsyslog.conf,v 1.42 2002/09/21 12:07:35 markm Exp $
#
# Note: some sites will want to select more restrictive protections than the
# defaults.  In particular, it may be desirable to switch many of the 644
# entries to 640 or 600.  For example, some sites will consider the
# contents of maillog, messages, and lpd-errs to be confidential.  In the
# future, these defaults may change to more conservative ones.
#
# logfilename           [owner:group]    mode count size when  [ZJB] 
[/pid_file] [sig_num]
/var/log/cron                            600  3     100  *      J
/var/log/amd.log                         644  7     100  *      J
/var/log/auth.log                        600  7     100  *      J
/var/log/kerberos.log                    600  7     100  *      J
/var/log/lpd-errs                        644  7     100  *      J
/var/log/xferlog                         600  7     100  *      J
/var/log/maillog                         640  7     *    @T00   J
/var/log/sendmail.st                     640  10    *    168    B
/var/log/messages                        644  5     100  *      J
/var/log/all.log                         600  7     *    @T00   J
/var/log/slip.log        root:network    640  3     100  *      J
/var/log/ppp.log         root:network    640  3     100  *      J
/var/log/security                        600  10    100  *      J
/var/log/wtmp                            644  3     *    @01T05 B
/var/log/daily.log                       640  7     *    @T00   J
/var/log/weekly.log                      640  5     1    $W6D0  J
/var/log/monthly.log                     640  12    *    $M1D0  J
/var/log/console.log                     600  5     100  *      J

There you go, all of the default log names and their permissions. Simply touch the required files and adjust their permissions accordingly with chmod.

Customizing mtree

Let’s get a little fancier and hack the mtree hack. If you want to be able to create a homegrown directory structure, start by perusing the instructions in /usr/src/etc/mtree/README.

The one rule to keep in mind is don’t use tabs. Instead, use four spaces for indentation. Here is a simple example:

% more MY.test.dist
#home grown test directory structure
/set type=dir uname=test gname=test mode=0755
.
    test1
    ..
      test2
          subdir2a
          ..
          subdir2b
              ..
              subsubdir2c    mode=01777
              ..
              ..
    ..

Note that you can specify different permissions on different parts of the directory structure.

Next, I’ll apply this file to my current directory:

# mtree -deU -f MY.test.dist

and check out the results:

# ls -F
test1/
test2/
# ls -F test1
#
# ls -F test2
subdir2a/
subdir2b/
# ls -F test2/subdir2b
subsubdir2c/

As you can see, mtree can be a real timesaver if you need to create custom directory structures when you do installations. Simply take a few moments to create a file containing the directory structure and its permissions. You’ll gain the added bonus of having a record of the required directory structure.

Ghosting Systems

Do you find yourself installing multiple systems, all containing the same operating system and applications? As an IT instructor, I’m constantly installing systems for my next class or trying to fix the ramifications of a misconfiguration from a previous class.

As any system administrator can attest to, ghosting or hard drive-cloning software can be a real godsend. Backups are one thing; they retain your data. However, an image is a true timesaver—it’s a copy of the operating system itself, along with any installed software and all of your configurations and customizations.

I haven’t always had the luxury of a commercial ghosting utility at hand. As you can well imagine, I’ve tried every homegrown and open source ghosting solution available. I started with various invocations of dd, gzip, ssh, and dump, but kept running across the same fundamental problem: it was easy enough to create an image, but inconvenient to deploy that image to a fresh hard drive. It was doable in the labs that used removable drives, but, otherwise, I had to open up a system, cable in the drive to be deployed, copy the image, and recable the drive into its own system.

Forget the wear and tear on the equipment; that solution wasn’t working out to be much of a timesaver! What I really needed was a floppy that contained enough intelligence to go out on the network and retrieve and restore an image. I tried several open source applications and found that Ghost For Unix, g4u, best fit the bill.

Creating the Ghost Disk

You’re about two minutes away from creating a bootable g4u floppy. Simply download g4u-1.12fs from http://theatomicmoose.ca/g4u/ and copy it to a floppy:

# cat g4u-1.12fs > /dev/fd0

Your only other requirement is a system with a drive capable of holding your images. It can be any operating system, as long as it has an installed FTP server. If it’s a FreeBSD system, you can configure an FTP server through /stand/sysinstall. Choose Configure from the menu, then Networking. Use your spacebar to choose Anon FTP.

Choose Yes to the configuration message and accept the defaults by tabbing to OK. The welcome message is optional. Exit sysinstall once you’re finished.

You’ll then need to remove the remark (#) in front of the FTP line in /etc/inetd.conf, so it looks like this:

ftp   stream   tcp   nowait   root   /usr/libexec/ftpd    ftpd -l

If inetd is already running, inform it of the configuration change using killall -1 inetd. Otherwise, start inetd by simply typing inetd. To ensure the service is running:

# sockstat | grep 21
root   inetd   22433  4  tcp4   *:21     *:*

In this listing, the local system is listening for requests on port 21, and there aren’t any current connections listed in the remote address section (*:*).

g4u requires a username and a password before it will create or retrieve an image. The default account is install, but you can specify another user account when you use g4u. To create the install account on a FreeBSD FTP server:

# pw useradd install -m -s /bin/csh

Tip

Make sure that the shell you give this user is listed in /etc/shells or FTP authentication will fail.

Then, use passwd install to give this account a password you will remember.

Creating an Image

Before you create an image, fully configure a test system. For example, in my security lab, I usually install the latest release of FreeBSD, add my customized /etc/motd and shell prompt, configure X, and install and configure the applications students will use during their labs.

It’s a good idea to know ahead of time how large the hard drive is on the test system and how it has been partitioned. There are several ways to find out on a FreeBSD system, depending upon how good you are at math. One way is to go back into /stand/sysinstall and choose Configure then Fdisk. The first long line will give the size of the entire hard drive:

Disk name:       ad0
DISK Geometry:   19885 cyls/16 heads/63 sectors = 20044080 sectors (9787MB)

Press q to exit this screen. If you then type fdisk at the command line, you’ll see the size of your partitions:

# fdisk
<snip>
The data for partition 1 is:
sysid 165 (0xa5), (FreeBSD/NetBSD/386BSD)
    start 63, size 4095441 (1999 Meg), flag 80 (active)
<snip>
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>

This particular system has a 9787 MB hard drive that has one 1999 MB partition containing FreeBSD.

Tip

Whenever you’re using any ghosting utility, create an image using the smallest hard drive size that you have available, but which is also large enough to hold your desired data. This will reduce the size of the image and prevent the problems associated with trying to restore an image to a smaller hard drive.

Once you’re satisfied with your system, insert the floppy and reboot.

g4u will probe for hardware and configure the NIC using DHCP. Once it’s finished, you’ll be presented with this screen:

Welcome to g4u Harddisk Image Cloning V1.12!

* To upload disk-image to FTP, type:    uploaddisk serverIP [image] [disk]
* To upload partition to FTP, type:     uploadpart serverIP [image] [disk+part]
* To install harddisk from FTP, type:   slurpdisk  serverIP [image] [disk]
* To install partition from FTP, type:  slurppart  serverIP [image] [disk+part]
* To copy disks locally, type:          copydisk disk0 disk1

[disk] defaults to wd0 for first IDE disk, [disk+part] defaults to wd0d 
for the whole first IDE disk. Use wd1 for second IDE disk, sd0 for first 
SCSI disk, etc. Default image for slurpdisk is 'rwd0d.gz'. Run 'dmesg' to 
see boot messages, 'disks' for recognized disks, 'parts <disk>' for list 
of (BSD-type!) partitions on disk '<disk>" (wd0, ...), run any other 
commands without args to see usage message.

Creating the image is as simple as invoking uploaddisk with the IP address of the FTP server. If you wish, include a useful name for the image; in this example, I’ll call the image securitylab.gz:

# uploaddisk 192.168.2.95 securitylab.gz

( cat $tmpfile ; dd progress=1 if=/dev/rwd0d bs=1m | gzip -9 ) | ftp -n
tmpfile:
open 192.168.2.95
user install
bin
put - securitylab.gz
bye
5
4
3
2
1
working...
Connected to 192.168.2.95.
220 genisis FTP server (Version 6.00LS) ready.
331 Password required for install.
Password: 
type_password_here

230 User install logged in.
Remote system type is UNIX.
Using binary mode to transfer files.
200 Type set to I.
remote: securitylab.gz
227 Entering Passive Mode (192,168,2,95,192,1)
150 Opening BINARY mode data connection for 'securitylab.gz'.
...................

This will take a while. How long depends upon the size of the drive and the speed of your network. When it is finished, you’ll see a summary:

9787+1 records in
9787+1 records out
10262568960 bytes transferred in 6033.533 secs (1700921 bytes/sec)
226 Transfer complete.
3936397936 bytes sent in 1:40:29 (637.58 KB/s)
221 Goodbye.
#

You can also check out the size of the image on the FTP server:

% du -h ~install/securitylab.gz
3.7G /home/install/securitylab.gz

That’s not too bad. It took just over an hour and a half to compress that 9 GB drive to a 3.7 GB image. The g4u web site also has some hints for further reducing the size of the image or increasing the speed of the transfer.

Tip

If you use images on a regular basis, consider upgrading hubs or older switches to 100 MB switches. This can speed up your transfer rates significantly.

It’s also possible to create an image of each particular filesystem, but I find it easier just to image a fairly small drive. This is because an image of the entire drive includes the master boot record (MBR) or the desired partitioning scheme.

Deploying the Image

When you wish to install the image, use the floppy to boot the system to receive the image. Once you receive the prompt, specify the name of the image and the IP address of the FTP server:

# slurpdisk 192.168.2.95 securitylab.gz

It doesn’t matter what was previously on that drive. Since the MBR is recreated, the new drive will just contain the imaged data. Once the deployment is finished, simply reboot the system without the floppy.

Tip

If the new drive is bigger than the image, you’ll have free space left over on the drive that you can partition with a partitioning utility. Remember, don’t try to deploy an image to a smaller drive!

Table of Contents for 2. Dealing with Files and Filesystems

Create new playlist

Sign In

Sign Up

Chapter 2. Dealing with Files and Filesystems

Introduction

Find Things

Finding Program Paths

Finding Commands

Finding Words

See Also

Get the Most Out of grep

Finding Text

Searching by Relevance

Document Extracts

Tip

Tip

Using Regular Expressions

Tip

Combining grep with Other Commands

See Also

Manipulate Files with sed

Removing Blank Lines

Warning

Searching with sed

Replacing Existing Text

Multiple Transformations

See Also

Format Text at the Command Line

Adding Comments to Source Code

Removing Comments

Using the Holding Space to Mark Text

Translating Case

Translating Characters

Removing Duplicate Line Feeds

Deleting Characters

Translating Tabs to Spaces

See Also

Delimiter Dilemma

Attacking the Problem

The Code

Hacking the Hack

See Also

DOS Floppy Manipulation

Mounting a Floppy

Tip

Common Error Messages

Managing the Floppy

Allowing Regular Users to Mount Floppies

Tip

Formatting Floppies

Automating the Format Process

See Also

Access Windows Shares Without a Server

Installing and Configuring Sharity-Light

Tip

Accessing Microsoft Shares

Tip

Tip

See Also

Deal with Disk Hogs

Periodic Scripts

daily_clean_disks

Warning

daily_clean_tmps

daily_clean_preserve

Limiting Files

The Other BSDs

NetBSD

OpenBSD

See Also

Manage Temporary Files and Swap Space

Tip

Clearing /tmp

Tip

Moving /tmp to RAM

Creating a Swap File on Disk

Monitoring Swap Changes

OpenBSD Differences

See Also

Table of Contents for
2. Dealing with Files and Filesystems