Now that you’re a bit more comfortable with the Unix environment,
it’s time to tackle some commands. It’s funny how some of the most
useful commands on a Unix system have gained themselves a reputation for
being user-unfriendly. Do find
,
grep
, sed
, tr
, or
mount
make you shudder? If not,
remember that you still have novice users who are intimidated by—and
therefore aren’t gaining the full potential of—these commands.
This chapter also addresses some useful filesystem manipulations. Have you ever inadvertently blown away a portion of your directory structure? Would you like to manipulate /tmp or your swap partition? Do your Unix systems need to play nicely with Microsoft systems? Might you consider ghosting your BSD system? If so, this chapter is for you.
Finding fles in Unix can be an exercise in frustration for a novice user. Here’s how to soften the learning curve.
Remember the first time you installed a Unix system? Once you
successfully booted to a command prompt, I bet your first thought was,
“Now what?” or possibly, “Okay, where is everything?” I’m also pretty
sure your first foray into man find
wasn’t all that enlightening.
How can you as an administrator make it easier for your users to find things? First, introduce them to the built-in commands. Then, add a few tricks of your own to soften the learning curve.
Every user should become aware of the three w
’s: which
, whereis
, and whatis
. (Personally, I’d like to see some
why
and when
commands, but that’s another
story.)
Use which
to find the path to
a program. Suppose you’ve just installed xmms
and wonder where it went:
% which xmms
/usr/X11R6/bin/xmms
Better yet, if you were finding out the pathname because you wanted to use it in a file, save yourself a step:
% echo `which xmms` >> somefile
Remember to use the backticks (`
), often found on
the far left of the keyboard on the same key as the tilde (~
). If you instead use the single quote (') character, usually located on the right side of the
keyboard on the same key as the double quote (“), your file will
contain the echoed string which
xmms
instead of the desired path.
The user’s current shell will affect how which
’s switches work. Here is an example
from the C shell:
%which -a xmms
-a: Command not found. /usr/X11R6/bin/xmms %which which
which: shell built-in command.
This is a matter of which which
the user is using. Here, the user used
the which
which is built into the C
shell and doesn’t support the options used by the which
utility. Where then is that which
? Try the whereis
command:
% whereis -b which
which: /usr/bin/which
Here, I used -b
to search
only for the binary. Without any switches, whereis
will display the binary, the manpage
path, and the path to the original sources.
If your users prefer to use the real which
command instead of the shell version
and if they are only interested in seeing binary paths, consider
adding these lines to /usr/share/skel/dot.cshrc [Hack
#9] :
alias which /usr/bin/which -a alias whereis whereis -b
The -a
switch will list all
binaries with that name, not just the first binary found.
How do you proceed when you know what it is that you want to
do, but have no clue which commands are available to do it? I know I
clung to the whatis
command like
a life preserver when I was first introduced to Unix. For example,
when I needed to know how to set up PPP:
% whatis ppp
i4bisppp(4) - isdn4bsd synchronous PPP over ISDN B-channel network driver
ng_ppp(4) - PPP protocol netgraph node type
ppp(4) - point to point protocol network interface
ppp(8) - Point to Point Protocol (a.k.a. user-ppp)
pppctl(8) - PPP control program
pppoed(8) - handle incoming PPP over Ethernet connections
pppstats(8) - print PPP statistics
On the days I had time to satisfy my curiosity, I tried this variation:
% whatis "(1)"
That will show all of the commands that have a manpage in
section 1. If you’re rusty on your manpage sections, whatis intro
should refresh your
memory.
The previous commands are great for finding binaries and
manpages, but what if you want to find a particular word in one of
your own text files? That requires the notoriously user-unfriendly
find
command. Let’s be realistic.
Even with all of your Unix experience, you still have to dig into
either the manpage or a good book whenever you need to find
something. Can you really expect novice
users to figure it out?
To start with, the regular old invocation of find
will find filenames, but not the words
within those files. We need a judicious use of grep
to accomplish that. Fortunately,
find
’s -exec
switch allows it to use other
utilities, such as grep
, without
forking another process.
Start off with a find
command
that looks like this:
% find . -type f -exec grep "word" { } ;
This invocation says to start in the current directory (.), look
through files, not directories (-type
f
), while running the grep
command
(-exec grep
) in order to search for
the word word
. Note that the syntax
of the -exec
switch always
resembles:
-exec command with_its_parameters { } ;
What happens if I search the files in my home directory for the
word alias
?
% find . -type f -exec grep "alias" { } ;
alias h history 25
alias j jobs -l
Antialiasing=true
Antialiasing arguments=-sDEVICE=x11 -dTextAlphaBits=4 -dGraphicsAlphaBits=2
-dMaxBitmap=10000000
(proc-arg 0 "antialiasing" "Apply antialiasing (TRUE/FALSE)")
(proc-arg 0 "antialiasing" "Apply antialiasing (TRUE/FALSE)")
While it’s nice to see that find
successfully found the word alias
in my home directory, there’s one
slight problem. I have no idea which file or
files contained my search expression! However, adding /dev/null to that command will fix that:
# find . -type f -exec grep "alias" /dev/null { } ;
./.cshrc:alias h history 25
./.cshrc:alias j jobs -l
./.kde/share/config/kghostviewrc:Antialiasing=true
./.kde/share/config/kghostviewrc:Antialiasing arguments=-sDEVICE=x11
-dTextAlphaBits=4 -dGraphicsAlphaBits=2 -dMaxBitmap=10000000
./.gimp-1.3/pluginrc: (proc-arg 0 "antialiasing" "Apply antialiasing (TRUE/FALSE)")
./.gimp-1.3/pluginrc: (proc-arg 0 "antialiasing" "Apply antialiasing (TRUE/FALSE)")
Why did adding nothing, /dev/null, automagically cause the name of the file to appear next to the line that contains the search expression? Is it because Unix is truly amazing? After all, it does allow even the state of nothingness to be expressed as a filename.
Actually, it works because grep
will list the filename whenever it
searches multiple files. When you just use {
}
, find
will pass each
filename it finds one at a time to grep
. Since grep
is searching only one filename, it
assumes you already know the name of that file. When you use /dev/null { }
, find
actually passes grep
two files, /dev/null along with whichever file
find
happens to be working on.
Since grep
is now comparing two
files, it’s nice enough to tell you which of the files contained the
search string. We already know /dev/null won’t contain anything, so we
just convinced grep
to give us the
name of the other file.
That’s pretty handy. Now let’s make it friendly. Here’s a very
simple script called fstring
:
% more ~/bin/fstring
#!/bin/sh
# script to find a string
# replaces $1 with user's search string
find . -type f -exec grep "$1" /dev/null { } ;
That $1
is a positional
parameter. This script expects the user to give one parameter: the
word the user is searching for. When the script executes, the shell
will replace "$1
" with the user’s
search string. So, the script is meant to be run like this:
% fstring
word_to_search
If you’re planning on using this script yourself, you’ll
probably remember to include a search string. If you want other users
to benefit from the script, you may want to include an if
statement to generate an error message if
the user forgets the search string:
#!/bin/sh # script to find a string # replaces $1 with user's search string # or gives error message if user forgets to include search string if test $1 then find . -type f -exec grep "$1" /dev/null { } ; else echo "Don't forget to include the word you would like to search for" exit 1 fi
Don’t forget to make your script executable with chmod +x
and to place it in the user’s path.
/usr/local/bin is a good location
for other users to benefit.
man which
man whereis
man whatis
man find
man grep
You may not know where its odd name originated, but you can’t argue the usefulness of grep.
Have you ever needed to find a particular file and thought, “I
don’t recall the filename, but I remember some of its contents”? The
oddly named grep
command does just
that, searching inside files and reporting on those that contain a given
piece of text.
Suppose you wish to search your shell scripts for the text
$USER
. Try this:
% grep -s '$USER' *
add-user:if [ "$USER" != "root" ]; then
bu-user: echo " [-u user] - override $USER as the user to backup"
bu-user:if [ "$user" = "" ]; then user="$USER"; fi
del-user:if [ "$USER" != "root" ]; then
mount-host:mounted=$(df | grep "$ALM_AFP_MOUNT/$USER")
.....
mount-user: echo " [-u user] - override $USER as the user to backup"
mount-user:if [ "$user" = "" ]; then user="$USER"; fi
In this example, grep
has
searched through all files in the current directory, displaying each
line that contained the text $USER
.
Use single quotes around the text to prevent the shell from
interpreting special characters. The -s
option suppresses error messages when
grep
encounters a directory.
Perhaps you only want to know the name of each file containing
the text $USER
. Use the -l
option to create that list for
you:
% grep -ls '$USER' *
add-user
bu-user
del-user
mount-host
mount-user
What if you’re more concerned about how many times a particular string occurs within a file? That’s known as a relevance search . Use a command similar to:
% grep -sc '$USER' * | grep -v ':0' | sort -k 2 -t : -r
mount-host:6
mount-user:2
bu-user:2
del-user:1
add-user:1
How does this magic work? The -c
flag lists each file with a count of
matching lines, but it unfortunately includes files with zero matches.
To counter this, I piped the output from grep
into a second grep
, this time searching for ':0
' and using a second option, -v
, to reverse the sense of the search by
displaying lines that don’t match. The second
grep
reads from the pipe instead of
a file, searching the output of the first grep
.
For a little extra flair, I sorted the subsequent output by the
second field of each line with sort -k
2
, assuming a field separator of colon (-t
:) and using -r
to reverse the sort into descending
order.
Suppose you wish to search a set of documents and extract a few lines of text centered on each occurrence of a keyword. This time we are interested in the matching lines and their surrounding context, but not in the filenames. Use a command something like this:
%grep -rhiw -A4 -B4 'preferences' *.txt > research.txt
%more research.txt
This grep
command searches
all files with the .txt
extension
for the word preferences
. It
performs a recursive search (-r
) to
include all subdirectories, hides (-h
) the filename in the output, matches in a
case-insensitive (-i
) manner, and
matches preferences
as a complete
word but not as part of another word (-w
). The -A4
and -B4
options display the four lines
immediately a
fter and b
efore the matched line, to give the desired
context. Finally, I’ve redirected the output to the file research.txt.
You could also send the output straight to the vim
text editor with:
% grep -rhiw -A4 -B4 'preferences' *.txt | vim -
Vim: Reading from stdin...
Specifying vim -
tells
vim
to read stdin (in this case the
piped output from grep
) instead of
a file. Type :q!
to exit vim
.
To search files for several alternatives, use the -e
option to introduce extra search
patterns:
% grep -e 'text1' -e 'text2' *
Q. How did grep
get its odd
name?
A. grep
was written as a
standalone program to simulate a commonly performed command
available in the ancient Unix editor ex
. The command in question searched an
entire file for lines containing a regular expression and displayed
those lines. The command was g/re/p
: g
lobally search for a r
egular e
xpression and p
rint the line.
To search for text that is more vaguely specified, use a
regular expression. grep
understands both basic and extended regular expressions, though it
must be invoked as either egrep
or
grep -E
when given an extended
regular expression. The text or regular expression to be matched is
usually called the pattern.
Suppose you need to search for lines that end in a space or tab
character. Try this command (to insert a tab, press Ctrl-V and then
Ctrl-I, shown as <tab>
in the
example):
% grep -n '[ <tab>]$' test-file
2:ends in space
3:ends in tab
I used the [...]
construct to
form a regular expression listing the characters to match: space and
tab. The expression matches exactly one space or
one tab character. $
anchors the
match to the end of a line. The -n
flag tells grep
to include the line
number in its output.
Alternatively, use:
% grep -n '[[:blank:]]$' test-file
2:ends is space
3:ends in tab
Regular expressions provide many preformed character groups of
the form [[
:description:]].
Example groups include all control characters, all digits, or all
alphanumeric characters. See man
re_format
for
details.
We can modify a previous example to search for either “preferences” or “preference” as a complete word, using an extended regular expression such as this:
% egrep -rhiw -A4 -B4 'preferences?' *.txt > research.txt
The ?
symbol specifies zero
or one of the preceding character, making the s
of preferences
optional. Note that I use
egrep
because ?
is
available only in extended regular expressions. If you wish to search
for the ?
character itself, escape
it with a backslash, as in ?
.
An alternative method uses an expression of the form (
string1
|
string2
)
, which matches either one string or the
other:
% egrep -rhiw -A4 -B4 'preference(s|)' *.txt > research.txt
As a final example, use this to seek out all bash
, tcsh
, or sh
shell scripts:
% egrep '^#!/bin/(ba|tc|)sh[[:blank:]]*$' *
The caret (^
) character at
the start of a regular expression anchors it to the start of the line
(much as $
at the end anchors it to
the end). (ba|tc|)
matches ba, tc,
or nothing. The *
character
specifies zero or more of [[:blank:]]
, allowing trailing whitespace
but nothing else. Note that the !
character must be escaped as !
to
avoid shell interpretation in tcsh
(but not in bash
).
grep
works well with other
commands. For example, to display all tcsh
processes:
% ps axww | grep -w 'tcsh'
saruman 10329 0.0 0.2 6416 1196 p1 Ss Sat01PM 0:00.68 -tcsh (tcsh)
saruman 11351 0.0 0.2 6416 1300 std Ss Sat07PM 0:02.54 -tcsh (tcsh)
saruman 13360 0.0 0.0 1116 4 std R+ 10:57PM 0:00.00 grep -w tcsh
%
Notice that the grep
command
itself appears in the output. To prevent this, use:
% ps axww | grep -w '[t]csh'
saruman 10329 0.0 0.2 6416 1196 p1 Ss Sat01PM 0:00.68 -tcsh (tcsh)
saruman 11351 0.0 0.2 6416 1300 std Ss Sat07PM 0:02.54 -tcsh (tcsh)
%
man grep
man re_format
(regular
expressions)
If you’ve ever had to change the formatting of a file, you know that it can be a time-consuming process.
Why waste your time making manual changes to files when Unix systems come with many tools that can very quickly make the changes for you?
Suppose you need to remove the blank lines from a file. This
invocation of grep
will do the
job:
% grep -v '^$' letter1.txt > tmp ; mv tmp letter1.txt
The pattern ^$
anchors to
both the start and the end of a line with no intervening
characters—the regexp definition of a blank line. The -v
option reverses the search, printing all
nonblank lines, which are then written to a temporary file, and the
temporary file is moved back to the original.
You can rewrite the preceding example in sed
as:
% sed '/^$/d' letter1.txt > tmp ; mv tmp letter1.txt
'/^$/d
' is actually a
sed
script. sed
’s normal mode of operation is to read
each line of input, process it according to the script, and then write
the processed line to standard output. In this example, the expression
'/^$/
is a regular expression
matching a blank line, and the trailing d
' is a sed
function that deletes the line. Blank
lines are deleted and all other lines are printed. Again, the results
are redirected to a temporary file, which is then copied back to the
original file.
sed
can also do the work of grep
:
% sed -n '/$USER/p' *
This command will yield the same results as:
% grep '$USER' *
The -n
(no-print, perhaps)
option prevents sed
from outputting
each line. The pattern /$USER/
matches lines containing $USER
, and
the p
function prints matched lines
to standard output, overriding -n
.
One of the most common uses for sed
is to perform a search and replace on a
given string. For example, to change all occurrences of 2003
into 2004
in a file called date, include the two search strings in the
format 's/
oldstring
/
newstring
/
', like so:
% sed 's/2003/2004/' date
Copyright 2004
...
This was written in 2004, but it is no longer 2003.
...
Almost! Noticed that that last 2003 remains unchanged. This is
because without the g
(global)
flag, sed
will change only the
first occurrence on each line. This command will
give the desired result:
% sed 's/2003/2004/g' date
Search and replace takes other flags too. To output only changed lines, use:
% sed -n 's/2003/2004/gp' date
Note the use of the -n
flag
to suppress normal output and the p
flag to print changed lines.
Perhaps you need to perform two or more transformations on a file. You can do this in a single run by specifying a script with multiple commands:
% sed 's/2003/2004/g;/^$/d' date
This performs both substitution and blank line deletion. Use a semicolon to separate the two commands.
Here is a more complex example that translates HTML tags of the
form <font>
into PHP bulletin
board tags of the form [font]
:
%cat index.html
<title>hello </title> %sed 's/<(.*)>/[1]/g' index.html
[title]hello [/title]
How did this work? The script searched for an HTML tag using the
pattern '<.*>
‘. Angle
brackets match literally. In a regular expression, a dot (.)
represents any character and an asterisk (*
) means zero or more of the previous item.
Escaped parentheses, (
and
)
, capture the matched pattern
laying between them and place it in a numbered buffer. In the replace
string, 1
refers to the contents
of the first buffer. Thus the text between the angle brackets in the
search string is captured into the first buffer and written back
inside square brackets in the replace string. sed
takes full advantage of the power of
regular expressions to copy text from the pattern to its
replacement.
%cat index1.html
<title>hello</title> %sed 's/<(.*)>/[1]/g' index1.html
[title>hello</title]
This time the same command fails because the pattern .*
is greedy and grabs as much as it can,
matching up to the second >
. To
prevent this behavior, we need to match zero or more of any character
except <
. Recall that [...]
is a regular expression that lists
characters to match, but if the first character is the caret (^
), the match is reversed. Thus the regular
expression [^<]
matches any
single character other than <
. I
can modify the previous example as follows:
% sed 's/<([^<]*)>/[1]/g' index1.html
[title]hello[/title]
Remember, grep
will perform a
case-insensitive search if you provide the -i
flag. sed
, unfortunately, does not have such an
option. To search for title
in a
case-insensitive manner, form regular expressions using [...]
, each listing a character of the word
in both upper- and lowercase forms:
% sed 's/[Tt][Ii][Tt][Ll][Ee]/title/g' title.html
man grep
man sed
man re_format
(regular
expressions)
"sed
& Regular
Expressions” at http://main.rtfiber.com.tw/~changyj/sed/
Cool sed
tricks at http://www.wagoneers.com/UNIX/SED/sed.html
The sed
FAQ (http://doc.ddart.net/shell/sedfaq.htm)
The sed
Script Archive
(http://sed.sourceforge.net/grabbag/scripts/)
Combine basic Unix tools to become a formatting expert.
Don’t let the syntax of the sed
command scare you off. sed
is a
powerful utility capable of handling most of your formatting needs. For
example, have you ever needed to add or remove comments from a source
file? Perhaps you need to shuffle some text from one section to
another.
In this hack, I’ll demonstrate how to do that. I’ll also show some
handy formatting tricks using two other built-in Unix commands, tr
and col
.
sed
allows you to specify an address range using a pattern,
so let’s put this to use. Suppose we want to comment out a block of
text in a source file by adding //
to the start of each line we wish to comment out. We might use a text
editor to mark the block with bc-start
and bc-end
:
% cat source.c
if (tTd(27, 1))
sm_dprintf("%s (%s, %s) aliased to %s
",
a->q_paddr, a->q_host, a->q_user, p);
bc-start
if (bitset(EF_VRFYONLY, e->e_flags))
{
a->q_state = QS_VERIFIED;
return;
}
bc-end
message("aliased to %s", shortenstring(p, MAXSHORTSTR));
and then apply a sed
script
such as:
% sed '/bc-start/,/bc-end/s/^////' source.c
to get:
if (tTd(27, 1)) sm_dprintf("%s (%s, %s) aliased to %s ", a->q_paddr, a->q_host, a->q_user, p); //bc-start // if (bitset(EF_VRFYONLY, e->e_flags)) // { // a->q_state = QS_VERIFIED; // return; // } //bc-end message("aliased to %s", shortenstring(p, MAXSHORTSTR));
The script used search and replace to add //
to the start of all lines (s/^////
) that lie between the two markers
(/bc-start/,/bc-end/
). This will
apply to every block in the file between the marker pairs. Note that
in the sed
script, the /
character has to be escaped as /
so it is not mistaken for a
delimiter.
When we need to delete the comments and the two bc-
lines (let’s assume that the edited
contents were copied back to source.c), we can use a script such
as:
% sed '/bc-start/d;/bc-end/d;/bc-start/,/bc-end/s/^////' source.c
Oops! My first attempt won’t work. The bc-
lines must be deleted
after they have been used as address ranges.
Trying again we get:
% sed '/bc-start/,/bc-end/s/^////;/bc-start/d;/bc-end/d' source.c
If you want to leave the two bc-
marker lines in but comment them out,
use this piece of trickery:
% sed '/bc-start/,/bc-end/{/^//bc-/!s/////;}' source.c
to get:
if (tTd(27, 1)) sm_dprintf("%s (%s, %s) aliased to %s ", a->q_paddr, a->q_host, a->q_user, p); //bc-start if (bitset(EF_VRFYONLY, e->e_flags)) { a->q_state = QS_VERIFIED; return; } //bc-end message("aliased to %s", shortenstring(p, MAXSHORTSTR));
Note that in the bash
shell
you must use:
% sed '/bc-start/,/bc-end/{/^//bc-/!s/////;}' source.c
because the bang character (!
) does not need to be escaped as it does in
tcsh
.
What’s with the curly braces? They prevent a common mistake. You may imagine that this example:
% sed -n '/$USER/p;p' *
prints each line containing $USER
twice because of the p;p
commands. It doesn’t, though, because
the second p
is not restrained by
the /$USER/
line address and
therefore applies to every line. To print twice
just those lines containing $USER
,
use:
% sed -n '/$USER/p;/$USER/p' *
or:
% sed -n '/$USER/{p;p;}' *
The construct {...}
introduces a function list that applies to the preceding line address
or range.
A line address followed by !
(or !
in the tcsh
shell) reverses the address range, and
so the function (list) that follows is applied to all lines
not matching. The net effect is to remove
//
from all lines that don’t start
with //bc-
but that do lie within
the bc-
markers.
sed
reads input into the pattern space, but it also provides
a buffer (called the holding space) and
functions to move text from one space to the other. All other
functions (such as s
and d
) operate on the pattern space, not the holding space.
Check out this sed
script:
% cat case.script
# Sed script for case insensitive search
#
# copy pattern space to hold space to preserve it
h
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
# use a regular expression address to search for lines containing:
/test/ {
i
vvvv
a
^^^^
}
# restore the original pattern space from the hold space
x;p
First, I have written the script to a file instead of typing it
in on the command line. Lines starting with #
are comments and are ignored. Other lines
specify a sed
command, and commands
are separated by either a newline or ;
character. sed
reads one line of input at a time and
applies the whole script file to each line. The following functions
are applied to each line as it is read:
h
Copies the pattern space (the line just read) into the holding space.
y/ABC/abc/
Operates on the pattern space, translating A
to a
, B
to b
, and C
to c
and so on, ensuring the line is all
lowercase.
/test/ {...}
Matches the line just read if it includes the text
test
(whatever the original
case, because the line is now all lowercase) and then applies
the list of functions that follow. This example appends text
before (i
) and after
(a
) the matched line to
highlight it.
x
Exchanges the pattern and hold space, thus restoring the original contents of the pattern space.
p
Prints the pattern space.
Here is the test file:
% cat case
This contains text Hello
that we want to TeSt
search for, but in test
a case insensitive XXXX
manner using the sed TEST
editor. Bye bye.
%
Here are the results of running our sed
script on it:
% sed -n -f case.script case
This contains text Hello
vvvv
that we want to TeSt
^^^^
vvvv
search for, but in test
^^^^
a case insensitive XXXX
vvvv
manner using the sed TEST
^^^^
editor. Bye bye.
Notice the vvv ^^^
markers
around lines that contain test
.
The tr
command can translate one character to another. To
change the contents of case into
all lowercase and write the results to file lower-case, we could use:
%tr 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 'abcdefghijklmnopqrstuvwxyz'
< case > lower-case
tr
works with standard input
and output only, so to read and write files we must use
redirection.
To translate carriage return characters into newline characters, we could use:
%tr \r \n <
cr>
lf
where cr
is the original file and
lf
is a new file containing line feeds in
place of carriage returns.
represents a line feed character, but we must escape the backslash
character in the shell, so we use \n
instead. Similarly, a carriage return is
specified as \r
.
tr
can also squeeze multiple consecutive occurrences of a
particular character into a single occurrence. For example, to remove
duplicate line feeds from the lines file:
% tr -s \n < lines > tmp ; mv tmp lines
Here we use the tmp file
trick again because tr
, like
grep
and sed
, will trash the input file if it is also
the output file.
tr
can also delete selected
characters. If for instance if you hate vowels, run your documents
through this:
% tr -d aeiou < file
To translate tabs into multiple spaces, use the -x
flag:
%cat tabs
col col col %od -x tabs
0000000 636f 6c09 636f 6c09 636f 6c0a 0a00 0000015 %col -x < tabs > spaces
%cat spaces
col col col %od -h spaces
0000000 636f 6c20 2020 2020 636f 6c20 2020 2020 0000020 636f 6c0a 0a00 0000025
In this example I have used od
-x
to octal dump in hexadecimal the contents of the before
and after files, which shows more clearly that the translation has
worked. (09
is the code for Tab and
20
is the code for Space.)
Deal with double quotation marks in delimited files.
Importing data from a delimited text file into an application is usually painless. Even if you need to change the delimiter from one character to another (from a comma to a colon, for example), you can choose from many tools that perform simple character substitution with great ease.
However, one common situation is not solved as easily: many business applications export data into a space- or comma-delimited file, enclosing individual fields in double quotation marks. These fields often contain the delimiter character. Importing such a file into an application that processes only one delimiter (PostgreSQL for example) may result in an incorrect interpretation of the data. This is one of those situations where the user should feel lucky if the process fails.
One solution is to write a script that tracks the use of double quotes to determine whether it is working within a text field. This is doable by creating a variable that acts as a text/nontext switch for the character substitution process. The script should change the delimiter to a more appropriate character, leave the delimiters that were enclosed in double quotes unchanged, and remove the double quotes. Rather than make the changes to the original datafile, it’s safer to write the edited data to a new file.
The following algorithm meets our needs:
Create the switch variable and assign it the value of
1
, meaning “nontext”. We’ll
declare the variable tswitch
and define it as tswitch =
1
.
Create a variable for the delimiter and define it. We’ll use
the variable delim
with a space
as the delimiter, so delim = '
‘.
Decide on a better delimiter. We’ll use the tab character,
so new_delim = '
‘.
Open the datafile for reading.
Open a new file for writing.
Now, for every character in the datafile:
Read a character from the datafile.
If the character is a double quotation mark, tswitch = tswitch * -1
.
If the character equals the character in delim
and tswitch
equals 1, write new_delim
to the new file.
If the character equals that in delim
and tswitch
equals -1, write the value of
delim
to the new file.
If the character is anything else, write the character to the new file.
The Python script redelim.py implements the preceding
algorithm. It prompts the user for the original datafile and a name
for the new datafile. The delim
and
new_delim
variables are hardcoded,
but those are easily changed within the script.
This script copies a space-delimited text file with text values in double quotes to a new, tab-delimited file without the double quotes. The advantage of using this script is that it leaves spaces that were within double quotes unchanged.
There are no command-line arguments for this script. The script will prompt the user for source and destination file information.
You can redefine the variables for the original and new
delimiters, delim
and new_delim
, in the script as needed.
#!/usr/local/bin/python import os print """ Change text file delimiters. # Ask user for source and target files. sourcefile = raw_input('Please enter the path and name of the source file:') targetfile = raw_input('Please enter the path and name of the target file:') # Open files for reading and writing. source = open(sourcefile,'r') dest = open(targetfile,'w') # The variable 'tswitch' acts as a text/non-text switch that reminds python # whether it is working within a text or non-text data field. tswitch = 1 # If the source delimiter that you want to change is not a space, # redefine the variable delim in the next line. delim = ' ' # If the new delimiter that you want to change is not a tab, # redefine the variable new_delim in the next line. new_delim = ' ' for charn in source.read( ): if tswitch == 1: if charn == delim: dest.write(new_delim) elif charn == '"': tswitch = tswitch * -1 else: dest.write(charn) elif tswitch == -1: if charn == '"': tswitch = tswitch * -1 else: dest.write(charn) source.close( ) dest.close( )
Use of redelim.py assumes that you have installed Python, which is available through the ports collection or as a binary package. The Python module used in this code is installed by default.
If you prefer working with Perl, DBD::AnyData is another good solution to this problem.
The home page (http://www.python.org/)
Bring simplicity back to using floppies.
If you’re like many Unix users, you originally came from a Windows
background. Remember your initial shock the first time you tried to use
a floppy on a Unix system? Didn’t Windows seem so much simpler? Forever
gone seemed the days when you could simply insert a floppy, copy some
files over, and remove the disk from the drive. Instead, you were
expected to plunge into the intricacies of the mount
command, only to discover that you
didn’t even have the right to use the floppy drive in the first
place!
There are several ways to make using floppies much, much easier on your FreeBSD system. Let’s start by taking stock of the default mechanisms for managing floppies.
Suppose I have formatted a floppy on a Windows system, copied some files over, and now want to transfer those files to my FreeBSD system. In reality, that floppy is a storage media. Since it is storing files, it needs a filesystem in order to keep track of the locations of those files. Because that floppy was formatted on a Windows system, it uses a filesystem called FAT12.
In Unix, a filesystem can’t be accessed until it has been
mounted. This means you have to use the mount
command before you can access the
contents of that floppy. While this may seem strange at first, it
actually gives Unix more flexibility. An administrator can mount and
unmount filesystems as they are needed. Note that I used the word
administrator. Regular users don’t have this
ability, by default. We’ll change that shortly.
Unix also has the additional flexibility of being able to
mount
different filesystems. In
Windows, a floppy will always contain the FAT12 filesystem. BSD understands floppies formatted
with either FAT12 or UFS, the Unix File System. As you might expect from the
name, the UFS filesystem is assumed unless you specify
otherwise.
For now, become the superuser and let’s pick apart the default
invocation of the mount
command:
%su
Password: #mount -t msdos /dev/fd0 /mnt
#
I used the type (-t
) switch
to indicate that this floppy was formatted from an msdos
-based system. I could have used the
mount_msdosfs
command instead:
# mount_msdosfs /dev/fd0 /mnt
Both commands take two arguments. The first indicates the device
to be mounted. /dev/fd0
represents
the first (0
) floppy drive
(fd
) device (/dev
).
The second argument represents the mount point. A mount point is simply an empty directory that acts as a pointer to the mounted filesystem. Your FreeBSD system comes with a default mount point called /mnt. If you prefer, create a different mount point with a more useful name. Just remember to keep that directory empty so it will be available as a mount point, because any files in your mount point will become hidden and inaccessible when you mount a device over it.
This can be a feature in itself if you have a filesystem that should always be mounted. Place a README file in /mnt/important_directory containing: “If you can see this file, contact the administrator at this number . . . .”
In this example, I’ll create a mount point called /floppy, which I’ll use in the rest of the examples in this hack:
# mkdir /floppy
This is a good place to explain some common error messages.
Trust me, I experienced them all before I became proficient at this
whole mount
business. At the time,
I wished for a listing of error messages so I could figure out what I
had done wrong and how to fix it.
Let’s take a look at the output of this command:
# mount /dev/fd0 /mnt
mount: /dev/fd0 on /mnt: incorrect super block
Remember my first mount
command? I know it worked, as I just received my prompt back. I know
this command didn’t work, because mount
instead wrote me a message explaining
why it did not do what I asked.
That error message isn’t actually as bad as it sounds. I forgot
to include the type switch, meaning mount
assumed I was using UFS. Since this is
a FAT12 floppy, it simply didn’t understand the filesystem.
This error message also looks particularly nasty:
fd0: hard error cmd=read fsbn 0 of 0-3 (No status) msdosfs: /dev/fd0: Input/output error
If you get that one, quickly reach down and push in the floppy before anyone else notices. You forgot to insert it into the bay.
Here’s another error message:
msdosfs: /dev/fd0: Operation not permitted
Oops. Looks like I didn’t become the superuser before trying
that mount
command.
How about this one:
mount: /floppy: No such file or directory
Looks like I forgot to make that mount point first. A mkdir /floppy
should fix that one.
The one error message you do not want to see is a system panic followed by a reboot. It took me a while to break myself of the habit of just ejecting a floppy once I had copied over the files I wanted. That’s something you just don’t do in Unix land.
You must first warn your operating system that you have finished using a filesystem before you physically remove it from the computer. Otherwise, when it goes out looking for a file, it will panic when it realizes that it has just disappeared off of the edge of the universe! (Well, the computer’s universe anyway.) Put yourself in your operating system’s shoes for a minute. The user entrusted something important to your care. You blinked for just a split second and it was gone, nowhere to be found. You’d panic too!
How do you warn your operating system that the universe has
shrunk? You unmount the floppy before you eject it from the floppy
bay. Note that the actual command used is missing the first n
and is instead spelled umount
:
# umount /floppy
Also, the only argument is the name of your mount point. In this example, it’s /floppy.
How can you tell if a floppy is mounted? The disk free command will tell you:
# df
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/ad0s1a 257838 69838 167374 29% /
devfs 1 1 0 100% /dev
/dev/ad0s1e 257838 616 236596 0% /tmp
/dev/ad0s1f 13360662 2882504 9409306 23% /usr
/dev/ad0s1d 257838 28368 208844 12% /var
/dev/fd0 1424 1 1423 0% /floppy
as will the mount
command
with no arguments:
# mount
/dev/ad0s1a on / (ufs, local)
devfs on /dev (devfs, local)
/dev/ad0s1e on /tmp (ufs, local, soft-updates)
/dev/ad0s1f on /usr (ufs, local, soft-updates)
/dev/ad0s1d on /var (ufs, local, soft-updates)
/dev/fd0 on /floppy (msdosfs, local)
This system currently has a floppy /dev/fd0 mounted on /floppy, meaning you’ll need to issue the
umount
command before ejecting the
floppy.
Several other filesystems are also mounted, yet I only used the
mount
command on my floppy drive.
When did they get mounted and how? The answer is in /etc/fstab , which controls which filesystems to mount at boot
time. Here’s my /etc/fstab; it’s
pretty similar to the earlier output from df
:
# more /etc/fstab
# Device Mountpoint FStype Options Dump Pass#
/dev/ad0s1b none swap sw 0 0
/dev/ad0s1a / ufs rw 1 1
/dev/ad0s1e /tmp ufs rw 2 2
/dev/ad0s1f /usr ufs rw 2 2
/dev/ad0s1d /var ufs rw 2 2
/dev/acd0 /cdrom cd9660 ro,noauto 0 0
proc /proc procfs rw 0 0
linproc /compat/linux/proc linprocfs rw 0 0
Each mountable filesystem has its own line in this file. Each
has its own unique mount point and its filesystem type listed. See how
the /cdrom mount point has the
options ro,noauto
instead of
rw
? The noauto
tells your system not to mount your
CD-ROM at bootup. That is a good thing—if there’s no CD in the bay at
boot time, the kernel will either give an error message or pause for a
few seconds, looking for that filesystem.
However, you can mount a data CD-ROM at any time by simply typing:
# mount /cdrom
That command was shorter than the usual mount
command for one reason: there was an
entry for /cdrom in /etc/fstab. That means you can shorten the
command to mount a floppy by creating a similar entry for /floppy. Simply add this line to /etc/fstab:
/dev/fd0 /floppy msdos rw,noauto 0 0
Test your change by inserting a floppy and issuing this command:
# mount /floppy
If you receive an error, check /etc/fstab for a typo and try again.
Now that the superuser can quickly mount floppies, let’s
give regular users this ability. First, we have to change the default
setting of the vfs.usermount
variable:
# sysctl vfs.usermount=1
vfs.usermount: 0 -> 1
By changing the default 0
to
a 1
, we’ve just enabled users to
mount virtual filesystems. However, don’t worry about your users
running amok with this new freedom—the devices themselves are still
owned by root. Check out the permissions on the floppy device:
# ls -l /dev/fd0
crw-r----- 1 root operator 9, 0 Nov 28 08:31 /dev/fd0
If you’d like any user to have the right to mount a floppy, change the permissions so everyone has read and write access:
# chmod 666 /dev/fd0
Now, if you don’t want every user to have this right, you could create a group, add the desired users to that group, and assign that group permissions to /dev/fd0.
You’re almost there. The only kicker is that the user has to own the mount point. The best place to put a user’s mount point is in his home directory. So, logged in as your usual user account:
% mkdir ~/floppy
Now, do you think the mount
command will recognize that new mount point?
% mount ~/floppy
mount: /home/dru/floppy: unknown special file or file system
Oh boy. Looks like we’re back to square one, doesn’t it?
Remember, that entry in /etc/fstab only refers to root’s mount
point, so I can’t use that shortcut to refer to my own mount point.
While it’s great to have the ability to use the mount
command, I’m truly too lazy to have to
type out mount -t
msdos /dev/fd0 ~/floppy
, let alone remember
it.
Thank goodness for aliases. Try adding these lines to the alias section of your ~.cshrc file:
alias mf mount -t msdos /dev/fd0 ~/floppy alias uf umount ~/floppy
Now you simply need to type mf
whenever you want to mount a floppy and
uf
when it’s time to unmount the
floppy. Or perhaps
you’ll prefer to create a keyboard shortcut [Hack
#4] .
Now that you can mount and unmount floppies with the best of them, it’s time to learn how to format them. Again, let’s start with the default invocations required to format a floppy, then move on to some ways to simplify the process.
When you format a floppy on a Windows or DOS system, several events occur:
The floppy is low-level formatted, marking the tracks and sectors onto the disk.
A filesystem is installed onto the floppy, along with two copies of its FAT table.
You are given the opportunity to give the floppy a volume label.
The same process also has to occur when you format a floppy on a FreeBSD system. On a 5.x system, the order goes like this:
%fdformat -f 1440 /dev/fd0
Format 1440K floppy `/dev/fd0'? (y/n):y
Processing ---------------------------------------- %bsdlabel -w /dev/fd0 fd1440
%newfs_msdos /dev/fd0
/dev/fd0: 2840 sectors in 355 FAT12 clusters (4096 bytes/cluster) bps=512 spc=8 res=1 nft=2 rde=512 sec=2880 mid=0xf0 spf=2 spt=18 hds=2 hid=0
First, notice that we don’t use the mount
command. You can’t mount
a filesystem before you have a
filesystem! (You do have to have the floppy in the drive, though.)
Take a look at the three steps:
If I see the following error message when I try to mount
the floppy, I’ll realize that I forgot
that third step:
% mf
msdosfs: /dev/fd0: Invalid argument
Because my mf
mount floppy
alias uses the msdos
filesystem, it
will complain if the floppy isn’t formatted with FAT12.
Any three-step process is just begging to be put into a
shell script. I like to keep these scripts under ~/bin. If you don’t have this directory
yet, create it. Then create a script called ff
(for format
floppy):
%cd
%mkdir bin
%cd bin
%vi ff
#!/bin/sh #this script formats a floppy with FAT12 #that floppy can also be used on a Windows system # first, remind the user to insert the floppy echo "Please insert the floppy and press enter" read pathname # then, proceed with the three format steps fdformat -f 1440 /dev/fd0 bsdlabel -w /dev/fd0 fd1440 newfs_msdos /dev/fd0 echo "Format complete."
Note that this script is basically those three commands, with
comments thrown in so I remember what the script does. The only new
part is the read
pathname
line. I added it to force the user
to press Enter before the script proceeds.
Remember to make the script executable:
% chmod +x ff
I’ll then return to my home directory and see how it works.
Since I use the C shell, I’ll use the rehash
command to
make the shell aware that there is a new executable in my path:
%cd
%rehash
%ff
Please insert the floppy and press enter Format 1440K floppy `/dev/fd0'? (y/n):y
Processing ---------------------------------------- /dev/fd0: 2840 sectors in 355 FAT12 clusters (4096 bytes/cluster) bps=512 spc=8 res=1 nft=2 rde=512 sec=2880 mid=0xf0 spf=2 spt=18 hds=2 hid=0 Format complete.
Not too bad. I can now manipulate floppies with my own custom
mf
, uf
, and ff
commands.
man fstab
man fdformat
man bsdlabel
man newfs
The Creating and Using Floppies section of the FreeBSD Handbook (http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/floppies.html)
The Mounting and Unmounting File Systems section of the FreeBSD Handbook (http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/mount-unmount.html)
Share files between Windows and FreeBSD with a minimum of fuss.
You’ve probably heard of some of the Unix utilities available for
accessing files residing on Microsoft systems. For example, FreeBSD
provides the mount_smbfs
and
smbutil
utilities to
mount Windows shares and view or access resources on a Microsoft
network. However, both of those utilities have a caveat: they require an
SMB server. The assumption is that somewhere in
your network there is at least one NT or 2000 Server.
Not all networks have the budget or the administrative expertise to allow for commercial server operating systems. Sure, you can install and configure Samba, but isn’t that overkill for, say, a home or very small office network? Sometimes you just want to share some files between a Windows 9x system and a Unix system. It’s a matter of using the right-sized tool for the job. You don’t bring in a backhoe to plant flowers in a window box.
If your small network contains a mix of Microsoft and Unix clients, consider installing Sharity-Light on the Unix systems. This application allows you to mount a Windows share from a Unix system. FreeBSD provides a port for this purpose (see the Sharity-Light web site for other supported platforms):
#cd /usr/ports/net/sharity-light
#make install clean
Since Sharity-Light is a command-line utility, you should be
familiar with UNC or the Universal Naming Convention. UNC is how you
refer to Microsoft shared resources from the command line. A UNC looks
like \
NetBIOSname
sharename
. It
starts with double backslashes, then contains the NetBIOS name of the computer to access and the name of
the share on that computer.
Before using Sharity-Light, you need to know the NetBIOS names
of the computers you wish to access. If you have multiple machines
running Microsoft operating systems, the quickest way to view each
system’s name is with nbtstat
. From
one of the Windows systems, open a command prompt and type:
C:> nbtstat -A 192.168.2.10
NETBIOS Remote Machine Name Table
Name Type Status
-----------------------------------------
LITTLE_WOLF <00> UNIQUE Registered
<snip>
Repeat for each IP address in your network. Your output will be
several lines long, but the entry (usually the first) containing
<00>
is the one with the name
you’re interested in. In this example, LITTLE_WOLF
is the NetBIOS name associated
with 192.168.2.10.
Even though nbtstat /?
indicates that -A
is used to view
a remote system, it also works with the IP
address of the local system. This allows you to check all of the IP
addresses in your network from the same system.
Once you know which IP addresses are associated with which NetBIOS names, you’ll need to add that information to /etc/hosts on your Unix systems:
# more /etc/hosts
127.0.0.1 localhost
192.168.2.95 genisis #this system
192.168.2.10 little_wolf #98 system sharing cygwin2
You’ll also need to know the names of the shares you wish to access. Again, from a Microsoft command prompt, repeat this command for each NetBIOS name and make note of your results:
C:> net view \little_wolf
Shared resources at \LITTLE_WOLF
Sharename Type Comment
---------------------------------------
CYGWIN2 Disk
The command was completed successfully.
Here the computer known as LITTLE_WOLF
has only one share, the
CYGWIN2 directory.
Finally, you’ll need a mount point on your Unix system, so you might as well give it a useful name. Since the typical floppy mount point is /floppy and the typical CD mount point is /cdrom, let’s use /windows:
# mkdir /windows
Once you know the names of your computers and shares, using Sharity-Light is very easy. As the superuser, mount the desired share:
# shlight //little_wolf/cygwin2 /windows
Password:
Using port 49923 for NFS.
Watch your slashes. Microsoft uses the backslash () at the command line, whereas Unix and
Sharity-Light use the forward slash (
/
).
Note that I was prompted for a password because Windows 9x and ME users have the option of password protecting their shares. This particular share did not have a password, so I simply pressed Enter.
Adding -n
to the previous
command will forego the password prompt. Type shlight -h
to see all available
options.
However, if the share is on a Windows NT Workstation, 2000 Pro, or XP system, you must provide a username and password valid on that system. The syntax is:
#shlight //2000pro/cdrom /windows -U
username-P
password
Once the share is mounted, it works like any other mount point. Depending on the permissions set on the share, you should be able to browse that shared directory, copy over or add files, and modify files. When you’re finished using the share, unmount it:
$ unshlight /windows
The Sharity-Light README and FAQ (/usr/local/share/doc/Sharity-Light/)
The Sharity-Light web site (http://www.obdev.at/products/sharity-light/index.html)
The Samba web site (http://www.samba.org/)
Fortunately, you no longer have to be a script guru or a find wizard just to keep up with what is happening on your disks.
Think for a moment. What types of files are you always chasing
after so they don’t waste resources? Your list probably includes temp
files, core files, and old logs that have already been archived. Did you
know that your system already contains scripts capable of cleaning out
those files? Yes, I’m talking about your periodic
scripts.
You’ll find these scripts in the following directory on a FreeBSD system:
% ls /etc/periodic/daily | grep clean
100.clean-disks
110.clean-tmps
120.clean-preserve
130.clean-msgs
140.clean-rwho
150.clean-hoststat
Are you using these scripts? To find out, look at your /etc/periodic.conf file. What, you don’t have one? That means you’ve never tweaked your default configurations. If that’s the case, copy over the sample file and take a look at what’s available:
#cp /etc/defaults/periodic.conf /etc/periodic.conf
#more /etc/periodic.conf
Let’s start with daily_clean_disks
.
This script is ideal for finding and deleting files with certain file extensions. You’ll find it
about two pages into periodic.conf, in the Daily options
section, where you may note
that it’s not enabled by default. Fortunately, configuring it is a
heck of a lot easier than using cron
to schedule a complex find
statement.
Before you enable any script, test it first, especially if it’ll delete files based on pattern-matching rules. Back up your system first!
For example, suppose you want to delete old logs with the
.bz2 extension. If you’re not
careful when you craft your daily_clean_disks_files
line, you may
end up inadvertently deleting all files with
that extension. Any user who has just compressed some important
data will be very miffed when she finds that her data has
mysteriously disappeared.
Let’s test this scenario. I’d like to prune all .core files and any logs older than .0.bz2. I’ll edit that section of /etc/periodic.conf like so:
# 100.clean-disks daily_clean_disks_enable="YES" # Delete files daily daily_clean_disks_files="*.[1-9].bz2 *.core" # delete old logs, cores daily_clean_disks_days=1 # on a daily basis daily_clean_disks_verbose="YES" # Mention files deleted
Notice my pattern-matching expression for the .bz2 files. My expression matches any
filename (*
) followed by a dot
and a number from one to nine (.[1-9]
), followed by another dot and the
.bz2 extension.
Now I’ll verify that my system has been backed up, and then manually run that script. As this script is fairly resource-intensive, I’ll do this test when the system is under a light load:
# /etc/periodic/daily/100.clean-disks
Cleaning disks:
/usr/ports/distfiles/MPlayer-0.92.tar.bz2
/usr/ports/distfiles/gnome2/libxml2-2.6.2.tar.bz2
/usr/ports/distfiles/gnome2/libxslt-1.1.0.tar.bz2
Darn. Looks like I inadvertently nuked some of my distfiles. I’d better be a bit more explicit in my matching pattern. I’ll try this instead:
# delete old logs, cores
daily_clean_disks_files="messages.[1-9].bz2 *.core"
# /etc/periodic/daily/100.clean-disks
Cleaning disks:
/var/log/messages.1.bz2
/var/log/messages.2.bz2
/var/log/messages.3.bz2
/var/log/messages.4.bz2
That’s a bit better. It didn’t delete /var/log/messages or /var/log/messages.0.bz2, which I like to
keep on disk. Remember, always test your pattern matching
before scheduling a deletion script. If you
keep the verbose
line at YES
, the script will report the names of
files it deletes.
The other cleaning scripts are quite straightforward to
configure. Take daily_clean_tmps
,
for example:
# 110.clean-tmps daily_clean_tmps_enable="NO" # Delete stuff daily daily_clean_tmps_dirs="/tmp" # Delete under here daily_clean_tmps_days="3" # If not accessed for daily_clean_tmps_ignore=".X*-lock quota.user quota.group" # Don't delete # these daily_clean_tmps_verbose="YES" # Mention files deleted
This is a quick way to clean out any temporary directories. Again, you get to choose the locations of those directories. Here is a quick way to find out which directories named tmp are on your system:
# find / -type d -name tmp
/tmp
/usr/tmp
/var/spool/cups/tmp
/var/tmp
That command asks find
to
start at root (/
) and look for
any directories (-type d
) named
tmp (-name tmp
). If I wanted to clean those
daily, I’d configure that section like so:
# 110.clean-tmps # Delete stuff daily daily_clean_tmps_enable="YES" daily_clean_tmps_dirs="/tmp /usr/tmp /var/spool/cups/tmp /var/tmp" # If not accessed for daily_clean_tmps_days="1" # Don't delete these daily_clean_tmps_ignore=".X*-lock quota.user quota.group" # Mention files deleted daily_clean_tmps_verbose="YES"
Again, I immediately test that script after saving my changes:
# /etc/periodic/daily/110.clean-tmps
Removing old temporary files:
/var/tmp/gconfd-root
This script will not delete any locked files or temporary files currently in use. This is an excellent feature and yet another reason to run this script on a daily basis, preferably at a time when few users are on the system.
Moving on, the next script is daily_clean_preserve
:
# 120.clean-preserve daily_clean_preserve_enable="YES" # Delete files daily daily_clean_preserve_days=7 # If not modified for daily_clean_preserve_verbose="YES" # Mention files deleted
What exactly is preserve
?
The answer is in man hier
. Use
the manpage search function (the /
key) to search for the word preserve
:
#man hier
/preserve
preserve/ temporary home of files preserved after an accidental death of an editor; see ex(1)
Now that you know what the script does, see if the default settings are suited for your environment. This script is run daily, but keeps preserved files until they are seven days old.
The last three clean scripts deal with cleaning out old files
from msgs
, rwho
and sendmail
’s hoststat cache. See man periodic.conf
for more details.
Incidentally, you don’t have to wait until it is time for
periodic
to do its thing; you can
manually run any periodic script at any time. You’ll find them all
in subdirectories of /etc/periodic/.
Instead of waiting for a daily process to clean up any spills, you can tweak several knobs to prevent these files from being created in the first place. For example, the C shell itself provides limits, any of which are excellent candidates for a customized dot.cshrc file [Hack #9] .
To see the possible limits and their current values:
% limit
cputime unlimited
filesize unlimited
datasize 524288 kbytes
stacksize 65536 kbytes
coredumpsize unlimited
memoryuse unlimited
vmemoryuse unlimited
descriptors 4557
memorylocked unlimited
maxproc 2278
sbsize unlimited
You can test a limit by typing it at the command line; it will remain for the duration of your current shell. If you like the limit, make it permanent by adding it to .cshrc. For example:
%limit filesize 2k
%limit | grep filesize
filesize 2 kbytes
will set the maximum file size that can be created to 2 KB. The
limit
command supports both
k
for kilobytes and m
for megabytes. Do note that this limit
does not affect the total size of the area available to store files,
just the size of a newly created file. See the Quotas section of the
FreeBSD Handbook if you intend to limit disk space usage.
Having created a file limit, you’ll occasionally want to exceed it. For example, consider decompressing a file:
%uncompress largefile.Z
Filesize limit exceeded %unlimit filesize
%uncompress largefile.Z
%
The unlimit
command will
allow me to override the file-size limit temporarily (for the duration
of this shell). If you really do want to force your users to stick to
limits, read man limits
.
Now back to shell limits. If you don’t know what a core file is, you probably don’t need to collect them.
Sure, periodic
can clean those
files out for you, but why make them in the first place? Core files
are large. You can limit their size with:
limit coredumpsize 1m
That command will limit a core file to 1 MB, or 1024 KB. To
prevent core files completely, set the size to 0
:
limit coredumpsize 0
If you’re interested in the rest of the built-in limits, you’ll
find them in man
tcsh
. Searching for coredumpsize
will take you to the right
spot.
The preceding discussion is based on FreeBSD. Other BSD systems ship with similar scripts that do identical tasks, but they are kept in a single file instead of in a separate directory.
For daily, weekly, and monthly tasks, NetBSD uses the
/etc/daily, /etc/weekly, and /etc/monthly scripts, whose behavior is
controlled with the /etc/daily.conf, /etc/weekly.conf, and /etc/monthly.conf configuration files.
For more information about them, read man
daily.conf
, man
weekly.conf
, and man
monthly.conf
.
man periodic.conf
man limits
man tcsh
The Quotas section of the FreeBSD Handbook (http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/quotas.html)
Add more temporary or swap space without repartitioning.
When you install any operating system, it’s important to allocate sufficient disk space to hold temporary and swap files. Ideally, you already know the optimum sizes for your system so you can partition your disk accordingly during the install. However, if your needs change or you wish to optimize your initial choices, your solution doesn’t have to be as drastic as a repartition—and reinstall—of the system.
man tuning
has some practical
advice for guesstimating the appropriate size of swap and your other
partitions.
Unless you specifically chose otherwise when you partitioned your disk, the installer created a /tmp filesystem for you:
%grep tmp /etc/fstab
/dev/ad0s1e /tmp ufs rw 2 2 %df -h /tmp
Filesystem Size Used Avail Capacity Mounted on /dev/ad0s1e 252M 614K 231M 0% /tmp
Here I searched /etc/fstab for the /tmp filesystem. This particular filesystem is 256 MB in size. Only a small portion contains temporary files.
The df
(disk free)
command will always show you a number lower than the
actual partition size. This is because eight percent of the
filesystem is reserved to prevent users from inadvertently
overflowing a filesystem. See man
tunefs
for details.
It’s always a good idea to clean out /tmp periodically so it doesn’t overflow with temporary files. Consider taking advantage of the built-in periodic script /etc/periodic/daily/110.clean-tmps [Hack #20] .
You can also clean out /tmp when the system reboots by adding this line to /etc/rc.conf:
clear_tmp_enable="YES"
Another option is to move /tmp off of your hard disk and into RAM. This has the built-in advantage of automatically clearing the filesystem when you reboot, since the contents of RAM are volatile. It also offers a performance boost, since RAM access time is much faster than disk access time.
Before moving /tmp, ensure you have enough RAM to support your desired /tmp size. This command will show the amount of installed RAM:
% dmesg | grep memory
real memory = 335462400 (319 MB)
avail memory = 320864256 (306 MB)
Also check that your kernel configuration file contains device md
(or memory disk). The GENERIC
kernel does; if you’ve customized
your kernel, double-check that you still have md
support:
% grep -w md /usr/src/sys/i386/conf/CUSTOM
device md # Memory "disks"
Changing the /tmp line in /etc/fstab as follows will mount a 64 MB /tmp in RAM:
md /tmp mfs rw,-s64m 0 0
Next, unmount /tmp (which is currently mounted on your hard drive) and remount it using the new entry in /etc/fstab:
#umount /tmp
#mount /tmp
#df -h /tmp
Filesystem Size Used Avail Capacity Mounted on /dev/md0 63M 8.0K 58M 0% /tmp
Notice that the filesystem is now md0
, the first memory disk, instead of
ad0s1e
, a partition on the first
IDE hard drive.
Swap is different than /tmp. It’s not a storage area for temporary files; instead, it is an area where the filesystem swaps data between RAM and disk. A sufficient swap size can greatly increase the performance of your filesystem. Also, if your system contains multiple drives, this swapping process will be much more efficient if each drive has its own swap partition.
The initial install created a swap filesystem for you:
%grep swap /etc/fstab
/dev/ad0s1b none swap sw 0 0 %swapinfo
Device 1K-blocks Used Avail Capacity Type /dev/ad0s1b 639688 68 639620 0% Interleaved
Note that the swapinfo
command
displays the size of your swap files. If you prefer to see that output
in MB, try the swapctl
command with
the -lh
flags (which make the
listing more human):
% swapctl -lh
Device: 1048576-blocks Used:
/dev/ad0s1b 624 0
To add a swap area, first determine which area of disk space to
use. For example, you may want to place a 128 MB swapfile on /usr. You’ll first need to use dd
to create this as a file full of null (or
zero) bytes. Here I’ll create a 128 MB swapfile as /usr/swap0:
# dd if=/dev/zero of=/usr/swap0 bs=1024k count=128
128+0 records in
128+0 records out
134217728 bytes transferred in 4.405036 secs (30469156 bytes/sec)
Next, change the permissions on this file. Remember, you don’t want users storing data here; this file is for the filesystem:
# chmod 600 /usr/swap0
Since this is really a file on an existing filesystem, you can’t
mount
your swapfile in /etc/fstab. However, you can tell the
system to find it at boot time by adding this line to /etc/rc.conf:
swapfile="/usr/swap0"
To start using the swapfile now without having to reboot the
system, use mdconfig
:
# mdconfig -a -t vnode -f /usr/swap0 -u 1 && swapon /dev/md1
The -a
flag attaches the
memory disk. -t vnode
marks that
the type of swap is a file, not a filesystem. The -f
flag sets the name of that file:
/usr/swap0.
The unit number -u 1
must
match the name of the memory disk /dev/md1. Since this system already has
/tmp mounted on /dev/md0, I chose to mount swap on
/dev/md1. && swapon
tells the system to enable
that swap device, but only if the mdconfig
command succeeded.
swapctl
should now show the
new swap partition:
% swapctl -lh
Device: 1048576-blocks Used:
/dev/ad0s1b 624 0
/dev/md1 128 0
Whenever you make changes to swap or are considering increasing
swap, use systat
to monitor
how your swapfiles are being used in real time:
% systat -swap
The output will show the names of your swap areas and how much of each is currently in use. It will also include a visual indicating what percentage of swap contains data.
You can make this hack work on OpenBSD, as long as you remember that the RAM disk
device is rd
and its configuration
tool is rdconfig
. Read the relevant
manpages, and you’ll be hacking away.
man tuning (
practical
advice on /tmp and
swap)
man md
man mdconfig
man swapinfo
man swapctl
man systat
The BSD Handbook entry on adding swap (http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/adding-swap-space.html)
Prevent or recover from rm disasters.
Someday the unthinkable may happen. You’re doing some routine
maintenance and are distracted by a phone call or perhaps another
employee’s question. A moment later, you’re faced with the awful
realization that your fingers typed either a rm *
or a rm -R
in the wrong place, and now a portion of
your system has evaporated into nothingness.
Painful thought, isn’t it? Let’s pause for a moment to catch our breath and examine a few ways to prevent such a scenario from happening in the first place.
Close your eyes and think back to when you were a fresh-faced
newbie and were introduced to the omnipotent rm
command. Return to the time when you
actually read man rm
and first
discovered the -i
switch. “What a
great idea,” you thought, “to be prompted for confirmation before
irretrievably deleting a file from disk.” However, you soon discovered
that this switch can be a royal PITA. Face it, it’s irritating to deal
with the constant question of whether you’re sure you want to remove a
file when you just issued the command to remove that file.
Fortunately, there is a way to request confirmation only when
you’re about to do something as rash as rm
*
. Simply make a file called -i. Well, actually, it’s not quite that
simple. Your shell will complain if you try this:
% touch -i
touch: illegal option -- i
usage: touch [-acfhm] [-r file] [-t [[CC]Y]MMDDhhmm[.SS]] file ...
You see, to your shell, -i looks like the -i
switch, which touch
doesn’t have. That’s actually part of
the magic. The reason why we want to make a file called -i in the first place is to fool your
shell: when you type rm *
, the
shell will expand *
into all of the
files in the directory. One of those files will be named -i, and, voila, you’ve just given the
interactive switch to rm
.
So, how do we get past the shell to make this file? Use this command instead:
% touch ./-i
The ./
acts as a sort of
separator instruction to the shell. To the left of the ./
go any options to the command touch
; in this case, there are none. To the
right of the ./
is the name of the
file to touch
in “this
directory.”
In order for this to be effective, you need to create a file
called -i in every directory that
you would like to protect from an inadvertent rm *
.
An alternative method is to take advantage of the rmstar
shell variable found in the tcsh
shell. This method will always prompt
for confirmation of a rm *
,
regardless of your current directory, as long as you always use
tcsh
. Since the default shell for
the superuser is tcsh
, add this
line to /root/.cshrc:
set rmstar
This is also a good line to add to /usr/share/skel/dot.cshrc [Hack #9] .
If you want to take advantage of the protection immediately, force the shell to reread its configuration file:
# source /root/.cshrc
Now you know how to protect yourself from rm *
. Unfortunately, neither method will
save you from a rm -R
. If you do
manage to blow away a portion of your directory structure, how do you
fix the mess with a minimum of fuss, fanfare, and years of teasing
from your coworkers? Sure, you can always restore from backup, but
that means filling in a form in triplicate, carrying it with you as
you walk to the other side of the building where backups are stored,
and sheepishly handing it over to the clerk in charge of tape
storage.
Fortunately for a hacker, there is always more than one way to skin a cat, or in this case, to save your skin. That directory structure had to be created in the first place, which means it can be recreated.
When you installed FreeBSD, it created a directory structure for
you. The utility responsible for this feat is called mtree
.
To see which directory structures were created with mtree
:
% ls /etc/mtree/
./ BSD.root.dist BSD.x11-4.dist
../ BSD.sendmail.dist BSD.x11.dist
BSD.include.dist BSD.usr.dist
BSD.local.dist BSD.var.dist
Each of these files is in ASCII text, meaning you can read, and more interestingly, edit their contents. If you’re a hacker, I know what you’re thinking. Yes, you can edit a file to remove the directories you don’t want and to add other directories that you do.
Let’s start with a simpler example. Say you’ve managed to blow away /var. To recreate it:
# mtree -deU -f /etc/mtree/BSD.var.dist -p /var
where:
-d
Ignores everything except directory files.
-e
Doesn’t complain if there are extra files.
-U
Recreates the original ownerships and permissions.
-f
/etc/mtree/BSD.var.dist
Specifies how to create the directory structure; this is an ASCII text file if you want to read up ahead of time on what exactly is going to happen.
-p /var
Specifies where to create the directory structure; if you don’t specify, it will be placed in the current directory.
When you run this command, the recreated files will be echoed to standard output so you can watch as they are created for you. A few seconds later, you can:
% ls /var
./ crash/ heimdal/ preserve/ yp/
../ cron/ lib/ run/
account/ db/ log/ rwho/
at/ empty/ mail/ spool/
backups/ games/ msgs/
That looks a lot better, but don’t breathe that sigh of relief
quite yet. You still have to recreate all of your log files. Yes,
/var/log is still glaringly
empty. Remember, mtree
creates a
directory structure, not all of the files within that directory
structure. If you have a directory structure containing thousands of
files, you’re better off grabbing your backup tape.
There is hope for /var/log, though. Rather than racking your brain for the names of all of the missing log files, do this instead:
% more /etc/newsyslog.conf
# configuration file for newsyslog
# $FreeBSD: src/etc/newsyslog.conf,v 1.42 2002/09/21 12:07:35 markm Exp $
#
# Note: some sites will want to select more restrictive protections than the
# defaults. In particular, it may be desirable to switch many of the 644
# entries to 640 or 600. For example, some sites will consider the
# contents of maillog, messages, and lpd-errs to be confidential. In the
# future, these defaults may change to more conservative ones.
#
# logfilename [owner:group] mode count size when [ZJB]
[/pid_file] [sig_num]
/var/log/cron 600 3 100 * J
/var/log/amd.log 644 7 100 * J
/var/log/auth.log 600 7 100 * J
/var/log/kerberos.log 600 7 100 * J
/var/log/lpd-errs 644 7 100 * J
/var/log/xferlog 600 7 100 * J
/var/log/maillog 640 7 * @T00 J
/var/log/sendmail.st 640 10 * 168 B
/var/log/messages 644 5 100 * J
/var/log/all.log 600 7 * @T00 J
/var/log/slip.log root:network 640 3 100 * J
/var/log/ppp.log root:network 640 3 100 * J
/var/log/security 600 10 100 * J
/var/log/wtmp 644 3 * @01T05 B
/var/log/daily.log 640 7 * @T00 J
/var/log/weekly.log 640 5 1 $W6D0 J
/var/log/monthly.log 640 12 * $M1D0 J
/var/log/console.log 600 5 100 * J
There you go, all of the default log names and their
permissions. Simply touch
the
required files and adjust their permissions accordingly with chmod
.
Let’s get a little fancier and hack the mtree
hack. If you want to be able to create
a homegrown directory structure, start by perusing the instructions in
/usr/src/etc/mtree/README.
The one rule to keep in mind is don’t use tabs. Instead, use four spaces for indentation. Here is a simple example:
% more MY.test.dist
#home grown test directory structure
/set type=dir uname=test gname=test mode=0755
.
test1
..
test2
subdir2a
..
subdir2b
..
subsubdir2c mode=01777
..
..
..
Note that you can specify different permissions on different parts of the directory structure.
Next, I’ll apply this file to my current directory:
# mtree -deU -f MY.test.dist
and check out the results:
#ls -F
test1/ test2/ #ls -F test1
# #ls -F test2
subdir2a/ subdir2b/ #ls -F test2/subdir2b
subsubdir2c/
As you can see, mtree
can be
a real timesaver if you need to create custom directory structures
when you do installations. Simply take a few moments to create a file
containing the directory structure and its permissions. You’ll gain
the added bonus of having a record of the required directory
structure.
man mtree
The Linux mtree
port (http://www.wie-auch-immer.de/mtree/)
Do you find yourself installing multiple systems, all containing the same operating system and applications? As an IT instructor, I’m constantly installing systems for my next class or trying to fix the ramifications of a misconfiguration from a previous class.
As any system administrator can attest to, ghosting or hard drive-cloning software can be a real godsend. Backups are one thing; they retain your data. However, an image is a true timesaver—it’s a copy of the operating system itself, along with any installed software and all of your configurations and customizations.
I haven’t always had the luxury of a commercial ghosting utility
at hand. As you can well imagine, I’ve tried every homegrown and open
source ghosting solution available. I started with various invocations
of dd
, gzip
, ssh
,
and dump
, but kept running across the
same fundamental problem: it was easy enough to create an image, but
inconvenient to deploy that image to a fresh hard drive. It was doable
in the labs that used removable drives, but, otherwise, I had to open up
a system, cable in the drive to be deployed, copy the image, and recable
the drive into its own system.
Forget the wear and tear on the equipment; that solution wasn’t
working out to be much of a timesaver! What I really needed was a
floppy that contained enough intelligence to go out on the
network and retrieve and restore an image. I tried several open source
applications and found that Ghost For Unix, g4u
,
best fit the bill.
You’re about two minutes away from creating a bootable g4u
floppy. Simply download g4u-1.12fs from http://theatomicmoose.ca/g4u/ and copy it to a
floppy:
# cat g4u-1.12fs > /dev/fd0
Your only other requirement is a system with a drive capable of
holding your images. It can be any operating system, as long as it has
an installed FTP server. If it’s a FreeBSD system, you can configure
an FTP server through /stand/sysinstall
. Choose Configure
from the menu, then Networking
. Use your spacebar to choose
Anon FTP
.
Choose Yes
to the
configuration message and accept the defaults by tabbing to OK
. The welcome message is optional. Exit
sysinstall
once you’re
finished.
You’ll then need to remove the remark (#
) in front of the FTP line in /etc/inetd.conf, so it looks like
this:
ftp stream tcp nowait root /usr/libexec/ftpd ftpd -l
If inetd
is already running,
inform it of the configuration change using killall -1 inetd
. Otherwise, start inetd
by simply typing inetd
. To ensure the service is
running:
# sockstat | grep 21
root inetd 22433 4 tcp4 *:21 *:*
In this listing, the local system is listening for requests on
port 21, and there aren’t any current connections listed in the remote
address section (*:*
).
g4u
requires a username and a
password before it will create or retrieve an image. The default
account is install
, but you can
specify another user account when you use g4u
. To create the install
account on a FreeBSD FTP
server:
# pw useradd install -m -s /bin/csh
Make sure that the shell you give this user is listed in /etc/shells or FTP authentication will fail.
Then, use passwd install
to
give this account a password you will remember.
Before you create an image, fully configure a test system. For example, in my security lab, I usually install the latest release of FreeBSD, add my customized /etc/motd and shell prompt, configure X, and install and configure the applications students will use during their labs.
It’s a good idea to know ahead of time how large the hard drive
is on the test system and how it has been partitioned. There are
several ways to find out on a FreeBSD system, depending upon how good
you are at math. One way is to go back into /stand/sysinstall
and choose Configure
then Fdisk
. The first long line will give the
size of the entire hard drive:
Disk name: ad0 DISK Geometry: 19885 cyls/16 heads/63 sectors = 20044080 sectors (9787MB)
Press q
to exit this screen.
If you then type fdisk
at the
command line, you’ll see the size of your
partitions:
# fdisk
<snip>
The data for partition 1 is:
sysid 165 (0xa5), (FreeBSD/NetBSD/386BSD)
start 63, size 4095441 (1999 Meg), flag 80 (active)
<snip>
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>
This particular system has a 9787 MB hard drive that has one 1999 MB partition containing FreeBSD.
Whenever you’re using any ghosting utility, create an image using the smallest hard drive size that you have available, but which is also large enough to hold your desired data. This will reduce the size of the image and prevent the problems associated with trying to restore an image to a smaller hard drive.
Once you’re satisfied with your system, insert the floppy and reboot.
g4u
will probe for hardware
and configure the NIC using DHCP. Once it’s finished, you’ll be
presented with this screen:
Welcome to g4u Harddisk Image Cloning V1.12! * To upload disk-image to FTP, type: uploaddisk serverIP [image] [disk] * To upload partition to FTP, type: uploadpart serverIP [image] [disk+part] * To install harddisk from FTP, type: slurpdisk serverIP [image] [disk] * To install partition from FTP, type: slurppart serverIP [image] [disk+part] * To copy disks locally, type: copydisk disk0 disk1 [disk] defaults to wd0 for first IDE disk, [disk+part] defaults to wd0d for the whole first IDE disk. Use wd1 for second IDE disk, sd0 for first SCSI disk, etc. Default image for slurpdisk is 'rwd0d.gz'. Run 'dmesg' to see boot messages, 'disks' for recognized disks, 'parts <disk>' for list of (BSD-type!) partitions on disk '<disk>" (wd0, ...), run any other commands without args to see usage message.
Creating the image is as simple as invoking uploaddisk
with the
IP address of the FTP server. If you wish, include a useful name for
the image; in this example, I’ll call the image securitylab.gz:
# uploaddisk 192.168.2.95 securitylab.gz
( cat $tmpfile ; dd progress=1 if=/dev/rwd0d bs=1m | gzip -9 ) | ftp -n
tmpfile:
open 192.168.2.95
user install
bin
put - securitylab.gz
bye
5
4
3
2
1
working...
Connected to 192.168.2.95.
220 genisis FTP server (Version 6.00LS) ready.
331 Password required for install.
Password:
type_password_here
230 User install logged in.
Remote system type is UNIX.
Using binary mode to transfer files.
200 Type set to I.
remote: securitylab.gz
227 Entering Passive Mode (192,168,2,95,192,1)
150 Opening BINARY mode data connection for 'securitylab.gz'.
...................
This will take a while. How long depends upon the size of the drive and the speed of your network. When it is finished, you’ll see a summary:
9787+1 records in 9787+1 records out 10262568960 bytes transferred in 6033.533 secs (1700921 bytes/sec) 226 Transfer complete. 3936397936 bytes sent in 1:40:29 (637.58 KB/s) 221 Goodbye. #
You can also check out the size of the image on the FTP server:
% du -h ~install/securitylab.gz
3.7G /home/install/securitylab.gz
That’s not too bad. It took just over an hour and a half to
compress that 9 GB drive to a 3.7 GB image. The g4u
web site also has some hints for further
reducing the size of the image or increasing the speed of the
transfer.
If you use images on a regular basis, consider upgrading hubs or older switches to 100 MB switches. This can speed up your transfer rates significantly.
It’s also possible to create an image of each particular filesystem, but I find it easier just to image a fairly small drive. This is because an image of the entire drive includes the master boot record (MBR) or the desired partitioning scheme.
When you wish to install the image, use the floppy to boot the system to receive the image. Once you receive the prompt, specify the name of the image and the IP address of the FTP server:
# slurpdisk 192.168.2.95 securitylab.gz
It doesn’t matter what was previously on that drive. Since the MBR is recreated, the new drive will just contain the imaged data. Once the deployment is finished, simply reboot the system without the floppy.
The Ghost For Unix web site (http://www.feyrer.de/g4u/)