Extracting the Data

To write a script to extract data from Oracle and place it in a flat file, follow these steps:

  1. Formulate the query.

  2. Format the data.

  3. Spool the extract to a file.

  4. Make the script user-friendly.

The last step, making the script user-friendly, isn’t really necessary for a one-off effort. However, if it’s an extraction you are going to perform often, it’s worth taking a bit of time to make it easy and convenient to use.

Formulate the Query

The very first step in extracting data is to figure out just what data you need to extract. You need to develop a SQL query that will return the data you need. For the example in this chapter, we will just extract the employee data, so the query will look like this:

SELECT employee_id, 
       employee_name, 
       employee_hire_date, 
       employee_termination_date, 
       employee_billing_rate
  FROM employee;

Keep in mind that you can write queries that are much more complicated than those shown here. If necessary, you can join several tables together, or you can UNION several queries together.

Format the Data

The next step, once you have your query worked out, is to format the data to be extracted. The way to do this is to modify your query so it returns a single, long expression that combines the columns together in the format that you want in your output file. It’s often necessary to include text literals in the SELECT statement as part of this calculation. For example, if you want to produce a comma-delimited file, you will need to include those commas in your SELECT statement.

Be sure to keep in mind the ultimate destination of the data. If your purpose is to pull data for someone to load into a spreadsheet, you will probably want to use a comma-delimited format. If you are passing data to another application, you may find it easier to format the data in fixed-width columns. Dates require some extra thought. With Oracle’s built-in TO_CHAR function, you can format a date any way you want. Be sure, though, to use a format easily recognized by the application that needs to read that date.

Comma-delimited

To produce a comma-delimited text file, you need to do two things. First, you need to add commas between each field. Second, you need to enclose text fields within quotes. The final query looks like this:

SELECT TO_CHAR(employee_id) || ','
       || '"' || employee_name || '",' 
       || TO_CHAR(employee_hire_date,'MM/DD/YYYY') || ','
       || TO_CHAR(employee_termination_date,'MM/DD/YYYY') || ',' 
       || TO_CHAR(employee_billing_rate)
  FROM employee;

Oracle’s TO_CHAR function has been used to explicitly convert numeric fields to text strings. TO_CHAR is also used to convert date fields to text, and a date format string is included to get the dates into MM/DD/YYYY format. SQL’s concatenation operator, ||, is used to concatenate all the fields together into one long string, and you can see that commas are included between fields. The output from this query will look like this:

...
107,"Bohdan Khmelnytsky",01/02/1998,,45
108,"Pavlo Chubynsky",03/01/1994,11/15/1998,220
110,"Ivan Mazepa",04/04/1998,09/30/1998,84
111,"Taras Shevchenko",08/23/1976,,100
...

Notice that in addition to the commas, the employee_name field has been enclosed in quotes. This is done to accommodate the possibility that someone’s name will contain a comma. Most commercial programs that load comma-delimited data will allow text strings to be optionally enclosed in quotes.

You can use the same technique to generate tab-delimited data. Instead of a comma, use CHR(9) to put a tab character between each field. CHR( ) is an Oracle SQL function that converts an ASCII code into a character. The ASCII code uses the value 9 to represent a tab character.

Fixed-width

The easiest way to produce an output file with fixed-width columns is to use the SQL*Plus COLUMN command to format the output from a standard SQL query. The following example shows one way to dump the employee data in a fixed-width column format:

COLUMN employee_id FORMAT 09999 HEADING ''
COLUMN employee_name FORMAT A20 HEADING '' TRUNCATED
COLUMN employee_hire_date FORMAT A10 HEADING ''
COLUMN employee_termination_date FORMAT A10 HEADING ''
COLUMN employee_billing_rate FORMAT S099.99 HEADING ''
SELECT employee_id,
       employee_name,
       TO_CHAR(employee_hire_date,'MM/DD/YYYY') employee_hire_date,
       TO_CHAR(employee_termination_date,'MM/DD/YYYY')
            employee_termination_date,
       employee_billing_rate
  FROM employee;

Notice some things about the example:

  • The heading for each column has explicitly been set to a null string. That’s because, in the case of a numeric column, SQL*Plus will make the column wide enough to accommodate the heading. You don’t want that behavior. You want the format specification to control the column width.

  • Both numeric fields have been formatted to show leading zeros. Most programs that read fixed-width data, such as COBOL programs, will expect this.

  • The employee_billing_rate column will always display a leading sign character, either a “+” or a “-”. The employee_id field, on the other hand, displays a leading sign only for negative values. Positive employee_id values are output with a leading space. Neither field would ever be negative, and these two approaches were taken just for illustrative purposes.

  • Finally, notice that the TRUNCATED option was used to format the employee_name field. That’s because the employee_name can be up to 40 characters long in the database. If TRUNCATED were not specified, any names that happened to be longer than 20 characters would wrap to a second line — definitely not what you want to happen in a flat file.

Here’s how the output from the example will look:

...
 00107 Bohdan Khmelnytsky   01/02/1998            +045.00
 00108 Pavlo Chubynsky      03/01/1994 11/15/1998 +220.00
 00110 Ivan Mazepa          04/04/1998 09/30/1998 +084.00
 00111 Taras Shevchenko     08/23/1976            +100.00
...

Each column in the output is separated by one space, because that’s the SQL*Plus default. If you like, you can use the SET COLSEP command to change the number of spaces or eliminate them entirely. To run the columns together, you could eliminate the space between columns by setting the column separator to a null string. For example:

SET COLSEP ""

Now the output will look like this:

...
 00107Bohdan Khmelnytsky  01/02/1998          +045.00
 00108Pavlo Chubynsky     03/01/199411/15/1998+220.00
 00110Ivan Mazepa         04/04/199809/30/1998+084.00
 00111Taras Shevchenko    08/23/1976          +100.00
...

You can use any column separation string that you like, and you aren’t limited to just one character.

Tip

The command SET SPACE has the same effect as SET COLSEP “” — it eliminates the space between columns. Oracle considers SET SPACE to be an obsolete command, but some people still use it.

Using SQL*Plus to format fixed-width output works best when you have some control over the format expected by the destination of that data. If you are also writing the program to load the data somewhere else, of course you can code it to match what you can easily produce with SQL*Plus. Sometimes though, you need to match an existing format required by the destination; one you cannot change. Depending on your exact requirements, it may be easier to code one large expression in your SQL statement, and use Oracle’s built-in functions to gain more control over the output. The following example produces the same output as before, but without a leading sign character in the numeric fields:

SELECT LTRIM(TO_CHAR(employee_id,'09999'))
       || SUBSTR(RPAD(employee_name,20,' '),1,20)
       || TO_CHAR(employee_hire_date,'MM/DD/YYYY')
       || NVL(TO_CHAR(employee_termination_date,'MM/DD/YYYY'),'          ')
       || LTRIM(TO_CHAR(employee_billing_rate,'099.99'))
  FROM employee;

The output from this example looks like this:

00107Bohdan Khmelnytsky  01/02/1998          045.00
00108Pavlo Chubynsky     03/01/199411/15/1998220.00
00110Ivan Mazepa         04/04/199809/30/1998084.00
00111Taras Shevchenko    08/23/1976          100.00

Look carefully, and you will see that neither the employee_id field nor the employee_billing_rate field has a leading sign. Not only do they not have a leading sign, there isn’t even space left for one. The LTRIM function was used to remove it.

There is a wide variety of built-in Oracle functions available. Add to that the ability to write your own, and you should be able to generate output in any conceivable format.

DML

If you are extracting data from Oracle in order to move it to another database, and if the volume of data isn’t too high, you can use SQL*Plus to generate a file of INSERT statements. Here is a query to generate INSERT statements for each employee record:

SELECT 'INSERT INTO employee' || chr(10)
    || '  (employee_id, employee_name, employee_hire_date' || chr(10)
    || '  ,employee_termination_date, employee_billing_rate)' || chr(10)
    || '  VALUES (' || TO_CHAR(employee_id) || ',' || chr(10)
    || '          ''' || employee_name || ''',' || chr(10)
    || '          TO_DATE('''
                    || TO_CHAR(employee_hire_date,'MM/DD/YYYY')
                    || ''',''MM/DD/YYYY''),' || chr(10)
    || '          TO_DATE('''
                    || TO_CHAR(employee_termination_date,'MM/DD/YYYY')
                    || ''',''MM/DD/YYYY''),' || chr(10)
    || '          ' || TO_CHAR(employee_billing_rate) || '),'
  FROM employee;

As you can see, this type of query can get a bit hairy. You have to deal with nested, quoted strings; you have to concatenate everything together; and you have to place line breaks so the output at least looks decent. The doubled-up quotes you see in the above statement are there because single quotes are required in the final output. So, for example, the string "`'MM/DD/YYYY'')" will resolve to "`MM/DD/YYYY')" when the SELECT statement is executed. The SELECT statement just shown will produce the following INSERT statement for each employee record:

INSERT INTO employee
  (employee_id, employee_name, employee_hire_date
  ,employee_termination_date, employee_billing_rate)
  VALUES (111,
          'Taras Shevchenko',
          TO_DATE('08/23/1976','MM/DD/YYYY'),
          TO_DATE('','MM/DD/YYYY'),
          100);

This is not a technique I use very often, because it can be frustrating to get the SQL just right. I use it most often on code tables and other small tables with only two or three columns. I also use it sometimes when I’m sending data to a client. That way I just send one file, and my client doesn’t have to mess with SQL*Loader or Oracle’s Import utility.

DDL

Another twist on using SQL to write SQL is to generate commands that help you maintain your database. Such commands are referred to as Data Definition Language, or DDL, commands. Using SQL*Plus to generate DDL scripts can help in automating many database administration tasks, and is often well worth the effort. The following command, for example, generates CREATE PUBLIC SYNONYM commands for each table you own:

SELECT 'CREATE PUBLIC SYNONYM ' || table_name
       || ' for ' || user || '.' || table_name || ';'
  FROM USER_TABLES;

It’s not unusual to need public synonyms, and if you have a large number of tables, you can save yourself a lot of typing by letting SQL*Plus do the work for you. The output from the above command looks like this:

CREATE PUBLIC SYNONYM EMPLOYEE for JONATHAN.EMPLOYEE;
CREATE PUBLIC SYNONYM PROJECT for JONATHAN.PROJECT;
CREATE PUBLIC SYNONYM PROJECT_HOURS for JONATHAN.PROJECT_HOURS;

Once you have spooled the above commands to a file, you can then execute that file to create the synonyms. In addition to one-off tasks like creating synonyms, you can use SQL*Plus to generate DDL commands for use by ongoing maintenance tasks. Going beyond that, you can even use SQL*Plus to generate operating-system script files.

Spool the Extract to a File

Once you have your query worked out and the data formatted the way you need it, it’s time to spool your output to a file. In order to get a clean file, there are four things you must do.

  1. Set the linesize large enough to accommodate the longest possible line. Pay close attention to this if you are generating comma-delimited data. You need to allow for the case where each field is at its maximum size. Use the SET LINESIZE command for this.

  2. Turn off all pagination features. You can use SET PAGESIZE for this purpose. It turns off all column headings, page headings, page footers, page breaks, etc.

  3. Turn feedback off with the SET FEEDBACK OFF command.

  4. Use the SET TRIMSPOOL ON command to eliminate trailing spaces in the output data file. Use this for comma-delimited output and when you generate a file of SQL statements. Do not use this command if you are generating a file with fixed-width columns.

The following script generates a clean, comma-delimited file containing employee information:

--
--This script extracts data from the employee
--table and writes it to a text file in
--a comma-delimited format.
--

--Set the linesize to accommodate the longest possible line.
SET LINESIZE 136

--Turn off all page headings, column headings, etc.
SET PAGESIZE 0

--Turn off feedback
SET FEEDBACK OFF

--Eliminate trailing blanks at the end of a line.
SET TRIMSPOOL ON

SET TERMOUT OFF
SPOOL C:AEMP_DATA.CSV
SELECT TO_CHAR(employee_id) || ','
       || '"' || employee_name || '",'
       || TO_CHAR(employee_hire_date,'MM/DD/YYYY') || ','
       || TO_CHAR(employee_termination_date,'MM/DD/YYYY') || ','
       || TO_CHAR(employee_billing_rate)
  FROM employee;
SPOOL OFF

--Restore the default settings.
SET LINESIZE 80
SET PAGESIZE 24
SET FEEDBACK ON
SET TERMOUT ON

The SPOOL command is used to send the output to C:AEMP_DATA.CSV, and SET TERMOUT OFF is used to disable the display while the data is being written to the file. Here’s how you run this script:

SQL> @CEMPLOYEE_EXTRACT
SQL>

The resulting output file looks like this:

101,"Jonathan Gennick",11/15/1961,,169
102,"Jenny Gennick",09/16/1964,05/05/1998,135
104,"Jeff Gennick",12/29/1987,04/01/1998,99
105,"Horace Walker",06/15/1998,,121
107,"Bohdan Khmelnytsky",01/02/1998,,45
108,"Pavlo Chubynsky",03/01/1994,11/15/1998,220
110,"Ivan Mazepa",04/04/1998,09/30/1998,84
111,"Taras Shevchenko",08/23/1976,,100
112,"Hermon Goche",11/15/1961,04/04/1998,70
113,"Jacob Marley",03/03/1998,10/31/1998,300
211,"Taras Shevchenko",08/23/1976,,100

You can see that the output contains no page or column headings of any kind. It’s just plain, comma-delimited data, with one line for each record in the table.

Make Your Extract Script User-Friendly

There are at least two things you can do to improve on the extract script just shown in the previous section. First, it might be nice to display a brief message to remind the user of what the script does. This will serve to give the user confidence that he or she has indeed started the correct script. The following PROMPT commands, added at the beginning of the script, should serve the purpose:

PROMPT 
PROMPT This script creates a comma-delimited text file containing
PROMPT employee data. All records from the employee table will
PROMPT be output to this file.
PROMPT

It may not be much, but it always makes me feel better to have some indication that the script I am executing really does what I thought it did. For scripts that change data or do something difficult to reverse, I will often include a PAUSE command such as this:

PAUSE Press ENTER to continue, or ctrl-C to abort.

Another nice thing to do would be to prompt for the output filename. The following ACCEPT command will prompt the user for a filename, and put the user’s response into the variable named output_file:

ACCEPT output_file CHAR PROMPT 'Enter the output filename >'

You can then replace the filename in the SPOOL command with the substitution variable &output_file. Here is the modified, more user-friendly version of the extract script:

--
--This script extracts data from the employee
--table and writes it to a text file in
--a comma-delimited format.
--

--Tell the user what he is about to do.
PROMPT
PROMPT This script creates a comma-delimited text file containing
PROMPT employee data. All records from the employee table will
PROMPT be output to this file.
PROMPT

--Give the user a chance to bail out.
PAUSE Press ENTER to continue, or ctrl-C to abort.

--Ask the user for a filename.
ACCEPT output_file CHAR PROMPT 'Enter the output filename >'

--Set the linesize to accommodate the longest possible line.
SET LINESIZE 136

--Turn off all page headings, column headings, etc.
SET PAGESIZE 0

--Turn off feedback
SET FEEDBACK OFF

--Eliminate trailing blanks at the end of a line.
SET TRIMSPOOL ON

SET TERMOUT OFF
SPOOL &output_file
SELECT TO_CHAR(employee_id) || ','
       || '"' || employee_name || '",'
       || TO_CHAR(employee_hire_date,'MM/DD/YYYY') || ','
       || TO_CHAR(employee_termination_date,'MM/DD/YYYY') || ','
       || TO_CHAR(employee_billing_rate)
  FROM employee;
SPOOL OFF

--Restore the default settings.
SET LINESIZE 80
SET PAGESIZE 24
SET FEEDBACK ON
SET TERMOUT ON

When you run this modified script, it’s much more obvious what is going to happen. In addition, you have a chance to abort, and you can specify whatever filename you like for the output.

SQL> @C>EMPLOYEE_EXTRACT_FRIENDLY

This script creates a comma-delimited text file containing
employee data. All records from the employee table will
be output to this file.

Press ENTER to continue, or ctrl-C to abort.

Enter the output filename >C:AEMP_DATA.CSV
SQL>

Depending on your needs, you could even go further and allow the user to enter some selection criteria in order to specify which employee records to extract.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset