ORACLE External Tables
Problem: You have a table sitting inside a MySQL database and your aim is to move it into your ORACLE database. Sounds easy? Well, it ought to be a lot easier than it currently happens to be. Here, we look at a worked example that involves using ORACLE external tables.
Dump The Data To File
The first thoughts are to get the data that's sitting in the MySQL table into an easily transportable form. MySQL comes with a useful utility called mysqldump which, not surprisingly, can be used to dump data into flat file form. It comes bristling with very interesting and flexible options. One option in particular reads (figure 1.0):
Figure 1.0
$ mysqldump --help mysqldump Ver 10.10 Distrib 5.0.22, for redhat-linux-gnu (i386) By Igor Romanenko, Monty, Jani & Sinisa This software comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to modify and redistribute it under the GPL license Dumping definition and data mysql database or table Usage: mysqldump [OPTIONS] database [tables] OR mysqldump [OPTIONS] --databases [OPTIONS] DB1 [DB2 DB3...] OR mysqldump [OPTIONS] --all-databases [OPTIONS] : : : -V, --version Output version information and exit. -w, --where=name Dump only selected records; QUOTES mandatory! -X, --xml Dump a database as well formed XML.
That sounds like exactly what we need. Wrong! Unfortunately, the MySQL data can be subject to an unwanted transformation during the attempt to generate well-formed XML. For example, double quotes and the less-than symbol that appear in the data are transformed into their entity equivalents: " and < respectively. While the resulting output is indeed well-formed XML, it cannot be relied upon to be a verbatim dump of the source data.
In the end, a simple solution is just to use the tried-and-trusted delimited format dump with delimiters chosen as appropriate. Figure 1.1 shows an example shell script that generates a data dump file for each table in the chosen database.
Figure 1.1
#!/bin/bash
# In the output,
# the field separator is: ^_ (CTRL-Underscore {a.k.a. \u001f})
# the record separator is: ^_\n (CTRL-Underscore followed by LF)
#
# Remember to set the variables un, pw and db accordingly.
#
# Also remember to ensure that the directory /tmp/dump is writable
# by the MySQL server process.
mysqldump -Qlc \
--compatible=oracle \
--tab=/tmp/dump \
--fields-terminated-by="^_" \
--lines-terminated-by="^_\n" \
-u ${un} \
--password=${pw} \
${db}
Before casually proceeding, it is worth stopping to consider whether there are any issues deriving from character set differences between the MySQL and ORACLE database servers.
Accessing the Data Dumps
With the creation of a dump file, it is now feasible to import the data straight into an ORACLE table using SQL*Loader. By way of example, a SQL*Loader control file for performing such a task using one of the earlier-generated dump files is given in figure 1.2.
Figure 1.2
LOAD DATA INFILE '/tmp/dump/users.txt' "str X'1F0A'" APPEND PRESERVE BLANKS INTO TABLE users FIELDS TERMINATED BY X'1F' ( "uid" INTEGER EXTERNAL, name CHAR(60), pass CHAR(32), mail CHAR(64), "mode" INTEGER EXTERNAL, sort INTEGER EXTERNAL, threshold INTEGER EXTERNAL, theme CHAR(255), signature CHAR(255), created INTEGER EXTERNAL, "access" INTEGER EXTERNAL, login INTEGER EXTERNAL, status INTEGER EXTERNAL, timezone CHAR(8), language CHAR(12), picture CHAR(255), init CHAR(64), data CHAR(4000) )
SQL*Loader would quite happily lap up the data file and populate our ORACLE table for us. Job done!
The External Table Approach
One of the newer features that ORACLE brings to the table is the ability to treat external files as if they were actually native database tables. Given that possibility, it doesn't take much imagination to see that we can query the files' contents using SQL, apply filtering and validation, join them to reference data and populate downstream tables with the results. In this particular example, we can convince ORACLE to reach inside the files generated by mysqldump and directly populate our target tables. The following steps outline how we do this.
Incidentally, the word is that over the course of time, external tables will eventually become the "norm" with respect to loading foreign data into the database. Given the ability to access such data and manipulate it using familiar SQL operations, it is easy to see the advantage this approach has over using SQL*Loader. Unfortunately (for now at least), external tables cannot access LOB data and in such cases, we are forced to make do with using SQL*Loader.
A useful thing to be aware of is that SQL*Loader can generate the DDL for external tables using the specification given in a control file. Figure 1.3 shows how:
Figure 1.3
$ sqlldr / control=users.ctl external_table=generate_only
This will generate the DDL as shown in Figure 1.4 (which has been formatted a bit and some of the names tailored). The DDL can be found in the log file that SQL*Loader generates for the requested operation.
Figure 1.4
CREATE DIRECTORY mysqldump AS '/tmp/dump';
CREATE TABLE ext_users
(
"uid" NUMBER(38),
"NAME" VARCHAR2(60),
"PASS" VARCHAR2(32),
"MAIL" VARCHAR2(64),
"mode" NUMBER(38),
"SORT" NUMBER(38),
"THRESHOLD" NUMBER(38),
"THEME" VARCHAR2(255),
"SIGNATURE" VARCHAR2(255),
"CREATED" NUMBER(38),
"access" NUMBER(38),
"LOGIN" NUMBER(38),
"STATUS" NUMBER(38),
"TIMEZONE" VARCHAR2(8),
"LANGUAGE" VARCHAR2(12),
"PICTURE" VARCHAR2(255),
"INIT" VARCHAR2(64),
"DATA" VARCHAR2(4000)
)
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY mysqldump
ACCESS PARAMETERS
(
RECORDS DELIMITED BY 0x'1F0A' CHARACTERSET WE8ISO8859P1
BADFILE mysqldump:'users.bad'
LOGFILE 'users.log_xt'
READSIZE 1048576
FIELDS TERMINATED BY 0x'1F' NOTRIM
REJECT ROWS WITH ALL NULL FIELDS
(
"uid" CHAR(255) TERMINATED BY 0x'1F',
"NAME" CHAR(60) TERMINATED BY 0x'1F',
"PASS" CHAR(32) TERMINATED BY 0x'1F',
"MAIL" CHAR(64) TERMINATED BY 0x'1F',
"mode" CHAR(255) TERMINATED BY 0x'1F',
"SORT" CHAR(255) TERMINATED BY 0x'1F',
"THRESHOLD" CHAR(255) TERMINATED BY 0x'1F',
"THEME" CHAR(255) TERMINATED BY 0x'1F',
"SIGNATURE" CHAR(255) TERMINATED BY 0x'1F',
"CREATED" CHAR(255) TERMINATED BY 0x'1F',
"access" CHAR(255) TERMINATED BY 0x'1F',
"LOGIN" CHAR(255) TERMINATED BY 0x'1F',
"STATUS" CHAR(255) TERMINATED BY 0x'1F',
"TIMEZONE" CHAR(8) TERMINATED BY 0x'1F',
"LANGUAGE" CHAR(12) TERMINATED BY 0x'1F',
"PICTURE" CHAR(255) TERMINATED BY 0x'1F',
"INIT" CHAR(64) TERMINATED BY 0x'1F',
"DATA" CHAR(4000) TERMINATED BY 0x'1F'
)
)
LOCATION
(
'users.txt'
)
)
REJECT LIMIT UNLIMITED
That's a lot of DDL! The upshot is that the file can now be treated as though it was a native ORACLE table. We now have the opportunity to transfer its contents verbatim, transform them using complex SQL expressions and under constraints which SQL*Loader couldn't possibly handle.
For example:
Load up only brand new, never before seen users who have registered with non-blacklisted email addresses and who signed up within the last two months. The registration date is expressed as the number of seconds since the start of the Unix epoch (01-Jan-1970 00:00:00 UTC).
This could be taken care of using, for example, the SQL given in figure 1.5:
Figure 1.5
INSERT INTO users
(
"uid",
name,
pass,
mail,
"mode",
sort,
threshold,
theme,
signature,
created,
"access",
login,
status,
timezone,
language,
picture,
init,
data
)
SELECT
"uid",
name,
pass,
mail,
"mode",
sort,
threshold,
theme,
signature,
created,
"access",
login,
status,
timezone,
language,
picture,
init,
data
FROM ext_users
WHERE NOT EXISTS(SELECT 1 FROM users us2 WHERE us2."uid" = ext_users."uid")
AND mail NOT IN (SELECT email_address FROM blacklisted_addresses)
AND MONTHS_BETWEEN(SYSDATE, (TO_DATE('19700101', 'YYYYMMDD') + created/86400)) < 2
So, there you have it: data import using ORACLE's external tables feature. This is just the tip of the iceberg and it is clear that external tables provide a useful tool in the data processor's war chest.
