Here’s an example code for migrating data from Oracle to Redshift using Python and AWS Glue:
pythonCopy codeimport sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
from awsglue.job import Job
from pyspark.sql import SparkSession
# Set up Spark and Glue context
spark = SparkSession.builder.appName("OracleToRedshift").getOrCreate()
glueContext = GlueContext(spark.sparkContext)
job = Job(glueContext)
# Get command-line arguments
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'ORACLE_JDBC_URL', 'ORACLE_USERNAME', 'ORACLE_PASSWORD', 'ORACLE_TABLE', 'REDSHIFT_JDBC_URL', 'REDSHIFT_USERNAME', 'REDSHIFT_PASSWORD', 'REDSHIFT_TABLE'])
# Extract data from Oracle
oracleDataSource = glueContext.create_dynamic_frame.from_options(
connection_type="jdbc",
connection_options={
"url": args['ORACLE_JDBC_URL'],
"user": args['ORACLE_USERNAME'],
"password": args['ORACLE_PASSWORD'],
"dbtable": args['ORACLE_TABLE'],
"redshiftTmpDir": args['S3_TEMP_DIR']
},
format="jdbc"
)
# Transform data to match Redshift schema
redshiftDataSource = oracleDataSource \
.apply_mapping([
("oracle_column1", "string", "redshift_column1", "string"),
("oracle_column2", "int", "redshift_column2", "int"),
("oracle_column3", "decimal(10,2)", "redshift_column3", "decimal(10,2)")
])
# Load data into Redshift
glueContext.write_dynamic_frame.from_jdbc_conf(
frame=redshiftDataSource,
catalog_connection=args['REDSHIFT_CONNECTION'],
connection_options={
"url": args['REDSHIFT_JDBC_URL'],
"user": args['REDSHIFT_USERNAME'],
"password": args['REDSHIFT_PASSWORD'],
"dbtable": args['REDSHIFT_TABLE']
},
redshift_tmp_dir=args['S3_TEMP_DIR']
)
# Commit job
job.commit()
This code uses the AWS Glue ETL (Extract, Transform, Load) service to extract data from Oracle, transform it to match the Redshift schema, and load it into Redshift. The args
variable contains the command-line arguments passed to the script, including the JDBC URLs and credentials for both the Oracle and Redshift databases, the table names, and the S3 temporary directory for Redshift data.
The script first creates a Spark session and Glue context. It then uses the create_dynamic_frame.from_options()
method to extract data from the Oracle database, and applies a mapping to transform the data to match the Redshift schema. Finally, it uses the write_dynamic_frame.from_jdbc_conf()
method to load the transformed data into Redshift.