Stream information to Amazon S3 for real-time analytics utilizing the Oracle GoldenGate S3 handler

[ad_1]

Fashionable enterprise purposes depend on well timed and correct information with growing demand for real-time analytics. There’s a rising want for environment friendly and scalable information storage options. Information at instances is saved in several datasets and must be consolidated earlier than significant and full insights could be drawn from the datasets. That is the place replication instruments assist transfer the info from its supply to the goal techniques in actual time and rework it as crucial to assist companies with consolidation.

On this submit, we offer a step-by-step information for putting in and configuring Oracle GoldenGate for streaming information from relational databases to Amazon Easy Storage Service (Amazon S3) for real-time analytics utilizing the Oracle GoldenGate S3 handler.

Oracle GoldenGate for Oracle Database and Huge Information adapters

Oracle GoldenGate is a real-time information integration and replication device used for catastrophe restoration, information migrations, excessive availability. It captures and applies transactional adjustments in actual time, minimizing latency and protecting goal techniques synchronized with supply databases. It helps information transformation, permitting modifications throughout replication, and works with numerous database techniques, together with SQL Server, MySQL, and PostgreSQL. GoldenGate helps versatile replication topologies comparable to unidirectional, bidirectional, and multi-master configurations. Earlier than utilizing GoldenGate, ensure you have reviewed and cling to the license settlement.

Oracle GoldenGate for Huge Information offers adapters that facilitate real-time information integration from totally different sources to large information providers like Hadoop, Apache Kafka, and Amazon S3. You may configure the adapters to regulate the info seize, transformation, and supply course of based mostly in your particular necessities to help each batch-oriented and real-time streaming information integration patterns.

GoldenGate offers particular instruments known as S3 occasion handlers to combine with Amazon S3 for information replication. These handlers permit GoldenGate to learn from and write information to S3 buckets. This selection lets you use Amazon S3 for GoldenGate deployments throughout on-premises, cloud, and hybrid environments.

Answer overview

The next diagram illustrates our resolution structure.

On this submit, we stroll you thru the next high-level steps:

  1. Set up GoldenGate software program on Amazon Elastic Compute Cloud (Amazon EC2).
  2. Configure GoldenGate for Oracle Database and extract information from the Oracle database to path information.
  3. Replicate the info to Amazon S3 utilizing the GoldenGate for Huge Information S3 handler.

Stipulations

You should have the next stipulations in place:

Set up GoldenGate software program on Amazon EC2

That you must run GoldenGate on EC2 cases. The cases will need to have satisfactory CPU, reminiscence, and storage to deal with the anticipated replication quantity. For extra particulars, discuss with Working System Necessities. After you identify the CPU and reminiscence necessities, choose a present era EC2 occasion kind for GoldenGate.

Use the next method to estimate the required path house:

path disk house = transaction log quantity in 1 hour x variety of hours down x .4

When the EC2 occasion is up and working, obtain the next GoldenGate software program from the Oracle GoldenGate Downloads web page:

  • GoldenGate 21.3.0.0
  • GoldenGate for Huge Information 21c

Use the next steps to add and set up the file out of your native machine to the EC2 occasion. Make it possible for your IP handle is allowed within the inbound guidelines of the safety group of your EC2 occasion earlier than beginning a session. For this use case, we set up GoldenGate for Traditional Structure and Huge Information. See the next code:

scp -i pem-key.pem 213000_fbo_ggs_Linux_×64_Oracle_shiphome.zip ec2-user@hostname:~/.
ssh -i pem-key.pem  ec2-user@hostname
unzip 213000_fbo_ggs_Linux_×64_Oracle_shiphome.zip

Set up GoldenGate 21.3.0.0

Full the next steps to put in GoldenGate 21.3 on an EC2 occasion:

  1. Create a house listing to put in the GoldenGate software program and run the installer:
    mkdir /u01/app/oracle/product/OGG_DB_ORACLE
    /fbo_ggs_Linux_x64_Oracle_shiphome/Disk1
    
    ls -lrt
    complete 8
    drwxr-xr-x. 4 oracle oinstall 187 Jul 29 2021 set up
    drwxr-xr-x. 12 oracle oinstall 4096 Jul 29 2021 stage
    -rwxr-xr-x. 1 oracle oinstall 918 Jul 29 2021 runInstaller
    drwxrwxr-x. 2 oracle oinstall 25 Jul 29 2021 response

  2. Run runInstaller:
    [oracle@hostname Disk1]$ ./runInstaller
    Beginning Oracle Common Installer.
    Checking Temp house: have to be better than 120 MB.   Precise 193260 MB Handed
    Checking swap house: have to be better than 150 B.       Precise 15624 MB    Handed

A GUI window will pop as much as set up the software program.

  1. Observe the directions within the GUI to finish the set up course of. Present the listing path you created as the house listing for GoldenGate.

After the GoldenGate software program set up is full, you’ll be able to create the GoldenGate processes that learn the info from the supply. First, you configure OGG EXTRACT.

  1. Create an extract parameter file for the supply Oracle database. The next code is the pattern file content material:
    [oracle@hostname Disk1]$vi eabc.prm
    
    -- Extract group title
    EXTRACT EABC
    SETENV (TNS_ADMIN = "/u01/app/oracle/product/19.3.0/community/admin")
    
    -- Extract database consumer login
    
    USERID ggs_admin@mydb, PASSWORD "********"
    
    -- Native path on the distant host
    EXTTRAIL /u01/app/oracle/product/OGG_DB_ORACLE/dirdat/ea
    IGNOREREPLICATES
    GETAPPLOPS
    TRANLOGOPTIONS EXCLUDEUSER ggs_admin
    TABLE scott.emp;

  2. Add the EXTRACT on the GoldenGate immediate by working the next command:
    GGSCI> ADD EXTRACT EABC, TRANLOG, BEGIN NOW

  3. After you add the EXTRACT, test the standing of the working applications with the information all

You will notice the EXTRACT standing is within the STOPPED state, as proven within the following screenshot; that is anticipated.

  1. Begin the EXTRACT course of as proven within the following determine.

The standing adjustments to RUNNING. The next are the totally different statuses:

  • STARTING – The method is beginning.
  • RUNNING – The method has began and is working usually.
  • STOPPED – The method has stopped both usually (managed method) or as a consequence of an error.
  • ABENDED – The method has been stopped in an uncontrolled method. An irregular finish is called ABEND.

This can begin the extract course of and a path file will probably be created within the location talked about within the extract parameter file.

  1. You may confirm this through the use of the command stats <<group_name>>, as proven within the following screenshot.

Set up GoldenGate for Huge Information 21c

On this step, we set up GoldenGate for Huge Information in the identical EC2 occasion the place we put in the GoldenGate Traditional Structure.

  1. Create a listing to put in the GoldenGate for Huge Information software program. To repeat the .zip file, observe these steps:
    mkdir /u01/app/oracle/product/OGG_BIG_DATA
    
    unzip 214000_ggs_Linux_x64_BigData_64bit.zip
    tar -xvf ggs_Linux_x64_BigData_64bit.tar
    
    GGSCI> CREATE SUBDIRS
    GGSCI> EDIT PARAM MGR
    PORT 7801
    
    GGSCI> START MGR

This can begin the MANAGER program. Now you’ll be able to set up the dependencies required for the REPLICAT to run.

  1. Go to /u01/app/oracle/product/OGG_BIG_DATA/DependencyDownloader and run the sh file with the most recent model of aws-java-sdk. This script downloads the AWS SDK, which offers shopper libraries for connectivity to the AWS Cloud.
    [oracle@hostname DependencyDownloader]$ ./aws.sh 1.12.748

Configure the S3 handler

To configure an GoldenGate Replicat to ship information to an S3 bucket, it’s essential arrange a Replicat parameter file and properties file that defines how information is dealt with and despatched to Amazon S3.

AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are the entry key and secret entry key of your IAM consumer, respectively. Don’t hardcode credentials or safety keys within the parameter and properties file. There are a number of strategies obtainable to attain this, comparable to the next:

#!/bin/bash

# Use surroundings variables which might be already set within the OS
export AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY
export AWS_REGION="your_aws_region"

You may set these surroundings variables in your shell configuration file (e.g., .bashrc, .bash_profile, .zshrc) or use a safe methodology to set them briefly:

export AWS_ACCESS_KEY_ID="your_access_key_id"
export AWS_SECRET_ACCESS_KEY="your_secret_access_key"

Configure the properties file

Create a properties file for the S3 handler. This file defines how GoldenGate will work together along with your S3 bucket. Just be sure you have added the proper parameters as proven within the properties file.

The next code is an instance of an S3 handler properties file (dirprm/reps3.properties):

[oracle@hostname dirprm]$ cat reps3.properties
gg.handlerlist=filewriter

gg.handler.filewriter.kind=filewriter
gg.handler.filewriter.fileRollInterval=60s
gg.handler.filewriter.fileNameMappingTemplate=${tableName}${currentTimestamp}.json
gg.handler.filewriter.pathMappingTemplate=./dirout
gg.handler.filewriter.stateFileDirectory=./dirsta
gg.handler.filewriter.format=json
gg.handler.filewriter.finalizeAction=rename
gg.handler.filewriter.fileRenameMappingTemplate=${tableName}${currentTimestamp}.json
gg.handler.filewriter.eventHandler=s3

goldengate.userexit.writers=javawriter
#TODO Set S3 Occasion Handler- please replace as wanted
gg.eventhandler.s3.kind=s3
gg.eventhandler.s3.area=eu-west-1
gg.eventhandler.s3.bucketMappingTemplate=s3bucketname
gg.eventhandler.s3.pathMappingTemplate=${tableName}_${currentTimestamp}
gg.eventhandler.s3.accessKeyId=$AWS_ACCESS_KEY_ID
gg.eventhandler.s3.secretKey=$AWS_SECRET_ACCESS_KEY

gg.classpath=/u01/app/oracle/product/OGG_BIG_DATA/dirprm/:/u01/app/oracle/product/OGG_BIG_DATA/DependencyDownloader/dependencies/aws_sdk_1.12.748/
gg.log=log4j
gg.log.degree=DEBUG

#javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar -Daws.accessKeyId=my_access_key_id -Daws.secretKey=my_secret_key
javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar

Configure GoldenGate REPLICAT

Create the parameter file in /dirprm within the GoldenGate for Huge Information dwelling:

[oracle@hostname dirprm]$ vi rps3.prm
REPLICAT rps3
-- Command so as to add REPLICAT
-- add replicat fw, exttrail AdapterExamples/path/tr
SETENV(GGS_JAVAUSEREXIT_CONF = 'dirprm/rps3.props')
TARGETDB LIBFILE libggjava.so SET property=dirprm/rps3.props
REPORTCOUNT EVERY 1 MINUTES, RATE
MAP SCOTT.EMP, TARGET gg.handler.s3handler;;

[oracle@hostname OGG_BIG_DATA]$ ./ggsci
GGSCI > add replicat rps3, exttrail ./dirdat/tr/ea
Replicat added.

GGSCI > data all
Program Standing Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED RPS3 00:00:00 00:00:39

GGSCI > begin *
Sending START request to Supervisor ...
Replicat group RPS3 beginning.

Now you might have efficiently began the Replicat. You may confirm this by working data and stats instructions adopted by the Replicat title, as proven within the following screenshot.

To substantiate that the file has been replicated to an S3 bucket, open the Amazon S3 console and open the bucket you created. You may see that the desk information has been replicated to Amazon S3 in JSON file format.

Greatest practices

Just be sure you are following one of the best practices on efficiency, compression, and safety.

Think about the next greatest practices for efficiency:

The next are greatest practices for compression:

  • Allow compression for path information to cut back storage necessities and enhance community switch efficiency.
  • Use GoldenGate’s built-in compression capabilities or use file system-level compression instruments.
  • Strike a steadiness between compression degree and CPU overhead, as a result of increased compression ranges might affect efficiency.

Lastly, when implementing Oracle GoldenGate for streaming information to Amazon S3 for real-time analytics, it’s essential to deal with numerous safety issues to guard your information and infrastructure. Observe the safety greatest practices for Amazon S3 and safety choices obtainable for GoldenGate Traditional Structure.

Clear up

To keep away from ongoing costs, delete the sources that you just created as a part of this submit:

  1. Take away the S3 bucket and path information if not wanted and cease the GoldenGate processes on Amazon EC2.
  2. Revert the adjustments that you just made within the database (comparable to grants, supplemental logging, and archive log retention).
  3. To delete all the setup, cease your EC2 occasion.

Conclusion

On this submit, we supplied a step-by-step information for putting in and configuring GoldenGate for Oracle Traditional Structure and Huge Information for streaming information from relational databases to Amazon S3. With these directions, you’ll be able to efficiently arrange an surroundings and make the most of the real-time analytics utilizing a GoldenGate handler for Amazon S3, which we are going to discover additional in an upcoming submit.

In case you have any feedback or questions, go away them within the feedback part.


In regards to the Authors

Prasad Matkar is Database Specialist Options Architect at AWS based mostly within the EMEA area. With a concentrate on relational database engines, he offers technical help to prospects migrating and modernizing their database workloads to AWS.

Arun Sankaranarayanan is a Database Specialist Answer Architect based mostly in London, UK. With a concentrate on purpose-built database engines, he assists prospects in migrating and modernizing their database workloads to AWS.

Giorgio Bonzi is a Sr. Database Specialist Options Architect at AWS based mostly within the EMEA area. With a concentrate on relational database engines, he offers technical help to prospects migrating and modernizing their database workloads to AWS.

[ad_2]

Leave a Reply

Your email address will not be published. Required fields are marked *