Automating Reconciliation Using AWS Glue

KBX Digital
4 min readMar 1


Automating the reconciliation process is crucial for businesses to keep track of their financial transactions and ensure accuracy in their reporting. With the growth of data and the need for faster reconciliation processes, many organizations are turning to AWS Glue, Amazon’s fully managed extract, transform, and load (ETL) service. In this article, we’ll explore how AWS Glue can be used to automate the reconciliation process.

What is AWS Glue?

AWS Glue is a fully managed ETL service that makes it easy to move data between data stores. With AWS Glue, you can convert data from one format to another, validate the data, and transform it into the desired format. You can also use AWS Glue to extract data from various data sources and load it into Amazon S3 or Amazon Redshift.

Why use AWS Glue for reconciliation?

Reconciliation is a time-consuming and repetitive process that requires a lot of manual effort. By automating the reconciliation process using AWS Glue, you can reduce the time and effort required to reconcile data and increase the accuracy of your financial reporting. Additionally, AWS Glue provides a centralized platform to manage all your reconciliation processes, making it easier to track and maintain reconciliation workflows.

Steps to Automate Reconciliation Using AWS Glue:

The following steps outline the process of automating reconciliation using AWS Glue:

  1. Extract Data: The first step in automating reconciliation is to extract the data from your data sources. AWS Glue supports a wide range of data sources, including Amazon S3, Amazon Redshift, Amazon RDS, and more. You can extract data from these sources using AWS Glue’s built-in connectors.
  2. Transform Data: Once you have extracted the data, you need to transform it into the desired format. AWS Glue provides a powerful transformation engine that allows you to perform complex transformations on your data. You can use AWS Glue’s built-in functions, such as date conversion and string manipulation, to transform the data.
  3. Load Data: After transforming the data, you can load it into Amazon S3 or Amazon Redshift for further analysis. AWS Glue provides a secure and efficient way to load data into these data stores, making it easy to use the data for reconciliation.
  4. Schedule Workflow: Once you have extracted, transformed, and loaded the data, you can schedule your reconciliation workflow to run automatically. AWS Glue provides a scheduling system that allows you to schedule your reconciliation process to run at a specific time or on a recurring basis.
  5. Monitor Workflow: The final step is to monitor the reconciliation workflow to ensure that it runs as expected. AWS Glue provides a monitoring system that allows you to track the progress of your reconciliation process and receive notifications if there are any issues.

So, let’s demonstrate the reconciliation process using two CSV files stored in Amazon S3 as the data source.

Step 1: Create a S3 Bucket

In the AWS Management Console, create a S3 bucket, one folder for input and one for output. Upload the two CSV files to the input folder.

Step 2: Create an AWS Glue Job

Click on the “Jobs” tab and then click the “Add Job” button. Give the job a name, such as “reconciliation-job”. Choose the source and target bucket where you would want to store the resultant files.

Step 3: Define the Transformation

In the “Transform” section of the job, you can define the transformations that need to be applied to the data. For example, you can choose to join the two tables based on a common field and apply a filter to reconcile the data.

Step 4: Run the Job

Click on the “Run” button on the top right to start the reconciliation process. The AWS Glue Job will extract the data from the two CSV files, apply the transformations, and store the reconciled data in the target S3 location which is the output folder.

Step 5: Monitor the Job

Go to the “Jobs” tab and monitor the status of the job. If the job is successful, the reconciled data will be available in the target S3 location.

In this demonstration, we have shown how to automate the reconciliation process using AWS Glue and two CSV files stored in Amazon S3. By using AWS Glue, you can simplify the reconciliation process and ensure accuracy in your financial reporting.

About the Author

Snowwin is a Software Developer at KBX Digital with experience in Data Engineering and Full Stack Development. He is also an avid traveller who likes to explore new things.

About KBX Digital

At KBX Digital we use server-less technology to auto scale micro-services to serve millions of customers.