loader-logo

Data Engineering

Problem:

Large identity provider based in DC Metro collects data from multiple sources. They needed an efficient way to run monthly batch job to reconcile records, collapse individual identities and provide collated info to their clients. Current infrastructure and processes weren’t able process the massive amounts of data.

RestonLogic Approach:

  • Our engineers and architects met with business leads, software and database architects, operations, testers and came with a comprehensive picture of this mission-critical app and workflow
  • We then worked through the dependencies, understood the data flow, developed several data processing utilities to clean and normalize data
  • We developed AWS architecture diagrams based on the assessment phase
  • Using Terraform and Ansible Configuration Management tools, we quickly created a POC to quickly establish the baseline i.e., number of records processed per hour
  • The final architecture was successfully deployed within weeks and put into production

Results:

  • Customer has completely migrated off of a data center and over 400 servers retired
  • Batch processing job time reduced to 3 days from 7
  • Customer has a well-defined app profiles and spin up different size cluster to match incoming data
  • Success of initial project led customer to create a SaaS offering and new revenue stream