Hadoop Developer
Simon

Professional Summary

  • 12+ years of experience in IT industry comprises of development, maintenance and support, mitigation projects in Big Data, Java and Mainframe technologies.
  • Over 6+ years of client facing skills and working experience at client location, USA.
  • Excellent business knowledge and work experience in Insurance (AIG), VISA, Banking and Finance firms.
  • Experience in DevOps (clipped compound of development and operations) Model.
  • Cloudera certified developer for Apache Hadoop – CCD410
  • Over 3.5 years of extensive experience in BigData analytics including Hadoop MapReduce, HDFS, Hive, Python, Sqoop, Cassandra, HBase, Flume, Kafka, Strom, Spark, Avro, Github, Nexus, Jenkins, Pig etc.
  • Part of the key solution team and worked on multiple POC and to implement the best one.
  • Have experience in various Hadoop distributions like Cloudera, Horton works etc.
  • Experience in ingesting streaming data into Hadoop using Spark, Storm Framework and Scala.
  • Working experience on Denodo visualization tool.
  • Good Experience in POC Projects, Played a key role in execution of POC projects.
  • Good experience in Hadoop cluster monitoring tools like Ambari.
  • Good experience in UNIX and shell scripts.
  • Good understanding of HDFS Designs, Daemons, federation and HDFS high availability (HA).
  • Good experience in Hive tables design, loading the data into hive tables.
  • Good experience in Data analysis, ETL Development, Data warehousing.
  • Good exposure in Data Modeling with expertise in creating Star & Snow-Flake Schemas, FACT and Dimensions Tables.
  • Good experience on Agile and Scrum methodologies.
  • Co-ordinate work requests among team members, sizing, work allocation, status reporting, defect tracking, change management, issues clarification.
  • System Study, Analysis, of Business requirement, preparation of Technical design, UTP and UTC, Coding, Unit testing, Integration testing, System testing and Implementation
  • Added Business value to the application like
    • Restructuring the application jobs improved the application availability extensively,
    • Reduce the job cost by running the jobs in off peak time, DASD utilization.
    • Reusing the existing services and functionalities, decommission of elements and preparing the internal tools for data validation.
    • Reduce the ticket volume by performing root cause analysis/preventive analysis.
  • Extensively supported the client Sr management by providing the complex reports which helps them in making the critical decisions.
  • Worked with DBA’s to understand the restrictions, standards and database architecture for all DB objects related activities.
  • Good working knowledge and extensive experience in performance tuning of DB2 components and batch programs,

Education

  • Executive M.B.A
  • MS in Information Systems
  • BS in Computer Applications)

Certifications And Achievements

  • Certified in ITIL V3 Foundation.
  • Cloudera certified developer for Apache Hadoop – CCD410
  • Trained in Agile and Scrum methodologies.
  • Recipient of “Creative Mind” award.
  • Recipient of “10/10” award.
  • Recipient of “special appreciation” award.
  • Received the “Best project of the year award”.
  • Received the “Best project of the half year award”.

Technical Skills

Big Data Ecosystem
HDFS, Cassandra, HBase, Hadoop MapReduce, Hive, Pig, Flume, Hue, SPARK, SCALA, Kafka, Sqoop, Cloudera, zoo keeper, Oozie, Hbase, SOLR, Azkaban.
Languages
COBOL, core Java, JCL, SQL, PL/SQL, C, C++, Web Services.
Methodologies
Agile, V-model.
Database
Oracle 10g, DB2, IMS DB, My SQL, No SQL, Derby
IDE / Testing Tools
Eclipse
Operating System
Windows, UNIX, Linux.
Scripts
Python, Shell Scripting.
Others
Git, Nexus, Jenkins, Denodo, Rally, VSAM, Endeavor, Change man, CONTROL-M, Jobtrac, Expediter, Easytrieve, QC, Manage now, Infoman, MS- Office, JHS, JMR.

Professional Experience

T-Mobile Bellevue, WA
Duration
Oct 2015 – Present
Role
Sr Hadoop Consultant/Lead
Responsibilities
Project: Integrated Data Warehouse

The goal of the project is to create Integrated Data warehouse system for T-Mobile. T-Mobile has multiple source systems for their pre-paid and post-paid billing systems which is being integrated in to a single warehouse system. Ericson billing system is proposed to replace the existing multiple systems. A hybrid solution is proposed which is a mixture of Hadoop and Teradata.

  • Involved in Source System Analysis along with SME’s and data architects.
  • Extract the data from Various Source system using DMF and BEAM to ingest data in to datalake.
  • Part of the solution architecture team and implemented optimizations techniques include partitioning, bucketing.
  • Developed a strategy for Full load and incremental load using Sqoop.
  • Closely working with Architects and Admin team to provide the solutions to multiple teams.
  • Implemented the DevOps model.
  • Single POC for code deployment to QA and Production.
  • Developed Spark-SQL code for faster testing and processing of data.
  • Implemented and extracted the data from Hive and HDFS files using SPARK.
  • Worked on Spark/Scala for large file processing.
  • Used the CI-CD tools (Github, Nexus, Gerrit and Jenkins) for code deployment.
  • Hadoop cluster Monitoring/Alerts using the Ambari tool.
  • Extensively used the Pig for Transformations.
  • Worked with Horton works team for the configuration issues.
  • Implemented LAD (Late Arriving Dimension) functionality in Hadoop.
  • Implemented optimization and performance tuning in Hive and Pig.
  • Extensively worked on History loads to bring the data into Hadoop.
  • Created the Views on Hive, Hive on top of Hbase tables.
  • Created HBase tables for loading SCD2 model.
  • Created the Control M Jobs for daily, weekly Job.
  • Created Hive tables for SCD1 Insert/Upsert models.
  • Involved in creating Hive table, loading with data, writing hive queries to test the test cases
  • Exporting test data into Teradata using Bteq, Sqoop.
Environment
Hadoop, MapReduce, HDFS, Hive, Pig, Hbase, Sqoop, Beam, Java, eclipse, Spark, Scala, SQL,DB2, Mysql, PIG, Linux, Horton works, Oozie, SOLR, Ambari, Github, Nexus, Jenkins, Gerrit, Python.
VISA, Foster City, USA
Duration
July 2014 - Sept 2015
Role
Sr Hadoop consultant
Responsibilities
CCDRi(Central commercial data repository integrated) system is one of the critical application in VISA. It is a core operational data store which stores financial, invoice, master details, audit and subscription details. As part of Hadoop Migration moving the data from DB2 to Hadoop and creating the outbound files for the clients and reports for the top management.

  • Involved in review of functional and non-functional requirements.
  • Analyze the source tables (350+) tables.
  • Closely work with App support team to deploy the code to QA and Prod environment.
  • Designed and created the stage tables and main tables.
  • Designing the Hive tables as per business requirement.
  • Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
  • Worked on the core and Spark SQL modules of Spark extensively.
  • Implemented optimization and performance tuning in Hive and Pig.
  • Created the various bulk load and incremental load scripts.
  • Imported data using Sqoop to load data from DB2 site to Hadoop on regular basis.
  • Wrote and implemented Pig UDF to preprocess the data and use it for analysis.
  • Installed and configured Pig for ETL jobs.
  • Creating Hive tables and working on them using Hive QL for data analysis in order to meet the business requirements.
  • Implemented Partitioning and bucketing in Hive.
  • Extensively used Pig for data cleansing.
  • Run Ad-Hoc query through PIG Latin language, Hive.
  • Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Experience with versioning,changecontrol,problem management.
Environment
Hadoop, MapReduce, HDFS, Hive, Pig, Hbase, Sqoop, Python, flume, Java, eclipse, Spark, Scala, SQL, DB2, PIG, Sqoop, Linux, Cloudera , Github
Project: AMERICAN EXPRESS, Florida, USA.
Duration
Nov 2012 - May 2014
Role
Hadoop Developer
Responsibilities
The goal of the project is to offer personalized offers to the customers of American Express by tracking the customer's spending habits and using that data to offer personalized deals or offers from merchants. The projects aims at creating a sophisticated offers creation and campaign management platform that supports multiple offer types enabling merchants to target offers to pre-selected consumer profiles and advanced personalization capabilities to ensure consumers receive highly relevant offers.

  • Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer data and purchase histories into HDFS for analysis.
  • Implemented optimization and performance tuning in Hive and Pig.
  • Developed job flows in Oozie to automate the workflow for extraction of data from warehouses and weblogs.
  • Used Pig as ETL tool to do transformations, event joins, filters and some pre-aggregations before storing the data onto HDFS.
  • Optimizing Map reduce code, pig scripts, user interface analysis, performance tuning and analysis.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Loaded the aggregated data onto DB2 for reporting on the dashboard.
  • Implemented Partitioning and bucketing in Hive.
  • Experience in managing and reviewing Hadoop log files.
  • Extensively used Pig for data cleansing.
  • Configured Flume to extract the data from the web server output files to load into HDFS.
Environment
JDK1.6, Cent OS, HDFS, Map-Reduce, Java, eclipse, Hive, Pig, Sqoop, Python, Flume, Zookeeper, Oozie, DB2, Mysql and HBase.
QA & SAC (DATA WAREHOUSE), DANSKE BANK
Duration
Jan 2011 to June 2012
Role
Data Analyst /Tech lead
Responsibilities
Technologies: - Erwin r7.1, Informatica 7.1.3, Windows XP/NT/2000, SQL, Oracle10g, DB2, MS Excel, MS Visio.

  • Worked on multiple projects with different business units including Insurance actuaries
  • Gather the various reporting requirement from the business analysts.
  • Gather all the Sales analysis report prototypes from the business analysts belonging to different Business units; Participated in JAD sessions involving the discussion of various reporting needs.
  • Reverse engineered the reports and identified the Data Elements (in the source systems), Dimensions, Facts and Measures required for new enhancements of reports.
  • Conduct Design discussions and meetings to come out with the appropriate Data Mart at the lowest level of grain for each of the Dimensions involved.
  • Designed a STAR schema for the detailed data marts and Plan data marts involving confirmed dimensions.
  • Created and maintained the Data Model repository as per company standards.
  • Conduct Design reviews with the business analysts and content developers to create a proof of concept for the reports.
  • Worked with the Implementation team to ensure a smooth transition from the design to the implementation phase
  • Worked closely with the ETL SSIS Developers to explain the complex Data Transformation using Logic
Corporate Reporting System (CRS), Chartis Insurance Company (formerly AIG)
Duration
Dec 2005 to Jan 2011
Role
Project Lead/Onsite Coordinator/Tech Lead
Responsibilities
Technologies: - COBOL, DB2, JCL, Control M, CA -7, Platinum, Change man, Manage now, QC, EXPEDITER,JHS, Java, Webservices.

  • Coordinate with the client and offshore resources.
  • Requirement gathering and writing technical specifications.
  • Interacting with users to get the signoffs and change requests.
  • Impact analysis and estimation for the work request.
  • Providing the value added services to the application.
  • Review of Design/Approach documents and Quality Control Activities.
  • Coding, unit testing, systems testing and User acceptance testing
  • Involved in handling the Development / Enhancement / Maintenance work requests
  • Providing the 24x7 Production Support for the mainframe & mainframe based applications on Service Level Agreement (SLA) basis. Responsibility involves
    • Monitor batch cycles Daily, Weekly, Monthly, Yearly and Ad hoc jobs
    • Resolve job failure, able to restart, force Complete the Jobs
    • Interacting with the Production/Operational team in case of job ABEND, and/or Environment issues
    • Interacting with oncall DBA to solve the database issues
Membership Application, Regence BlueCross BlueShield of Oregon
Duration
Oct 2004 to Dec 2005
Role
Team member
Responsibilities
Technologies: - COBOL, IMS DB, JCL, Java, Webservices, MQ, CA -7, Platinum, Librarian, Manage now, QC.

  • Production support and test region support activities.
  • Preparation of Impact Analysis documents
  • Preparing RCA documents and defective prevention
  • Communicating with client for gathering the requirements
  • Analysis, detail design and code reviews
  • Coding, unit testing and integration testing