Zhenming Yang

Data engineer, Software Engineer

Profile

Software Data Engineer with strong experience in Big Data solutions, SQL/NoSQL databases, and building ETL/ELT pipelines using cloud services. Proficient in AWS cloud technologies, dbt for analytics engineering, Docker for containerization, and Apache Airflow for workflow orchestration. Skilled in Python, Java, SQL, and PowerShell, with a solid foundation in Computer Science concepts and software engineering best practices. Experienced in requirement gathering and SDLC, with a focus on scalable, efficient data solutions.


As a Green Card holder. I do not need any sponsorship in the future.

Skills

Data Engineering

Proven ability to use Cloud data platforms and different languages. Proficient in SQL and ETL process. Great experience with data modeling, large scale of OLAP nmanagement, and operations. Good at creating varied ad-hoc visuals to meet stakeholders' requirements. Insights the value of data.

Software Engineering

Solid understanding of fundamental Computer Science concepts.

Technical Forte

  • On-prem/Cloud DB
  • AWS/Azure
  • PySpark/Snowflake/Kafka
  • ETL/ELT/DataLake/DeltaLake
  • PySpark/DBT/Docker/Airflow
  • Powershell/R
  • Python/SQL/GraphQL
  • Data Visualizations
  • DevOps/GIT

Work Experience

Capital Group Investment Group

Data Engineer

2021 Nov - 2024 Oct

    Generated Glue jobs by analyzing data from different sources. Such as files in S3, CloudWatch/CloudTrail logs, and other third-party APIs.
    Utilized Glue jobs and Lambda functions to handle data processing logics with different kinds of triggers such as SQS and Glue catalog events.
    Automated ETL process with Glue jobs by using AWS CDK which extracts CloudTrail logs into parquet files and dumps them into Athena DB.
    Initiated, Developed, and managed an ETL pipeline. Converted IRR, MoM, FundReturnRatio, and other Excel functions in Pyspark. More than 20 Lambda / Glue jobs were included for data staging, logic processing, and datasets validation. All permissions and resource access are defined in AWS CDK.
    Persistent the ETL results into Postgre DB via AWS Glue jobs. Partitioned and indexed monthly coming datasets with pre-defined store procedures to improve performance.
    Managed and automated the Glue jobs in DBT. Including the data validations, SCD checks, and documentation generation. Modularized and materialized the middle steps into tables. Other features such as seeds for ad-hoc analysis. Macro for user-defined functions.
    Rendered Postgre datasets into Tableau visuals and published them into different workspaces. Implemented Role-Level-Security to limit different data access requests.
    Read and backfill data from the Dremio data source in the AWS Glue job with rotated PAT(private access token) enabled. With the optimized indexes, the API provides the best performance when compared with APIs from by other teams.
    Modified the Sproc and tuned the data archive Sproc by modifying indexes and job schedules. Made the front-end query returned in milliseconds rather than mins than before in Postgres DB.

Microsoft Office IP AntiSpam

Software Developer

2019 May - 2021 May

    Migrated Cosmos uploader/downloader with ScopeSDK, T-SQL, and PowerShell from old JScript logic. Equipped with alert and pipeline monitor to enhance durability.
    Troubleshoot issues, and errors in ETL services. Generated probes, monitors, and alerts for different components and services with C#, Scope language, and PowerShell.
    Built and maintained a big data platform (ADF/ADLS) in Cosmos Scope script as per industry standards and best practices.
    Performed DB admin activities on multiple DB Servers. Optimized data pipeline by rewriting new logic and SQL schema, gaining both performance and service durability. Reduced up to 20% storage cost with archiving and re-indexing stale data as well as increased the query performance.
    Optimized slow-running scope queries by using scope hints. Generate TB-level data daily basis.Implemented Ad-hoc data status monitors and dashboards with PowerBI and Lens platform. Deliver reports to end users weekly and monthly.

Conagra Brands Inc. Cost to Serve

Data Engineer

2018 Nov - 2019 Mar

    Designed and implemented the whole ETL (Extract, Transform, Load) data process with various transactions within SSIS packages to implement business rules and logic.
    Generated PowerBI reports, and dashboards with Star schema. Implemented complex requirements in PowerBI by using DAX and M language.
    Reduced refresh waiting time by using PowerBI API in PowerShell scripts, deployed and automated the scripts on CtrlM Servers.
    Created customized visuals with R libraries. Embedded the PowerBI visuals within the Sales-force homepage with PowerBI-JavaScript libraries.
    Accumulated agile Product Management experience - a full gamut of creating stories, generating PowerBI reports and dashboards, and working through user acceptance testing (UAT).
    Hold scrum meetings as a scrum master. Wrote various documents such as data mapping documents, work detail descriptions for KT sessions, and future development.

KHDHC Inc.

SQL ETL Developer

2013 Sep - 2015 Mar

    Created logical and physical data models based on the requirements utilizing Erwin and NaviCat.
    Improved the performance of the official website by optimizing slowly running queries, and utilized indexes, partitions, and deadlock monitor via SQL Profiler and DTA.
    Normalized database tables to avoid redundancy and DML anomalies.
    Developed complex SQL scripts and procedures for data profiling and auditing purposes. Combined C# logic in SSIS packages.
    Utilized Microsoft SSIS/SSAS/SSRS to handle ETL logic, and create ad-hoc reports with both tabular model and cube model.
    Developed various Dashboards and Stories using advanced Tableau features including calculated fields, parameters, table calculations, row-level security, R integration, and dashboard actions while dealing with Cloud data in SSRS.

Other Projects

Kaggle Data Science projects

2017-2018

    Performed data preprocessing and feature engineering, including correlation analysis, Box-Cox transformation, encoding, etc.
    Built and hyper-tuned several lasso regression models and Xgboost models, with cross-validation.
    Built an average model of all model predictions.

Graduation Project: Diamond price prediction.

2017-2018

    Collected diamond data (570K+ records) with JS crawler PhantomJS from BlueNiles. Followed with Data Mining steps(data label, data cleansing, One-Hot Encoder,and PCA, etc).Created predictive analysis model that predicts with up to 93% accuracy the likelihood of correction of predicting price with 10-fold cross-validation.
    Implemented multiple ML models (linear regression, decision tree, and random forest) in OpenR and Intel MKL library.
    Developed an Interactive GUI platform(mobile friendly) for presenting ML models with Bootstrap framework and Rshiny server.

Education

Oregon State University - Corvallis, Oregon

Master in Computer Science

Certificates

Microsoft Certified:Azure Database Administrator Associate

Microsoft Certified:Data Analyst Associate