Zhenming Yang

Data engineer, Software Engineer

Profile

Data/Software Engineer with a demonstrated experience on large-scale (Big) data solutions including databases (SQL/No-SQL) and ETL / Data pipelines using cloud services. Solid understanding of fundamental Computer Science concepts. Familiar with requirement gathering and SDLC. Hands-on programming experience in multi-language such as Python, Java, SQL, PowerShell. Excellent problem-solving skills and software engineering habits, including peer code reviews, unit testing, etc.

Technical Forte

  • OLTP/OLAP
  • Azure ADF ADL
  • AWS S3/Glue/Lambda/SNS
  • ETL/ELT/Data Modeling
  • PySpark/boto3
  • Powershell/R
  • Java/Python
  • PowerBI/Tableau
  • DevOps/GIT

Work Experience

Capital Group Investment Group

Data Engineer

2021-Now

    Generated Glue jobs by analyzing data from different sources. Such as third-party API, JSON or ZIP files in S3 bucket. Stream in Kinesis firehose and Dynamo DB.
    Utilized Glue jobs and Lambda functions to handle data processing logics with different kinds of triggers such as SQS and events.
    Developed and deployed Glue jobs with AWS CDK which extract CloudTrail log into parquet files and dump them into Athena DB automatically.
    Troubleshoot the internal data platform issues with the DaaS team. Some improvements as adding datatypes for completeness constraints and changing the logic of ingestion datasets to support multi datasets under the same zip files in the same S3 bucket.

Microsoft Office IP AntiSpam

Software Developer

2019-2021

    Migrated Cosmos uploader/downloader with ScopeSDK, TSQL, and PowerShell from old JScript logic. Equipped with alert and pipeline monitor to enhance the durability.
    Troubleshoot issues, errors in ETL services. Generated probes, monitors, alerts for different ETL components and services with C# and PowerShell.
    Optimized data pipeline by rewriting new logic and SQL schema, gained both performance and service durability. Equipped with alerts and visuals to control the status of the service.
    Built and maintained a big data platform(ADF/ ADLS) in Cosmos Scope as per industry standards and best practices.
    Performed DB admin activities on the multiple DB Servers. Optimized and tuned slow-running queries up to 20% by troubleshooting issues. Improved the efficiency of the heavy-lift jobs/queries.
    Rescued customers' sensitive data from old DB backup files online. Praised by the manager.
    Collaborated closely with cross-functional teams, created efficient data models / structures to support the organization’s data and reporting needs on Avocado, PowerBI, and Lens Explorer.
    Engaged with stakeholders to understand requirements, designed and automated the pipeline of reports/dashboards with pull/push triggers. Integrated processing steps and waiting time.

Conagra Brands Inc. Cost to Serve

Data Engineer

2018-2019

    Designed and implemented the whole ETL (Extract, Transform, Load) data process with various transactions within SSIS packages to implement business rules and logic.
    Generated PowerBI reports, dashboards with Star schema. Implemented complex requirements in PowerBI by using DAX and M language.
    Reduced refresh waiting time by using PowerBI API in PowerShell scripts, deployed, automated the scripts on CtrlM Servers.
    Accumulated agile Product Management experience - a full gamut of creating stories, generated PowerBI reports and dashboards, working through user acceptance testing (UAT).
    Hold scrum meetings as a scrum master. Wrote various documents such as data mapping document, work detail descriptions for KT sessions, and future development.

KHDHC Inc.

SQL ETL Developer

2013-2015

    Created logical and physical data models based on the requirements utilizing Erwin and NaviCat.
    Improved the performance of the official website by optimizing slow-running queries, utilized indexes, partitions, and deadlock monitor via SQL Profiler and DTA.
    Normalized database tables to avoid redundancy and DML anomalies.
    Developed complex SQL scripts and procedures for data profiling and auditing purposes. Combined C# logic in SSIS packages.
    Utilized Microsoft SSIS/SSAS/SSRS to handle ETL logic, create ad-hoc reports with both tabular model and cube model.
    Developed various Dashboards and Stories using advanced Tableau features including calculated fields, parameters, table calculations, row-level security, R integration,and dashboard actions while dealing with Cloud data in SSRS.

Other Projects

Kaggle Data Science projects

2017-2018

    Performed data preprocessing and feature engineering, including correlation analysis, Box-Cox transformation, encoding, etc.
    Built and hyper-tuned several lasso regression models and Xgboost models, with cross-validation.
    Built an average model of all model predictions.

Graduation Project: Diamond price prediction.

2017-2018

    Collected diamond data (570K+ records) with JS crawler PhantomJS from BlueNiles. Followed with Data Mining steps(data label, data cleansing, One-Hot Encoder,and PCA, etc).Created predictive analysis model that predicts with up to 93% accuracy the likelihood of correction of predicting price with 10-fold cross-validation.
    Implemented multiple ML models (linear regression, decision tree, and random forest) in OpenR and Intel MKL library.
    Developed an Interactive GUI platform(mobile friendly) for presenting ML models with Bootstrap framework and Rshiny server.

Education

Oregon State University - Corvallis, Oregon

Master in Computer Science

Certificates

Microsoft Certified:Azure Database Administrator Associate

Microsoft Certified:Data Analyst Associate