AI Engineer – Data Pipeline Specialist

Published in
·
read
·
June 17, 2025

CLASSIFICATION
Volunteer

LOCATION
Remote – Must be willing to collaborate across time zones as needed

REPORTS TO
CTO

About the Organization

AI for Vietnam (AIV) is a 501(c)(3) nonprofit dedicated to accelerating Vietnam’s AI transformation. Our mission is to integrate Vietnamese language and culture into global AI innovation, ensuring that AI solutions are inclusive, accessible, and beneficial to all. Through open collaboration, cutting-edge research, and a strong global network, AIV is building an ecosystem where local developers, researchers, and businesses can thrive in the AI era.

Position Summary

We’re looking for an AI Engineer specializing in data pipelines to join our team at AI for Vietnam. In this role, you’ll design and implement robust data infrastructure supporting our AI pretraining workflows, with a focus on metadata extraction, data cleaning, and filtering systems for our data portal. Your work will directly contribute to building AI systems that address unique challenges in the Vietnamese context.

Primary Responsibilities

  • Design and build scalable data pipelines for processing, cleaning, and filtering large datasets
  • Develop systems to automatically extract and organize metadata from diverse data sources
  • Implement quality control measures to ensure data integrity throughout the pipeline
  • Create efficient filtering mechanisms to identify and remove low-quality or problematic content
  • Optimize pipeline performance to handle increasingly large data volumes

Qualifications and Experiences

Education: Bachelor’s degree in Computer Science, Data Science, or related technical field

Experience:

  • 3+ years of experience building data pipelines for machine learning applications
  • Strong programming skills in Python and experience with data processing frameworks (e.g., Apache Spark, Beam, Airflow)
  • Experience with database technologies and data storage solutions
  • Knowledge of best practices for data cleaning, normalization, and preprocessing
  • Understanding of metadata extraction techniques and schema design 
  • Experience working with large language model training data
  • Knowledge of content filtering approaches for AI safety
  • Familiarity with distributed computing and parallel processing
  • Background in natural language processing or computer vision
  • Experience with Vietnamese language data processing

Skills

Technical skills: 

  • Machine learning
  • Big data frameworks: Apache Hadoop, Apache Spark
  • Data modeling: Creating efficient data models for storage and analysis
  • Data visualization: Presenting data insights through charts and graphs
  • Data analysis: Performing exploratory data analysis to identify patterns and trends
  • Software development practices: Agile methodologies, version control systems 

Others skills

  • Communication skills: Effectively communicating technical concepts to stakeholders
  • Teamwork: Collaborating with data scientists, analysts, and other engineers
  • Problem-solving: Identifying and resolving data quality issues
  • Analytical skills: Interpreting data and drawing meaningful conclusions
  • Architecture design: Designing scalable and robust data systems

Work Environment

  • Flexibility: This role may involve coordinating with international team members and volunteers across multiple time zones.
  • Remote / Hybrid Setup: Must be comfortable using digital collaboration tools (Slack, Zoom, etc.).

Why joining AI for Vietnam?

  • Opportunity to work on cutting-edge AI systems with national impact
  • Collaborative team environment with experienced AI researchers
  • Professional development and learning opportunities
  • Regular collaboration opportunities with big tech companies and global AI experts to sharpen your skills
  • Chance to contribute to Vietnam’s growth as an AI innovation hub
  • Join network of AI Alliance

Google reCaptcha: Invalid site key.