AI Audio & Speech Engineer (Whisper Specialist)

Job updated 3 months ago
The employer was active 2 days ago

Job Description

About the Role:
We are seeking an experienced Machine Learning Engineer to architect and own our end-to-end speech processing pipeline. In this role, you will bridge the gap between signal processing and generative AI, managing the flow from raw audio input to highly accurate transcription. You will be the primary owner of our ASR stack, with a specific mandate to optimize OpenAI’s Whisper for a high-throughput, low-latency cloud production environment.

Key Responsibilities:

1. Production ASR Architecture: Design and build scalable serving infrastructure using tools like Triton Inference Server, TorchServe, or vLLM to handle concurrent audio streams with minimal latency.

2. Whisper Optimization: Push the limits of Whisper models in production. Implement distillation, INT8 quantization, and framework acceleration (e.g., Faster-Whisper, CTranslate2) to drastically reduce inference time without sacrificing accuracy.

3. Advanced Noise Cancellation: Develop and integrate deep learning-based noise suppression models to pre-process audio. Your goal is to ensure high fidelity and low WER (Word Error Rate) even in complex, noisy acoustic environments.

4. Model Fine-Tuning: Customize open-source ASR models (Whisper, Wav2Vec2, HuBERT) on our proprietary datasets to master domain-specific vocabulary, accents, and acoustics.

5. Tech Stack Evolution: Actively evaluate and integrate the latest open-source audio models from the Hugging Face ecosystem to keep our stack on the cutting edge.



Requirements

Required Qualifications:

1. Deep expertise with Whisper: Proven experience deploying and optimizing Whisper and its variants (Distil-Whisper, Faster-Whisper).

2. Core ML Stack: Strong proficiency in Python and PyTorch.

3. Audio Ecosystem: Extensive experience with Hugging Face Transformers and open-source audio toolkits.

4. Signal Processing: Solid background in Speech Enhancement or Noise Suppression techniques to improve downstream ASR performance.

5. Inference Optimization: Knowledge of ONNX Runtime, TensorRT, or CTranslate2 for accelerating model inference.

Bonus Points:

1. Experience with Speaker Diarization (e.g., PyAnnote).

2. Experience serving models on AWS (SageMaker/EC2) or GCP.

3. Active contributions to open-source audio/AI projects.

1
3 years of experience required
Negotiable
Personal Invitation Link
This is your personal referral link for job invitation. You'll receive an email notification when someone applied for the position via your job link.
Share this job
Aiello Inc. 犀動智能
Artificial Intelligence / Machine Learning
51 ~ 200 people

About us

Company Snapshot: Welcome to Aiello Inc.! 🚀

At Aiello Inc., we're not just another tech startup – we're a dynamic force in the world of technology, pioneering the future of conversational AI voice services. Powered by cutting-edge machine learning (ML) models designed for natural language understanding (NLU), our offerings span from Omni-channel customer communication integration to adaptable and continuously optimized commercial NLU modules. We convert unstructured enterprise data into structured brilliance, assist in building business data analysis backends, create industry knowledge graphs, and conduct proprietary ASR optimization training.

Imagine a world where businesses seamlessly communicate, adapt, and optimize using our state-of-the-art Conversational AI modules.

Impact on the Industry:

Conversational AI isn't just a buzzword; it's a revolution transforming finance, retail, customer service, travel, and healthcare.

Over the past three years, the market has witnessed explosive growth, reaching $5.1 billion. Aiello Inc. stands at the forefront, with projections anticipating the market to skyrocket to $46.29 billion by 2028 (Data Bridge). Investors are nodding in approval, and Headline Asia recognizes us as a Top 5 Startup in the #Sound Industry (Japan and Taiwan edition).

Meet the A-Team:

Our dream team comprises global rockstars from Google, Qualcomm, MediaTek, and more. Originating in Taipei, we've spread our wings to Tokyo, Bangkok, and beyond. we're on a mission to turbocharge industries, break into the Asian scene, offer multilingual solutions, and conquer markets in Japan and Southeast Asia. Ready to join the international AI party with us?

The Aiello Essence: Where Vibe Meets Mindset

"Work hard, play hard" isn't just a motto; it's our way of life. Our team of tech enthusiasts values sharing AI tech knowledge, embracing change, and savoring those 'aha' moments. Aiello Inc. is on the lookout for passionate individuals ready to embark on an entrepreneurial journey and create the next big thing.

Guided by Aiello Principles - Our Collective Mindset:

  • Positive and Pursuing Excellence
  • Focus on Important and Right Things
  • Courage to Try and Embrace Change
  • Willingness to Share
  • Work Hard, Play Hard

Discover our super DNA – Be an Aielloer:

  • Business Focus: Aim for business bullseyes.
  • Problem-Solving and Decision Making: Troubleshoot like a boss.
  • Teamwork and Collaboration: Working seamlessly within a team.
  • Managing Through Change and Uncertainty: Thrive in chaos as change champions.
  • Dynamic Learning Mindset: Become a Learning Ninja, mastering the art of continuous learning.


Our Stellar Approach – The Aiello Way:

  • Worldwide Wizardry: Think global, act international.
  • Quality Crusaders: Set the gold standard.
  • Open-Door Magic: Our secret sauce? Openness!
  • Work + Fun Equation: Where challenge meets fun.
  • Innovation Incubator: A breeding ground for game-changers.