🔧 Deploy Hugging Face Models to AWS Lambda in 3 steps
Nachrichtenbereich: 🔧 Programmierung
🔗 Quelle: dev.to
Ever wanted to deploy a Hugging Face model to AWS Lambda but got stuck with container builds, cold starts, and model caching? Here's how to do it in under 5 minutes using Scaffoldly.
TL;DR
-
Create an EFS filesystem named
.cache
in AWS:- Go to AWS EFS Console
- Click "Create File System"
- Name it
.cache
- Select any VPC (Scaffoldly will take care of the rest!)
-
Create your app from the
python-huggingface
branch:
npx scaffoldly create app --template python-huggingface
-
Deploy it:
cd my-app && npx scaffoldly deploy
That's it! You'll get a Hugging Face model running on Lambda (using openai-community/gpt2 as an example), complete with proper caching and container deployment.
Pro-Tip: For the EFS setup, you can customize it down to a Single AZ in Burstable mode for even more cost savings. Scaffoldly will match the Lambda Function to the EFS's VPC, Subnets, and Security Group.
✨ Check out the Live Demo and the example code!
The Problem
Deploying ML models to AWS Lambda traditionally involves:
- Building and managing Docker containers
- Figuring out model caching and storage
- Dealing with Lambda's size limits
- Managing cold starts
- Setting up API endpoints
It's a lot of infrastructure work when you just want to serve a model!
The Solution
Scaffoldly handles all this complexity with a simple configuration file. Here's a complete application that serves a Hugging Face model (using openai-community/gpt2 as an example):
# app.py
from flask import Flask
from transformers import pipeline
app = Flask(__name__)
generator = pipeline('text-generation', model='openai-community/gpt2')
@app.route("/")
def hello_world():
output = generator("Hello, world,")
return output[0]['generated_text']
// requirements.txt
Flask ~= 3.0
gunicorn ~= 23.0
torch ~= 2.5
numpy ~= 2.1
transformers ~= 4.46
huggingface_hub[cli] ~= 0.26
// scaffoldly.json
{
"name": "python-huggingface",
"runtime": "python:3.12",
"handler": "localhost:8000",
"files": ["app.py"],
"packages": ["pip:requirements.txt"],
"resources": ["arn::elasticfilesystem:::file-system/.cache"],
"schedules": {
"@immediately": "huggingface-cli download openai-community/gpt2"
},
"scripts": {
"start": "gunicorn app:app"
},
"memorySize": 1024
}
How It Works
Scaffoldly does some clever things behind the scenes:
-
Smart Container Building:
- Automatically creates a Docker container optimized for Lambda
- Handles all Python dependencies including PyTorch
- Pushes to ECR without you writing any Docker commands
-
Efficient Model Handling:
- Uses Amazon EFS to cache the model files
- Pre-downloads models after deployment for faster cold starts
- Mounts the cache automatically in Lambda
-
Lambda-Ready Setup:
- Rus up a proper WSGI server (gunicorn)
- Creates a public Lambda Function URL
- Proxies Function URL requests to gunicorn
- Manages IAM roles and permissions
What deploy
looks like
Here's output from a npx scaffoldly deploy
command I ran on this example:
Real World Performance & Costs
✅ Costs: ~$0.20/day for AWS Lambda, ECR, and EFS
✅ Cold Start: ~20s for first request (model loading)
✅ Warm Requests: 5-20s (CPU-based inference)
While this setup uses CPU inference (which is slower than GPU), it's an incredibly cost-effective way to experiment with ML models or serve low-traffic endpoints.
Customizing for Other Models
Want to use a different model? Just update two files:
- Change the model in
app.py
:
generator = pipeline('text-generation', model='your-model-here')
- Update the download in
scaffoldly.json
:
"schedules": {
"@immediately": "huggingface-cli download your-model-here"
}
Using Private or Gated Models
Scaffoldly supports private and gated models via the HF_TOKEN
environment variable. You can add your Hugging Face token in several ways:
-
Local Development: Add to your shell profile (
.bashrc
,.zprofile
, etc.):
export HF_TOKEN="hf_rH...A"
- CI/CD: Add as a GitHub Actions Secret:
# In your repository settings -> Secrets and Variables -> Actions
HF_TOKEN: hf_rH...A
The token will be automatically used for both downloading and accessing your private or gated models.
CI/CD Bonus
Scaffoldly even generates a GitHub Action for automated deployments:
name: Scaffoldly Deploy
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: scaffoldly/scaffoldly@v1
with:
secrets: ${{ toJSON(secrets) }}
Try It Yourself
The complete example is available on GitHub:
scaffoldly/scaffoldly-examples#python-huggingface
And you can create your own copy of this example by running:
npx scaffoldly create app --template python-huggingface
You can see it running live (though responses might be slow due to CPU inference):
Live Demo
What's Next?
- Try deploying different Hugging Face models
- Join the Scaffoldly Community on Discord
- Check out other examples
- Star our repos if you found this useful!
- The
scaffoldly
toolchain - The Scaffoldly Examples repository
- The
Licenses
Scaffoldly is Open Source, and welcome contributions from the community.
- The examples are licensed with the Apache-2.0 license.
- The
scaffoldly
toolchain is licensed with the FSL-1.1-Apache-2.0 license.
What other models do you want to run in AWS Lambda? Let me know in the comments!
...
🎥 Deploy HUGS on GKE with Hugging Face
📈 33.13 Punkte
🎥 Video | Youtube
🎥 Deploy HUGS on GKE with Hugging Face
📈 33.13 Punkte
🎥 Video | Youtube
🔧 How to Use Hugging Face AI Models as an API
📈 30.73 Punkte
🔧 Programmierung