How to archive images from MongoDB to Digital Ocean Spaces
Our case: we have db with bunch of images saved in other places and ideally they all have to be moved to one bucket on DigitalOcean and update document in the db when it's saved there:
To create a Python script that connects to MongoDB, pulls data from the 'midjourney' database and 'showcase' collection, and uploads the image files to a DigitalOcean Spaces bucket, you can use the following libraries:
pymongo for connecting to MongoDB
requests for downloading image files
boto3 for uploading image files to DigitalOcean Spaces
First, you need to install the required packages:
pip install pymongo requests boto3
import os
import requests
from pymongo import MongoClient
import boto3
# MongoDB setup
mongo_uri = "your_mongodb_connection_uri"
client = MongoClient(mongo_uri)
db = client["dbname"]
collection = db["collection"]
# DigitalOcean Spaces setup
do_space_name = "your_space_name"
do_space_endpoint = "your_space_endpoint"
access_key = "your_do_spaces_access_key"
secret_key = "your_do_spaces_secret_key"
session = boto3.session.Session()
client_do = session.client("s3",
region_name="nyc3",
endpoint_url=do_space_endpoint,
aws_access_key_id=access_key,
aws_secret_access_key=secret_key)
# Function to download and upload image
def process_image(image_url, image_key):
response = requests.get(image_url, stream=True)
response.raise_for_status()
client_do.upload_fileobj(response.raw, do_space_name, image_key)
print(f"Uploaded image: {image_key}")
# Main logic
documents = collection.find({"image_uploaded": {"$ne": True}})
for document in documents:
image_url = document["image_path"][0]
image_key = os.path.basename(image_url)
try:
process_image(image_url, image_key)
collection.update_one({"_id": document["_id"]}, {"$set": {"image_uploaded": True}})
print(f"Updated document: {document['_id']}")
except Exception as e:
print(f"Error processing image {image_url}: {e}")
Make sure to replace the placeholders with your MongoDB connection URI, DigitalOcean Spaces access key, secret key, space name, and space endpoint. This script will download the images from the URLs specified in the image_path[0]
property and upload them to the specified DigitalOcean Spaces bucket. After uploading the image, it will update the document to mark the image as uploaded.
Works like charm ...
Don't thank me ;)