An easy way to clean up after experimentation: a Python Amazon S3 Object & Bucket cleanup script

Share on:

Automate S3 Object & Bucket Deletion Progamatically with Python and Boto3

Introduction

Managing AWS S3 buckets can sometimes involve deleting multiple buckets that match a specific pattern. To streamline this process, we can use a Python script leveraging the Boto3 library. This script not only identifies and lists the buckets that match a given pattern but also offers an option to delete them, including all their contents and versions.

In this blog post, we’ll walk through the requirements, usage, and functionality of our S3 bucket deletion script.

Requirements

To use this script, you need the following:

  1. Python 3.6+: Make sure Python is installed on your system.

  2. Boto3: The AWS SDK for Python. Install it using pip:

    pip install boto3

  3. AWS Credentials: Ensure you have configured your AWS credentials. This can be done using the AWS CLI (aws configure) or by setting environment variables (AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY).

Code Overview

The script accepts a bucket name pattern and a dry-run option as command-line arguments. It lists and optionally deletes all S3 buckets matching the provided pattern.

Here’s the script:

import boto3
import re
import argparse

Initialize the S3 client

s3 = boto3.client('s3')

def list_buckets():
    """List all S3 buckets"""
    response = s3.list_buckets()
    return [bucket['Name'] for bucket in response['Buckets']]

def empty_bucket(bucket_name):
    """Delete all objects and versions in an S3 bucket"""
    s3_resource = boto3.resource('s3')
    bucket = s3_resource.Bucket(bucket_name)

    # Delete all object versions
    object_versions = bucket.object_versions.all()
    for version in object_versions:
        version.delete()

    print(f"Emptied bucket: {bucket_name}")

def delete_bucket(bucket_name):
    """Delete an S3 bucket after emptying it"""
    empty_bucket(bucket_name)
    
    # Now delete the bucket itself
    s3.delete_bucket(Bucket=bucket_name)
    print(f"Deleted bucket: {bucket_name}")

def main(bucket_pattern, dry_run):
    # Pattern to match bucket names
    pattern = re.compile(bucket_pattern)

    # List all buckets
    buckets = list_buckets()

    # Filter buckets that match the pattern
    buckets_to_delete = [bucket for bucket in buckets if pattern.match(bucket)]

    if dry_run:
        print("Dry-run mode: The following buckets would be emptied and deleted:")
        for bucket in buckets_to_delete:
            print(bucket)
    else:
        for bucket in buckets_to_delete:
            print(f"Emptying and deleting bucket: {bucket}")
            delete_bucket(bucket)

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Delete S3 buckets matching a pattern.")
    parser.add_argument('--bucket-pattern', required=True, help="Pattern to match bucket names (e.g., '^sagemaker-')")
    parser.add_argument('--dry-run', action='store_true', help="Run in dry-run mode to list buckets without deleting them")
    
    args = parser.parse_args()
    main(bucket_pattern=args.bucket_pattern, dry_run=args.dry_run) 

How to Use the Clean-Up Script

  1. Dry-Run Mode: To list the buckets that match the pattern without deleting them, use the --dry-run flag.

    python script.py --bucket-pattern '^sagemaker-' --dry-run

    This command will output the names of all buckets starting with “sagemaker-” without deleting them.

  2. Execution Mode: To actually delete the matched buckets, omit the --dry-run flag.

    python script.py --bucket-pattern '^sagemaker-'

    This command will empty and delete all buckets starting with “sagemaker-”, for example.

Conclusion

Automating the deletion of S3 buckets based on a name pattern can save time and reduce the risk of manual errors. By following this guide, you can easily set up and use a Python script to manage your S3 buckets more efficiently.

Feel free to customize the script according to your needs, and make sure to handle it with care, especially when running it in execution mode. Always test with the --dry-run option first to avoid unintended deletions.

Hope it helps! :)