画像中のBase64エンコーディングを検出する方法

画像中にBase64エンコーディングされたデータが含まれているかどうかを検出する方法について、いくつかの方法を紹介します。以下に例としてPythonコードを示します。

方法1: ライブラリを使用する方法

PythonのPIL（Python Imaging Library）ライブラリを使用して、画像内のテキストを抽出し、Base64エンコーディングされているかどうかを確認することができます。

from PIL import Image
import pytesseract
import base64
def detect_base64_encoding(image_path):
    # 画像を開く
    image = Image.open(image_path)
    # 画像からテキストを抽出
    text = pytesseract.image_to_string(image)
    # テキストがBase64エンコーディングされているかどうかを確認
    try:
        base64.b64decode(text)
        print("Base64エンコーディングが検出されました。")
    except base64.binascii.Error:
        print("Base64エンコーディングは検出されませんでした。")
# 画像のパスを指定して実行
image_path = "sample_image.png"
detect_base64_encoding(image_path)

方法2: パターンマッチングを使用する方法

正規表現を使用して、画像内のテキストを検索し、Base64エンコーディングされたデータのパターンに一致するかどうかを確認することもできます。

import re
def detect_base64_encoding(image_path):
    # 画像を開く
    image = Image.open(image_path)
    # 画像からテキストを抽出
    text = pytesseract.image_to_string(image)
    # Base64エンコーディングのパターンに一致するかどうかを確認
    pattern = r"(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?"
    match = re.search(pattern, text)
    if match:
        print("Base64エンコーディングが検出されました。")
    else:
        print("Base64エンコーディングは検出されませんでした。")
# 画像のパスを指定して実行
image_path = "sample_image.png"
detect_base64_encoding(image_path)