Dec 12, 2025

Multimodal Image Analysis

Alex Rivera

Multimodal

Introduction

Multimodal models can process both text and images, enabling powerful visual analysis capabilities. This cookbook demonstrates how to analyze images using the Hyperfold API.

Image Inputs

You can provide images to the model using URLs or base64-encoded data. Here's a basic example:

import Hyperfold from "hyperfold";
 
const client = new Hyperfold();
 
const response = await client.responses.create({
  model: "hf-2.0",
  input: [
    { type: "text", text: "What objects are in this image?" },
    {
      type: "image_url",
      image_url: { url: "https://example.com/photo.jpg" }
    }
  ],
});
 
console.log(response.output_text);

Analysis Techniques

The model can perform various types of image analysis:

Object detection and identification
Text extraction (OCR)
Scene description
Visual question answering
Image comparison

Use Cases

Common applications include document processing, product image analysis, content moderation, and accessibility features like image descriptions.