Gemini Nano Banana Cheat Sheet for JavaScript Developers

Escrito por

0 minutos de lectura

All the Geminis, all the bananas. Generative images have entered a new era with Google’s new Nano Banana model. The rise of sophisticated Large Language Models (LLMs) has opened new frontiers in application development, particularly in the realm of multimodal AI.

The ability to process and generate content across different data types, like text and images. At the forefront of this innovation is the Gemini Nano Banana model (officially referred to as gemini-2.5-flash-image), a powerful tool optimized for rapid, high-quality image manipulation, generation, and processing tasks. For JavaScript and TypeScript developers, harnessing this power requires a set of targeted best practices to ensure efficiency, maintainability, and security. Yes, security too!

This article is designed to provide you with thoughtful, quick, and easy techniques for integrating Gemini Nano Banana into your modern web applications. We leverage established tools, such as the AI SDK, to provide a type-safe, abstracted layer over the underlying API complexity, allowing you to focus purely on the application logic. This abstraction is key to superior code quality and future-proofing your AI features.

The core utility of Gemini Nano Banana in a JavaScript context often revolves around dynamic image generation and editing, such as creating photorealistic composites or stylized images from user-uploaded content. To successfully achieve this, we cover crucial operational practices: from efficiently fetching external images from the web directly into a model-compatible Uint8Array format, to the essential step of converting the model's raw binary output into a Data URL, for instance, client-side rendering.

Furthermore, we explore advanced prompt engineering strategies, specifically the technique of interleaving descriptive text and images within the prompt array. This method provides the granular context necessary for the model to execute complex compositional tasks, transforming vague requests into predictable, high-quality results.

Finally, recognizing the role of AI in developer workflows, we emphasize the importance of grounding agentic coding platforms in relevant documentation and integrating security checks through tools like Snyk Studio to ensure that coding speed doesn't compromise application security.

Let’s get started with the cheat sheet and get you up and running with Nano Banana development practices that will enhance performance.

1. Use AI SDK

The AI SDK provides a powerful, type-safe, and highly flexible framework for JavaScript and TypeScript developers seeking to develop sophisticated AI applications.

While you can use the vendor-provided libraries from OpenAI, Google, and others, there’s value in having an abstraction over model APIs.

By providing a unified, declarative API surface, the SDK abstracts away the complexities of interacting directly with models like Gemini Nano Banana (referred to here as gemini-2.5-flash-image), enabling developers to focus on application logic rather than low-level API management.

This not only streamlines the development process but also ensures superior code quality and maintainability, particularly within a TypeScript environment, where strong typing significantly reduces runtime errors and enhances the developer experience.

Following is an example of using the AI SDK and Google’s Gemini 2.5 Flash Image model (known as Nano Banana):

import { generateText } from 'ai'
import { createGoogleGenerativeAI } from '@ai-sdk/google'

const yodaImagePath = join(process.cwd(), 'data', 'yoda.jpg')
const yodaImageBuffer = await readFile(yodaImagePath)
const yodaImageUint8Array = new Uint8Array(yodaImageBuffer)

// Use AI SDK with generateText for Gemini 2.5 Flash Image (Nano Banana)
// This model supports image editing/generation with input images
const googleProvider = createGoogleGenerativeAI({ apiKey: apiKey })

const result = await generateText({
      model: googleProvider('gemini-2.5-flash-image'),
      prompt: [
        {
          role: 'user',
          content: [
            {
              type: 'text',
              text: 'Generate a realistic selfie photo of the person in the first image together with Yoda from Star Wars (shown in the second image). The person joins Yoda in the scene where he is at and hugs him in selfie style while they are both smiling and looking at the camera. The image should look natural and photorealistic, as if taken with a smartphone camera.'
            },
     {
              type: 'image',
              image: userImageUint8Array,
              mediaType: imageFile.type || 'image/jpeg'
            },
            {
              type: 'image',
              image: yodaImageUint8Array,
              mediaType: 'image/jpeg'
            }
         ]
        }
      ]
    })

2. Extract Nano Banana Model results into data URL

After successfully generating an image using a multimodal model like Gemini Nano Banana (gemini-2.5-flash-image), the result is typically a raw image file, often as a Uint8Array buffer.

To display this image immediately on a client-side frontend application (such as a website built with React, Vue, or plain JavaScript), the raw binary data must be converted into a format that the web browser can natively interpret.

The Data URL format provides the most direct and efficient method for this, embedding the base64-encoded image data directly into the src attribute of an <img> tag, allowing the generated result to be instantly rendered for the end-user without requiring a separate server endpoint or saving the file to disk.

try {
   // Call the model with await generateText({})
   // ...

   // Extract the generated image from result.files
    const generatedImageFile = result.files?.find(file => file.mediaType?.startsWith('image/'))

    if (!generatedImageFile) {
      console.error('=== Error: No image file found ===')
      console.error('Available files:', result.files)
      console.error('Result text content:', result.text)
      throw new Error('No image was generated in the response')
    }

    // Convert Uint8Array to base64 data URL for the frontend
    const base64String = Buffer.from(generatedImageFile.uint8Array).toString('base64')
    const dataUrl = `data:${generatedImageFile.mediaType};base64,${base64String}`

    return {
      success: true,
      image: dataUrl
    }
  } catch (error: any) {
    console.error('Error generating selfie:', error)
    throw createError({
      statusCode: error.statusCode || 500,
      message: error.message || 'Failed to generate selfie'
    })
  }

3. Fetch image data from the web

Instead of loading the image file from the local file system, you can also fetch it on the fly from a remote URL and convert the response into a compatible Uint8Array format for the model API

A common scenario for modern JavaScript applications, especially those running in a browser environment, involves working with remote assets. Instead of being limited to loading the image file from the local filesystem, which might be restricted by security policies or simply not feasible for dynamic web content, you can also fetch the image on the fly from a remote URL. This capability enables greater flexibility, allowing the model to process images hosted on content delivery networks (CDNs) or external servers.

To make this remote data compatible with the on-device model API, which typically expects a specific data structure, such as a Uint8Array, you need to perform a conversion. The standard procedure involves using the fetch API to retrieve the resource, obtaining the raw response, and then converting that response into the required Uint8Array format.

    // Fetch Yoda image
    const yodaResponse = await fetch('https://example.com/image.jpg')
    if (!yodaResponse.ok) {
      throw new Error('Failed to fetch Yoda image')
    }
    const yodaArrayBuffer = await yodaResponse.arrayBuffer()
    const yodaImageUint8Array = new Uint8Array(yodaArrayBuffer)

This process ensures that the binary data of the image, whether a JPEG, PNG, or other supported format, is correctly represented as an array of 8-bit unsigned integers, which is the expected input format for the model's processing functions. This seamless integration of remote data fetching and format conversion is crucial for building robust and dynamic JavaScript applications powered by on-device intelligence.

4. Interleave descriptive text for detailed prompts

The model API and the generateText({})API don’t have room for annotating each image to describe what it is, and so if you’re specifying several files, then their order is implicit and could confuse the model, especially if the prompt is ambiguous.

To overcome this and achieve better prompt performance, you can construct the prompt as if it were a sequence of chat messages, where you can interleave text descriptions before and after the images, providing the model with better context to interpret your meaning.

Consider the following method to interleave text as part of the model prompt:

const result = await generateText({
      model: googleProvider('gemini-2.5-flash-image'),
      prompt: [
        {
          role: 'user',
          content: [
            {
              type: 'text',
              text: 'COMPOSITION INSTRUCTIONS: Generate a single, hyper-realistic, high-detail selfie photograph. The foundational scene and background must be derived exclusively from the Yoda background image (shown below); this setting serves as the environment for the final image.'
            },
            {
              type: 'text',
              text: '[Yoda Background Image]'
            },
            {
              type: 'image',
              image: yodaImageUint8Array,
              mediaType: 'image/jpeg'
            },
            {
              type: 'text',
              text: 'Integrate the person from the user selfie image (shown below) into the Yoda background, placing them standing close to Yoda. The final image should have a selfie-style perspective (close-up, intimate framing) as if taken from a first-person viewpoint, but DO NOT include any visible hands, smartphones, or camera equipment in the frame. The image should appear as if the camera is invisible, capturing a natural moment without showing the device or hand holding it.'
            },
            {
              type: 'text',
              text: '[User Selfie Image]'
            },
            {
              type: 'image',
              image: userImageUint8Array,
              mediaType: imageFile.type || 'image/jpeg'
            },
            {
              type: 'text',
              text: 'Both individuals must be smiling, looking directly into the lens, and their precise facial likenesses, clothing, and details from tyouheir respective input images must be accurately and seamlessly preserved. Maintain consistent, warm, soft, natural daylight and ensure the overall vibe is joyful and spontaneous. The final image must appear as a spontaneous, single-shot photograph taken in the original environment of the Yoda background image. IMPORTANT: Do not include any visible hands, smartphones, phones, or camera equipment in the final image - only show the people and background.'
            }
          ]
        }
      ]
    })

This technique of interleaving text and media within the prompt's content array is a powerful pattern for gaining granular control over multimodal models, such as Gemini Nano Banana.

By treating the prompt as a structured conversation or narrative, developers move beyond simple text-image inputs and unlock complex tasks, such as precise in-painting, compositional editing, and grounding the generation in specific visual elements. This ultimately leads to more predictable and higher-quality results from the image generation API.

5. Leverage model rules for AI-driven development

If you’re working with Gemini CLI, Claude Code, Cursor, or other agentic coding platforms, you likely want to provide them with context and grounding for the API documentation so it can perform better and avoid hallucinating non-existent method calls. Providing the agent with up-to-date documentation also helps ensure that the LLM coding assistance is using the newer API versions that you choose.

In this example, we’ll work with Cursor rules and use the following format for the file name: .cursor/rules/ai-sdk-google-provider-docs.mdc. Then create the following contents for it:

---
alwaysApply: false
description: AI SDK API documentation for the Google Gemini LLM provider
globs: server/api/generate-selfie.post.ts
---

# Google Generative AI Provider

The [Google Generative AI](https://ai.google.dev) provider contains language and embedding model support for
the [Google Generative AI](https://ai.google.dev/api/rest) APIs.

// ...
// REST OF THE CONTENT
// ...

Simply grab the contents to paste from the official documentation, such as from the AI SDK docs https://v6.ai-sdk.dev/providers/ai-sdk-providers/google-generative-ai#gemini-25-models or any other API documentation to help provide the model with up-to-date context.

6. Vibe code with security using Snyk Studio

Unfortunately, many developers who vibe code and leverage agentic coding for AI code generation find out too late that they have traded off coding speed with secure coding practices and other application security responsibilities.

Real-world examples include leaked secrets and SSRF vulnerabilities, which can be prevented if you equip the coding agent with a security brain.

Snyk Studio seamlessly integrates into the agentic coding loop by leveraging the coding agent with application security checks that are seamlessly integrated into the native workflow of agent execution via MCP servers (Model Context Protocol), agent hooks, agent rules, and other mechanisms.

Snyk Studio is free to use and features a 1-click installation for Cursor, GitHub Copilot, and Windsurf, as well as integration with many other coding agents, including the Cline marketplace and Factory AI, among others.

Snyk Studio provides AI vibe coding with a Vibe Check - a security brain to test for security in AI assisted code generation on Cursor, Claude Code and others

Want to learn more about vibe coding? Check out our on-deman workshop, "Securing Vibe Coding:
Addressing the Security Challenges of AI-Generated Code."

ON-DEMAND WORKSHOP

Securing Vibe Coding: Addressing the Security Challenges of AI-Generated Code

Snyk Staff Developer Advocate Sonya Moisset breaks down the security implications of Vibe Coding and shares actionable strategies to secure AI-generated code at scale.

Watch on-demand

La plataforma de seguridad para desarrolladores

¿Quieres probarlo?