Constructing a Product Discovery API with Gemini Imaginative and prescient Professional


Introduction

The fast enlargement of the Generative AI mannequin capabilities allows us to develop many companies round GenAI. Immediately’s mannequin not solely generates textual content information but additionally, with the highly effective multi-modal mannequin like GPT-4, Gemini can leverage picture information to generate info. This functionality has big potential within the enterprise world akin to you should utilize any picture to get details about it straight from the AI with any overhead. On this article, We are going to undergo the processes of utilizing the Gemini Imaginative and prescient Professional multimodal mannequin to get product info from a picture after which making a FastAPI-based REST API to devour the extracted info. So, let’s begin studying by constructing a product discovery API.

Studying Goal

  • What’s REST structure?
  • Utilizing REST APIs to entry internet information
  • Methods to use FastAPI and Pydantic for creating REST API
  • What steps to take to construct APIs utilizing Google Gemini Imaginative and prescient Professional
  • Methods to use the Llamaindex library to entry Google Gemini Fashions

This text was printed as part of the Information Science Blogathon.

What’s a REST API?

A REST API or RESTful API is an software programming interface (API) that makes use of the design rules of the Representational State Switch structure. It helps builders to combine software elements in a microservices structure.

An API is a technique to allow an software or service to entry assets inside one other service or software.

Let’s take a Restaurant analogy to grasp the ideas.

You’re a restaurant proprietor, so you have got two companies operating when the restaurant is operating.

  • One is the kitchen, the place the scrumptious meals will probably be made.
  • Two, the sitting or desk space the place individuals will sit and eat meals.
Product Discovery API

Right here, the kitchen is the SERVER the place scrumptious meals or information will probably be produced for the individuals or shoppers. Now, individuals (shoppers) will verify the menu(API)and might place for order(request) to the kitchen (server) utilizing particular codes (HTTP strategies) like “GET”, “POST”, “PUT”, or “DELETE”.

Perceive the HTTP methodology utilizing the restaurant analogy

  • GET: It means the consumer browses the menu earlier than ordering meals.
  • POST: Now, shoppers are inserting an order, which implies the kitchen will begin making the meals or creating information on the server for the shoppers.
  • PUT: Now, to grasp the “PUT” methodology, think about that after inserting your order, you forgot so as to add ice cream, so that you simply replace the prevailing order, which implies updating the information.
  • DELETE: If you wish to cancel the order, delete the information utilizing the “DELETE” methodology.

These are essentially the most used strategies for constructing API utilizing the REST framework.

What’s the FastAPI framework?

FastAPI is a high-performance trendy internet framework for API improvement. It’s constructed on high of Starlette for internet elements and Pydantic for information validation and serialization. Probably the most noticeable options are beneath:

  • Excessive Efficiency: It’s primarily based on ASGI(Asynchronous Server Gateway Interface), which implies FastAPI leverages asynchronous programming, which permits for dealing with high-concurrency situations with out a lot overhead.
  • Information Validation: FastAPI makes use of essentially the most extensively used Pydantic information validation. We are going to find out about it later within the article
  • Automated API documentation utilizing Swagger UI with full OpenAPI customary JSON information.
  • Straightforward Extensibility: FastAPI permits integration with different Python libraries and frameworks simply

What’s Lammaindex?

LLamaindex is a software that acts as a bridge between your information and LLMs. LLMs might be native utilizing Ollama (run LLMs on a Native machine) or an API service akin to OpenAI, Gemini, and many others.LLamaindex can construct a Q&A system, Chat course of, clever agent,  and different LLM fashions. It lays the muse of Retrieval Augmented Technology (see beneath picture) with ease in three well-defined steps

Product Discovery API
  • Step One: Information Base (Enter)
  • Step Two: Set off/Question(Enter)
  • Step Three: Activity/Motion(Output)

Based on the context of this text, we are going to construct Step Two and Step Three. We are going to give a picture as enter and retrieve the product info from the product within the picture.

Setup venture atmosphere

Right here is the not-so-good flowchart of the appliance:

Product Discovery API

I’ll use conda to arrange the venture atmosphere. Comply with the beneath steps

Schematic venture construction

Product Discovery API

Step 1: Create a conda atmosphere

conda create --name api-dev python=3.11

conda activate api-dev

Step 2: Set up the required libraries

# llamaindex libraries
pip set up llama-index llama-index-llms-gemini llama-index-multi-modal-llms-gemini

# google generative ai

pip set up google-generativeai google-ai-generativelanguage

# for API improvement

pip set up fastapi

Step 3: Getting the Gemini API KEY

Go to Google AI and Click on on Get an API Key. It should go to the Google AI Studio, the place you may Create API Key

Product Discovery API
Supply: Google API Web page

Preserve it secure and save it; we would require this later.

Implementing REST API

Create a separate folder for the venture; let’s title it gemini_productAPI

# create empty venture listing
mkdir gemini_productAPI

# activate the conda atmosphere
conda activate api-dev

To make sure FastAPI is put in accurately create a Python file title most important.py and replica the beneath code to it.

# Touchdown web page for the appliance

@app.get("https://www.analyticsvidhya.com/")
def index(request: Request):
    return "Hiya World"  

As a result of Fastapi is an ASGI framework, we are going to use an asynchronous internet server to run the Fastapi software. There are two kinds of Server Gateway interfaces: WSGI and ASGI. They each sit between an internet server and a Python internet software or framework and deal with incoming consumer requests, however they do it in another way.

  • WSGI or Internet Server Gateway interface: It’s synchronous, which implies it could actually deal with one request at a time and block execution of the opposite till the earlier request is accomplished. Widespread Python internet framework Flask is a WSGI framework.
  • ASGI or Asynchronous Server Gateway interface: It’s asynchronous, which implies it could actually deal with a number of requests concurrently with out blocking others. It’s extra trendy and sturdy for a number of shoppers, long-live connections, and bidirectional communication akin to real-time messaging, video calls, and many others.

Uvicorn is an Asynchronous Server Gateway Interface (ASGI) internet server implementation for Python. It should present a typical interface between an async-capable Python internet server, framework, and software. Fastapi is an ASGI framework that makes use of Uvicorn by default.

Now begin the Uvicorn server and go to http://127.0.0.1:8000 in your browser. You will notice Hiya World written on it.

-- open your vscode terminal and kind

uvicorn most important:app --reload
Product Discovery API

Now, we’re set to begin coding the primary venture.

Importing Libraries

import os
from typing import Record

# fastapi libs
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import HTMLResponse
from fastapi.responses import HTMLResponse
from fastapi.staticfiles import StaticFiles
from fastapi.templating import Jinja2Templates

# pydantic libs
from pydantic import BaseModel, ConfigDict, Area, HttpUrl

# llamaindex libs
from llama_index.multi_modal_llms.gemini import GeminiMultiModal
from llama_index.core.program import MultiModalLLMCompletionProgram
from llama_index.core.output_parsers import PydanticOutputParser
from llama_index.core.multi_modal_llms.generic_utils import load_image_urls

After importing libraries, create a file .env and put the Google Gemini API Key you bought earlier.

# put it within the .env file

GOOGLE_API_KEY="AB2CD4EF6GHIJKLM-NO6P3QR6ST"

Now instantiate the FastAPI class and cargo the GOOGLE API KEY from env

app = FastAPI()
load_dotenv()
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")

Create a easy touchdown Web page

Create a GET methodology for our easy touchdown web page for the venture.

# Touchdown web page for the appliance

@app.get("https://www.analyticsvidhya.com/", response_class=HTMLResponse)
def landingpage(request: Request):
    return templates.TemplateResponse(
    "landingpage.html", 
    {"request": request}
    )

To render HTML in FastAPI we use the Jinja template. To do that create a template listing on the root of your venture and for static information akin to CSS and Javascript information create a listing named static. Copy the beneath code in your most important.py after the app.

# Linking template listing utilizing Jinja Template
templates = Jinja2Templates(listing="templates")

# Mounting Static information from a static listing
app.mount("/static", StaticFiles(listing="static"), title="static")

The code above hyperlinks your templates and static listing to the FastAPI software.

Now, create a file known as landingpage.html within the template listing. Go to GithubLink and replica the code /template/landingpage.html to your venture.

Truncated code snippets

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta title="viewport" content material="width=device-width, initial-scale=1.0">
    <title>Product Discovery API</title>
    <hyperlink href="https://www.analyticsvidhya.com/weblog/2024/05/building-a-product-discovery-api/{{url_for("static', path="/landingpage.css")}}"></hyperlink>
</head>
<physique>
    <header class="header">
        <div class="container header-container">
            <h1 class="content-head is-center">
            Uncover Merchandise with Google Gemini Professional Imaginative and prescient
            ....
    </header>

    <most important>
        ....
    </most important>

    <footer class="footer">
        <div class="container footer-container">
            <p class="footer-text">© 2024 Product Discovery API. 
            All rights reserved.</p>
        </div>
    </footer>

</physique>
</html>

After that, create a listing named static and two information, landingpage.css and landingpage.js, in your static listing. Now, go to GithubLink and replica the code from landingpage.js to your venture.

Truncated code snippets


doc.getElementById('upload-btn').addEventListener('click on', perform() {
    const fileInput = doc.getElementById('add');
    const file = fileInput.information[0];

   ......

doc.getElementById('contact-form').addEventListener(
'submit', perform(occasion) {
    occasion.preventDefault();
    alert('Message despatched!');
    this.reset();
}
);

for CSS, go to the Github hyperlink and replica the code landingpage.css to your venture.

Truncated code snippets

physique {
    font-family: 'Arial', sans-serif;
    margin: 5px;
    padding: 5px;
    box-sizing: border-box;
    background-color: #f4f4f9;
}

.container {
    max-width: 1200px;
    margin: 0 auto;
    padding: 0 20px;
}

The ultimate web page will appear to be beneath

Product Discovery API

This can be a very fundamental touchdown web page created for the article. You can also make it extra enticing utilizing CSS styling or React UI.

We are going to use the Google Gemini Professional Imaginative and prescient mannequin to extract product info from a picture.

def gemini_extractor(model_name, output_class, image_documents, prompt_template_str):
    gemini_llm = GeminiMultiModal(api_key=GOOGLE_API_KEY, model_name=model_name)

    llm_program = MultiModalLLMCompletionProgram.from_defaults(
        output_parser=PydanticOutputParser(output_class),
        image_documents=image_documents,
        prompt_template_str=prompt_template_str,
        multi_modal_llm=gemini_llm,
        verbose=True,
    )

    response = llm_program()
    return response
    

We are going to use Llamaindex’s GeminiMultiModal API to work with Google Gemini API on this perform. The LLmaindex MultiModalLLMCompletion API will take the output parser, picture information, immediate, and GenAI mannequin to get our desired response from the Gemini Professional Imaginative and prescient mannequin.

For extracting info from the picture, we’ve to engineer a immediate for this function

prompt_template_str = """
                You might be an professional system designed to extract merchandise from pictures for
                an e-commerce software. Please present the product title, product coloration,
                product class and a descriptive question to seek for the product.
                Precisely establish each product in a picture and supply a descriptive
                question to seek for the product. You simply return a accurately formatted
                JSON object with the product title, product coloration, product class and
                description for every product within the picture
"""

With this immediate, we instruct the mannequin that it’s an professional system that may extract info from a picture. It should extract the data beneath from the given picture enter.

  • Identify of product
  • Colour
  • Class
  • Description

This immediate will probably be used as an argument within the above gemini_extractor perform later.

Everyone knows {that a} Generative AI mannequin can usually produce undesired responses. This can be a drawback when working with a generative AI mannequin as a result of it is not going to at all times comply with the immediate (more often than not). To mitigate the sort of concern, Pydantic comes within the scene. FastAPI was constructed utilizing Pydantic to validate its API mannequin.

Making a Product mannequin utilizing Pydantic

class Product(BaseModel):
    id: int
    product_name: str
    coloration: str
    class: str
    description: str


class ExtractedProductsResponse(BaseModel):
    merchandise: Record[Product]

class ImageRequest(BaseModel):
    url: str

    model_config = ConfigDict(
        json_schema_extra={
            "examples": [
                {
                    
                    "url": "https://images.pexels.com/photos/356056/pexels-photo-356056.jpeg?
                    auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1"
                }
            ]
        }
    )

The above Product class defines a knowledge mannequin for a product, and the ExtractedProductResponse class represents a response construction that comprises a listing of those merchandise, in addition to the ImageRequest class for offering enter pictures for shoppers. We used Pydantic to make sure the structural integrity of the information validation and serialization.

all_products = []

@app.put up("/extracted_products")
def extracted_products(image_request: ImageRequest):
    responses = gemini_extractor(
        model_name="fashions/gemini-pro-vision",
        output_class=ExtractedProductsResponse,
        image_documents=load_image_urls([image_request.url]),
        prompt_template_str=prompt_template_str,
    )
    all_products.append(responses)
    return responses
    

Within the above code snippets, we create an endpoint within the FastAPI software utilizing the POST methodology with decorator @app.put up(“/extracted_products”), which can course of the requested picture to extract product info. The extracted_products methodology will deal with the request to those endpoints. It should take the image_request parameter of kind ImageRequest.

We known as the gemini_extractor perform we created beforehand for info extraction, and the response will probably be saved within the all_products listing. We are going to use a built-in Python listing to retailer the responses for simplicity. You’ll be able to add database logic to retailer the response within the database. MongoDB could be a sensible choice for storing the sort of JSON information.

Requesting a picture from OPENAPI doc

Go to http://127.0.0.1:8000/docs in your browser; you’ll get an OpenAPI docs

Product Discovery API

Increase the /extracted_product and click on Attempt It Out on the precise

Product Discovery API

Then click on Execute and it’ll extract the product info from the picture utilizing the Gemini Imaginative and prescient Professional mannequin. 

Gemini Vision Pro

Response from the POST methodology will appear to be this.

Product Discovery API

Within the above picture, you may see the requested URL and response physique, which is the generated response from the Gemini mannequin

Making a product endpoint with a GET methodology for fetching the information

@app.get("/api/merchandise/", response_model=listing[ExtractedProductsResponse])
async def get_all_products_api():
    return all_products

Go to http://127.0.0.1:8000/api/merchandise to see all of the merchandise

Product Discovery API

Within the above code, we created an endpoint to fetch the extracted information saved within the database. Others can use this JSON information for his or her merchandise, akin to making and e-commerce websites.

All of the code used on this article within the GitHub Gemini-Product-Utility

Conclusion

This can be a easy but systematic technique to entry and make the most of the Gemini Multimodal Mannequin to make minimal viable product discovery API. You should use this method to construct a extra sturdy product discovery system straight from a picture. Any such software has very helpful enterprise potential, e.g., an Android software that makes use of digital camera API to take images and Gemini API for extracting product info from that picture, which will probably be used for getting merchandise straight from Amazon, Flipkart, and many others.

Key Takeaways

  • The structure of Representational State Switch for constructing high-performance API for enterprise.
  • Llamaindex has an API to attach totally different GenAI fashions. Immediately, we are going to discover ways to use Llamaindex with the Gemini API to extract picture information utilizing the Gemini Imaginative and prescient Professional mannequin.
  • Many Python frameworks, akin to Flask, Django, and FastAPI, are used to construct REST APIs. We discover ways to use FastAPI to construct a strong REST API.
  • Immediate engineering to get the anticipated response from Gemini Mannequin

The media proven on this article should not owned by Analytics Vidhya and is used on the Writer’s discretion.

Continuously Requested Questions

Q2. Methods to use frontend frameworks akin to NextJS, Vite, and React with FastAPI?

A: You should use any UI frameworks you need with FastAPI. You must create a frontend listing and Backend dir for the API software within the FastAPI root and hyperlink your frontend UI with the Backend. See the Full Stack software right here.

Q3. Which Database will probably be good for storing responses?

A: The responses from the mannequin are JSON, so any doc database akin to MongoDB could be good for storing the response and retrieving the information for the appliance. See MongoDB with FastAPI right here.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *