How one can Construct Your Private AI Assistant with Huggingface SmolLM

[ad_1]

Introduction

Within the not-so-distant previous, the thought of getting a private AI assistant felt like one thing out of a sci-fi film. Image a tech-savvy inventor named Alex, who dreamed of getting a sensible companion to reply questions and supply insights, with out counting on the cloud or third-party servers. With developments in small language fashions (SLMs), Alex’s dream turned a actuality. This text will take you on Alex’s journey to construct an AI Chat CLI software utilizing Huggingface’s modern SmolLM mannequin. We’ll mix the facility of SmolLM with LangChain’s flexibility and Typer’s user-friendly interface. By the top, you’ll have a practical AI assistant, similar to Alex, able to chatting, answering queries, and saving conversations—all out of your terminal. Let’s dive into this thrilling new world of on-device AI and see what you possibly can create.

Studying Outcomes

  • Perceive Huggingface SmolLM fashions and their purposes.
  • Leverage SLM fashions for on-device AI purposes.
  • Discover Grouped-Question Consideration and its position in SLM structure.
  • Construct interactive CLI purposes utilizing the Typer and Wealthy libraries.
  • Combine Huggingface fashions with LangChain for strong AI purposes.

This text was revealed as part of the Knowledge Science Blogathon.

What’s Huggingface SmolLM?

SmolLM is a sequence of state-of-the-art small language fashions out there in three sizes 135M, 360M, and 1.7B parameters. These fashions are constructed on a high-quality coaching corpus named Cosmopedia V2 which is the gathering of artificial textbooks and tales generated by Mixtral (28B tokens), Python-Edu instructional Python samples from The Stack (4B tokens), and FineWeb-Edu, an academic internet samples from FineWeb(220B tokens)based on Huggingface these fashions outperform different fashions within the dimension classes throughout a varied benchmark, testing widespread sense causes, and world data.

Efficiency Comparability Chart

Performance Comparison Chart

It makes use of 5000 matters belonging to 51 classes generated utilizing Mixtral to create subtopics for sure matters, and the ultimate distribution of subtopics is under:

Histogram

The structure of 135M and 260M parameter fashions, makes use of a design just like MobileLLM, incorporating Grouped-Question Consideration (GQA) and prioritizing depth over width.

What’s Grouped-Question-Consideration?

There are three sorts of Consideration structure:

grouped query attention
  • Multi-Head Consideration (MHA): Every consideration head has its personal unbiased question, key, and worth heads. That is computationally costly, particularly for big fashions.
  • Multi-Question Consideration (MQA): Shares key and worth heads throughout all consideration heads, however every head has its question, That is extra environment friendly than MHA however can nonetheless be computationally intensive.
  • Group-Question Consideration(GQA): Think about you have got a staff engaged on a giant mission. As a substitute of each staff member working independently, you resolve to kind smaller teams. Every group will share some instruments and assets. That is just like what Grouped-Question Consideration (GQA) does in a Generative Mannequin Constructing.

Understanding Grouped-Question Consideration (GQA)

GQA is a way utilized in fashions to course of data extra effectively. It divides the mannequin’s consideration heads into teams. Every group shares a set of key and worth heads. That is totally different from conventional strategies the place every consideration head has its personal key and worth heads.

grouped query

Key Factors:

  • GQA-G: This implies GQA with G teams.
  • GQS-1: This can be a particular case the place there’s just one group. It’s just like one other technique known as Multi-Question Consideration (MQA)
  • GQA-H: On this case, the variety of teams is the same as the variety of consideration heads. That is just like Multi-Head Consideration (MHA).

Why Use GQA?

  • Velocity: GQA can course of data sooner than conventional strategies in massive fashions.
  • Effectivity: It reduces the quantity of information the mannequin must deal with, saving reminiscence and processing energy.
  • Stability: GQA finds a candy spot between velocity and accuracy.

By grouping consideration heads, GQA helps massive fashions work higher with out sacrificing a lot velocity or accuracy.

How one can use SmolLM?

Set up the required libraries Pytorch, and Transformers utilizing pip. after which we are going to put that code into the primary.py file.

Right here , I used SmolLM 360M instruct mannequin you should use larger parameter fashions comparable to SmolLM-1.7B

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "HuggingFaceTB/SmolLM-360M-Instruct"

system = "CPU" # GPU, if out there
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

mannequin = AutoModelForCausalLM.from_pretrained(checkpoint)

messages = [
    {"role": "user", "content": "List the steps to bake a chocolate cake from scratch."}
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False)

print(input_text)

inputs = tokenizer.encode(input_text, return_tensors="pt").to("cpu")
outputs = mannequin.generate(
    inputs, max_new_tokens=100, temperature=0.6, top_p=0.92, do_sample=True
)
print(tokenizer.decode(outputs[0]))

Output:

output

What’s Typer?

Typer is a library for constructing Command Line (CLI) purposes. It was constructed by Tiangolo who developed the extremely performant Python internet framework FastAPI. Typer for the CLI as FastAPI for the net.

What are the advantages of utilizing it?

  • Consumer-Pleasant and Intuitive:
    • Simple to Write: Due to glorious editor assist and code completion in every single place, you’ll spend much less time debugging and studying documentation.
    • Easy for Customers: Computerized assist and completion for all shells make it simple for finish customers.
  • Environment friendly:
    • Concise Code: decrease code duplication with a number of options from every parameter declaration. resulting in fewer bugs.
    • Begin Small: You will get began with simply 2 traces of code: one import and one operate name.
  • Scalable:
    • Develop a Wanted: Enhance complexity as a lot as you need, creating advanced command timber and subcommands with choices and arguments.
  • Versatile:
    • Run Scripts: Typer features a command/program to run scripts, robotically changing them to CLIs, even when they don’t use Typer internally.

How one can use Typer? 

A easy Hey CLI utilizing Typer. First, set up Typer utilizing pip.

$ pip set up typer

Now create a major.py file and sort under code

import typer

app = typer.Typer()
@app.command()
def major(identify: str):
    print(f"Hey {identify}")
if __name__ == "__main__":
    app()

Within the above code, we first import Typer after which create an app utilizing “typer.Typer()” technique.

The @app.command() is a decorator, decorator in Python does one thing(user-defined) with the operate on which it’s positioned. Right here, in Typer, it makes the primary() right into a command.

Output:

FIsrt with –assist argument and the with –identify argument.

output:  Huggingface SmolLM
output:  Huggingface SmolLM

Setting Up Mission

To get began with our Private AI Chat CLI software, we have to arrange our improvement atmosphere and set up the required dependencies. Right here’s find out how to do it

Create a Conda Atmosphere

# Create a conda env
$ conda create --name slmdev python=3.11

# Begin your env
$ conda activate slmdev

Create a brand new listing for the mission

$ mkdir personal-ai-chat
$ cd personal-ai-chat

Set up the required packages

pip set up langchain huggingface_hub trabsformers torch wealthy

Implementing the Chat Software

First, create a major.py file in your mission listing.

Let’s import the required modules and initialize our software.

import typer
from langchain_huggingface.llms import HuggingFacePipeline
from langchain.prompts import PromptTemplate
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
from wealthy.console import Console
from wealthy.panel import Panel
from wealthy.markdown import Markdown
import json
from typing import Checklist, Dict

app = typer.Typer()
console = Console()

Now, we are going to arrange our SmolLM mannequin and a text-generation pipeline

# Initialize smolLM mannequin
model_name = "HuggingFaceTB/SmolLM-360M-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
mannequin = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32)

# Create a text-generation pipeline
pipe = pipeline(
    "text-generation",
    mannequin=mannequin,
    tokenizer=tokenizer,
    max_length=256,
    truncation=True,
    temperature=0.7,
    do_sample=True,
    repetition_penalty=1.2,
)

# Create a LangChain LLM
llm = HuggingFacePipeline(pipeline=pipe)

Within the above code, We set our mannequin identify SmolLM 360M Instruct, and use AutoTokenizer for tokenization. After that, we provoke the mannequin utilizing Huggingface AutoModelForCasualLM.

Then we arrange a HuggingFace Pipeline for operating the llm.

Crafting the Immediate Template and LangChain

Now we’ve got to create a immediate template for our help. On this software, we are going to devise a concise and informative reply immediate.

# Create a immediate template
template = """
You're a useful assistant. Present a concise and informative reply to the next question:

Question: {question}

Reply:
"""

immediate = PromptTemplate(template=template, input_variables=["query"])

# Create a LangChain
chain = immediate | llm

When you have adopted me until now, congratulations.

Now we are going to implement the core performance of our software.

Create a Perform known as generate_response

def generate_response(question: str) -> str:
    attempt:
        with console.standing("Considering...", spinner="dots"):
            response = chain.invoke(question)
        return response
    besides Exception as e:
        print(f"An error occurred: {e}")
        return "Sorry, I encountered a problem. Please attempt rephrasing your question." 

On this operate, we are going to create a console standing that can show a loading message “Considering..” and a spinner animation whereas a response is being generated. This gives visible suggestions to the consumer.

Then we are going to name langchain’s “chain.invoke” technique to move the consumer’s question as enter. It will question the smolLM and produce a response.

Within the exception block deal with any exception which may come up in the course of the response technology course of.

Producing Responses and Dealing with Conversations

Subsequent, create a operate for saving the conversations.

def save_conversation(dialog: Checklist[Dict[str, str]]):
    """Save the dialog historical past to a JSON file."""
    filename = typer.immediate(
        "Enter a filename to avoid wasting the dialog (with out extension)"
    )
    attempt:
        with open(f"{filename}.json", "w") as f:
            json.dump(dialog, f, indent=2)
        console.print(f"Dialog saved to {filename}.json", fashion="inexperienced")
    besides Exception as e:
        print(f"An error occurred whereas saving: {e}")

Within the above code snippets, we are going to create a conversation-saving operate. Right here, the consumer can enter a filename, and the operate will save all of the dialog right into a JSON file.

Implementing the CLI Software Command

## Code Block 1

@app.command()
def begin():
    console.print(Panel.match("Hello, I am your Private AI!", fashion="daring magenta"))
    dialog = []
    
    ## Code Block 2
    whereas True:
        console.print("How might I show you how to?", fashion="cyan")
        question = typer.immediate("You")

        if question.decrease() in ["exit", "quit", "bye"]:
            break

        response = generate_response(question)
        dialog.append({"position": "consumer", "content material": question})
        dialog.append({"position": "assistant", "content material": response})

        console.print(Panel(Markdown(response), title="Assistant", broaden=False))
        
        ## Code Block 3
        whereas True:
            console.print(
                "nCHoose an motion:",
                fashion="daring yellow",
            )
            console.print(
                "1. follow-upn2. new-queryn3. end-chatn4. save-and-exit",
                fashion="yellow",
            )
            motion = typer.immediate("Enter the nuber equivalent to your alternative.")

            if motion == "1":
                follow_up = typer.immediate("Observe-up query")
                question = follow_up.decrease()
                response = generate_response(question)

                dialog.append({"position": "consumer", "content material": question})
                dialog.append({"position": "assistant", "content material": response})

                console.print(
                    Panel(Markdown(response), title="Assistant", broaden=False)
                )
            elif motion == "2":
                new_query = typer.immediate("New question")
                question = new_query.decrease()
                response = generate_response(question)

                dialog.append({"position": "consumer", "content material": question})
                dialog.append({"position": "assistant", "content material": response})

                console.print(
                    Panel(Markdown(response), title="Assistant", broaden=False)
                )
            elif motion == "3":
                return
            elif motion == "4":
                save_conversation(dialog)
                return
            else:
                console.print(
                    "Invalid alternative. Please select a sound possibility.", fashion="pink"
                )
                
    ## Code Block 4  
    if typer.affirm("Would you want to avoid wasting this dialog?"):
        save_conversation(dialog)

    console.print("Good Bye!! Joyful Hacking", fashion="daring inexperienced")

Code Block 1

Introduction and welcome message, right here

  • The code begins with a ” begin ” operate triggered while you run the appliance. the decorator “@app.command” makes this begin operate right into a command in our CLI software.
  • It shows colourful welcome messages utilizing a library known as Wealthy.

Code Block 2

The primary dialog loop, right here

  • The code enters a loop that continues till you exit the dialog
  • Inside that loop
    • It asks you “How might I show you how to?” utilizing the colour immediate.
    • It captures your question utilizing “typer.immediate” and converts it into lowercase
    • It additionally checks in case your question is an exit command like “exit”, “give up” or “bye”. If that’s the case it exits the loop.
    • In any other case, it calls the “generate_response” operate to course of your question and get a response.
    • It shops your question and response within the dialog historical past.
    • It shows the assistant’s response in a formatted field utilizing libraries like Wealthy’s console and Markdown.

Output:

output:  Huggingface SmolLM

Code Block 3

Dealing with Consumer Selection

  • Right here on this whereas loop, many of the issues are the identical as earlier than, the one distinction is you could select a unique possibility for additional dialog, comparable to follow-up questions, new question, finish chat, saving the dialog, and many others.

Code Block 4

Saving the dialog and farewell message

Right here, the assistant will ask you to avoid wasting your chat historical past as a JSON file for additional evaluation. Ask you for a file identify with out “.json”. and save your historical past in your root listing.

After which a farewell message for you.

Output:

output: Huggingface SmolLM

Output of Saved File:

Output:  Huggingface SmolLM

Conclusion

Constructing your individual Private AI CLI software utilizing Huggingface SmoLM is greater than only a enjoyable mission. It’s the gateway to understanding and making use of cutting-edge AI applied sciences in a sensible, accessible approach. By means of this text, we’ve got explored find out how to use the facility of compact SLM to create a fascinating consumer interface proper in your terminal.

All of the code used on this Private AI CLI

Key Takeaways

  • Article demonstrates that constructing a private AI assistant is inside attain for builders of varied talent ranges. and {hardware} ranges.
  • By using SmolLM, a compact but succesful language mannequin, the mission reveals find out how to create an AI chat software that doesn’t require heavy computational assets and makes it appropriate for small, low-power {hardware}.
  • The mission showcases the facility of integrating totally different applied sciences to create a practical feature-rich software.
  • By means of using Typer and Wealthy libraries, the article emphasizes the significance of making an intuitive and visually interesting command-line interface, enhancing consumer expertise even in a text-based atmosphere.

Often Requested Questions

Q1. Can I customise the AI’s response or practice it on my information?

A. Sure, You’ll be able to tweak the immediate template to your domain-specific alternative of help. Experiment with immediate, and mannequin parameters like temperature and max_lenght to regulate the fashion responses. For, coaching together with your information you must attempt PEFT fashion coaching comparable to LORA, or you should use RAG sort software to make use of your information instantly with out altering mannequin weights.

Q2. Is that this Private AI chat safe for dealing with delicate data?

A. This private AI chat is designed for Native Use, so your information stays private so long as you replace your mannequin weight by fine-tuning it together with your information. As a result of whereas fine-tuning in case your coaching information comprise any private data it should have an imprint on the mannequin weight. So, watch out and sanitize your dataset earlier than fine-tuning.

Q3. How does the SmolLM mannequin examine to a bigger language mannequin like GPT-3?

A. SLM fashions are constructed utilizing high-quality coaching information for small units, it has solely 100M to 3B parameters, whereas LLM is skilled on and for big computationally heavy {hardware} and consists of 100B to trillions of parameters. These LLMs are coaching on information out there on the web. So, SML won’t compete with the LLM in width and depth. However SLM efficiency is nicely in its dimension classes.

The media proven on this article is just not owned by Analytics Vidhya and is used on the Writer’s discretion.

[ad_2]

Leave a Reply

Your email address will not be published. Required fields are marked *