Qwen2: Alibaba Cloud’s Open-Supply LLM


Introduction

Many new corporations are popping up and releasing new open supply Giant Language Fashions within the coming years. As time progresses, these fashions have gotten nearer and nearer to the paid closed-source fashions. These corporations are releasing these fashions in numerous sizes and ensuring to maintain their licenses in order that anybody can use them commercially. One such group of fashions is Qwen. Its earlier fashions have confirmed to be the most effective open supply fashions alongside Mistral and Zephyr and now they’ve not too long ago introduced a model 2 of it referred to as the Qwen2.

Qwen

Studying Targets

  • Find out about Qwen, Alibaba Cloud’s open-source language fashions.
  • Uncover Qwen2’s new options.
  • Overview Qwen2’s efficiency benchmarks.
  • Making an attempt Qwen2 with the HuggingFace Transformer library.
  • Acknowledge Qwen2’s industrial and open-source potential.

This text was printed as part of the Information Science Blogathon.

What’s Qwen?

Qwen refers to a household of Giant Language Fashions backed by Alibaba Cloud, a agency situated in China. It has made an ideal contribution to AI House by releasing lots of its open-source fashions which can be on par with the highest fashions on the HuggingFace leaderboard. Qwen has launched its fashions in numerous sizes starting from the 7 Billion Parameter mannequin to the 70 Billion Parameter mannequin. They haven’t simply launched the fashions however have finetuned them in a means that was on the prime of the leaderboard after they have been launched.

However Qwen didn’t cease it with this. It has even launched Chat Finetuned fashions, LLMs that have been closely skilled in Arithmetic and Code. It has even launched imaginative and prescient language fashions. The Qwen workforce is even shifting to the audio house to launch Textual content-to-speech fashions. Qwen is attempting to create an ecosystem of open-source fashions available for everybody to start out constructing functions with them with none restrictions and for industrial functions.

What’s Qwen2?

Qwen obtained a lot appreciation from the open-source neighborhood when it was launched. Loads of derivates have been created from this Qwen mannequin. Lately the Qwen workforce has introduced a collection of successor fashions to its earlier technology, referred to as the Qwen2 with extra fashions and extra finetuned variations in comparison with earlier generations.

Qwen2 was launched in 5 completely different sizes, which embody the 0.5B, 1.5B, 7B, 14B, and 72 Billion variations. These fashions have been pretrained on greater than 27 completely different languages and have been considerably improved within the areas of code and arithmetic in comparison with the sooner technology of fashions. The nice factor is right here is that even the 0.5B and the 1.5B fashions include 32k context size. Whereas the 7B and the 72B include 128k context size.

All these fashions have Grouped Question Consideration, which vastly hurries up the method of consideration and the quantity of reminiscence required to retailer the intermediate outcomes in the course of the inference.

Efficiency and Benchmarks

Coming to the bottom mannequin comparisons, the Qwen2 72B Giant Language Mannequin outperforms the newly launched Llama3 70B mannequin and the combination of exports Mixtral 8x22B mannequin. We will see the benchmark scores within the beneath pic. The Qwen mannequin outperforms each the Llama3 and Mixtral in lots of benchmarks like MMLU, MMLU-Professional, TheoremQA, HumanEval, GSM8k and plenty of extra.

Qwen2: Performance and Benchmarks

Coming to the smaller mannequin i.e. the Qwen2 7B Instruct Mannequin, it additionally outperforms the newly launched SOTA(State-Of-The-Artwork) fashions just like the Llama3 8B Mannequin and the GLM4 9B Mannequin. Regardless of Qwen2 being the smallest mannequin of the three, it outperforms each of them and the outcomes for all of the benchmarks might be seen within the beneath pic.

Qwen2 7B Instruct Model

Qwen2 in Motion

We shall be working with Google Colab to check out the Qwen2 mannequin.

Step1: Obtain Libraries

To get began, we have to obtain just a few helper libraries. For this, we work with the beneath code:

!pip set up -U -q transformers speed up
  • transformers: It’s a in style Python bundle from HuggingFace, with which we will obtain any deep studying fashions and work with them.
  • speed up: Even this, is a bundle developed by HuggingFace. This bundle helps in growing the inference pace of the Giant Language Fashions when they’re working on the GPU.

Step2: Obtain the Qwen Mannequin

Now we’ll write the code to obtain the Qwen mannequin and check it. The code for this shall be:

from transformers import pipeline

machine = "cuda"

pipe = pipeline("text-generation",
                mannequin="Qwen/Qwen2-1.5B-Instruct",
                machine=machine,
                max_new_tokens=512,
                do_sample=True,
                temperature=0.7,
                top_p=0.95,
                )
  • We begin by importing the pipeline perform from the transformers library.
  • Then we set the machine to which the mannequin must be mapped to. Right here, we set it to cuda, which suggests the mannequin shall be despatched to GPU if accessible.
  • mannequin=”Qwen/Qwen2-1.5B-Instruct”: This tells the pre-trained mannequin to be labored with machine=machine: This tells the machine for use for working the mannequin.
  • max_new_tokens=512: Right here, we give the utmost variety of new tokens to be generated.
  • do_sample=True: This allows sampling throughout technology for elevated variety within the output.
  • temperature=0.7: This controls the randomness of the generated textual content. Greater values result in extra inventive and unpredictable outputs.
  • top_p=0.95: This units the chance mass to be thought-about for the following token throughout technology.

Step3: Giving Record of Messages to the Mannequin

Now, allow us to attempt giving the mannequin an inventory of messages for the enter and see the output that it generates for the given checklist of messages.

messages = [
    {"role": "system",
     "content": "You are a funny assistant. You must respons to user questions in funny way"},
    {"role": "user", "content": "What is life?"},
]

response = pipe(messages)

print(response[0]['generated_text'][-1]['content'])
  • Right here, the primary message is a system message that instructs the assistant to be humorous.
  • The second message is a person message that asks “What’s life?”.
  • We put each these messages as gadgets in an inventory.
  • Then we give this checklist, containing an inventory of messages to the pipeline object, that’s to our mannequin.
  • The mannequin then processes these messages and generates a response.
  • Lastly, we extract the content material of the final generated textual content from the response.

Operating this code has produced the next output:

output

We see that the mannequin certainly tried to generate a humorous reply.

Step4: Testing the Mannequin with Arithmetic Questions

Now allow us to check the mannequin with just a few arithmetic questions. The code for this shall be:

messages = [
    {"role": "user", "content": "If a car travels at a constant speed of 
    60 miles per hour, how far will it travel in 45 minutes?"},
    {"role": "assistant", "content": "To find the distance, 
    use the formula: distance = speed × time. Here, speed = 60 miles per 
    hour and time = 45 minutes = 45/60 hours. So, distance = 60 × (45/60) = 45 miles."},
    {"role": "user", "content": "How far will it travel in 2.5 hours? Explain step by step"}
]

response = pipe(messages)

print(response[0]['generated_text'][-1]['content'])
  • Right here once more, we’re creating an inventory of messages.
  • The primary message is a person message that asks how far a automotive will journey in 45 minutes at a continuing pace of 60 miles per hour.
  • The second message is an assistant message that gives the answer to the person’s query utilizing the formulation distance = pace × time.
  • The third message is once more a person message asking the assistant one other query.
  • Then we give this checklist of messages to the pipeline.
  • The mannequin will then course of these messages and generate a response.

The output generated by working the code might be seen beneath:

Output

We will see that the Qwen2 1.5B mannequin began pondering step-by-step to reply the person query. It first began by defining the formulation to calculate distance. Following that, it wrote down the data it has relating to the pace and time. Then it has lastly put collectively this stuff to make up the ultimate reply. Regardless of simply being a 1.5 Billion Parameter mannequin, the mannequin is really working nicely.

Testing with Extra Examples

Allow us to check the mannequin with just a few extra examples:

messages = [
    {"role": "user", "content": "A clock shows 12:00 p.m. now. 
    How many degrees will the minute hand move in 15 minutes?"},
    {"role": "assistant", "content": "The minute hand moves 360 degrees 
    in one hour (60 minutes). Therefore, in 15 minutes, it will 
    move (15/60) * 360 degrees = 90 degrees."},
    {"role": "user", "content": "How many degrees does the hour hand 
    move in 15 minutes?"}
]

response = pipe(messages)

print(response[0]['generated_text'][-1]['content'])
output
messages = [
    {"role": "user", "content": "Convert 100 degrees Fahrenheit to Celsius."},
    {"role": "assistant", "content": "To convert Fahrenheit to Celsius,
     use the formula: C = (F - 32) × 5/9. So, for 100 degrees Fahrenheit, 
     C = (100 - 32) × 5/9 = 37.78 degrees Celsius."},
    {"role": "user", "content": "What is 0 degrees Celsius in Fahrenheit?"}
]

response = pipe(messages)

print(response[0]['generated_text'][-1]['content'])
Qwen2
messages = [
    {"role": "user", "content": "What gets wetter as it dries?"},
    {"role": "assistant", "content": "A towel gets wetter as it dries 
    because it absorbs the water from the body, becoming wetter itself."},
    {"role": "user", "content": "What has keys but can't open locks?"}
]

response = pipe(messages)

print(response[0]['generated_text'][-1]['content'])
Qwen2

Right here we’ve got moreover examined the mannequin with three different examples. The primary two examples are on arithmetic once more. We see that Qwen2 1.5B was in a position to perceive the query nicely and was in a position to generate a satisfying reply.

However within the instance, it has failed. The reply to the query is the piano keys. That could be a piano that has keys however can’t open locks. The mannequin has didn’t reply this however got here up with a distinct reply. It answered the keychain and even gave a supporting assertion to it. We can’t precisely say it has failed as a result of technically a keychain accommodates open locks however the keys within the keychain do.

General, we see that regardless of being a 1.5 Billion parameter mannequin, the Qwen2 1.5B has answered the mathematical questions appropriately and was in a position to present good reasoning across the solutions it generated. This tells us the larger parameter fashions just like the Qwen2 7B, 14B, and 72B fashions can carry out extraordinarily nicely in numerous duties.

Conclusion

Qwen2, a brand new collection of open-source fashions from Alibaba Cloud, represents an ideal development within the area of enormous language fashions (LLMs). Constructing on the success of its predecessor, Qwen2 provides a variety of fashions from 0.5B to 72B parameters, excelling in efficiency throughout numerous benchmarks. The fashions are designed to be versatile and commercially accessible, supporting a number of languages and that includes improved capabilities in code, arithmetic, and extra. Qwen2’s spectacular efficiency and open accessibility place it as a formidable competitor to closed-source alternate options, fostering innovation and utility improvement in AI.

Key Takeaways

  • Qwen2 continues the development of high-quality open-source LLMs, offering sturdy alternate options to closed-source fashions.
  • The Qwen2 collection consists of fashions from 0.5 billion to 72 billion parameters, catering to various computational wants and use circumstances.
  • Qwen2 fashions are pretrained in over 27 languages, enhancing their applicability in international contexts
  • Licenses that permit for industrial use promote widespread adoption and innovation of Qwen2 fashions.
  • Builders and researchers can simply combine and make the most of the fashions by way of in style instruments like HuggingFace’s transformers library, making them accessible.

Regularly Requested Questions

Q1. What’s Qwen?

A. Qwen is a household of enormous language fashions created by Alibaba Cloud. They launch open-source fashions in numerous sizes which can be aggressive with paid fashions.

Q2. What’s Qwen2?

A. Qwen2 is the newest model of Qwen fashions with improved efficiency and extra options. It is available in completely different sizes, starting from 0.5 billion to 72 billion parameters.

Q3. How do I take advantage of Qwen2 for textual content technology?

A. You should use the pipeline perform from the Transformers library to generate textual content with Qwen2. The code instance within the article reveals how to do that.

This fall. How does Qwen2 carry out?

A. Qwen2 outperforms different main fashions in lots of benchmarks, together with language understanding, code technology, and mathematical reasoning.

Q5. Can Qwen2 reply math questions?

A. Sure, Qwen2, particularly the bigger fashions, can reply math questions and supply explanations for the solutions.

The media proven on this article isn’t owned by Analytics Vidhya and is used on the Creator’s discretion.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *