Bettering Text2SQL Efficiency with Ease on Databricks

[ad_1]

Wish to increase your LLM into the highest 10 of Spider, a broadly used benchmark for text-to-SQL duties? Spider evaluates how effectively LLMs can convert textual content queries into SQL code.

For these unfamiliar with text-to-SQL, its significance lies in remodeling how companies work together with their knowledge. As a substitute of counting on SQL specialists to jot down queries, individuals can merely ask questions of their knowledge in plain English and obtain exact solutions. This democratizes entry to knowledge, enhancing enterprise intelligence and enabling extra knowledgeable decision-making.

The Spider benchmark is a well known commonplace for evaluating the efficiency of text-to-SQL techniques. It challenges LLMs to translate pure language queries into exact SQL statements, requiring a deep understanding of database schemas and the power to generate syntactically and semantically appropriate SQL code.

On this submit, we’ll dive into how we achieved scores of 79.9% on the Spider growth dataset and 78.9% on the check dataset in lower than a day of labor utilizing the open-source Llama3 8B Instruct mannequin – a exceptional 19-point enchancment over the baseline. This efficiency would place it in a top-10 spot on the now-frozen Spider leaderboard, because of strategic prompting and fine-tuning on Databricks.

How to Crush the Spider Benchmark with Ease on Databricks

Zero-shot Prompting for Baseline Efficiency

Let’s begin by evaluating the efficiency of Meta Llama 3 8B Instruct on the Spider dev dataset utilizing a quite simple immediate format consisting of the CREATE TABLE statements that created the tables and a query we would wish to reply utilizing these tables:

{create_table_queries}

-- {query}
SELECT

This sort of immediate is also known as “zero-shot” as a result of there aren’t any different examples within the immediate. For the primary query within the Spider dev dataset this immediate format produces:

CREATE TABLE stadium (
Stadium_ID int,
Location textual content,
Identify textual content,
Capability int,
Highest int,
Lowest int,
Common int,
PRIMARY KEY (Stadium_ID)
)
…<omitted the singer, live performance, and singer_in_concert tables for brevity>

-- What number of singers do we now have?
SELECT

Operating the Spider benchmark on the dev dataset utilizing this format produces an general rating of 60.9 when measured utilizing execution accuracy and grasping decoding. Because of this 60.9% of the time the mannequin produces SQL that when executed produces the identical outcomes as a “gold” question representing the proper answer.

	Simple	Medium	Onerous	Additional	All
Zero-shot	78.6	69.3	42.5	31.3	60.9

With the baseline rating established, earlier than we even get into fine-tuning let’s attempt totally different prompting methods to attempt to increase the rating for the bottom mannequin on the Spider dev benchmark dataset.

Prompting With Pattern Rows

One of many drawbacks with the primary immediate we used is that it would not embody any details about the information within the columns past the information sort. A paper on evaluating text-to-SQL capabilities of fashions with Spider discovered that including sampled rows to the immediate led to a better rating, so let’s attempt that.

We are able to replace the immediate format above in order that the create desk queries additionally embody the primary few rows from every desk. For a similar query from earlier we not have an up to date immediate:

CREATE TABLE stadium (
Stadium_ID int,
Location textual content,
Identify textual content,
Capability int,
Highest int,
Lowest int,
Common int,
PRIMARY KEY (Stadium_ID)
)
/*
Stadium_ID    Location    Identify    Capability    Highest    Lowest    
Common
1    Raith Rovers    Stark's Park    10104    4812    1294    2106
2    Ayr United    Somerset Park    11998    2363    1057    1477
3    East Fife    Bayview Stadium    2000    1980    533    864
*/
…<omitted the singer, live performance, and singer_in_concert tables for 
brevity>

-- What number of singers do we now have?
SELECT

Together with pattern rows for every desk raises the general rating by about 6 share factors to 67.0:

	Simple	Medium	Onerous	Additional	All
Zero-shot with pattern rows	80.6	75.3	51.1	41.0	67.0

Few-shot Prompting

Few-shot prompting is a well-known technique used with LLMs the place we will enhance the efficiency on a process reminiscent of producing appropriate SQL by together with some examples demonstrating the duty to be carried out. With a zero-shot immediate we supplied the schemas after which requested a query. With a few-shot immediate we offer some schemas, a query, the SQL that solutions that query, after which repeat that sequence a pair occasions earlier than attending to the precise query we wish to ask. This typically ends in higher efficiency than a zero-shot immediate.

A very good supply of examples demonstrating the SQL era process is definitely the Spider coaching dataset itself. We are able to take a random pattern of some questions from this dataset with their corresponding tables and assemble a few-shot immediate demonstrating the SQL that may reply every of those questions. Since we at the moment are utilizing pattern rows as of the earlier immediate we must also guarantee considered one of these examples additionally contains pattern rows as effectively to show their utilization.

One other enchancment we will make on the earlier zero-shot immediate is to additionally embody a “system immediate” at first. System prompts are sometimes used to supply detailed steerage to the mannequin that define the duty to be carried out. Whereas a person might ask a number of questions all through the course of chat with a mannequin, the system immediate is simply supplied as soon as earlier than the person even asks a query, primarily establishing expectations for the way the “system” ought to carry out throughout the chat.

With these methods in thoughts, we will assemble a few-shot immediate that additionally begins with a system message represented as a big SQL remark block on the high adopted by three examples:

/*
You're a useful assistant who solutions questions on database tables 
by responding with SQL queries.  Customers will give you a set of 
tables represented as CREATE TABLE statements.  Every CREATE TABLE 
assertion might optionally be adopted by the primary few rows from the 
desk in an effort to assist write the proper SQL to reply questions. After 
the CREATE TABLE statements customers will ask a query utilizing a SQL 
remark beginning with two dashes. It's best to reply the person's query 
by writing a SQL assertion beginning with SELECT and ending with a 
semicolon.
*/

CREATE TABLE "Campuses" (
	"Id" INTEGER PRIMARY KEY,
	"Campus" TEXT,
	"Location" TEXT,
	"County" TEXT,
	"12 months" INTEGER
);
/*
Id    Campus    Location    County    12 months
1    California State College-Bakersfield    Bakersfield    Kern    
1965
2    California State College-Channel Islands    Camarillo    
Ventura    2002
3    California State College-Chico    Chico    Butte    1887
*/

… <extra tables omitted>

-- Please reply the next query utilizing the tables above.
-- Discover the title of the campuses that's in Northridge, Los Angeles or 
-- in San Francisco, San Francisco.
SELECT Campus FROM Campuses WHERE Location="Northridge" AND County="Los 
Angeles" 
UNION SELECT Campus FROM Campuses WHERE Location="San Francisco" AND 
County="San Francisco";

… <two extra examples omitted>

CREATE TABLE stadium (
Stadium_ID int,
Location textual content,
Identify textual content,
Capability int,
Highest int,
Lowest int,
Common int,
PRIMARY KEY (Stadium_ID)
)
/*
Stadium_ID    Location    Identify    Capability    Highest    Lowest    
Common
1    Raith Rovers    Stark's Park    10104    4812    1294    2106
2    Ayr United    Somerset Park    11998    2363    1057    1477
3    East Fife    Bayview Stadium    2000    1980    533    864
*/
…<omitted the singer, live performance, and singer_in_concert tables for 
brevity>

-- What number of singers do we now have?
SELECT

This new immediate has resulted in a rating of 70.8, which is one other 3.8 share level enchancment over our earlier rating. We have now raised the rating almost 10 share factors from the place we began simply by easy prompting methods.

	Simple	Medium	Onerous	Additional	All
Few-shot with pattern rows	83.9	79.1	55.7	44.6	70.8

We’re most likely now reaching the purpose of diminishing returns from tweaking our immediate. Let’s fine-tune the mannequin to see what additional features could be made.

High-quality-Tuning with LoRA

If we’re fine-tuning the mannequin the primary query is what coaching knowledge to make use of. Spider features a coaching dataset so this looks like a very good place to begin. To fine-tune the mannequin we are going to use QLoRA in order that we will effectively practice the mannequin on a single A100 80GB Databricks GPU cluster reminiscent of Standard_NC24ads_A100_v4 in Databricks. This may be accomplished in about 4 hours utilizing the 7k information within the Spider coaching dataset. We have now beforehand mentioned fine-tuning with LoRA in an earlier weblog submit. readers can consult with that submit for extra particulars. We are able to comply with commonplace coaching recipes utilizing the trl, peft, and bitsandbytes libraries.

Though we’re getting the coaching information from Spider, we nonetheless must format them in a method that the mannequin can be taught from. The objective is to map every report, consisting of the schema (with pattern rows), query and SQL right into a single textual content string. We begin by performing some processing on the uncooked Spider dataset. From the uncooked knowledge we produce a dataset the place every report consists of three fields: schema_with_rows, query, and question. The schema_with_rows subject is derived from the tables comparable to the query, following the formatting of the CREATE TABLE assertion and rows used within the few-shot immediate earlier.

Subsequent load the tokenizer:

tokenizer = 
AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")

We’ll outline a mapping operate that may convert every report from our processed Spider coaching dataset right into a textual content string. We are able to use apply_chat_template from the tokenizer to conveniently format the textual content into the chat format anticipated by the Instruct mannequin. Though this is not the very same format we’re utilizing for our few-shot immediate, the mannequin generalizes effectively sufficient to work even when the boilerplate formatting of the prompts is barely totally different.

def _mapper(rec):
    schema = rec["schema_with_rows"].strip()
    query = rec["question"].strip()
    question = rec["query"].strip()

    user_message = USER_MESSAGE_FORMAT.format(schema=schema, 
    query=query)

    messages = [
     {
       "role": "system",
       "content": SYSTEM_PROMPT,
     },
     {"role": "user", "content": user_message},
     {"role": "assistant", "content": query},
    ]
    immediate = tokenizer.apply_chat_template(messages, tokenize=False, 
    add_generation_prompt=False)
    return {"textual content": immediate}

For SYSTEM_PROMPT we use the identical system immediate used within the few-shot immediate earlier. For USER_MESSAGE_FORMAT we equally use:

{schema}

Please reply the next query utilizing the tables above.
{query}

With this operate outlined all that’s left is to remodel the processed Spider dataset with it and put it aside as a JSONL file.

dataset.map(_mapper)

We at the moment are prepared to coach. Just a few hours later we now have a fine-tuned Llama3 8B Instruct. Rerunning our few-shot immediate on this new mannequin resulted in a rating of 79.9, which is one other 9 share level enchancment over our earlier rating. We have now now raised the full rating by ~19 share factors over our easy zero-shot baseline.

	Simple	Medium	Onerous	Additional	All
Few-shot with pattern rows (High-quality-tuned Llama3 8B Instruct)	91.1	85.9	72.4	54.8	79.9
Few-shot with pattern rows (Llama3 8B Instruct)	83.9	79.1	55.7	44.6	70.8
Zero-shot with pattern rows (Llama3 8B Instruct)	80.6	75.3	51.1	41.0	67.0
Zero-shot (Llama3 8B Instruct)	78.6	69.3	42.5	31.3	60.9

You is perhaps questioning now how the Llama3 8B Instruct mannequin and the fine-tuned model evaluate towards a bigger mannequin reminiscent of Llama3 70B Instruct. We have now repeated the analysis course of utilizing the off-the-shelf 70B mannequin on the dev dataset with eight A100 40 GB GPUs and recorded the outcomes beneath.

Few-shot with pattern rows

(Llama3 70B Instruct)

89.5

83.0

64.9

53.0

76.7

Zero-shot with pattern rows

(Llama3 70B Instruct)

83.1

81.8

59.2

36.7

71.1

Zero-shot

(Llama3 70B Instruct)

82.3

80.5

57.5

31.9

69.2

As anticipated, evaluating the off-the-shelf fashions, the 70B mannequin beats the 8B mannequin when measured utilizing the identical immediate format. However what’s stunning is that the fine-tuned Llama3 8B Instruct mannequin scores greater than the Llama3 70B Instruct mannequin by 3 share factors. When centered on particular duties reminiscent of text-to-SQL, fine-tuning may end up in small fashions which might be comparable in efficiency with fashions which might be a lot bigger in dimension.

Deploy to a Mannequin Serving Endpoint

Llama3 is supported by Mosaic AI Mannequin Serving, so we may even deploy our fine-tuned Llama3 mannequin to an endpoint and use it to energy purposes. All we have to do is log the fine-tuned mannequin to Unity Catalog after which create an endpoint utilizing the UI. As soon as it’s deployed we will question it utilizing widespread libraries.

Wrapping Up

We kicked off our journey with the Llama3 8B Instruct on the Spider dev dataset utilizing a zero-shot immediate, reaching a modest rating of 60.9. By enhancing this with a few-shot immediate—full with system messages, a number of examples, and pattern rows—we boosted our rating to 70.8. Additional features got here from fine-tuning the mannequin on the Spider coaching dataset, propelling us to a formidable 79.9 on Spider dev and 78.9 on Spider check. This vital 19-point climb from our start line and a 3-point lead over the bottom Llama3 70B Instruct not solely showcases our mannequin’s prowess but in addition would safe us a coveted spot within the top-10 outcomes on Spider.

Be taught extra about learn how to leverage the facility of open supply LLMs and the Information Intelligence Platform by registering for Information+AI Summit.

Appendix

Analysis Setup

Technology was carried out utilizing vLLM, grasping decoding (temperature of 0), two A100 80 GB GPUs, and 1024 max new tokens. To judge the generations we used the check suite from the taoyds/test-suite-sql-eval repo in Github.

Coaching Setup

Right here is the precise particulars concerning the fine-tuning setup:

Base Mannequin	Llama3 8B Instruct
GPUs	Single A100 80GB
Max Steps	100
Spider practice dataset information	7000
Lora R	16
Lora Alpha	32
Lora Dropout	0.1
Studying Price	1.5e-4
Studying Price Scheduler	Fixed
Gradient Accumulation Steps	8
Gradient Checkpointing	True
Practice Batch Measurement	12
LoRA Goal Modules	q_proj,v_proj,k_proj,o_proj,gate_proj,up_proj,down_proj
Information Collator Response Template	<\|start_header_id\|>assistant<\|end_header_id\|>

Zero-shot Immediate Instance

That is the primary report from the dev dataset we used for analysis formatted as a zero-shot immediate that features the desk schemas. The tables the query is regarding are represented utilizing the CREATE TABLE statements that created them.

CREATE TABLE stadium (
Stadium_ID int,
Location textual content,
Identify textual content,
Capability int,
Highest int,
Lowest int,
Common int,
PRIMARY KEY (Stadium_ID)
)

CREATE TABLE singer (
Singer_ID int,
Identify textual content,
Nation textual content,
Song_Name textual content,
Song_release_year textual content,
Age int,
Is_male bool,
PRIMARY KEY (Singer_ID)
)

CREATE TABLE live performance (
concert_ID int,
concert_Name textual content,
Theme textual content,
Stadium_ID textual content,
12 months textual content,
PRIMARY KEY (concert_ID),
FOREIGN KEY (Stadium_ID) REFERENCES stadium(Stadium_ID)
)

CREATE TABLE singer_in_concert (
concert_ID int,
Singer_ID textual content,
PRIMARY KEY (concert_ID,Singer_ID),
FOREIGN KEY (concert_ID) REFERENCES live performance(concert_ID),
FOREIGN KEY (Singer_ID) REFERENCES singer(Singer_ID)
)

-- What number of singers do we now have?
SELECT

Zero-shot with Pattern Rows Immediate Instance

That is the primary report from the dev dataset we used for analysis formatted as a zero-shot immediate that features the desk schemas and pattern rows. The tables the query is regarding are represented utilizing the CREATE TABLE statements that created them. The rows had been chosen utilizing “SELECT * {table_name} LIMIT 3” from every desk, with the column names showing as a header.

CREATE TABLE stadium (
Stadium_ID int,
Location textual content,
Identify textual content,
Capability int,
Highest int,
Lowest int,
Common int,
PRIMARY KEY (Stadium_ID)
)
/*
Stadium_ID   Location   Identify   Capability   Highest   Lowest   Common
1   Raith Rovers   Stark's Park    10104    4812    1294    2106
2   Ayr United   Somerset Park    11998    2363    1057    1477
3   East Fife   Bayview Stadium    2000    1980    533    864
*/

CREATE TABLE singer (
Singer_ID int,
Identify textual content,
Nation textual content,
Song_Name textual content,
Song_release_year textual content,
Age int,
Is_male bool,
PRIMARY KEY (Singer_ID)
)
/*
Singer_ID    Identify    Nation    Song_Name   Song_release_year   Age   Is_male
1    Joe Sharp    Netherlands    You    1992    52    F
2    Timbaland    United States    Harmful    2008    32    T
3    Justin Brown    France    Hey Oh    2013    29    T
*/

CREATE TABLE live performance (
concert_ID int,
concert_Name textual content,
Theme textual content,
Stadium_ID textual content,
12 months textual content,
PRIMARY KEY (concert_ID),
FOREIGN KEY (Stadium_ID) REFERENCES stadium(Stadium_ID)
)
/*
concert_ID    concert_Name    Theme    Stadium_ID    12 months
1    Auditions    Free alternative    1    2014
2    Tremendous bootcamp    Free alternative 2    2    2014
3    Residence Visits    Bleeding Love    2    2015
*/

CREATE TABLE singer_in_concert (
concert_ID int,
Singer_ID textual content,
PRIMARY KEY (concert_ID,Singer_ID),
FOREIGN KEY (concert_ID) REFERENCES live performance(concert_ID),
FOREIGN KEY (Singer_ID) REFERENCES singer(Singer_ID)
)
/*
concert_ID    Singer_ID
1    2
1    3
1    5
*/

-- What number of singers do we now have?
SELECT

Few-shot with Pattern Rows Immediate Instance

That is the primary report from the dev dataset we used for analysis formatted as a few-shot immediate that features the desk schemas and pattern rows. The tables the query is regarding are represented utilizing the CREATE TABLE statements that created them. The rows had been chosen utilizing “SELECT * {table_name} LIMIT 3” from every desk, with the column names showing as a header.

/*
You're a useful assistant who solutions questions on database tables by 
responding with SQL
queries.  Customers will give you a set of tables represented as CREATE 
TABLE statements.  Every CREATE TABLE assertion might optionally be adopted by 
the primary few rows from the desk in an effort to assist write the proper SQL to 
reply questions. After the CREATE TABLE statements customers will ask a 
query utilizing a SQL remark beginning with two dashes. It's best to reply the 
person's query by writing a SQL assertion beginning with SELECT and ending 
with a semicolon.
*/

CREATE TABLE "Campuses" (
	"Id" INTEGER PRIMARY KEY,
	"Campus" TEXT,
	"Location" TEXT,
	"County" TEXT,
	"12 months" INTEGER
);
/*
Id    Campus    Location    County    12 months
1    California State College-Bakersfield    Bakersfield    Kern    1965
2    California State College-Channel Islands    Camarillo    Ventura    
2002
3    California State College-Chico    Chico    Butte    1887
*/

CREATE TABLE "csu_fees" (
	"Campus" INTEGER PRIMARY KEY,
	"12 months" INTEGER,
	"CampusFee" INTEGER,
	FOREIGN KEY (Campus) REFERENCES Campuses(Id)
);
/*
Campus    12 months    CampusFee
1    1996    1951
2    2003    1868
3    1996    2042
*/

CREATE TABLE "levels" (
	"12 months" INTEGER,
	"Campus" INTEGER,
	"Levels" INTEGER,
	PRIMARY KEY (12 months, Campus),
	FOREIGN KEY (Campus) REFERENCES Campuses(Id)
);
/*
12 months    Campus    Levels
1990    1    701
1991    1    681
1992    1    791
*/

CREATE TABLE "discipline_enrollments" (
	"Campus" INTEGER,
	"Self-discipline" INTEGER,
	"12 months" INTEGER,
	"Undergraduate" INTEGER,
	"Graduate" INTEGER,
	PRIMARY KEY (Campus, Self-discipline),
	FOREIGN KEY (Campus) REFERENCES Campuses(Id)
);
/*
Campus    Self-discipline    12 months    Undergraduate    Graduate
1    4    2004    248    0
1    5    2004    811    73
1    6    2004    199    0
*/

CREATE TABLE "enrollments" (
	"Campus" INTEGER,
	"12 months" INTEGER,
	"TotalEnrollment_AY" INTEGER,
	"FTE_AY" INTEGER,
	PRIMARY KEY(Campus, 12 months),
	FOREIGN KEY (Campus) REFERENCES Campuses(Id)
);
/*
Campus    12 months    TotalEnrollment_AY    FTE_AY
1    1956    384    123
1    1957    432    151
1    1958    422    178
*/

CREATE TABLE "school" (
	"Campus" INTEGER,
	"12 months" INTEGER,
	"School" REAL,
	FOREIGN KEY (Campus) REFERENCES Campuses(Id)
);
/*
Campus    12 months    School
1    2002    357.1
2    2002    48.4
3    2002    742.8
*/

-- Please reply the next query utilizing the tables above.
-- Discover the title of the campuses that's in Northridge, Los Angeles or in 
San Francisco, San Francisco.
SELECT Campus FROM Campuses WHERE Location="Northridge" AND County="Los 
Angeles" UNION SELECT Campus 
FROM Campuses WHERE Location="San Francisco" AND County="San Francisco";


CREATE TABLE Allergy_Type (
       Allergy 		  VARCHAR(20) PRIMARY KEY,
       AllergyType 	  VARCHAR(20)
);

CREATE TABLE Has_Allergy (
       StuID 		 INTEGER,
       Allergy 		 VARCHAR(20),
       FOREIGN KEY(StuID) REFERENCES Pupil(StuID),
       FOREIGN KEY(Allergy) REFERENCES Allergy_Type(Allergy)
);

CREATE TABLE Pupil (
        StuID        INTEGER PRIMARY KEY,
        LName        VARCHAR(12),
        Fname        VARCHAR(12),
        Age      INTEGER,
        Intercourse      VARCHAR(1),
        Main        INTEGER,
        Advisor      INTEGER,
        city_code    VARCHAR(3)
 );

-- Please reply the next query utilizing the tables above.
-- Which allergy sort has most variety of allergic reactions?
SELECT AllergyType FROM Allergy_Type GROUP BY AllergyType ORDER BY rely(*) 
DESC LIMIT 1;


CREATE TABLE "constructing" (
"building_id" textual content,
"Identify" textual content,
"Street_address" textual content,
"Years_as_tallest" textual content,
"Height_feet" int,
"Flooring" int,
PRIMARY KEY("building_id")
);

CREATE TABLE "Establishment" (
"Institution_id"  textual content,
"Establishment" textual content,
"Location" textual content,
"Based" actual,
"Kind" textual content,
"Enrollment" int,
"Staff" textual content,
"Primary_Conference" textual content,
"building_id" textual content,
PRIMARY KEY("Institution_id"),
FOREIGN  KEY ("building_id") REFERENCES "constructing"("building_id")
);

CREATE TABLE "protein" (
"common_name" textual content,
"protein_name" textual content,
"divergence_from_human_lineage" actual,
"accession_number" textual content,
"sequence_length" actual,
"sequence_identity_to_human_protein" textual content,
"Institution_id" textual content,
PRIMARY KEY("common_name"),
FOREIGN KEY("Institution_id") REFERENCES "Establishment"("Institution_id")
);


-- Please reply the next query utilizing the tables above.
-- For every constructing, present the title of the constructing and the variety of 
establishments in it.
SELECT T1.title, rely(*) FROM constructing AS T1 JOIN Establishment AS T2 ON 
T1.building_id=
T2.building_id GROUP BY T1.building_id;


CREATE TABLE stadium (
Stadium_ID int,
Location textual content,
Identify textual content,
Capability int,
Highest int,
Lowest int,
Common int,
PRIMARY KEY (Stadium_ID)
)
/*
Stadium_ID   Location   Identify   Capability   Highest   Lowest   Common
1   Raith Rovers   Stark's Park   10104   4812   1294   2106
2   Ayr United   Somerset Park   11998   2363   1057   1477
3   East Fife   Bayview Stadium   2000   1980   533   864
*/

CREATE TABLE singer (
Singer_ID int,
Identify textual content,
Nation textual content,
Song_Name textual content,
Song_release_year textual content,
Age int,
Is_male bool,
PRIMARY KEY (Singer_ID)
)
/*
Singer_ID    Identify    Nation    Song_Name    Song_release_year    Age    
Is_male
1    Joe Sharp    Netherlands    You    1992    52    F
2    Timbaland    United States    Harmful    2008    32    T
3    Justin Brown    France    Hey Oh    2013    29    T
*/

CREATE TABLE live performance (
concert_ID int,
concert_Name textual content,
Theme textual content,
Stadium_ID textual content,
12 months textual content,
PRIMARY KEY (concert_ID),
FOREIGN KEY (Stadium_ID) REFERENCES stadium(Stadium_ID)
)
/*
concert_ID    concert_Name    Theme    Stadium_ID    12 months
1    Auditions    Free alternative    1    2014
2    Tremendous bootcamp    Free alternative 2    2    2014
3    Residence Visits    Bleeding Love    2    2015
*/

CREATE TABLE singer_in_concert (
concert_ID int,
Singer_ID textual content,
PRIMARY KEY (concert_ID,Singer_ID),
FOREIGN KEY (concert_ID) REFERENCES live performance(concert_ID),
FOREIGN KEY (Singer_ID) REFERENCES singer(Singer_ID)
)
/*
concert_ID    Singer_ID
1    2
1    3
1    5
*/

-- What number of singers do we now have?
SELECT

[ad_2]