第5章 多表查询
# 第5章 多表查询
在第2章中,我展示了如何通过一种称为规范化(normalization)的过程,将相关概念拆分为独立的部分。这个过程的最终结果是得到了两个表:person
表和favorite_food
表。然而,如果你想要生成一份显示个人姓名、地址和最喜欢食物的单一报告,就需要一种机制将这两个表中的数据重新组合在一起;这种机制被称为连接(join),本章将重点介绍最简单且最常见的连接类型——内连接(inner join)。第10章将展示所有不同的连接类型。
# 什么是连接?
对单个表进行查询并不罕见,但你会发现,大多数查询都需要涉及两个、三个甚至更多的表。为了说明这一点,让我们先查看employee
表和department
表的定义,然后定义一个从这两个表中检索数据的查询:
mysql> DESC employee;
+---------------------+---------------------------------+------+-----+---------+
| Field | Type | Null | Key | Default |
+---------------------+---------------------------------+------+-----+---------+
| emp_id | smallint(5) unsigned | NO | PRI | NULL |
| fname | varchar(20) | NO | | NULL |
| lname | varchar(20) | NO | | NULL |
| start_date | date | NO | | NULL |
| end_date | date | YES | | NULL |
| superior_emp_id | smallint(5) unsigned | YES | MUL | NULL |
| dept_id | smallint(5) unsigned | YES | MUL | NULL |
| title | varchar(20) | YES | | NULL |
| assigned_branch_id | smallint(5) unsigned | YES | MUL | NULL |
+---------------------+---------------------------------+------+-----+---------+
9 rows in set (0.11 sec)
mysql> DESC department;
+-----------+---------------------+------+-----+---------+
| Field | Type | Null | Key | Default |
+-----------+---------------------+------+-----+---------+
| dept_id | smallint(5) unsigned | No | PRI | NULL |
| name | varchar(20) | No | | NULL |
+-----------+---------------------+------+-----+---------+
2 rows in set (0.03 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
假设你想要检索每位员工的名字和姓氏,以及他们所属部门的名称。因此,你的查询需要检索employee.fname
、employee.lname
和department.name
列。但是,如何在同一个查询中从两个表中检索数据呢?答案在于employee.dept_id
列,该列保存了每位员工所属部门的ID(更正式地说,employee.dept_id
列是department
表的外键(foreign key))。你很快就会看到,该查询会指示服务器使用employee.dept_id
列作为employee
表和department
表之间的桥梁,从而允许将两个表中的列都包含在查询结果集中。这种操作被称为连接。
# 笛卡尔积
最简单的方法是将employee
表和department
表放入查询的FROM
子句中,看看会发生什么。下面这个查询检索员工的名字、姓氏以及部门名称,FROM
子句中列出了两个表,并用JOIN
关键字分隔:
mysql> SELECT e.fname, e.lname, d.name
-> FROM employee e JOIN department d;
+-----------+-----------------+---------------------+
| fname | lname | name |
+-----------+-----------------+---------------------+
| Michael | Smith | Operations |
| Michael | Smith | Loans |
| Michael | Smith | Administration |
| Susan | Barker | Operations |
| Susan | Barker | Loans |
| Susan | Barker | Administration |
| Robert | Tyler | Operations |
| Robert | Tyler | Loans |
| Robert | Tyler | Administration |
| Susan | Hawthorne | Operations |
| Susan | Hawthorne | Loans |
| Susan | Hawthorne | Administration |
| John | Gooding | Operations |
| John | Gooding | Loans |
| John | Gooding | Administration |
| Helen | Fleming | Operations |
| Helen | Fleming | Loans |
| Helen | Fleming | Administration |
| Chris | Tucker | Operations |
| Chris | Tucker | Loans |
| Chris | Tucker | Administration |
| Sarah | Parker | Operations |
| Sarah | Parker | Loans |
| Sarah | Parker | Administration |
| Jane | Grossman | Operations |
| Jane | Grossman | Loans |
| Jane | Grossman | Administration |
| Paula | Roberts | Operations |
| Paula | Roberts | Loans |
| Paula | Roberts | Administration |
| Thomas | Ziegler | Operations |
| Thomas | Ziegler | Loans |
| Thomas | Ziegler | Administration |
| Samantha | Jameson | Operations |
| Samantha | Jameson | Loans |
| Samantha | Jameson | Administration |
| John | Blake | Operations |
| John | Blake | Loans |
| John | Blake | Administration |
| Cindy | Mason | Operations |
| Cindy | Mason | Loans |
| Cindy | Mason | Administration |
| Frank | Portman | Operations |
| Frank | Portman | Loans |
| Frank | Portman | Administration |
| Theresa | Markham | Operations |
| Theresa | Markham | Loans |
| Theresa | Markham | Administration |
| Beth | Fowler | Operations |
| Beth | Fowler | Loans |
| Beth | Fowler | Administration |
| Rick | Tulman | Operations |
| Rick | Tulman | Loans |
| Rick | Tulman | Administration |
+-----------+-----------------+---------------------+
54 rows in set (0.23 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
嗯…… 只有18名员工和3个不同的部门,那么结果集怎么会有54行呢?仔细观察可以发现,这18名员工的信息重复了3次,除了部门名称之外,所有数据都相同。由于查询没有指定两个表应该如何连接,数据库服务器生成了笛卡尔积(Cartesian product),即两个表的所有排列组合(18名员工×3个部门 = 54种排列组合)。这种连接类型被称为交叉连接(cross join),并且很少被使用(至少是很少被有意使用)。交叉连接是我们将在第10章学习的连接类型之一。
# 内连接
要修改前面的查询,使结果集中只包含18行(每位员工对应一行),你需要描述两个表之间的关系。前面我展示过,employee.dept_id
列是两个表之间的链接,所以需要将这个信息添加到FROM
子句的ON
子句中:
mysql> SELECT e.fname, e.lname, d.name
-> FROM employee e JOIN department d
-> ON e.dept_id = d.dept_id;
+-----------+-----------------+---------------------+
| fname | lname | name |
+-----------+-----------------+---------------------+
| Michael | Smith | Administration |
| Susan | Barker | Administration |
| Robert | Tyler | Administration |
| Susan | Hawthorne | Operations |
| John | Gooding | Loans |
| Helen | Fleming | Operations |
| Chris | Tucker | Operations |
| Sarah | Parker | Operations |
| Jane | Grossman | Operations |
| Paula | Roberts | Operations |
| Thomas | Ziegler | Operations |
| Samantha | Jameson | Operations |
| John | Blake | Operations |
| Cindy | Mason | Operations |
| Frank | Portman | Operations |
| Theresa | Markham | Operations |
| Beth | Fowler | Operations |
| Rick | Tulman | Operations |
+-----------+-----------------+---------------------+
18 rows in set (0.00 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
由于添加了ON
子句,现在结果集有18行,这正是我们期望的行数。ON
子句指示服务器通过使用dept_id
列在两个表之间进行遍历,从而连接employee
表和department
表。例如,employee
表中Susan Hawthorne的行在dept_id
列中的值为1(示例中未显示)。服务器使用这个值在department
表中查找dept_id
列值为1的行,然后从该行的name
列中检索出“Operations”这个值。
如果一个表的dept_id
列中有值,而另一个表中没有,那么包含该值的行的连接就会失败,这些行将被排除在结果集之外。这种连接类型被称为内连接,它是最常用的连接类型。需要说明的是,如果department
表包含营销部门的第四行数据,但没有员工被分配到该部门,那么营销部门将不会包含在结果集中。同样,如果某些员工被分配到了department
表中不存在的部门ID 99,那么这些员工也将被排除在结果集之外。如果你希望无论是否存在匹配项,都包含一个表或另一个表中的所有行,则需要指定外连接(outer join),但本书后面会介绍这个内容。
在前面的示例中,我在FROM
子句中没有指定使用哪种连接类型。然而,当你希望使用内连接来连接两个表时,应该在FROM
子句中明确指定;下面是相同的示例,添加了连接类型(注意INNER
关键字):
mysql> SELECT e.fname, e.lname, d.name
-> FROM employee e INNER JOIN department d
-> ON e.dept_id = d.dept_id;
+-----------+-----------------+---------------------+
| fname | lname | name |
+-----------+-----------------+---------------------+
| Michael | Smith | Administration |
| Susan | Barker | Administration |
| Robert | Tyler | Administration |
| Susan | Hawthorne | Operations |
| John | Gooding | Loans |
| Helen | Fleming | Operations |
| Chris | Tucker | Operations |
| Sarah | Parker | Operations |
| Jane | Grossman | Operations |
| Paula | Roberts | Operations |
| Thomas | Ziegler | Operations |
| Samantha | Jameson | Operations |
| John | Blake | Operations |
| Cindy | Mason | Operations |
| Frank | Portman | Operations |
| Theresa | Markham | Operations |
| Beth | Fowler | Operations |
| Rick | Tulman | Operations |
+-----------+-----------------+---------------------+
18 rows in set (0.00 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
如果你不指定连接类型,服务器将默认进行内连接。然而,正如你将在本书后面看到的,连接类型有好几种,所以你应该养成明确指定所需连接类型的习惯。
如果用于连接两个表的列名相同(前面的查询就是这种情况),你可以使用USING
子句来代替ON
子句,如下所示:
mysql> SELECT e.fname, e.lname, d.name
-> FROM employee e INNER JOIN department d
-> USING (dept_id);
+-----------+-----------------+---------------------+
| fname | lname | name |
+-----------+-----------------+---------------------+
| Michael | Smith | Administration |
| Susan | Barker | Administration |
| Robert | Tyler | Administration |
| Susan | Hawthorne | Operations |
| John | Gooding | Loans |
| Helen | Fleming | Operations |
| Chris | Tucker | Operations |
| Sarah | Parker | Operations |
| Jane | Grossman | Operations |
| Paula | Roberts | Operations |
| Thomas | Ziegler | Operations |
| Samantha | Jameson | Operations |
| John | Blake | Operations |
| Cindy | Mason | Operations |
| Frank | Portman | Operations |
| Theresa | Markham | Operations |
| Beth | Fowler | Operations |
| Rick | Tulman | Operations |
+-----------+-----------------+---------------------+
18 rows in set (0.01 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
由于USING
是一种简写表示法,只能在特定情况下使用,所以我更喜欢始终使用ON
子句以避免混淆。
# ANSI连接语法
本书中用于表连接的表示法源自ANSI SQL标准的SQL92版本。所有主流数据库(甲骨文数据库(Oracle Database)、微软SQL Server、MySQL、IBM DB2通用数据库(IBM DB2 Universal Database)以及赛贝斯自适应服务器(Sybase Adaptive Server))均采用了SQL92连接语法。由于这些数据库服务器大多在SQL92规范发布前就已存在,所以它们也都包含一种旧的连接语法。例如,所有这些服务器都能理解前面查询的以下变体:
mysql> SELECT e.fname, e.lname, d.name
-> FROM employee e, department d
-> WHERE e.dept_id = d.dept_id;
+----------------+-----------------+----------------------------------+
| fname | lname | name |
+----------------+-----------------+----------------------------------+
| Michael | Smith | Administration |
| Susan | Barker | Administration |
| Robert | Tyler | Administration |
| Susan | Hawthorne | Operations |
| John | Gooding | Loans |
| Helen | Fleming | Operations |
| Chris | Tucker | Operations |
| Sarah | Parker | Operations |
| Jane | Grossman | Operations |
| Paula | Roberts | Operations |
| Thomas | Ziegler | Operations |
| Samantha | Jameson | Operations |
| John | Blake | Operations |
| Cindy | Mason | Operations |
| Frank | Portman | Operations |
| Theresa | Markham | Operations |
| Beth | Fowler | Operations |
| Rick | Tulman | Operations |
+----------------+-----------------+----------------------------------+
18 rows in set (0.01 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
这种旧的连接指定方法不包含on
子句;相反,在from
子句中用逗号分隔列出表名,连接条件则包含在where
子句中。虽然你可能会决定忽略SQL92语法而选择旧的连接语法,但ANSI连接语法具有以下优点:
- 连接条件和过滤条件被分别放在两个不同的子句中(即
on
子句和where
子句),这使得查询更容易理解。 - 每对表的连接条件都包含在各自的
on
子句中,从而降低了连接部分被误遗漏的可能性。 - 使用SQL92连接语法的查询可以在不同的数据库服务器间移植,而旧语法在不同服务器上会略有差异。 对于同时包含连接条件和过滤条件的复杂查询而言,SQL92连接语法的优势更为明显。考虑以下查询,它返回所有由经验丰富的出纳员(2007年前入职)目前分配到沃本分行(Woburn branch)所开设的账户:
mysql> SELECT a.account_id, a.cust_id, a.open_date, a.product_cd
-> FROM account a, branch b, employee e
-> WHERE a.open_emp_id = e.emp_id
-> AND e.start_date < '2007-01-01'
-> AND e.assigned_branch_id = b.branch_id
-> AND (e.title = 'Teller' OR e.title = 'Head Teller')
-> AND b.name = 'Woburn Branch';
+-------------+---------+-------------+------------+
| account_id | cust_id | open_date | product_cd |
+-------------+---------+-------------+------------+
| 1 | 1 | 2000-01-15 | CHK |
| 2 | 1 | 2000-01-15 | SAV |
| 3 | 1 | 2004-06-30 | CD |
| 4 | 2 | 2001-03-12 | CHK |
| 5 | 2 | 2001-03-12 | SAV |
| 17 | 7 | 2004-01-12 | CD |
| 27 | 11 | 2004-03-22 | BUS |
+-------------+---------+-------------+------------+
7 rows in set (0.00 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
在这个查询中,不太容易判断where
子句中的哪些条件是连接条件,哪些是过滤条件。也不容易直接看出使用了哪种连接类型(要确定连接类型,你需要仔细查看where
子句中的连接条件,看是否使用了特殊字符),而且也难以确定是否有连接条件被误遗漏。下面是使用SQL92连接语法的相同查询:
mysql> SELECT a.account_id, a.cust_id, a.open_date, a.product_cd
-> FROM account a INNER JOIN employee e
-> ON a.open_emp_id = e.emp_id
-> INNER JOIN branch b
-> ON e.assigned_branch_id = b.branch_id
-> WHERE e.start_date < '2007-01-01'
-> AND (e.title = 'Teller' OR e.title = 'Head Teller')
-> AND b.name = 'Woburn Branch';
+-------------+---------+-------------+------------+
| account_id | cust_id | open_date | product_cd |
+-------------+---------+-------------+------------+
| 1 | 1 | 2000-01-15 | CHK |
| 1 | 1 | 2000-01-15 | SAV |
| 1 | 1 | 2004-06-30 | CD |
| 2 | 2 | 2001-03-12 | CHK |
| 2 | 2 | 2001-03-12 | SAV |
| 17 | 7 | 2004-01-12 | CD |
| 27 | 11 | 2004-03-22 | BUS |
+-------------+---------+-------------+------------+
7 rows in set (0.05 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
希望你会认同,使用SQL92连接语法的版本更容易理解。
# 连接三个或更多表
连接三个表与连接两个表类似,但有一个细微差别。在两表连接中,from
子句中有两个表和一种连接类型,并且有一个on
子句来定义表的连接方式。在三表连接中,from
子句中有三个表和两种连接类型,以及两个on
子句。下面是另一个两表连接查询的示例:
mysql> SELECT a.account_id, c.fed_id
-> FROM account a INNER JOIN customer c
-> ON a.cust_id = c.cust_id
-> WHERE c.cust_type_cd = 'B';
+-------------+------------------+
| account_id | fed_id |
+-------------+------------------+
| 24 | 04-1111111 |
| 25 | 04-1111111 |
| 27 | 04-2222222 |
| 28 | 04-3333333 |
| 29 | 04-4444444 |
+-------------+------------------+
5 rows in set (0.15 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
这个查询返回所有企业账户的账户ID和联邦税号,现在来看应该相当直观。然而,如果在查询中添加employee
表,以便同时检索开设每个账户的出纳员的姓名,查询会如下所示:
mysql> SELECT a.account_id, c.fed_id, e.fname, e.lname
-> FROM account a INNER JOIN customer c
-> ON a.cust_id = c.cust_id
-> INNER JOIN employee e
-> ON a.open_emp_id = e.emp_id
-> WHERE c.cust_type_cd = 'B';
+-------------+------------------+----------+----------+
| account_id | fed_id | fname | lname |
+-------------+------------------+----------+----------+
| 24 | 04-1111111 | Theresa | Markham |
| 25 | 04-1111111 | Theresa | Markham |
| 27 | 04-2222222 | Paula | Roberts |
| 28 | 04-3333333 | Theresa | Markham |
| 29 | 04-4444444 | John | Blake |
+-------------+------------------+----------+----------+
5 rows in set (0.00 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
现在,from
子句中列出了三个表、两种连接类型和两个on
子句,所以内容变得更加繁杂。乍一看,表名的顺序可能会让你认为employee
表正在与customer
表连接,因为首先列出的是account
表,然后是customer
表,最后是employee
表。然而,如果你交换前两个表的顺序,你会得到完全相同的结果:
mysql> SELECT a.account_id, c.fed_id, e.fname, e.lname
-> FROM customer c INNER JOIN account a
-> ON a.cust_id = c.cust_id
-> INNER JOIN employee e
-> ON a.open_emp_id = e.emp_id
-> WHERE c.cust_type_cd = 'B';
+-------------+------------------+----------+----------+
| account_id | fed_id | fname | lname |
+-------------+------------------+----------+----------+
| 24 | 04-1111111 | Theresa | Markham |
| 25 | 04-1111111 | Theresa | Markham |
| 27 | 04-2222222 | Paula | Roberts |
| 28 | 04-3333333 | Theresa | Markham |
| 29 | 04-4444444 | John | Blake |
+-------------+------------------+----------+----------+
5 rows in set (0.09 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
现在customer
表排在第一位,接着是account
表,然后是employee
表。由于on
子句没有改变,所以结果是相同的。为了完整性,下面再次给出相同的查询,但表的顺序完全颠倒(从employee
到account
再到customer
):
mysql> SELECT a.account_id, c.fed_id, e.fname, e.lname
-> FROM employee e INNER JOIN account a
-> ON e.emp_id = a.open_emp_id
-> INNER JOIN customer c
-> ON a.cust_id = c.cust_id
-> WHERE c.cust_type_cd = 'B';
+-------------+------------------+----------+----------+
| account_id | fed_id | fname | lname |
+-------------+------------------+----------+----------+
| 24 | 04-1111111 | Theresa | Markham |
| 25 | 04-1111111 | Theresa | Markham |
| 27 | 04-2222222 | Paula | Roberts |
| 28 | 04-3333333 | Theresa | Markham |
| 29 | 04-4444444 | John | Blake |
+-------------+------------------+----------+----------+
5 rows in set (0.00 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 连接顺序重要吗?
如果你对账户/员工/客户查询的三个版本为什么会产生相同的结果感到困惑,请记住,SQL是一种非过程性语言。这意味着你只需描述想要检索的内容以及涉及哪些数据库对象,而由数据库服务器来决定如何以最佳方式执行你的查询。服务器会利用从数据库对象收集的统计信息,从三个表中选择一个作为起点(所选的表此后称为驱动表),然后决定按什么顺序连接其余的表。因此,表在from
子句中出现的顺序并不重要。
然而,如果你认为查询中的表应该始终按特定顺序连接,在MySQL中,你可以按所需顺序放置表,然后指定关键字STRAIGHT_JOIN
;在SQL Server中,你可以请求FORCE ORDER
选项;在甲骨文数据库(Oracle Database)中,你可以使用ORDERED
或LEADING
优化器提示。例如,要告诉MySQL服务器使用customer
表作为驱动表,然后连接account
表和employee
表,你可以这样做:
mysql> SELECT STRAIGHT_JOIN a.account_id, c.fed_id, e.fname, e.lname
-> FROM customer c INNER JOIN account a
-> ON a.cust_id = c.cust_id
-> INNER JOIN employee e
-> ON a.open_emp_id = e.emp_id
-> WHERE c.cust_type_cd = 'B';
2
3
4
5
6
可以把使用三个或更多表的查询想象成一个从山上滚下的雪球。前两个表推动雪球滚动,随后的每个表在雪球下山时堆积在上面。你可以把雪球看作是中间结果集,随着后续表的连接,它会包含越来越多的列。因此,employee
表实际上并不是与account
表连接,而是与customer
表和account
表连接时创建的中间结果集连接。(如果你想知道我为什么选择雪球这个比喻,是因为我在新英格兰的冬天写这一章:到目前为止已经下了110英寸的雪,明天还会下更多。真开心。)
# 使用子查询作为表
你已经见过几个使用三个表的查询示例,但有一个变体值得一提:如果部分数据集是由子查询生成的,该怎么办。子查询是第9章的重点,但我在上一章已经介绍了在from
子句中使用子查询的概念。下面是前面一个查询(查找目前分配到沃本分行的经验丰富的出纳员开设的所有账户)的另一个版本,它将account
表与针对branch
表和employee
表的子查询进行连接:
1 SELECT a.account_id, a.cust_id, a.open_date, a.product_cd
2 FROM account a INNER JOIN
3 (SELECT emp_id, assigned_branch_id
4 FROM employee
5 WHERE start_date < '2007-01-01'
6 AND (title = 'Teller' OR title = 'Head Teller')) e
7 ON a.open_emp_id = e.emp_id
8 INNER JOIN
9 (SELECT branch_id
10 FROM branch
11 WHERE name = 'Woburn Branch') b
12 ON e.assigned_branch_id = b.branch_id;
2
3
4
5
6
7
8
9
10
11
12
第一个子查询从第3行开始,别名为e
,它查找所有经验丰富的出纳员。第二个子查询从第9行开始,别名为b
,它查找沃本分行的ID。首先,account
表通过员工ID与经验丰富的出纳员子查询进行连接,然后将结果表通过分行ID与沃本分行子查询进行连接。结果与前一个版本的查询相同(你可以自己试试),但查询看起来却大不相同。
这里并没有什么令人惊讶的地方,但可能需要一些时间来弄清楚是怎么回事。例如,注意主查询中没有where
子句;由于所有过滤条件都是针对employee
表和branch
表的,所以过滤条件都在子查询中,因此主查询中不需要任何过滤条件。一种可视化过程的方法是运行子查询并查看结果集。以下是针对employee
表的第一个子查询的结果:
mysql> SELECT emp_id, assigned_branch_id
-> FROM employee
-> WHERE start_date < '2007-01-01'
-> AND (title = 'Teller' OR title = 'Head Teller');
+--------+-----------------------+
| emp_id | assigned_branch_id |
+--------+-----------------------+
| 8 | 1 |
| 9 | 1 |
| 10 | 2 |
| 11 | 2 |
| 13 | 3 |
| 14 | 3 |
| 16 | 4 |
| 17 | 4 |
| 18 | 4 |
+--------+-----------------------+
9 rows in set (0.03 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
这个结果集由一组员工ID及其相应的分行ID组成。当它们通过emp_id
列与account
表连接时,你现在就有了一个中间结果集,其中包含account
表的所有行,并且额外增加了一列,用于存放开设每个账户的员工所在分行的ID。以下是针对branch
表的第二个子查询的结果:
mysql> SELECT branch_id
-> FROM branch
-> WHERE name = 'Woburn Branch';
+-----------+
| branch_id |
+-----------+
| 2 |
+-----------+
1 row in set (0.02 sec)
2
3
4
5
6
7
8
9
这个查询返回一行,其中包含单个列:沃本分行的ID。这个表与中间结果集的assigned_branch_id
列进行连接,从而将所有由非沃本分行员工开设的账户从最终结果集中过滤掉。
# 多次使用同一个表
在连接多个表时,你可能会发现需要多次连接同一个表。例如,在示例数据库中,account
表(账户开户的分支机构)和employee
表(员工工作的分支机构)都有指向branch
表的外键。如果你想在结果集中同时包含这两个分支机构的信息,你可以在from
子句中两次引入branch
表,一次与employee
表连接,一次与account
表连接。为使此操作生效,你需要给branch
表的每个实例一个不同的别名,这样服务器就能知道在各个子句中你引用的是哪一个实例,如下所示:
mysql> SELECT a.account_id, e.emp_id,
-> b_a.name open_branch, b_e.name emp_branch
-> FROM account a INNER JOIN branch b_a
-> ON a.open_branch_id = b_a.branch_id
-> INNER JOIN employee e
-> ON a.open_emp_id = e.emp_id
-> INNER JOIN branch b_e
-> ON e.assigned_branch_id = b_e.branch_id
-> WHERE a.product_cd = 'CHK';
+------------------+-----------+-----------------------+-----------------------+
| account_id | emp_id | open_branch | emp_branch |
+------------------+-----------+-----------------------+-----------------------+
| 10 | 1 | Headquarters | Headquarters |
| 14 | 1 | Headquarters | Headquarters |
| 21 | 1 | Headquarters | Headquarters |
| 1 | 10 | Woburn Branch | Woburn Branch |
| 4 | 10 | Woburn Branch | Woburn Branch |
| 7 | 13 | Quincy Branch | Quincy Branch |
| 13 | 16 | So. NH Branch | So. NH Branch |
| 18 | 16 | So. NH Branch | So. NH Branch |
| 24 | 16 | So. NH Branch | So. NH Branch |
| 28 | 16 | So. NH Branch | So. NH Branch |
+------------------+-----------+-----------------------+-----------------------+
10 rows in set (0.16 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
这个查询展示了每个支票账户的开户人、开户的分支机构,以及开户员工目前所属的分支机构。branch
表被引入了两次,分别使用别名b_a
和b_e
。通过给branch
表的每个实例赋予不同的别名,服务器就能理解你引用的是与account
表连接的实例,还是与employee
表连接的实例。因此,这是一个需要使用表别名的查询示例。
# 自连接
你不仅可以在同一个查询中多次引入同一个表,实际上还可以将一个表与自身进行连接。乍一看,这可能有点奇怪,但这么做是有合理原因的。例如,employee
表包含一个自引用外键,这意味着它有一个列(superior_emp_id
)指向同一张表中的主键。这个列指向员工的经理(除非该员工是最高领导,这种情况下该列为空值)。使用自连接,你可以编写一个查询,列出每个员工的姓名以及其经理的姓名:
mysql> SELECT e.fname, e.lname,
-> e_mgr.fname mgr_fname, e_mgr.lname mgr_lname
-> FROM employee e INNER JOIN employee e_mgr
-> ON e.superior_emp_id = e_mgr.emp_id;
+-----------------+-----------------+-----------------+-----------------+
| fname | lname | mgr_fname | mgr_lname |
+-----------------+-----------------+-----------------+-----------------+
| Susan | Barker | Michael | Smith |
| Robert | Tyler | Michael | Smith |
| Susan | Hawthorne | Robert | Tyler |
| John | Gooding | Susan | Hawthorne |
| Helen | Fleming | Susan | Hawthorne |
| Chris | Tucker | Helen | Fleming |
| Sarah | Parker | Helen | Fleming |
| Jane | Grossman | Helen | Fleming |
| Paula | Roberts | Susan | Hawthorne |
| Thomas | Ziegler | Paula | Roberts |
| Samantha | Jameson | Paula | Roberts |
| John | Blake | Susan | Hawthorne |
| Cindy | Mason | John | Blake |
| Frank | Portman | John | Blake |
| Theresa | Markham | Susan | Hawthorne |
| Beth | Fowler | Theresa | Markham |
| Rick | Tulman | Theresa | Markham |
+-----------------+-----------------+-----------------+-----------------+
17 rows in set (0.00 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
这个查询包含了employee
表的两个实例:一个用于提供员工姓名(表别名是e
),另一个用于提供经理姓名(表别名是e_mgr
)。on
子句使用这些别名,通过superior_emp_id
外键将employee
表与自身进行连接。这是另一个需要使用表别名的查询示例;否则,服务器将无法知道你引用的是员工还是员工的经理。
虽然employee
表中有18行记录,但该查询只返回了17行;银行行长迈克尔·史密斯(Michael Smith)没有上级(他的superior_emp_id
列是空值),所以针对他那一行的连接失败了。要将迈克尔·史密斯包含在结果集中,你需要使用外连接(outer join),我们将在第10章介绍外连接。
# 等值连接与非等值连接
到目前为止展示的所有多表查询都使用了等值连接,这意味着只有当两个表中的值匹配时,连接才会成功。等值连接总是使用等号,例如:
ON e.assigned_branch_id = b.branch_id
虽然你的大多数查询会使用等值连接,但你也可以通过值的范围来连接表,这被称为非等值连接。下面是一个通过值的范围进行连接的查询示例:
SELECT e.emp_id, e.fname, e.lname, e.start_date
FROM employee e INNER JOIN product p
ON e.start_date >= p.date_offered
AND e.start_date <= p.date_retired
WHERE p.name = 'no-fee checking';
2
3
4
5
这个查询连接了两个没有外键关系的表。其目的是找出在“无手续费支票”产品推出期间开始在银行工作的所有员工。因此,员工的入职日期必须在该产品推出日期和停止发售日期之间。
你可能还会遇到自非等值连接的需求,即使用非等值连接将一个表与自身连接。例如,假设运营经理决定为所有银行出纳员举办一场国际象棋比赛。你被要求创建一份所有对阵组合的列表。你可能会尝试将employee
表中所有出纳员(title = 'Teller'
)的记录进行自连接,并返回所有emp_id
不匹配的行(因为一个人不能和自己下棋):
mysql> SELECT e1.fname, e1.lname, 'VS' vs, e2.fname, e2.lname
-> FROM employee e1 INNER JOIN employee e2
-> ON e1.emp_id != e2.emp_id
-> WHERE e1.title = 'Teller' AND e2.title = 'Teller';
+-----------------+-----------------+-------+-----------------+-----------------+
| fname | lname | vs | fname | lname |
+-----------------+-----------------+-------+-----------------+-----------------+
| Sarah | Parker | VS | Chris | Tucker |
| Jane | Grossman | VS | Chris | Tucker |
| Thomas | Ziegler | VS | Chris | Tucker |
| Samantha | Jameson | VS | Chris | Tucker |
| Cindy | Mason | VS | Chris | Tucker |
| Frank | Portman | VS | Chris | Tucker |
| Beth | Fowler | VS | Chris | Tucker |
| Rick | Tulman | VS | Chris | Tucker |
| Chris | Tucker | VS | Sarah | Parker |
| Jane | Grossman | VS | Sarah | Parker |
| Thomas | Ziegler | VS | Sarah | Parker |
| Samantha | Jameson | VS | Sarah | Parker |
| Cindy | Mason | VS | Sarah | Parker |
| Frank | Portman | VS | Sarah | Parker |
| Beth | Fowler | VS | Sarah | Parker |
| Rick | Tulman | VS | Sarah | Parker |
| Chris | Tucker | VS | Jane | Grossman |
| Thomas | Ziegler | VS | Jane | Grossman |
| Samantha | Jameson | VS | Jane | Grossman |
| Cindy | Mason | VS | Jane | Grossman |
| Frank | Portman | VS | Jane | Grossman |
| Beth | Fowler | VS | Jane | Grossman |
| Rick | Tulman | VS | Jane | Grossman |
| Chris | Tucker | VS | Thomas | Ziegler |
| Sarah | Parker | VS | Thomas | Ziegler |
| Jane | Grossman | VS | Thomas | Ziegler |
| Samantha | Jameson | VS | Thomas | Ziegler |
| Cindy | Mason | VS | Thomas | Ziegler |
| Frank | Portman | VS | Thomas | Ziegler |
| Beth | Fowler | VS | Thomas | Ziegler |
| Rick | Tulman | VS | Thomas | Ziegler |
| Chris | Tucker | VS | Samantha | Jameson |
| Sarah | Parker | VS | Samantha | Jameson |
| Jane | Grossman | VS | Samantha | Jameson |
| Thomas | Ziegler | VS | Samantha | Jameson |
| Cindy | Mason | VS | Samantha | Jameson |
| Frank | Portman | VS | Samantha | Jameson |
| Beth | Fowler | VS | Samantha | Jameson |
| Rick | Tulman | VS | Samantha | Jameson |
| Chris | Tucker | VS | Cindy | Mason |
| Sarah | Parker | VS | Cindy | Mason |
| Jane | Grossman | VS | Cindy | Mason |
| Thomas | Ziegler | VS | Cindy | Mason |
| Samantha | Jameson | VS | Cindy | Mason |
| Frank | Portman | VS | Cindy | Mason |
| Beth | Fowler | VS | Cindy | Mason |
| Rick | Tulman | VS | Cindy | Mason |
| Chris | Tucker | VS | Frank | Portman |
| Sarah | Parker | VS | Frank | Portman |
| Jane | Grossman | VS | Frank | Portman |
| Thomas | Ziegler | VS | Frank | Portman |
| Samantha | Jameson | VS | Frank | Portman |
| Cindy | Mason | VS | Frank | Portman |
| Beth | Fowler | VS | Frank | Portman |
| Rick | Tulman | VS | Frank | Portman |
| Chris | Tucker | VS | Beth | Fowler |
| Sarah | Parker | VS | Beth | Fowler |
| Jane | Grossman | VS | Beth | Fowler |
| Thomas | Ziegler | VS | Beth | Fowler |
| Samantha | Jameson | VS | Beth | Fowler |
| Cindy | Mason | VS | Beth | Fowler |
| Frank | Portman | VS | Beth | Fowler |
| Rick | Tulman | VS | Beth | Fowler |
| Chris | Tucker | VS | Rick | Tulman |
| Sarah | Parker | VS | Rick | Tulman |
| Jane | Grossman | VS | Rick | Tulman |
| Thomas | Ziegler | VS | Rick | Tulman |
| Samantha | Jameson | VS | Rick | Tulman |
| Cindy | Mason | VS | Rick | Tulman |
| Frank | Portman | VS | Rick | Tulman |
| Beth | Fowler | VS | Rick | Tulman |
+-----------------+-----------------+-------+-----------------+-----------------+
72 rows in set (0.01 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
你的思路是对的,但这里存在一个问题:对于每一组对阵(例如,莎拉·帕克对阵克里斯·塔克 ),还会有一组反向对阵(例如,克里斯·塔克对阵莎拉·帕克)。要得到想要的结果,一种方法是使用连接条件e1.emp_id < e2.emp_id
,这样每个出纳员(teller)就只会与员工ID更高的出纳员进行配对(如果你愿意,也可以使用e1.emp_id > e2.emp_id
):
mysql> SELECT e1.fname, e1.lname, 'VS' vs, e2.fname, e2.lname
-> FROM employee e1 INNER JOIN employee e2
-> ON e1.emp_id < e2.emp_id
-> WHERE e1.title = 'Teller' AND e2.title = 'Teller';
+-----------------+-----------------+-------+-----------------+-----------------+
| fname | lname | vs | fname | lname |
+-----------------+-----------------+-------+-----------------+-----------------+
| Chris | Tucker | VS | Sarah | Parker |
| Chris | Tucker | VS | Jane | Grossman |
| Sarah | Parker | VS | Jane | Grossman |
| Chris | Tucker | VS | Thomas | Ziegler |
| Sarah | Parker | VS | Thomas | Ziegler |
| Jane | Grossman | VS | Thomas | Ziegler |
| Chris | Tucker | VS | Samantha | Jameson |
| Sarah | Parker | VS | Samantha | Jameson |
| Jane | Grossman | VS | Samantha | Jameson |
| Thomas | Ziegler | VS | Samantha | Jameson |
| Chris | Tucker | VS | Cindy | Mason |
| Sarah | Parker | VS | Cindy | Mason |
| Jane | Grossman | VS | Cindy | Mason |
| Thomas | Ziegler | VS | Cindy | Mason |
| Samantha | Jameson | VS | Cindy | Mason |
| Chris | Tucker | VS | Frank | Portman |
| Sarah | Parker | VS | Frank | Portman |
| Jane | Grossman | VS | Frank | Portman |
| Thomas | Ziegler | VS | Frank | Portman |
| Samantha | Jameson | VS | Frank | Portman |
| Cindy | Mason | VS | Frank | Portman |
| Chris | Tucker | VS | Beth | Fowler |
| Sarah | Parker | VS | Beth | Fowler |
| Jane | Grossman | VS | Beth | Fowler |
| Thomas | Ziegler | VS | Beth | Fowler |
| Samantha | Jameson | VS | Beth | Fowler |
| Cindy | Mason | VS | Beth | Fowler |
| Frank | Portman | VS | Beth | Fowler |
| Chris | Tucker | VS | Rick | Tulman |
| Sarah | Parker | VS | Rick | Tulman |
| Jane | Grossman | VS | Rick | Tulman |
| Thomas | Ziegler | VS | Rick | Tulman |
| Samantha | Jameson | VS | Rick | Tulman |
| Cindy | Mason | VS | Rick | Tulman |
| Frank | Portman | VS | Rick | Tulman |
| Beth | Fowler | VS | Rick | Tulman |
+-----------------+-----------------+-------+-----------------+-----------------+
36 rows in set (0.00 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
现在你有了一个包含36组对阵的列表,从9个不同的对象中选择两两组合时,这是正确的组合数量。
# 连接条件与过滤条件
你现在已经熟悉了连接条件应放在on
子句中,而过滤条件应放在where
子句中的概念。然而,SQL在条件放置位置上具有灵活性,所以在构建查询时你需要格外小心。例如,下面这个查询使用单个连接条件连接两个表,并在where
子句中包含单个过滤条件:
mysql> SELECT a.account_id, a.product_cd, c.fed_id
-> FROM account a INNER JOIN customer c
-> ON a.cust_id = c.cust_id
-> WHERE c.cust_type_cd = 'B';
+------------------+------------------+------------------+
| account_id | product_cd | fed_id |
+------------------+------------------+------------------+
| 24 | CHK | 04-1111111 |
| 25 | BUS | 04-1111111 |
| 27 | BUS | 04-2222222 |
| 28 | CHK | 04-3333333 |
| 29 | SBL | 04-4444444 |
+------------------+------------------+------------------+
5 rows in set (0.01 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
这很简单直接,但如果你错误地把过滤条件放在on
子句而不是where
子句中会怎样呢?
mysql> SELECT a.account_id, a.product_cd, c.fed_id
-> FROM account a INNER JOIN customer c
-> ON a.cust_id = c.cust_id
-> AND c.cust_type_cd = 'B';
+------------------+------------------+------------------+
| account_id | product_cd | fed_id |
+------------------+------------------+------------------+
| 24 | CHK | 04-1111111 |
| 25 | BUS | 04-1111111 |
| 27 | BUS | 04-2222222 |
| 28 | CHK | 04-3333333 |
| 29 | SBL | 04-4444444 |
+------------------+------------------+------------------+
5 rows in set (0.01 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
如你所见,第二个版本把两个条件都放在on
子句中且没有where
子句,却生成了相同的结果。如果把两个条件都放在where
子句中,但from
子句仍使用ANSI连接语法会怎样呢?
mysql> SELECT a.account_id, a.product_cd, c.fed_id
-> FROM account a INNER JOIN customer c
-> WHERE a.cust_id = c.cust_id
-> AND c.cust_type_cd = 'B';
+------------------+------------------+------------------+
| account_id | product_cd | fed_id |
+------------------+------------------+------------------+
| 24 | CHK | 04-1111111 |
| 25 | BUS | 04-1111111 |
| 27 | BUS | 04-2222222 |
| 28 | CHK | 04-3333333 |
| 29 | SBL | 04-4444444 |
+------------------+------------------+------------------+
5 rows in set (0.01 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
MySQL服务器再次生成了相同的结果集。你需要把条件放在合适的位置,这样你的查询才易于理解和维护。
# 知识测验
以下练习旨在测试你对内部连接(inner join)的理解。练习答案请见附录C。
# 练习5 - 1
填写以下查询中的空白处(用<#>
表示),以获得后续的结果:
mysql> SELECT e.emp_id, e.fname, e.lname, b.name
-> FROM employee e INNER JOIN <1> b
-> ON e.assigned_branch_id = b.<2>;
+-----------+-----------------+-----------------+-------------------------------+
| emp_id | fname | lname | name |
+-----------+-----------------+-----------------+-------------------------------+
| 1 | Michael | Smith | Headquarters |
| 2 | Susan | Barker | Headquarters |
| 3 | Robert | Tyler | Headquarters |
| 4 | Susan | Hawthorne | Headquarters |
| 5 | John | Gooding | Headquarters |
| 6 | Helen | Fleming | Headquarters |
| 7 | Chris | Tucker | Headquarters |
| 8 | Sarah | Parker | Headquarters |
| 9 | Jane | Grossman | Headquarters |
| 10 | Paula | Roberts | Woburn Branch |
| 11 | Thomas | Ziegler | Woburn Branch |
| 12 | Samantha | Jameson | Woburn Branch |
| 13 | John | Blake | Quincy Branch |
| 14 | Cindy | Mason | Quincy Branch |
| 15 | Frank | Portman | Quincy Branch |
| 16 | Theresa | Markham | So. NH Branch |
| 17 | Beth | Fowler | So. NH Branch |
| 18 | Rick | Tulman | So. NH Branch |
+-----------+-----------------+-----------------+-------------------------------+
18 rows in set (0.03 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# 练习5 - 2
编写一个查询,返回每个非商业客户(customer.cust_type_cd = 'I'
)的账户ID,以及客户的联邦ID(customer.fed_id
)和账户所基于的产品名称(product.name
)。
# 练习5 - 3
构建一个查询,查找所有直属上级(supervisor)被分配到不同部门的员工。检索员工的ID、名字和姓氏。