第10章 再谈连接
# 第10章 再谈连接
现在,你应该已经对第5章中介绍的内连接(inner join)概念很熟悉了。本章将重点介绍其他连接表的方式,包括外连接(outer join)和交叉连接(cross join)。
# 外连接
到目前为止,在所有涉及多个表的示例中,我们都没有考虑到连接条件可能无法找到表中所有行的匹配项这种情况。例如,在将account
表与customer
表进行连接时,我没有提及account
表中cust_id
列的值可能与customer
表中cust_id
列的值不匹配的可能性。如果出现这种情况,那么其中一个表的某些行将被排除在结果集之外。
为了确保无误,我们来检查表中的数据。以下是account
表中的account_id
和cust_id
列:
mysql> SELECT account_id,
-> cust_id
-> FROM account;
+------------+---------+
| account_id | cust_id |
+------------+---------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
| 7 | 3 |
| 8 | 3 |
| 10 | 4 |
| 11 | 4 |
| 12 | 4 |
| 13 | 5 |
| 14 | 6 |
| 15 | 6 |
| 17 | 7 |
| 18 | 8 |
| 19 | 8 |
| 21 | 9 |
| 22 | 9 |
| 23 | 9 |
| 24 | 10 |
| 25 | 10 |
| 27 | 11 |
| 28 | 12 |
| 29 | 13 |
+------------+---------+
24 rows in set (1.50 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
有24个账户,涉及13个不同的客户,客户ID从1到13,每个ID至少对应一个账户。以下是customer
表中的客户ID集合:
mysql> SELECT cust_id
-> FROM customer;
+---------+
| cust_id |
+---------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
| 11 |
| 12 |
| 13 |
+---------+
13 rows in set (0.02 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
customer
表中有13行数据,ID从1到13,所以每个客户ID至少在account
表中出现一次。因此,当两个表根据cust_id
列进行连接时,预计结果集将包含所有24行数据(除非有其他过滤条件):
mysql> SELECT a.account_id,
-> c.cust_id
-> FROM account a
-> INNER JOIN customer c
-> ON a.cust_id = c.cust_id;
+------------+---------+
| account_id | cust_id |
+------------+---------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
| 7 | 3 |
| 8 | 3 |
| 10 | 4 |
| 11 | 4 |
| 12 | 4 |
| 13 | 5 |
| 14 | 6 |
| 15 | 6 |
| 17 | 7 |
| 18 | 8 |
| 19 | 8 |
| 21 | 9 |
| 22 | 9 |
| 23 | 9 |
| 24 | 10 |
| 25 | 10 |
| 27 | 11 |
| 28 | 12 |
| 29 | 13 |
+------------+---------+
24 rows in set (0.06 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
正如预期的那样,结果集中包含了所有24个账户。但是,如果将account
表与其中一个特定的客户表(如business
表)进行连接,会发生什么呢?
mysql> SELECT a.account_id,
-> b.cust_id,
-> b.name
-> FROM account a
-> INNER JOIN business b
-> ON a.cust_id = b.cust_id;
+------------+---------+------------------------+
| account_id | cust_id | name |
+------------+---------+------------------------+
| 24 | 10 | Chilton Engineering |
| 25 | 10 | Chilton Engineering |
| 27 | 11 | Northeast Cooling Inc. |
| 28 | 12 | Superior Auto Body |
| 29 | 13 | AAA Insurance Inc. |
+------------+---------+------------------------+
5 rows in set (0.10 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
结果集中现在只有5行数据,而不是24行。我们来查看business
表,看看原因:
mysql> SELECT cust_id,
-> name
-> FROM business;
+---------+------------------------+
| cust_id | name |
+---------+------------------------+
| 10 | Chilton Engineering |
| 11 | Northeast Cooling Inc. |
| 12 | Superior Auto Body |
| 13 | AAA Insurance Inc. |
+---------+------------------------+
4 rows in set (0.01 sec)
2
3
4
5
6
7
8
9
10
11
12
在customer
表的13行数据中,只有4个是企业客户,并且由于其中一个企业客户有两个账户,所以account
表中总共有5行数据与企业客户相关联。
但是,如果你希望查询返回所有账户,并且仅当账户与企业客户相关联时才包含企业名称,该怎么办呢?在这种情况下,你需要在account
表和business
表之间进行外连接,如下所示:
mysql> SELECT a.account_id,
-> a.cust_id,
-> b.name
-> FROM account a
-> LEFT OUTER JOIN business b
-> ON a.cust_id = b.cust_id;
+------------+---------+------------------------+
| account_id | cust_id | name |
+------------+---------+------------------------+
| 1 | 1 | NULL |
| 2 | 1 | NULL |
| 3 | 1 | NULL |
| 4 | 2 | NULL |
| 5 | 2 | NULL |
| 7 | 3 | NULL |
| 8 | 3 | NULL |
| 10 | 4 | NULL |
| 11 | 4 | NULL |
| 12 | 4 | NULL |
| 13 | 5 | NULL |
| 14 | 6 | NULL |
| 15 | 6 | NULL |
| 17 | 7 | NULL |
| 18 | 8 | NULL |
| 19 | 8 | NULL |
| 21 | 9 | NULL |
| 22 | 9 | NULL |
| 23 | 9 | NULL |
| 24 | 10 | Chilton Engineering |
| 25 | 10 | Chilton Engineering |
| 27 | 11 | Northeast Cooling Inc. |
| 28 | 12 | Superior Auto Body |
| 29 | 13 | AAA Insurance Inc. |
+------------+---------+------------------------+
24 rows in set (0.04 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
外连接会包含一个表中的所有行,并且仅当找到匹配行时才包含第二个表中的数据。在这种情况下,由于我指定了LEFT OUTER JOIN
,并且account
表在连接定义的左侧,所以account
表中的所有行都被包含在内。除了四个企业客户(客户ID为10、11、12和13)之外,name
列对于所有行都是NULL
。以下是一个类似的查询,这次是与individual
表进行外连接,而不是business
表:
mysql> SELECT a.account_id,
-> a.cust_id,
-> i.fname,
-> i.lname
-> FROM account a
-> LEFT OUTER JOIN individual i
-> ON a.cust_id = i.cust_id;
+------------+---------+----------+---------+
| account_id | cust_id | fname | lname |
+------------+---------+----------+---------+
| 1 | 1 | James | Hadley |
| 2 | 1 | James | Hadley |
| 3 | 1 | James | Hadley |
| 4 | 2 | Susan | Tingley |
| 5 | 2 | Susan | Tingley |
| 7 | 3 | Frank | Tucker |
| 8 | 3 | Frank | Tucker |
| 10 | 4 | John | Hayward |
| 11 | 4 | John | Hayward |
| 12 | 4 | John | Hayward |
| 13 | 5 | Charles | Frasier |
| 14 | 6 | John | Spencer |
| 15 | 6 | John | Spencer |
| 17 | 7 | Margaret | Young |
| 18 | 8 | George | Blake |
| 19 | 8 | George | Blake |
| 21 | 9 | Richard | Farley |
| 22 | 9 | Richard | Farley |
| 23 | 9 | Richard | Farley |
| 24 | 10 | NULL | NULL |
| 25 | 10 | NULL | NULL |
| 27 | 11 | NULL | NULL |
| 28 | 12 | NULL | NULL |
| 29 | 13 | NULL | NULL |
+------------+---------+----------+---------+
24 rows in set (0.09 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
这个查询本质上与上一个查询相反:为个人客户提供了名字和姓氏,而对于企业客户,这些列是NULL
。
# 左外连接与右外连接
在上一节的每个外连接示例中,我都指定了LEFT OUTER JOIN
。关键字LEFT
表示连接左侧的表负责确定结果集中的行数,而右侧的表用于在找到匹配项时提供列值。考虑以下查询:
mysql> SELECT c.cust_id,
-> b.name
-> FROM customer c
-> LEFT OUTER JOIN business b
-> ON c.cust_id = b.cust_id;
+---------+------------------------+
| cust_id | name |
+---------+------------------------+
| 1 | NULL |
| 2 | NULL |
| 3 | NULL |
| 4 | NULL |
| 5 | NULL |
| 6 | NULL |
| 7 | NULL |
| 8 | NULL |
| 9 | NULL |
| 10 | Chilton Engineering |
| 11 | Northeast Cooling Inc. |
| 12 | Superior Auto Body |
| 13 | AAA Insurance Inc. |
+---------+------------------------+
13 rows in set (0.00 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
FROM
子句指定了左外连接(left outer join),所以customer
表中的所有13行数据都包含在结果集中,business
表为四个企业客户在结果集的第二列中提供值。如果你执行相同的查询,但指定RIGHT OUTER JOIN
,你会看到以下结果集:
mysql> SELECT c.cust_id,
-> b.name
-> FROM customer c
-> RIGHT OUTER JOIN business b
-> ON c.cust_id = b.cust_id;
+---------+------------------------+
| cust_id | name |
+---------+------------------------+
| 10 | Chilton Engineering |
| 11 | Northeast Cooling Inc. |
| 12 | Superior Auto Body |
| 13 | AAA Insurance Inc. |
+---------+------------------------+
4 rows in set (0.00 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
现在,结果集中的行数由business
表中的行数决定,这就是为什么结果集中只有4行数据。
请记住,这两个查询都在执行外连接;关键字LEFT
和RIGHT
只是为了告诉服务器哪个表的数据可以存在空缺。如果你想对表A和表B进行外连接,并且希望包含表A中的所有行,以及在有匹配数据时包含表B中的额外列,你可以指定A LEFT OUTER JOIN B
或B RIGHT OUTER JOIN A
。
# 三向外部连接
在某些情况下,你可能想要将一个表与另外两个表进行外连接。例如,你可能希望得到一份所有账户的列表,对于个人账户,显示客户的名和姓;对于企业账户,则显示企业名称。示例如下:
mysql> SELECT a.account_id,
-> a.product_cd,
-> CONCAT(i.fname, ' ', i.lname) person_name,
-> b.name business_name
-> FROM account a
-> LEFT OUTER JOIN individual i
-> ON a.cust_id = i.cust_id
-> LEFT OUTER JOIN business b
-> ON a.cust_id = b.cust_id;
+------------+------------+-----------------+------------------------+
| account_id | product_cd | person_name | business_name |
+------------+------------+-----------------+------------------------+
| 1 | CHK | James Hadley | NULL |
| 2 | SAV | James Hadley | NULL |
| 3 | CD | James Hadley | NULL |
| 4 | CHK | Susan Tingley | NULL |
| 5 | SAV | Susan Tingley | NULL |
| 7 | CHK | Frank Tucker | NULL |
| 8 | MM | Frank Tucker | NULL |
| 10 | CHK | John Hayward | NULL |
| 11 | SAV | John Hayward | NULL |
| 12 | MM | John Hayward | NULL |
| 13 | CHK | Charles Frasier | NULL |
| 14 | CHK | John Spencer | NULL |
| 15 | CD | John Spencer | NULL |
| 17 | CD | Margaret Young | NULL |
| 18 | CHK | George Blake | NULL |
| 19 | SAV | George Blake | NULL |
| 21 | CHK | Richard Farley | NULL |
| 22 | MM | Richard Farley | NULL |
| 23 | CD | Richard Farley | NULL |
| 24 | CHK | NULL | Chilton Engineering |
| 25 | BUS | NULL | Chilton Engineering |
| 27 | BUS | NULL | Northeast Cooling Inc. |
| 28 | CHK | NULL | Superior Auto Body |
| 29 | SBL | NULL | AAA Insurance Inc. |
+------------+------------+-----------------+------------------------+
24 rows in set (0.08 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
结果集包含account
表中的全部24行数据,以及来自两个外连接表中的个人姓名或企业名称。
我不知道MySQL对于可与同一表进行外连接的表数量是否存在限制,但你始终可以使用子查询来减少查询中的连接数量。例如,你可以将前面的示例重写如下:
mysql> SELECT account_ind.account_id,
-> account_ind.product_cd,
-> account_ind.person_name,
-> b.name business_name
-> FROM (
SELECT a.account_id,
a.product_cd,
a.cust_id,
CONCAT(i.fname, ' ', i.lname) person_name
FROM account a
LEFT OUTER JOIN individual i
ON a.cust_id = i.cust_id
) account_ind
-> LEFT OUTER JOIN business b
-> ON account_ind.cust_id = b.cust_id;
+------------+------------+-----------------+------------------------+
| account_id | product_cd | person_name | business_name |
+------------+------------+-----------------+------------------------+
| 1 | CHK | James Hadley | NULL |
| 2 | SAV | James Hadley | NULL |
| 3 | CD | James Hadley | NULL |
| 4 | CHK | Susan Tingley | NULL |
| 5 | SAV | Susan Tingley | NULL |
| 7 | CHK | Frank Tucker | NULL |
| 8 | MM | Frank Tucker | NULL |
| 10 | CHK | John Hayward | NULL |
| 11 | SAV | John Hayward | NULL |
| 12 | MM | John Hayward | NULL |
| 13 | CHK | Charles Frasier | NULL |
| 14 | CHK | John Spencer | NULL |
| 15 | CD | John Spencer | NULL |
| 17 | CD | Margaret Young | NULL |
| 18 | CHK | George Blake | NULL |
| 19 | SAV | George Blake | NULL |
| 21 | CHK | Richard Farley | NULL |
| 22 | MM | Richard Farley | NULL |
| 23 | CD | Richard Farley | NULL |
| 24 | CHK | NULL | Chilton Engineering |
| 25 | BUS | NULL | Chilton Engineering |
| 27 | BUS | NULL | Northeast Cooling Inc. |
| 28 | CHK | NULL | Superior Auto Body |
| 29 | SBL | NULL | AAA Insurance Inc. |
+------------+------------+-----------------+------------------------+
24 rows in set (0.08 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
在这个查询版本中,individual
表在名为account_ind
的子查询中与account
表进行外连接,子查询的结果再与business
表进行外连接。因此,每个查询(子查询和包含子查询的主查询)都只使用了一个外连接。如果你使用的不是MySQL数据库,并且想要对多个表进行外连接,可能就需要采用这种策略。
# 自外连接
在第5章中,我向你介绍了自连接的概念,即一个表与自身进行连接。下面是第5章中的一个自连接示例,它将employee
表与自身连接,以生成员工及其主管的列表:
mysql> SELECT e.fname,
-> e.lname,
-> e_mgr.fname mgr_fname,
-> e_mgr.lname mgr_lname
-> FROM employee e
-> INNER JOIN employee e_mgr
-> ON e.superior_emp_id = e_mgr.emp_id;
+----------+-----------+-----------+-----------+
| fname | lname | mgr_fname | mgr_lname |
+----------+-----------+-----------+-----------+
| Susan | Barker | Michael | Smith |
| Robert | Tyler | Michael | Smith |
| Susan | Hawthorne | Robert | Tyler |
| John | Gooding | Susan | Hawthorne |
| Helen | Fleming | Susan | Hawthorne |
| Chris | Tucker | Helen | Fleming |
| Sarah | Parker | Helen | Fleming |
| Jane | Grossman | Helen | Fleming |
| Paula | Roberts | Susan | Hawthorne |
| Thomas | Ziegler | Paula | Roberts |
| Samantha | Jameson | Paula | Roberts |
| John | Blake | Susan | Hawthorne |
| Cindy | Mason | John | Blake |
| Frank | Portman | John | Blake |
| Theresa | Markham | Susan | Hawthorne |
| Beth | Fowler | Theresa | Markham |
| Rick | Tulman | Theresa | Markham |
+----------+-----------+-----------+-----------+
17 rows in set (0.02 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
这个查询运行良好,但存在一个小问题:没有主管的员工被排除在了结果集之外。不过,通过将连接类型从内连接更改为外连接,结果集将包含所有员工,包括那些没有主管的员工:
mysql> SELECT e.fname,
-> e.lname,
-> e_mgr.fname mgr_fname,
-> e_mgr.lname mgr_lname
-> FROM employee e
-> LEFT OUTER JOIN employee e_mgr
-> ON e.superior_emp_id = e_mgr.emp_id;
+----------+-----------+-----------+-----------+
| fname | lname | mgr_fname | mgr_lname |
+----------+-----------+-----------+-----------+
| Michael | Smith | NULL | NULL |
| Susan | Barker | Michael | Smith |
| Robert | Tyler | Michael | Smith |
| Susan | Hawthorne | Robert | Tyler |
| John | Gooding | Susan | Hawthorne |
| Helen | Fleming | Susan | Hawthorne |
| Chris | Tucker | Helen | Fleming |
| Sarah | Parker | Helen | Fleming |
| Jane | Grossman | Helen | Fleming |
| Paula | Roberts | Susan | Hawthorne |
| Thomas | Ziegler | Paula | Roberts |
| Samantha | Jameson | Paula | Roberts |
| John | Blake | Susan | Hawthorne |
| Cindy | Mason | John | Blake |
| Frank | Portman | John | Blake |
| Theresa | Markham | Susan | Hawthorne |
| Beth | Fowler | Theresa | Markham |
| Rick | Tulman | Theresa | Markham |
+----------+-----------+-----------+-----------+
18 rows in set (0.00 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
现在的结果集包含了迈克尔·史密斯(Michael Smith),他是银行行长,因此没有主管。该查询使用左外连接生成所有员工的列表,并且在适用的情况下,列出他们的主管。如果你将连接更改为右外连接,将会看到以下结果:
mysql> SELECT e.fname,
-> e.lname,
-> e_mgr.fname mgr_fname,
-> e_mgr.lname mgr_lname
-> FROM employee e
-> RIGHT OUTER JOIN employee e_mgr
-> ON e.superior_emp_id = e_mgr.emp_id;
+----------+-----------+-----------+-----------+
| fname | lname | mgr_fname | mgr_lname |
+----------+-----------+-----------+-----------+
| Susan | Barker | Michael | Smith |
| Robert | Tyler | Michael | Smith |
| NULL | NULL | Susan | Barker |
| Susan | Hawthorne | Robert | Tyler |
| John | Gooding | Susan | Hawthorne |
| Helen | Fleming | Susan | Hawthorne |
| Paula | Roberts | Susan | Hawthorne |
| John | Blake | Susan | Hawthorne |
| Theresa | Markham | Susan | Hawthorne |
| NULL | NULL | John | Gooding |
| Chris | Tucker | Helen | Fleming |
| Sarah | Parker | Helen | Fleming |
| Jane | Grossman | Helen | Fleming |
| NULL | NULL | Chris | Tucker |
| NULL | NULL | Sarah | Parker |
| NULL | NULL | Jane | Grossman |
| Thomas | Ziegler | Paula | Roberts |
| Samantha | Jameson | Paula | Roberts |
| NULL | NULL | Thomas | Ziegler |
| NULL | NULL | Samantha | Jameson |
| Cindy | Mason | John | Blake |
| Frank | Portman | John | Blake |
| NULL | NULL | Cindy | Mason |
| NULL | NULL | Frank | Portman |
| Beth | Fowler | Theresa | Markham |
| Rick | Tulman | Theresa | Markham |
| NULL | NULL | Beth | Fowler |
| NULL | NULL | Rick | Tulman |
+----------+-----------+-----------+-----------+
28 rows in set (0.00 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
这个查询展示了每个主管(仍在第三列和第四列)以及他或她所管理的员工集合。因此,迈克尔·史密斯作为苏珊·巴克(Susan Barker)和罗伯特·泰勒(Robert Tyler)的主管出现了两次;苏珊·巴克作为主管,但没有管理任何人(第一列和第二列为空值)出现了一次。所有18名员工至少在第三列和第四列中出现一次,如果一名员工管理多名员工,则会出现多次,结果集中总共有28行。这与前面的查询结果大不相同,而仅仅是因为更改了一个关键字(从左外连接改为右外连接)。因此,在使用外连接时,一定要仔细考虑是指定左外连接还是右外连接。
# 交叉连接
在第5章中,我介绍了笛卡尔积(Cartesian product)的概念,它本质上是在不指定任何连接条件的情况下连接多个表的结果。笛卡尔积经常被意外使用(例如,忘记在from
子句中添加连接条件),但在其他情况下并不常见。不过,如果你确实想要生成两个表的笛卡尔积,应该指定交叉连接,示例如下:
mysql> SELECT pt.name,
-> p.product_cd,
-> p.name
-> FROM product p
-> CROSS JOIN product_type pt;
+-------------------------------+------------+-------------------------+
| name | product_cd | name |
+-------------------------------+------------+-------------------------+
| Customer Accounts | AUT | auto loan |
| Customer Accounts | BUS | business line of credit |
| Customer Accounts | CD | certificate of deposit |
| Customer Accounts | CHK | checking account |
| Customer Accounts | MM | money market account |
| Customer Accounts | MRT | home mortgage |
| Customer Accounts | SAV | savings account |
| Customer Accounts | SBL | small business loan |
| Insurance Offerings | AUT | auto loan |
| Insurance Offerings | BUS | business line of credit |
| Insurance Offerings | CD | certificate of deposit |
| Insurance Offerings | CHK | checking account |
| Insurance Offerings | MM | money market account |
| Insurance Offerings | MRT | home mortgage |
| Insurance Offerings | SAV | savings account |
| Insurance Offerings | SBL | small business loan |
| Individual and Business Loans | AUT | auto loan |
| Individual and Business Loans | BUS | business line of credit |
| Individual and Business Loans | CD | certificate of deposit |
| Individual and Business Loans | CHK | checking account |
| Individual and Business Loans | MM | money market account |
| Individual and Business Loans | MRT | home mortgage |
| Individual and Business Loans | SAV | savings account |
| Individual and Business Loans | SBL | small business loan |
+-------------------------------+------------+-------------------------+
24 rows in set (0.00 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
这个查询生成了product
表和product_type
表的笛卡尔积,结果有24行(8个产品行×3个产品类型行)。既然你已经知道了什么是交叉连接以及如何指定它,那么它有什么用呢?大多数SQL书籍会描述交叉连接是什么,然后告诉你它很少有用,但我想和你分享一种我认为交叉连接非常有用的情况。
在第9章中,我讨论了如何使用子查询来构造表。我给出的示例展示了如何构建一个有三行的表,该表可与其他表进行连接。以下是示例中构造的表:
mysql> SELECT 'Small Fry' name,
-> 0 low_limit,
-> 4999.99 high_limit
-> UNION ALL
-> SELECT 'Average Joes' name,
-> 5000 low_limit,
-> 9999.99 high_limit
-> UNION ALL
-> SELECT 'Heavy Hitters' name,
-> 10000 low_limit,
-> 9999999.99 high_limit;
+---------------+-----------+------------+
| name | low_limit | high_limit |
+---------------+-----------+------------+
| Small Fry | 0 | 4999.99 |
| Average Joes | 5000 | 9999.99 |
| Heavy Hitters | 10000 | 9999999.99 |
+---------------+-----------+------------+
3 rows in set (0.00 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
虽然这个表对于根据客户的总账户余额将他们划分为三组来说是完全适用的,但如果你需要构造一个大表,使用集合运算符UNION ALL
合并单行表的这种策略就不太好用了。
比如说,你想要创建一个查询,为2008年的每一天生成一行数据,但是你的数据库中没有包含每一天数据的表。按照第9章示例中的策略,你可能会这样做:
SELECT '2008-01-01' dt UNION ALL
SELECT '2008-01-02' dt UNION ALL
SELECT '2008-01-03' dt UNION ALL
...
...
...
SELECT '2008-12-29' dt UNION ALL
SELECT '2008-12-30' dt UNION ALL
SELECT '2008-12-31' dt
2
3
4
5
6
7
8
9
构建一个合并366个查询结果的查询有点繁琐,所以或许需要换一种策略。要是生成一个有366行(2008年是闰年)且只有一列,该列包含0到366之间数字的表,然后将这个数字代表的天数加到2008年1月1日上面,会怎么样呢?下面是一种生成这样一个表的可能方法:
mysql> SELECT ones.num + tens.num + hundreds.num
-> FROM
-> (
SELECT 0 num
UNION ALL
SELECT 1 num
UNION ALL
SELECT 2 num
UNION ALL
SELECT 3 num
UNION ALL
SELECT 4 num
UNION ALL
SELECT 5 num
UNION ALL
SELECT 6 num
UNION ALL
SELECT 7 num
UNION ALL
SELECT 8 num
UNION ALL
SELECT 9 num
) ones
-> CROSS JOIN
-> (
SELECT 0 num
UNION ALL
SELECT 10 num
UNION ALL
SELECT 20 num
UNION ALL
SELECT 30 num
UNION ALL
SELECT 40 num
UNION ALL
SELECT 50 num
UNION ALL
SELECT 60 num
UNION ALL
SELECT 70 num
UNION ALL
SELECT 80 num
UNION ALL
SELECT 90 num
) tens
-> CROSS JOIN
-> (
SELECT 0 num
UNION ALL
SELECT 100 num
UNION ALL
SELECT 200 num
UNION ALL
SELECT 300 num
) hundreds;
+------------------------------------+
| ones.num + tens.num + hundreds.num |
+------------------------------------+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
| 11 |
| 12 |
...
...
...
| 391 |
| 392 |
| 393 |
| 394 |
| 395 |
| 396 |
| 397 |
| 398 |
| 399 |
+------------------------------------+
400 rows in set (0.00 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
如果你对集合{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}、{0, 10, 20, 30, 40, 50, 60, 70, 80, 90}和{0, 100, 200, 300}进行笛卡尔积(Cartesian product)运算,并将三列中的值相加,会得到一个包含400行的结果集,其中包含0到399之间的所有数字。虽然这比生成2008年的日期集所需的366行多,但去掉多余的行很容易,我很快就会告诉你怎么做。
下一步是将数字集转换为日期集。为此,我将使用date_add()
函数,把结果集中的每个数字加到2008年1月1日。然后,我会添加一个过滤条件,去掉任何进入2009年的日期:
mysql> SELECT DATE_ADD('2008-01-01',
-> INTERVAL (ones.num + tens.num + hundreds.num) DAY) dt
-> FROM
-> (SELECT 0 num UNION ALL
-> SELECT 1 num UNION ALL
-> SELECT 2 num UNION ALL
-> SELECT 3 num UNION ALL
-> SELECT 4 num UNION ALL
-> SELECT 5 num UNION ALL
-> SELECT 6 num UNION ALL
-> SELECT 7 num UNION ALL
-> SELECT 8 num UNION ALL
-> SELECT 9 num) ones
-> CROSS JOIN
-> (SELECT 0 num UNION ALL
-> SELECT 10 num UNION ALL
-> SELECT 20 num UNION ALL
-> SELECT 30 num UNION ALL
-> SELECT 40 num UNION ALL
-> SELECT 50 num UNION ALL
-> SELECT 60 num UNION ALL
-> SELECT 70 num UNION ALL
-> SELECT 80 num UNION ALL
-> SELECT 90 num) tens
-> CROSS JOIN
-> (SELECT 0 num UNION ALL
-> SELECT 100 num UNION ALL
-> SELECT 200 num UNION ALL
-> SELECT 300 num) hundreds
-> WHERE DATE_ADD('2008-01-01',
-> INTERVAL (ones.num + tens.num + hundreds.num) DAY) < '2009-01-01'
-> ORDER BY 1;
+---------------------+
| dt |
+---------------------+
| |
| |
| |
| |
| |
| |
| |
| |
| |
| 2008-01-01 |
| 2008-01-02 |
| 2008-01-03 |
| 2008-01-04 |
| 2008-01-05 |
| 2008-01-06 |
| 2008-01-07 |
| 2008-01-08 |
| 2008-01-09 |
| 2008-01-10 |
...
...
...
| 2008-02-20 |
| 2008-02-21 |
| 2008-02-22 |
| 2008-02-23 |
| 2008-02-24 |
| 2008-02-25 |
| 2008-02-26 |
| 2008-02-27 |
| 2008-02-28 |
| 2008-02-29 |
| 2008-03-01 |
...
...
...
| 2008-12-20 |
| 2008-12-21 |
| 2008-12-22 |
| 2008-12-23 |
| 2008-12-24 |
| 2008-12-25 |
| 2008-12-26 |
| 2008-12-27 |
| 2008-12-28 |
| 2008-12-29 |
| 2008-12-30 |
| 2008-12-31 |
+---------------------+
366 rows in set (0.01 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
这种方法的好处在于,结果集会自动包含额外的闰日(2月29日),无需你手动干预,因为数据库服务器在将59天加到2008年1月1日时会自行计算出来。
既然你已经有了生成2008年所有日期的方法,那该怎么用呢?嗯,可能会要求你编写一个查询,显示2008年的每一天,以及当天进行的银行交易数量、当天开设的账户数量等等。下面是一个回答第一个问题的示例:
mysql> SELECT days.dt, COUNT(t.txn_id)
-> FROM transaction t RIGHT OUTER JOIN
-> (SELECT DATE_ADD('2008-01-01',
-> INTERVAL (ones.num + tens.num + hundreds.num) DAY) dt
-> FROM
-> (SELECT 0 num UNION ALL
-> SELECT 1 num UNION ALL
-> SELECT 2 num UNION ALL
-> SELECT 3 num UNION ALL
-> SELECT 4 num UNION ALL
-> SELECT 5 num UNION ALL
-> SELECT 6 num UNION ALL
-> SELECT 7 num UNION ALL
-> SELECT 8 num UNION ALL
-> SELECT 9 num) ones
-> CROSS JOIN
-> (SELECT 0 num UNION ALL
-> SELECT 10 num UNION ALL
-> SELECT 20 num UNION ALL
-> SELECT 30 num UNION ALL
-> SELECT 40 num UNION ALL
-> SELECT 50 num UNION ALL
-> SELECT 60 num UNION ALL
-> SELECT 70 num UNION ALL
-> SELECT 80 num UNION ALL
-> SELECT 90 num) tens
-> CROSS JOIN
-> (SELECT 0 num UNION ALL
-> SELECT 100 num UNION ALL
-> SELECT 200 num UNION ALL
-> SELECT 300 num) hundreds
-> WHERE DATE_ADD('2008-01-01',
-> INTERVAL (ones.num + tens.num + hundreds.num) DAY) <
-> '2009-01-01') days
-> ON days.dt = t.txn_date
-> GROUP BY days.dt
-> ORDER BY 1;
+---------------------+-----------------+
| dt | COUNT(t.txn_id) |
+---------------------+-----------------+
| 2008-01-01 | 0 |
| 2008-01-02 | 0 |
| 2008-01-03 | 0 |
| 2008-01-04 | 0 |
| 2008-01-05 | 21 |
| 2008-01-06 | 0 |
| 2008-01-07 | 0 |
| 2008-01-08 | 0 |
| 2008-01-09 | 0 |
| 2008-01-10 | 0 |
| 2008-01-11 | 0 |
| 2008-01-12 | 0 |
| 2008-01-13 | 0 |
| 2008-01-14 | 0 |
| 2008-01-15 | 0 |
...
| 2008-12-31 | 0 |
+---------------------+-----------------+
366 rows in set (0.03 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
这是本书目前为止比较有趣的查询之一,因为它包含了交叉连接(cross joins)、外连接(outer joins)、日期函数、分组、集合操作(union all
)和聚合函数(count()
)。这也不是解决给定问题最优雅的方法,但它可以作为一个示例,展示只要有点创造力并熟练掌握这门语言,你就能让像交叉连接这样很少使用的特性成为SQL工具包中的有力工具。
# 自然连接(Natural Joins)
如果你偷懒(其实我们都偷懒),可以选择一种连接类型,让你只需指定要连接的表,而让数据库服务器确定连接条件。这种连接类型称为自然连接,它依赖于多个表中相同的列名来推断正确的连接条件。例如,account
表包含一个名为cust_id
的列,它是customer
表的外键,而customer
表的主键也叫cust_id
。因此,你可以编写一个使用自然连接来连接这两个表的查询:
mysql> SELECT a.account_id, a.cust_id, c.cust_type_cd, c.fed_id
-> FROM account a NATURAL JOIN customer c;
+-------------+---------+------------------+------------------+
| account_id | cust_id | cust_type_cd | fed_id |
+-------------+---------+------------------+------------------+
| 1 | 1 | 111-11-1111 |
| 1 | 1 | 111-11-1111 |
| 1 | 1 | 111-11-1111 |
| 2 | 1 | 222-22-2222 |
| 2 | 1 | 222-22-2222 |
| 3 | 1 | 333-33-3333 |
| 3 | 1 | 333-33-3333 |
| 4 | 1 | 444-44-4444 |
| 4 | 1 | 444-44-4444 |
| 4 | 1 | 444-44-4444 |
| 11 | 5 | 555-55-5555 |
| 12 | 6 | 666-66-6666 |
| 13 | 6 | 666-66-6666 |
| 14 | 7 | 777-77-7777 |
| 15 | 8 | 888-88-8888 |
| 16 | 8 | 888-88-8888 |
| 17 | 9 | 999-99-9999 |
| 18 | 9 | 999-99-9999 |
| 19 | 9 | 999-99-9999 |
| 20 | 10 | B | 04-1111111 |
| 21 | 10 | B | 04-1111111 |
| 22 | 11 | B | 04-2222222 |
| 23 | 12 | B | 04-3333333 |
| 24 | 13 | B | 04-4444444 |
+-------------+---------+------------------+------------------+
24 rows in set (0.02 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
因为你指定了自然连接,服务器会检查表定义,并添加连接条件a.cust_id = c.cust_id
来连接这两个表。
这一切都很好,但如果表之间的列名不一样会怎么样呢?例如,account
表也有一个到branch
表的外键,但account
表中的列名为open_branch_id
,而不是branch_id
。让我们看看如果在account
表和branch
表之间使用自然连接会发生什么:
mysql> SELECT a.account_id, a.cust_id, a.open_branch_id, b.name
-> FROM account a NATURAL JOIN branch b;
+-------------+---------+------------------+---------------------+
| account_id | cust_id | open_branch_id | name |
+-------------+---------+------------------+---------------------+
| 1 | 1 | 2 | Headquarters |
| 1 | 1 | 2 | Woburn Branch |
| 1 | 1 | 2 | Quincy Branch |
| 1 | 1 | 2 | So. NH Branch |
| 2 | 1 | 2 | Headquarters |
| 2 | 1 | 2 | Woburn Branch |
| 2 | 1 | 2 | Quincy Branch |
| 2 | 1 | 2 | So. NH Branch |
| 3 | 1 | 2 | Headquarters |
| 3 | 1 | 2 | Woburn Branch |
| 3 | 1 | 2 | Quincy Branch |
| 3 | 1 | 2 | So. NH Branch |
| 4 | 2 | 2 | Headquarters |
| 4 | 2 | 2 | Woburn Branch |
| 4 | 2 | 2 | Quincy Branch |
| 4 | 2 | 2 | So. NH Branch |
| 5 | 2 | 2 | Headquarters |
| 5 | 2 | 2 | Woburn Branch |
| 5 | 2 | 2 | Quincy Branch |
| 5 | 2 | 2 | So. NH Branch |
| 7 | 3 | 3 | Headquarters |
| 7 | 3 | 3 | Woburn Branch |
| 7 | 3 | 3 | Quincy Branch |
| 7 | 3 | 3 | So. NH Branch |
| 8 | 3 | 3 | Headquarters |
...
...
...
+-------------+---------+------------------+---------------------+
96 rows in set (0.07 sec)
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
看起来出问题了;由于account
表中有24行数据,这个查询应该返回不超过24行。实际发生的情况是,由于服务器在两个表中找不到两个同名的列,所以没有生成连接条件,而是对两个表进行了交叉连接,结果产生了96行(24个账户×4个分支机构)。
那么,不用输入连接条件,减少手指的劳累,这点好处值得带来这些麻烦吗?绝对不值得;你应该避免使用这种连接类型,而是使用带有明确连接条件的内连接(inner joins)。
# 知识测验
以下练习考查你对外连接和交叉连接的理解。答案请见附录C。
# 练习10 - 1
编写一个查询,返回所有产品名称以及基于该产品的账户信息(使用account
表中的product_cd
列连接到product
表)。包括所有产品,即使没有为该产品开设任何账户。
# 练习10 - 2
重新编写练习10 - 1中的查询,使用另一种外连接类型(例如,如果你在练习10 - 1中使用了左外连接,这次就使用右外连接),使得结果与练习10 - 1相同。
# 练习10 - 3
通过account.cust_id
列将account
表与individual
表和business
表进行外连接,使得结果集每行代表一个账户。要包含的列有account.account_id
、account.product_cd
、individual.fname
、individual.lname
和business.name
。
# 练习10 - 4(附加题)
设计一个查询,生成集合{1, 2, 3, ..., 99, 100}。(提示:使用至少两个from
子句子查询进行交叉连接。)