cpp20completeguides

第10章格式化输出

# 第10章格式化输出

C++的输入输出流库（IOStream library）提供的格式化输出方式（指定字段宽度、填充字符等）既不方便又有限。因此，格式化输出常常仍在使用类似sprintf()这样的函数。

C++20引入了一个新的格式化输出库，本章将对其进行介绍。该库能方便地指定格式化属性，并且具有可扩展性。

# 10.1 格式化输出示例

在深入探讨细节之前，让我们先来看一些具有启发性的示例。

# 10.1.1 使用`std::format()`

格式化库为应用程序员提供的基本函数是std::format()。它能让程序员将格式化字符串与传递的参数值相结合，根据一对花括号内指定的格式进行填充。一个简单的示例是使用每个传递参数的值：

#include <format>

std::string str{ "hello "};
...
std::cout << std::format("String '{}' has {} chars\n", str, str.size());

1
2
3
4
5

定义在<format>头文件中的std::format()函数，接受一个格式化字符串（在编译时已知的字符串字面量、字符串或字符串视图），其中{...}代表下一个参数的值（这里使用其类型的默认格式化方式）。它返回一个std::string，并为其分配内存。

这个示例的输出如下：

String ’hello’ has 5 chars

在花括号开启后的可选整数值指定了参数的索引，这样你可以按不同顺序处理参数，或者多次使用它们。例如：

std::cout << std::format("{1} is the size of string '{}'\n", str, str.size());

输出如下：

5 is the size of string ’hello’

注意，你无需显式指定参数的类型。这意味着你可以在泛型代码中轻松使用std::format()。考虑以下示例：

void print2(const auto& arg1, const auto& arg2) {
    std::cout << std::format("args: {} and {}\n", arg1, arg2);
}

1
2
3

如果你像这样调用这个函数：

print2(7.7, true);
print2("character: ", '?');

1
2

输出将如下：

args: 7.7 and true
args: character:  and  ?

1
2

如果支持格式化输出，格式化功能甚至对用户自定义类型也有效。chrono库的格式化输出就是一个例子。像下面这样的调用：

print2(std::chrono::system_clock::now(), std::chrono::seconds{13});

可能会有如下输出：

args: 2022-06-19 08:46:45.3881410 and 13s

对于你自己定义的类型，你需要一个格式化器，后面会对此进行介绍。

在格式化占位符中，冒号后面可以指定传递参数的格式化细节。例如，你可以定义字段宽度：

std::format("{:7}", 42)                     // 输出 "        42"
std::format("{:7}", 42.0)                   // 输出 "        42"
std::format("{:7} ", 'x')                   // 输出 "x           "
std::format("{:7} ", true)                  // 输出 "true     "

1
2
3
4

注意，不同类型有不同的默认对齐方式。还要注意，对于bool类型，false和true会被打印出来，而不像使用<<运算符的输入输出流那样打印为0和1。

你还可以显式指定对齐方式（<表示左对齐，^表示居中对齐，>表示右对齐），并指定填充字符：

std::format("{:*<7}", 42)    // 输出 "42*****"
std::format("{:*>7}", 42)    // 输出 "*****42"
std::format("{:*^7}", 42)    // 输出 "**42***"

1
2
3

还可以进行一些其他的格式化指定，比如强制使用特定的表示法、特定的精度（或将字符串限制为特定大小）、填充字符或正号：

std::format("{:7.2f} Euro ", 42.0)   // 输出 "    42.00 Euro"
std::format("{:7.4} ", "corner ")    // 输出 "corn      "

1
2

通过使用参数的位置，我们可以以多种形式打印一个值。例如：

std::cout << std::format("'{}' has value {0:02X} {0:+4d} {0:03o}\n", '?');
std::cout << std::format("'{}' has value {0:02X} {0:+4d} {0:03o}\n", 'y');

1
2

上述代码会使用实际的字符集打印出?和y的十六进制、十进制和八进制值，输出可能如下：

’?’ has value 3F    +63  077
’y’ has value 79  +121  171

1
2

# 10.1.2 使用`std::format_to_n()`

与其他格式化方式相比，std::format()的实现性能相当不错。然而，它需要为结果字符串分配内存。为了节省时间，你可以使用std::format_to_n()，它会写入预先分配好的字符数组。你必须同时指定要写入的缓冲区及其大小。例如：

char buffer[64];
...
auto ret = std::format_to_n(buffer, std::size(buffer) - 1,
    "String '{}' has {} chars\n", str, str.size());
*(ret.out) = '\0';

1
2
3
4
5

或者：

std::array<char, 64> buffer;
...
auto ret = std::format_to_n(buffer.begin(), buffer.size() - 1,
    "String '{}' has {} chars\n", str, str.size());
*(ret.out) = '\0';   // 写入末尾的空字符

1
2
3
4
5

注意，std::format_to_n()不会写入末尾的空字符。不过，返回值包含了处理这个问题的所有信息。它是一个std::format_to_n_result类型的数据结构，有两个成员：

out表示第一个未写入字符的位置。
size表示在不截断为传递大小的情况下会写入的字符数。

因此，我们在ret.out指向的位置存储一个空字符。注意，我们只向std::format_to_n()传递buffer.size() - 1，以确保我们有空间存储末尾的空字符：

auto ret = std::format_to_n(buffer.begin(), buffer.size() - 1, ... );
*(ret.out) = '\0';

1
2

或者，我们可以用{}初始化缓冲区，以确保所有字符都被初始化为空字符。

如果大小不适合存储值，这并不是错误。在这种情况下，写入的值会被简单截断。例如：

std::array<char, 5> mem{};
std::format_to_n(mem.data(), mem.size()-1, "{} ", 123456.78);

std::cout << mem.data() << "\n";

1
2
3
4

输出如下：

# 10.1.3 使用`std::format_to()`

格式化库还提供了std::format_to()，它可以无限制地写入格式化输出的字符。在内存有限的情况下使用这个函数存在风险，因为如果值需要的内存过多，就会产生未定义行为。不过，通过使用输出流缓冲区迭代器，你可以安全地使用它直接写入流中：

std::format_to(std::ostreambuf_iterator<char>{std::cout},
    "String '{}' has {} chars\n", str, str.size());

1
2

一般来说，std::format_to()接受任何字符输出迭代器。例如，你还可以使用back inserter将字符追加到字符串中：

std::string s;
std::format_to(std::back_inserter(s),
    "String '{}' has {} chars\n", str, str.size());

1
2
3

辅助函数std::back_inserter()创建一个对象，该对象会为每个字符调用push_back()。注意，std::format_to()的实现能够识别传递的是back insert迭代器，并能一次性为某些容器写入多个字符，因此性能仍然不错¹。 ¹感谢Victor Zverovich指出这一点。

# 10.1.4 使用`std::formatted_size()`

如果你想提前知道格式化输出会写入多少个字符（而不实际进行写入操作），可以使用std::formatted_size()。例如：

auto sz = std::formatted_size("String '{}' has {} chars\n", str, str.size());

这可以让你预留足够的内存，或者再次检查预留的内存是否足够。

# 10.2 格式化库的性能

人们仍然使用sprintf()的一个原因是，它的性能比使用输出字符串流或std::to_string()要好得多。格式化库的设计目标就是在这方面做得更好，它的格式化速度至少要和sprintf()一样快，甚至更快。

当前（草案）的实现表明，实现同等甚至更好的性能是可行的。大致测量结果显示：

std::format()的速度应该和sprintf()一样快，甚至更快。
std::format_to()和std::format_to_n()的速度应该更快。

编译器通常能在编译时检查格式化字符串，这对性能提升有很大帮助。它有助于避免格式化错误，同时显著提高性能。

然而，最终的性能还是取决于你特定平台上格式化库的实现质量（例如，在撰写本节时，对于Visual C++，/utf-8选项能显著提升格式化性能。）。因此，你应该自己测量性能。format/formatperf.cpp程序可能会为你了解你平台上的情况提供一些思路。

# 10.2.1 使用`std::vformat()`和`vformat_to()`

为了实现这一目标，在C++20标准化之后，格式化库进行了一项重要的修正（见http://wg21.link/p2216r3 (opens new window)）。通过这个修正，std::format()、std::format_to()和std::format_to_n()要求格式化字符串是编译时的值。你必须传递字符串字面量或constexpr字符串。例如：

const char* fmt1 = "{} \n";            // 运行时格式化字符串
std::cout << std::format(fmt1, 42);    // 编译时错误：运行时格式化字符串
constexpr const char* fmt2 = "{} \n";  // 编译时格式化字符串
std::cout << std::format(fmt2, 42);    // 没问题

1
2
3
4

因此，无效的格式化规范会成为编译时错误：

std::cout << std::format("{:7.2f}\n", 42); // 编译时错误：无效的格式化
constexpr const char* fmt2 = "{:7.2f}\n";  // 编译时格式化字符串
std::cout << std::format(fmt2, 42);        // 编译时错误：无效的格式化

1
2
3

当然，应用程序有时需要在运行时计算格式化细节（比如根据传递的值计算最佳宽度）。在这种情况下，你必须使用std::vformat()或std::vformat_to()，并将所有参数通过std::make_format_args()传递给这些函数：

const char* fmt3 = "{} {} \n";     								 // 运行时格式化
std::cout << std::vformat(fmt3, std::make_format_args(42, 1.7)); // 没问题

1
2

如果使用运行时格式化字符串，并且格式化对于传递的参数无效，调用将抛出std::format_error类型的运行时错误：

const char* fmt4 = "{:7.2f}\n";
// 运行时错误：抛出std::format_error异常
std::cout << std::vformat(fmt4, std::make_format_args(42));

1
2
3

# 10.3 格式化输出详解

本节详细介绍格式化的语法。

# 10.3.1 格式字符串的一般格式

指定参数格式化方式的一般做法是传递一个格式字符串，其中可以包含由{...}指定的替换字段和普通字符。其他所有字符都会按原样输出。要输出{和}，需使用{{和}}。

例如：

std::format("With format {}: {}", 42);    // 输出 "With format {}: 42"

替换字段可以有一个索引来指定参数，并且在冒号后可以跟一个格式说明符：

{}：使用下一个参数的默认格式。
{n}：使用第n个参数（第一个参数索引为0）的默认格式。
{:fmt}：使用下一个参数，并根据fmt进行格式化。
{n:fmt}：使用第n个参数，并根据fmt进行格式化。

参数索引要么都不指定，要么都指定：

std::format("{}: {}", key, value);   		// 正确
std::format("{1}: {0}", value, key);   		// 正确
std::format("{}: {} or {0}", value, key);   // 错误

1
2
3

多余的参数会被忽略。

格式说明符的语法取决于传递参数的类型。

对于算术类型、字符串和原始指针，格式化库本身定义了标准格式。
此外，C++20指定了chrono类型（持续时间、时间点和日历类型）的标准格式化方式。

# 10.3.2 标准格式说明符

标准格式说明符具有以下格式（每个说明符都是可选的）：fill align sign # 0 width.prec L type

fill：用于填充值直至达到width指定宽度的字符（默认是空格）。只有在指定了align时才能指定fill。
align：
- <：左对齐。
- >：右对齐。
- ^：居中对齐。默认对齐方式取决于类型。
sign：
- -：仅对负数显示负号（默认）。
- +：显示正负号。
- space：负数显示负号，正数显示空格。
#：切换到某些表示法的替代形式：
- 对于整数值的二进制、八进制和十六进制表示法，它会添加前缀，如0b、0和0x。
- 它会强制浮点表示法始终显示小数点。
width前面的0：用零填充算术值。
width：指定最小字段宽度。
prec：紧跟在点号后的精度：
- 对于浮点类型，它指定小数点后或总共打印的数字位数（取决于表示法）。
- 对于字符串类型，它指定从字符串中处理的最大字符数。
L：启用与区域设置相关的格式化（这可能会影响算术类型和bool的格式）。
type：指定格式化的通用表示法。这允许将字符作为整数值打印（反之亦然），或者选择浮点值的通用表示法。

# 10.3.3 宽度、精度和填充字符

对于所有打印的值，冒号后（无前导点号）的正整数值指定了整个值输出的最小字段宽度（包括符号等）。它可以与对齐说明一起使用：例如：

std::format("{:7}", 42);         // 输出 "        42"
std::format("{:7}", "hi");       // 输出 "hi        "
std::format("{:^7}", "hi");      // 输出 "   hi     "
std::format("{:>7}", "hi");      // 输出 "        hi"

1
2
3
4

也可以指定填充零和填充字符。填充0仅适用于算术类型（char和bool除外），如果指定了对齐方式，则填充0会被忽略：

std::format("{:07}", 42);         // 输出 "0000042"
std::format("{:^07}", 42);        // 输出 "  42   "
std::format("{:>07}", -1);        // 输出 "      -1"

1
2
3

填充0与紧跟在冒号后（对齐说明之前）指定的通用填充字符不同：

std::format("{:^07}", 42);    // 输出 "    42      "
std::format("{:0^7}", 42);    // 输出 "0042000"
std::format("{:07}", "hi");   // 无效（字符串不允许填充0）
std::format("{:0<7}", "hi");  // 输出 "hi00000"

1
2
3
4

精度用于浮点类型和字符串：

对于浮点类型，可以指定不同于通常默认值6的精度：

std::format("{}", 0.12345678);         // 输出 "0.12345678"
std::format("{:.5}", 0.12345678);      // 输出 "0.12346"
std::format("{:10.5}", 0.12345678);    // 输出 "     0.12346"
std::format("{:^10.5}", 0.12345678);   // 输出 "  0.12346    "

1
2
3
4

注意，根据浮点表示法的不同，精度可能应用于整个值，也可能应用于小数点后的数字。

对于字符串，可以用它指定最大字符数：

std::format("{}", "counterproductive");         // 输出 "counterproductive"
std::format("{:20}", "counterproductive");      // 输出 "counterproductive     "
std::format("{:.7}", "counterproductive");      // 输出 "counter"
std::format("{:20.7}", "counterproductive");    // 输出 "counter                          "
std::format("{:^20.7}", "counterproductive");   // 输出 "      counter         "

1
2
3
4
5

注意，宽度和精度本身也可以是参数。例如，以下代码：

int width = 10;
int precision = 2;
for (double val : {1.0, 12.345678, -777.7}) {
    std::cout << std::format("{:+{}.{}f}\n", val, width, precision);
}

1
2
3
4
5

输出如下：

+1.00
+12.35
-777.70

1
2
3

这里，我们在运行时指定最小字段宽度为10，小数点后有两位数字（使用固定表示法）。

# 10.3.4 格式/类型说明符

通过指定格式或类型说明符，可以强制为整数类型、浮点类型和原始指针使用各种表示法。

# 整数类型的说明符

表10.1“整数类型的格式化选项”列出了整数类型（包括bool和char）可能的格式化类型选项。

根据http://wg21.link/lwg3648 (opens new window)，对bool类型使用说明符c可能是个错误，将会被移除。

说明符	42	’@’	true	含义
无	42	@	true	默认格式
`d`	42	64	1	十进制表示法
`b` / `B`	101010	1000000	1	二进制表示法
`#b`	0b101010	0b1000000	0b1	带前缀的二进制表示法
`#B`	0B101010	0B1000000	0B1	带前缀的二进制表示法
`o`	52	100	1	八进制表示法
`x`	2a	40	1	十六进制表示法
`X`	2A	40	1	十六进制表示法
`#x`	0x2a	0x40	0x1	带前缀的十六进制表示法
`#X`	0X2A	0X40	0X1	带前缀的十六进制表示法
`c`	*	@	’\1’	作为具有该值的字符
`s`	无效	无效	true	`bool`类型作为字符串

表10.1 整数类型的格式化选项

例如：

std::cout << std::format("{:#b} {:#b} {:#b}\n", 42, '@', true);

将输出：

0b101010 0b1000000 0b1

注意以下几点：

默认表示法为：
- 整数类型为d（十进制）。
- 字符类型为c（作为字符）。
- bool类型为s（作为字符串）。
如果在表示法后指定了L，则会使用与区域设置相关的布尔值字符序列，以及与区域设置相关的算术值千位分隔符和小数点字符。

# 浮点类型的说明符

表10.2“浮点类型的格式化选项”列出了浮点类型可能的格式化类型选项。

例如：

std::cout << std::format("{0} {0:#} {0:#g} {0:e}\n", -1.0);

将输出：

-1 -1. -1.00000 -1.000000e+00

注意，如果传递整数-1，则会出现格式化错误。

说明符	-1.0	0.0009765625	1785856.0	含义
无	-1	0.0009765625	1.785856e+06	默认格式
`#`	-1.	0.0009765625	1.785856e+06	强制显示小数点
`f` / `F`	-1.000000	0.000977	1785856.000000	固定表示法（默认小数点后精度：6）
`g`	-1	0.000976562	1.78586e+06	固定或指数表示法（默认全精度：6）
`G`	-1	0.000976562	1.78586E+06	固定或指数表示法（默认全精度：6）
`#g`	-1.00000	0.000976562	1.78586e+06	固定或指数表示法（强制显示小数点和零）
`#G`	-1.00000	0.000976562	1.78586E+06	固定或指数表示法（强制显示小数点和零）
`e`	-1.000000e+00	9.765625e-04	1.7858560e+06	指数表示法（默认小数点后精度：6）
`E`	-1.000000E+00	9.765625E-04	1.7858560E+06	指数表示法（默认小数点后精度：6）
`a`	-1p+0	1p-10	1.b4p+20	十六进制浮点表示法
`A`	-1P+0	1P-10	1.B4P+20	十六进制浮点表示法
`#a`	-1.p+0	1.p-10	1.b4p+20	十六进制浮点表示法
`#A`	-1.P+0	1.P-10	1.B4P+20	十六进制浮点表示法

表10.2 浮点类型的格式化选项

# 字符串的说明符

对于字符串类型，默认格式说明符是s。不过，不必提供这个说明符，因为它是默认的。还要注意，对于字符串，可以指定一定的精度，该精度会被解释为使用的最大字符数：

std::format("{}", "counter");           // 输出 "counter"
std::format("{:s}", "counter");         // 输出 "counter"
std::format("{:.5}", "counter");        // 输出 "count"
std::format("{:.5}", "hi");             // 输出 "hi"

1
2
3
4

注意，仅支持字符类型char和wchar_t的标准字符串类型。不支持u8string和char8_t、u16string和char16_t、u32string和char32_t类型的字符串和序列。实际上，C++标准库为以下类型提供了格式化器：

char*和const char*
const char[n]（字符串字面量）
std::string和std::basic_string<char, traits, allocator>
std::string_view和std::basic_string_view<char, traits>
wchar_t*和const wchar_t*
const wchar_t[n]（宽字符串字面量）
std::wstring和std::basic_string<wchar_t, traits, allocator>
std::wstring_view和std::basic_string_view<wchar_t, traits>

注意，格式字符串及其参数必须具有相同的字符类型：

auto ws1 = std::format("{}", L "K\u00F6ln");   			// 编译时错误
std::wstring ws2 = std::format(L"{}", L "K\u00F6ln");   // 正确

1
2

# 指针的格式说明符

对于指针类型，默认的格式说明符是p，它通常以十六进制表示法输出地址，并带有前缀0x。在没有uintptr_t类型的平台上，格式由实现定义：

void* ptr = ... ;
std::format( "{} ", ptr)		// 通常会输出类似0x7ff688ee64的值
std::format( "{:p} ", ptr)		// 通常会输出类似0x7ff688ee64的值

1
2
3

请注意，仅支持以下指针类型：

void*和const void*
std::nullptr_t

因此，你可以传递nullptr或原始指针，但需要将其转换为（const）void*类型：

int i = 42;
std::format( "{} ", &i)                                 // 编译时错误
std::format( "{} ", static_cast<void*>(&i))             // 正确（例如，0x7ff688ee64）
std::format( "{:p} ", static_cast<void*>(&i))           // 正确（例如，0x7ff688ee64）
std::format( "{} ", static_cast<const void*>( "hi "))   // 正确（例如，0x7ff688ee64）
std::format( "{} ", nullptr)                            // 正确（通常为0x0）
std::format( "{:p} ", nullptr)                          // 正确（通常为0x0）

1
2
3
4
5
6
7

# 10.4 国际化

如果在格式中指定了L，则会使用特定区域设置的表示法：

对于bool类型，会使用std::numpunct::truename和std::numpunct::falsename中的区域设置字符串。
对于整数值，会使用与区域设置相关的千位分隔符。
对于浮点数值，会使用与区域设置相关的小数点和千位分隔符。
对于chrono库中几种类型的表示（持续时间、时间点等），会使用其特定区域设置的格式。

要激活特定区域设置的表示法，还必须将区域设置传递给std::format()。例如：

// 初始化一个代表“德国的德语”的区域设置：
#ifdef _MSC_VER
std::locale locG{"deu_deu.1252"};
#else
std::locale locG{"de_DE"};
#endif

// 在格式化中使用它：
std::format(locG, "{0}  {0:L} ", 1000.7)   // 输出1000.7  1.000,7

1
2
3
4
5
6
7
8
9

完整示例请查看format/formatgerman.cpp。

请注意，只有在使用区域设置说明符L时才会使用指定的区域设置。如果不使用，将使用默认区域设置C，它采用美式格式化。

或者，你可以设置全局区域设置并使用L说明符：

std::locale::global(locG);  		// 全局设置德语区域设置
std::format( "{0}  {0:L} ", 1000.7)	// 输出1000.7  1.000,7

1
2

你可能需要创建自己的区域设置（通常基于现有区域设置并修改其面（facets））。例如：

// format/formatbool.cpp
#include <iostream>
#include <locale>
#include <format>

// 定义德语bool名称的面：
class GermanBoolNames : public std::numpunct_byname<char> {
public:
    GermanBoolNames(const std::string& name) : std::numpunct_byname<char>(name) {
    }
protected:
    virtual std::string do_truename() const {
        return "wahr";
    }
    virtual std::string do_falsename() const {
        return "falsch";
    }
};

int main() {
    // 创建带有德语bool名称的区域设置：
    std::locale locBool{std::cin.getloc(),
                        new GermanBoolNames{""}};
    // 使用该区域设置打印布尔值：
    std::cout << std::format(locBool, "{0}  {0:L}\n", false);   // false  falsch
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

该程序的输出如下：

false  falsch

要使用宽字符串输出值（这在Visual C++中尤其需要注意），格式字符串和参数都必须是宽字符串。例如：

std::wstring city = L"K\u00F6ln";                // Köln
auto ws1 = std::format("{} ", city);             // 编译时错误
std::wstring ws2 = std::format(L"{} ", city);    // 正确：ws2是std::wstring
std::wcout << ws2 << L'\n';                      // 正确

1
2
3
4

目前，char8_t（UTF-8字符）、char16_t和char32_t类型的字符串还不支持。

# 10.5 错误处理

理想情况下，C++编译器应该在编译时检测错误，而不是在运行时。由于字符串字面量在编译时是已知的，因此当字符串字面量用作格式字符串时，C++可以在std::format()中检查格式违规情况：

std::format( "{:d} ", 42)    // 正确
std::format( "{:s} ", 42)    // 编译时错误

1
2

如果你传递一个已经初始化或计算好的格式字符串，格式化库会按如下方式处理格式错误：

std::format()、std::format_to()和format_to_n()仅接受在编译时已知的格式字符串：
- 字符串字面量
- constexpr字符指针
- 可以转换为编译时字符串视图的编译时字符串
要使用在运行时计算的格式字符串，请使用：
- std::vformat()
- std::vformat_to()

对于std::formatted_size()，你只能使用编译时已知的格式字符串。例如：

const char* fmt1 = "{:d} ";							// 运行时格式字符串
std::format(fmt1, 42);								// 编译时错误
std::vformat(fmt1, std::make_format_args(42));		// 正确
constexpr const char* fmt2 = "{:d} ";				// 编译时格式字符串
std::format(fmt2, 42);								// 正确

1
2
3
4
5

使用fmt1时无法编译，因为传递的参数不是编译时字符串，而使用了std::format()。但是，将fmt1与std::vformat()一起使用则没问题（但你必须使用std::make_format_args()转换所有参数）。将fmt2传递给std::format()时可以编译，因为它被初始化为编译时字符串。如果在运行时检测到格式错误，会抛出std::format_error类型的异常。这个新的标准异常类型派生自std::runtime_error，并提供标准异常的常用API，通过调用what()获取错误消息字符串来初始化异常。

例如：

try {
    const char* fmt4 = "{:s} ";
    std::vformat(fmt4, std::make_format_args(42))   // 抛出std::format_error
}
catch (const std::format_error& e) {
    std::cerr << "FORMATTING EXCEPTION : " << e.what() << std::endl;
}

1
2
3
4
5
6
7

# 10.6 用户自定义格式化输出

格式化库可以为用户自定义类型定义格式化方式。你需要一个格式化器（formatter），实现起来相当简单。

# 10.6.1 基本格式化器API

格式化器是类模板std::formatter<>针对你自定义类型的特化。在格式化器内部，必须定义两个成员函数：

parse()：用于实现如何解析针对你自定义类型的格式字符串说明符。
format()：用于对你自定义类型的对象/值执行实际的格式化操作。

让我们看一个最小的示例（我们将逐步改进它），该示例指定了如何格式化具有固定值的对象/值。假设类型定义如下（见format/always40.hpp）：

class Always40 {
public:
    int getValue() const {
        return 40;
    }
};

1
2
3
4
5
6

对于这个类型，我们可以如下定义第一个格式化器（我们肯定需要改进它）：

// format/formatalways40.hpp
#include "always40.hpp"
#include <format>
#include <iostream>

template<>
struct std::formatter<Always40> {
    // 解析针对此类型的格式字符串：
    constexpr auto parse(std::format_parse_context& ctx) {
        return ctx.begin();       // 返回}的位置（希望存在）
    }

    // 通过始终输出其值进行格式化：
    auto format(const Always40& obj, std::format_context& ctx) const {
        return std::format_to(ctx.out(), "{} ", obj.getValue());
    }
};

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

这样已经足够使以下代码正常工作：

Always40 val;
std::cout << std::format( "Value :  {}\n", val);
std::cout << std::format( "Twice :  {0}  {0}\n", val);

1
2
3

输出将是：

Value:  40
Twice:  40  40

1
2

我们将格式化器定义为std::formatter<>针对Always40类型的特化：

template<>
struct std::formatter<Always40> {
    // ...
};

1
2
3
4

因为我们只有公共成员，所以使用struct而不是class。

# 解析格式字符串

在parse()函数中，我们实现解析格式字符串的功能：

// 解析针对此类型的格式字符串：
constexpr auto parse(std::format_parse_context& ctx) {
    return ctx.begin();       // 返回}的位置（希望存在）
}

1
2
3
4

该函数接受一个std::format_parse_context，它提供了一个API用于遍历传递的格式字符串的剩余字符。ctx.begin()指向要解析值的格式说明符的第一个字符，如果没有说明符，则指向}：

如果格式字符串是"Value: {:7.2f}"，ctx.begin()指向":7.2f}"。
如果格式字符串是"Twice: {0} {0}"，第一次调用时ctx.begin()指向"} {0}"。
如果格式字符串是"{}\n"，ctx.begin()指向"}\n"。

还有一个ctx.end()，它指向整个格式字符串的末尾。这意味着已经解析了开头的{，你必须解析所有字符，直到对应的结束}。

对于格式字符串"Val: {1:_>20}cm \n"，ctx.begin()是_的位置，ctx.end()是\n之后整个格式字符串的末尾。parse()的任务是解析传递参数的指定格式，这意味着你只需要解析_>20这些字符，然后返回格式说明符末尾}的位置，也就是字符0后面的}。

在我们的实现中，目前还不支持任何格式说明符。因此，我们只是简单地返回获取到的第一个字符的位置，只有当下一个字符确实是}时这种方式才有效（处理结束}之前的字符是我们首先需要改进的地方）。使用任何指定的格式字符调用std::format()都将无法工作：

Always40 val;
std::format( "{:7}", val)   // 错误

1
2

请注意，parse()成员函数应该是constexpr的，以支持格式字符串的编译时计算。这意味着代码必须接受constexpr函数的所有限制（这些限制在C++20中有所放宽）。

不过，你可以看到这个API如何允许程序员解析为他们的类型指定的任何格式。例如，这用于支持chrono库的格式化输出。当然，我们应该遵循标准说明符的约定，以避免程序员产生混淆。

# 执行格式化

在format()函数中，我们实现对传入值进行格式化的功能：

// 始终通过写入其值进行格式化：
auto  format(const  Always40&  value,  std::format_context&  ctx)  const  {
    return  std::format_to(ctx.out(),  "{} ",  value.getValue());
}

1
2
3
4

该函数有两个参数：

作为参数传递给std::format()（或类似函数）的值。
std::format_context，它提供了用于写入格式化结果字符（根据解析后的格式）的API。

格式化上下文最重要的函数是out()，它返回一个对象，你可以将该对象传递给std::format_to()来写入实际的格式化字符。该函数必须返回用于进一步输出的新位置，这个位置由std::format_to()返回。

注意，格式化器的format()成员函数应该是const的。根据最初的C++20标准，这不是必需的（详见http://wg21.link/lwg3636 (opens new window)）。

# 10.6.2 改进的解析

让我们改进前面看到的示例。首先，我们应该确保解析器能更好地处理格式说明符：

我们应该处理直到}结束符之前的所有字符。
当指定了非法的格式化参数时，应该抛出异常。
我们应该处理有效的格式化参数（比如指定的字段宽度）。

让我们通过查看之前格式化器的改进版本来探讨这些问题（这次处理的类型其值始终为41）：

// format/formatalways41.hpp
#include  "always41.hpp"
#include  <format>

template<>
class  std::formatter<Always41> {
    int  width  =  0;   // 指定的字段宽度
public:
    // 解析针对该类型的格式字符串：
    constexpr  auto  parse(std::format_parse_context&  ctx)  {
        auto  pos  =  ctx.begin();
        while  (pos  !=  ctx.end()  &&  *pos  != '}')  {
            if  (*pos  <  '0'  ||  *pos  >  '9')  {
                throw  std::format_error{std::format("invalid  format  '{}'",  *pos)};
            }
            width  =  width  *  10  +  *pos  -  '0';    // 宽度的新数字
            ++pos;
        }
        return  pos;                     // 返回 '}' 的位置
    }

    // 始终通过写入其值进行格式化：
    auto  format(const  Always41&  obj,  std::format_context&  ctx)  const  {
        return  std::format_to(ctx.out(),  "{:{}} ",  obj.getValue(),  width);
    }
};

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

现在我们的格式化器有一个成员来存储指定的字段宽度：

template<>
class  std::formatter<Always41> {
    int  width  =  0;   // 指定的字段宽度
   ...
};

1
2
3
4
5

字段宽度初始化为0，但可以由格式字符串指定。现在解析器有一个循环，用于处理直到末尾}之前的所有字符：

constexpr  auto  parse(std::format_parse_context&  ctx)  {
    auto  pos  =  ctx.begin();
    while  (pos  !=  ctx.end()  &&  *pos  != '}')  {
       ...
        ++pos;
    }
    return  pos;                     // 返回 '}' 的位置
}

1
2
3
4
5
6
7
8

注意，循环必须同时检查是否还有字符以及该字符是否是末尾的}，因为调用std::format()的程序员可能会忘记写末尾的}。

在循环内部，我们将当前宽度与数字字符的整数值相乘：

width  =  width  *  10  +  *pos  -  '0';    // 宽度的新数字

如果字符不是数字，我们使用std::format()初始化并抛出std::format异常：

if  (*pos  <  '0'  ||  *pos  >  '9')  {
    throw  std::format_error{std::format("invalid  format  '{}'",  *pos)};
}

1
2
3

注意，这里我们不能使用std::isdigit()，因为它不是一个可以在编译时调用的函数。你可以使用以下format/always41.cpp程序测试这个格式化器，程序有如下输出：

41
Value: 41
Twice: 41 41
With width: ’ 41’
Format Error: invalid format ’f’

1
2
3
4
5

值是右对齐的，因为这是整数值的默认对齐方式。

# 10.6.3 为用户定义的格式化器使用标准格式化器

我们仍然可以改进上面实现的格式化器：

我们可以允许使用对齐说明符。
我们可以支持填充字符。

幸运的是，我们不必自己实现完整的解析功能。相反，我们可以使用标准格式化器，以利用它们支持的格式说明符。实际上，有两种方法可以做到这一点：

你可以将工作委托给一个局部标准格式化器。
你可以从标准格式化器继承。

# 将格式化委托给标准格式化器

要将格式化委托给标准格式化器，你必须：

声明一个局部标准格式化器。
让parse()函数将工作委托给标准格式化器。
让format()函数将工作委托给标准格式化器。

一般来说，代码应该如下所示：

// format/formatalways42ok.hpp
#include  "always42.hpp"
#include  <format>

// ***针对Always42类型的格式化器：
template<>
struct  std::formatter<Always42> {
    // 使用标准的int格式化器来完成工作：
    std::formatter<int>  f;

    // 将解析工作委托给标准格式化器：
    constexpr  auto  parse(std::format_parse_context&  ctx)  {
        return  f.parse(ctx);
    }

    // 将值的格式化工作委托给标准格式化器：
    auto  format(const  Always42&  obj,  std::format_context&  ctx)  const  {
        return  f.format(obj.getValue(),  ctx);
    }
};

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

和往常一样，我们为Always42类型声明了std::formatter<>的特化版本。不过，这次我们使用一个局部的标准int格式化器来完成工作。我们将解析和格式化工作都委托给它。实际上，我们通过getValue()从我们的类型中提取值，并使用标准的int格式化器来完成其余的格式化工作。

我们可以使用以下程序测试这个格式化器：

// format/always42.cpp

#include  "always42.hpp"
#include  "formatalways42.hpp"
#include  <iostream>

int main() {
    try  {
        Always42  val;
        std::cout  <<  val.getValue()  <<  "\n";
        std::cout  <<  std::format("Value :  {}\n",  val);
        std::cout  <<  std::format("Twice :  {0}  {0}\n",  val);
        std::cout  <<  std::format("With  width :  '{:7}'\n",  val);
        std::cout  <<  std::format("With  all :     '{: .^7}'\n",  val);
    }
    catch  (std::format_error&  e)  {
        std::cerr  <<  "Format  Error :  "  <<  e.what()  <<  std::endl;
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

程序有如下输出：

42
Value: 42
Twice: 42 42
With width: ’ 42’
With all: ’..42...’

1
2
3
4
5

注意，值默认仍然是右对齐的，因为这是int类型的默认对齐方式。

也要注意，在实际应用中，你可能需要对这段代码进行一些修改，后面会详细讨论：

除非将格式化器声明为mutable，否则将format()声明为const可能无法编译。
将parse()声明为constexpr可能无法编译。

# 从标准格式化器继承

通常，从标准格式化器派生就足够了，这样格式化器成员及其parse()函数会隐式可用：

[`format/formatalways42inherit.hpp`]
#include  "always42.hpp"
#include  <format>

// ***Always42类型的格式化器：
// - 使用标准的int格式化器
template<>
struct  std::formatter<Always42>  :  std::formatter<int> {
    auto  format(const  Always42&  obj,  std::format_context&  ctx)  {
        // 将值的格式化委托给标准格式化器：
        return  std::formatter<int>::format(obj.getValue(),  ctx);
    }
};

1
2
3
4
5
6
7
8
9
10
11
12
13

不过，请注意在实际应用中，这段代码可能还需要一些修改：

将format()声明为const可能无法编译。

# 在实践中使用标准格式化器

在实践中，C++20的标准化内容存在一些问题，因此之后有些内容需要加以阐明：

最初标准化的C++20并未要求格式化器的format()成员函数必须是const（详见http://wg21.link/lwg3636 (opens new window)）。为了支持那些未将format()声明为const成员函数的C++标准库实现，你要么将其声明为非const函数，要么将局部格式化器声明为mutable ⁵。
现有的实现可能还不支持使用constexpr的parse()成员函数进行编译时解析，因为编译时解析是在C++20标准化之后添加的（详见http://wg21.link/p2216r3 (opens new window)）。在这种情况下，我们无法将编译时解析委托给标准格式化器。

因此，在实际应用中，Always42类型的格式化器可能需要如下所示：

//format/formatalways42.hpp

#include  "always42.hpp"
#include  <format>

// ***Always42类型的格式化器：
template<>
struct  std::formatter<Always42> {
    // 使用一个标准的int格式化器来完成工作：
#if __cpp_lib_format  <  202106
    mutable       // 如果标准格式化器有非const的format()
#endif
    std::formatter<int>  f;

    // 将解析委托给标准的int格式化器：
#if __cpp_lib_format  >=  202106
    constexpr   // 如果标准格式化器还不支持constexpr的parse()
#endif
    auto  parse(std::format_parse_context&  ctx)  {
        return  f.parse(ctx);
    }

    // 将int值的格式化委托给标准的int格式化器：
    auto  format(const  Always42&  obj,  std::format_context&  ctx)  const  {
        return  f.format(obj.getValue(),  ctx);
    }
};

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

如你所见，代码：

仅在采用了相应修复的情况下才使用constexpr声明parse()。
可能将局部格式化器声明为mutable，以便const的format()成员函数可以调用非const的标准format()函数。

对于这两种情况，实现都使用了一个特性测试宏，该宏表明支持编译时解析（期望其采用也会使标准格式化器的format()成员函数变为const）。

# 10.6.4 对字符串使用标准格式化器

如果你要格式化更复杂的类型，一种常见的方法是创建一个字符串，然后使用针对字符串（如果仅使用字符串字面量，则为std::string或std::string_view）的标准格式化器。

例如，我们可以定义一个枚举类型及其格式化器如下：

[`format/color.hpp`]
#include  <format>
#include  <string>

enum  class  Color  {  red,  green,  blue  };

// ***枚举类型Color的格式化器：
template<>
struct  std::formatter<Color>  :  public  std::formatter<std::string> {
    auto  format(Color  c,  format_context&  ctx)  const  {
        // 为该值初始化一个字符串：
        std::string  value;
        switch  (c)  {
        using  enum  Color;
        case  red:
            value  =  "red ";
            break;
        case  green:
            value  =  "green ";
            break;
        case  blue:
            value  =  "blue ";
            break;
        default :
            value  =  std::format( "Color{} ",  static_cast<int>(c));
            break;
        }
        // 并将其余的格式化工作委托给字符串格式化器：
        return  std::formatter<std::string>::format(value,  ctx);
    }
};

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

通过从字符串格式化器继承格式化器，我们继承了其parse()函数，这意味着我们支持字符串拥有的所有格式说明符。在format()函数中，我们先进行到字符串的映射，然后让标准格式化器完成其余的格式化工作。

我们可以如下使用该格式化器：

//format/color.cpp

#include "color.hpp"
#include <iostream>
#include <string>
#include <format>

int main() {
    for  (auto  val  :  {Color::red,  Color::green,  Color::blue,  Color{13}})  {
        // 使用用户提供的枚举Color的格式化器：
        std::cout  <<  std::format( "Color  {:_>8}  has  value  {:02}\n ", val,  static_cast<int>(val));
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13

该程序的输出如下：

Color            red  has  value  00
Color        green  has  value  01
Color          blue  has  value  02
Color  _Color13  has  value  13

1
2
3
4

如果你不引入自己的格式说明符，这种方法通常效果良好。如果仅使用字符串字面量作为可能的值，你甚至可以使用针对std::string_view的格式化器。

# 10.7 补充说明

格式化输出最初由Victor Zverovich和Lee Howes在http://wg21.link/p0645r0 (opens new window)中提出。最终被接受的表述由Victor Zverovich在http://wg21.link/p0645r10 (opens new window)中制定。

C++20标准化之后，一些针对C++20的修复被接受。其中最重要的是格式字符串必须在编译时进行检查并且是已知的，这由Victor Zverovich在http://wg21.link/p2216r3 (opens new window)中提出。其他修复包括http://wg21.link/p2372r3 (opens new window)（修复chrono格式化器的本地化处理）和http://wg21.link/p2418r2 (opens new window)（格式化参数变为万能引用/转发引用）。

上次更新: 2025/03/20, 19:44:38

← 第9章跨度（Spans) 第11章 <chrono>中的日期和时区→

第10章 格式化输出

# 第10章 格式化输出

# 10.1 格式化输出示例

# 10.1.1 使用std::format()

# 10.1.2 使用std::format_to_n()

# 10.1.3 使用std::format_to()

# 10.1.4 使用std::formatted_size()

# 10.2 格式化库的性能

# 10.2.1 使用std::vformat()和vformat_to()

# 10.3 格式化输出详解

# 10.3.1 格式字符串的一般格式

# 10.3.2 标准格式说明符

# 10.3.3 宽度、精度和填充字符

# 10.3.4 格式/类型说明符

# 整数类型的说明符

# 浮点类型的说明符

# 字符串的说明符

# 指针的格式说明符

# 10.4 国际化

# 10.5 错误处理

# 10.6 用户自定义格式化输出

# 10.6.1 基本格式化器API

# 解析格式字符串

# 执行格式化

# 10.6.2 改进的解析

# 10.6.3 为用户定义的格式化器使用标准格式化器

# 将格式化委托给标准格式化器

# 从标准格式化器继承

# 在实践中使用标准格式化器

# 10.6.4 对字符串使用标准格式化器

# 10.7 补充说明

第10章格式化输出

# 第10章格式化输出

# 10.1.1 使用`std::format()`

# 10.1.2 使用`std::format_to_n()`

# 10.1.3 使用`std::format_to()`

# 10.1.4 使用`std::formatted_size()`

# 10.2.1 使用`std::vformat()`和`vformat_to()`