字符串格式：％vs.format vs.字符串文字

Python 2.6引入了str.format()方法，其语法与现有%运算符略有不同。哪个更好，什么情况下更好？
Python 3.6现在通过语法f"my string"引入了另一种字符串文字（也称为“ f”字符串）的字符串格式化格式。

以下内容使用每种方法并具有相同的结果，所以有什么区别？

 #!/usr/bin/python
 sub1 = "python string!"
 sub2 = "an arg"

 sub_a = "i am a %s" % sub1
 sub_b = "i am a {0}".format(sub1)
 sub_c = f"i am a {sub1}"

 arg_a = "with %(kwarg)s!" % {'kwarg':sub2}
 arg_b = "with {kwarg}!".format(kwarg=sub2)
 arg_c = f"with {sub2}!"

 print(sub_a)    # "i am a python string!"
 print(sub_b)    # "i am a python string!"
 print(sub_c)    # "i am a python string!"

 print(arg_a)    # "with an arg!"
 print(arg_b)    # "with an arg!"
 print(arg_c)    # "with an arg!"

此外，什么时候在Python中发生字符串格式化？例如，如果我的日志记录级别设置为HIGH，那么执行以下%操作是否还会受到影响？如果是这样，有办法避免这种情况吗？

 log.debug("some debug info: %s" % some_info)

类似于stackoverflow.com/questions/3691975/…

对于初学者：这是一个很好的教程，教了两种风格。我个人更经常使用较旧的％样式，因为如果您不需要format（）样式的改进功能，则％样式通常会更方便。

供参考：有关较新的format（）格式样式和较旧的基于％的格式样式的Python 3文档。

另请参阅：Python用多种方式设置字符串格式

要回答第二个问题，从3.2开始，如果您使用自定义格式器，则可以使用{}格式（请参阅docs.python.org/3/library/logging.html#logging.Formatter）

#1 楼

要回答您的第一个问题... .format似乎在许多方面都更加复杂。关于%的一个烦人的事情是它如何可以接受变量或元组。您会认为以下方法将始终有效：

"hi there %s" % name

但是，如果name恰好是(1, 2, 3)，则会抛出TypeError。为了确保它始终可以打印，您需要执行

"hi there %s" % (name,)   # supply the single argument as a single-item tuple

很难看。 .format没有这些问题。同样在您给出的第二个示例中，.format的示例看起来更加简洁。

为什么不使用它？

不知道（我在阅读本文之前）
必须与Python 2.5兼容

回答第二个问题，字符串格式化与任何其他操作同时发生-评估字符串格式化表达式时。而且，Python（不是一种惰性语言）在调用函数之前先对表达式进行求值，因此在您的log.debug示例中，表达式"some debug info: %s"%some_info将首先求值，例如"some debug info: roflcopters are active"，然后该字符串将传递给log.debug()。

“％（a）s，％（a）s”％{'a'：'test'}

–ted
2012年8月23日在9:53

请注意，您将浪费时间在log.debug（“ something：％s”％x）上，而不是在log.debug（“ something：％s”，x）上，因为字符串格式化将在该方法中处理，并且您不会如果不记录性能，将导致性能下降。和往常一样，Python会满足您的需求=）

– Darkfeline
2012-12-14 23:13

ted：这是一种看起来比“ {0}，{0}”。format（'test'）相同的骇客。

–飞羊
2013年1月30日20:43

重点是：一个反复出现的论点，即新语法允许对项目进行重新排序是有争议的：您可以对旧语法执行相同的操作。大多数人不知道这实际上已经在Ansi C99 Std中定义了！查看man sprintf的最新副本，并了解％占位符内的$表示法

–cfi
13年2月20日在12:42

@cfi：如果您的意思是printf（“％2 $ d”，1，3）打印出“ 3”，那是在POSIX中指定的，而不是C99。您参考的手册页上写道：“ C99标准不包括使用'$'的样式……”。

–Thanatos
13年7月7日在23:55

#2 楼

afaik无法执行模运算符（％）的操作：

tu = (12,45,22222,103,6)
print '{0} {2} {1} {2} {3} {2} {4} {2}'.format(*tu)

结果

12 22222 45 22222 103 22222 6 22222

非常有用。

另一点：format()是一个函数，可以在其他函数中用作参数：

li = [12,45,78,784,2,69,1254,4785,984]
print map('the number is {}'.format,li)   

print

from datetime import datetime,timedelta

once_upon_a_time = datetime(2010, 7, 1, 12, 0, 0)
delta = timedelta(days=13, hours=8,  minutes=20)

gen =(once_upon_a_time +x*delta for x in xrange(20))

print '\n'.join(map('{:%Y-%m-%d %H:%M:%S}'.format, gen))

结果是：

['the number is 12', 'the number is 45', 'the number is 78', 'the number is 784', 'the number is 2', 'the number is 69', 'the number is 1254', 'the number is 4785', 'the number is 984']

2010-07-01 12:00:00
2010-07-14 20:20:00
2010-07-28 04:40:00
2010-08-10 13:00:00
2010-08-23 21:20:00
2010-09-06 05:40:00
2010-09-19 14:00:00
2010-10-02 22:20:00
2010-10-16 06:40:00
2010-10-29 15:00:00
2010-11-11 23:20:00
2010-11-25 07:40:00
2010-12-08 16:00:00
2010-12-22 00:20:00
2011-01-04 08:40:00
2011-01-17 17:00:00
2011-01-31 01:20:00
2011-02-13 09:40:00
2011-02-26 18:00:00
2011-03-12 02:20:00

您可以像使用格式一样轻松地在地图中使用旧样式格式。 map（'some_format_string_％s'.__ mod__，some_iterable）

–agf
2012年11月28日5:49

@cfi：请通过在C99中重写以上示例来证明您是对的

– MarcH
2014年2月15日14:11

@MarcH：printf（“％2 $ s％1 $ s \ n”，“一个”，“两个”）;用gcc -std = c99 test.c -o测试编译，输出为2。但是我的立场是正确的：它实际上是POSIX扩展而不是C。在C / C ++标准中，我以为我已经看到它了，所以我再也找不到它。该代码甚至可以与“ c90” std标志一起使用。 sprintf手册页。它没有列出它，但是允许库实现超集。我原来的论点仍然有效，用Posix代替C

–cfi
14年2月15日在15:18

我在这里的第一个评论不适用于此答案。我对此措辞感到遗憾。在Python中，我们不能使用模运算符％对占位符进行重新排序。为了此处的评论一致，我仍然不想删除该第一条评论。我为在这里发泄怒气而道歉。反对经常做出的声明，即旧语法本身不允许这样做。除了创建全新的语法外，我们还可以引入std Posix扩展。我们可以同时拥有。

–cfi
2014年2月15日15:25

“模”是指除法运算后求余数的运算符。在这种情况下，百分号不是取模运算符。

–章鱼
2014年5月8日在21:09

#3 楼

假设您正在使用Python的logging模块，则可以将字符串格式参数作为参数传递给.debug()方法，而无需自己进行格式设置：

log.debug("some debug info: %s", some_info)

，这避免了格式化，除非记录器实际记录了一些内容。

这是我现在才学到的一些有用信息。可惜没有它自己的问题，因为它似乎与主要问题分开。可惜OP没有将他的问题分为两个独立的问题。

–snth
2012年11月14日7:36

您可以使用dict这样的格式：log.debug（“一些调试信息：％（this）s和％（that）s”，dict（this ='Tom'，that ='Jerry'））但是，您可以'不要在这里使用新的样式.format（）语法，甚至在Python 3.3中也是如此，这很可惜。

– Cito
2012年11月25日17:00

这样做的主要好处不是性能（与您对日志输出进行的操作（例如在终端中显示，保存到磁盘）相比，字符串内插会更快），这是因为如果您有日志聚合器，可以告诉您“您收到此错误消息的12个实例”，即使它们都具有不同的“ some_info”值。如果在将字符串传递给log.debug之前完成了字符串格式化，则这是不可能的。聚合器只能说“您有12条不同的日志消息”

–乔纳森·哈特利
13-10-10在8:04

如果您担心性能，请使用文字dict {}语法而不是dict（）类实例化：oughellmann.com/2012/11/…

– trojjer
14年2月14日在11:03

#4 楼

从Python 3.6（2016）开始，您可以使用f字符串替换变量：

>>> origin = "London"
>>> destination = "Paris"
>>> f"from {origin} to {destination}"
'from London to Paris'

请注意f"前缀。如果您在Python 3.5或更早版本中尝试此操作，则会得到一个SyntaxError。

请参阅https://docs.python.org/3.6/reference/lexical_analysis.html#f-strings

这不能回答问题。另一个提到f字符串的答案至少谈到了性能：stackoverflow.com/a/51167833/7851470

–乔治
19年6月17日12:00

#5 楼

PEP 3101建议用Python 3中新的高级字符串格式替换%运算符，这将是默认格式。

不正确：“可以通过保留现有机制来维持向后兼容性。”当然，.format不会替代％字符串格式。

– Tobias
13年1月28日在23:32

不，BrainStorms的假设是正确的：“旨在替代现有的'％'”。 Tobias引用表示这两种系统将共存一段时间。 RTFPEP

– phobie
15年8月19日在15:00

#6 楼

但是请小心，刚才我在尝试用现有代码中的所有%替换所有.format时发现了一个问题：'{}'.format(unicode_string)将尝试对unicode_string进行编码，并且可能会失败。

只看一下此Python交互式会话日志：

Python 2.7.2 (default, Aug 27 2012, 19:52:55) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
; s='й'
; u=u'й'
; s
'\xd0\xb9'
; u
u'\u0439'

s只是一个字符串（在Python3中称为“字节数组”），而u是Unicode字符串（在Python3中称为“字符串”）：

; '%s' % s
'\xd0\xb9'
; '%s' % u
u'\u0439'

当将Unicode对象作为参数提供给%运算符时，即使原始字符串不是Unicode，它也会产生一个Unicode字符串：

; '{}'.format(s)
'\xd0\xb9'
; '{}'.format(u)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u0439' in position 0: ordinal not in range(256)

，但是.format函数将引发“ UnicodeEncodeError”：

; u'{}'.format(s)
u'\xd0\xb9'
; u'{}'.format(u)
u'\u0439'

，并且仅当原始字符串为Unicode。

; '{}'.format(u'i')
'i'

，或者是否可以将参数字符串转换为字符串（所谓的“字节数组”）

除非确实需要新的format方法的其他功能，否则根本没有理由更改工作代码。

– Tobias
13年1月28日在22:51

Tobias完全同意您的意见，但是有时候升级到更新版本的Python时需要

– wobmene
2013年1月30日13:17

例如？ AFAIK，从未需要它；我不认为％字符串插值会消失。

– Tobias
13年1月31日在13:45

我认为.format（）函数比％更安全。我经常看到初学者的错误，例如“ p1 =％s p2 =％d”％“ abc”，2或“ p1 =％s p2 =％s”％（tuple_p1_p2，）。您可能会认为这是编码器的错误，但我认为这只是怪异的错误语法，对于quick-scriptie而言不错，但对生产代码则不利。

– wobmene
2014年1月6日15:07

但是我不喜欢.format（）的语法，我会更乐于使用旧的％s，％02d，例如“ p1 =％s p2 =％02d” .format（“ abc”，2）。我责怪那些发明并批准大括号格式的人，这些格式需要您像{{}}这样逃避它们，并且看起来丑陋。

– wobmene
2014年1月6日15:15

#7 楼

从我的测试中，%的性能要优于format。

测试代码：

Python 2.7.2：

import timeit
print 'format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')")
print '%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')")

>结果：

> format: 0.470329046249
> %: 0.357107877731

Python 3.5.2

import timeit
print('format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')"))
print('%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')"))

结果

> format: 0.5864730989560485
> %: 0.013593495357781649

在Python2中看起来很小，而在Python3中，%比format快得多。

感谢@Chris Cogdon提供示例代码。

编辑1：

于2019年7月在Python 3.7.2中再次测试。

结果：

> format: 0.86600608
> %: 0.630180146

没有太大区别。我想Python正在逐步完善。

编辑2：

在有人在评论中提到python 3的f字符串后，我在python 3.7.2下对以下代码进行了测试。：

 import timeit
print('format:', timeit.timeit("'{}{}{}'.format(1, 1.23, 'hello')"))
print('%:', timeit.timeit("'%s%s%s' % (1, 1.23, 'hello')"))
print('f-string:', timeit.timeit("f'{1}{1.23}{\"hello\"}'"))

结果：

format: 0.8331376779999999
%: 0.6314778750000001
f-string: 0.766649943

看来f弦仍然比%慢，但比format好。

相反，str.format提供了更多功能（尤其是类型专用的格式，例如'{0：％Y-％m-％d}'。format（datetime.datetime.utcnow（）））。性能并不是所有工作的绝对要求。使用正确的工具完成工作。

– minhee
2011-09-18 17:25

“过早的优化是万恶之源”，或者唐纳德·努斯曾经说过……

–亚瑟斯·阿加瓦尔（Yatharth Agarwal）
2012年10月17日在13:07

坚持使用众所周知的格式设置方案（只要它适合大多数情况下的要求），并且速度快两倍，这并不是“过早的优化”，而是合理的。顺便说一句，％运算符允许重用printf知识；字典插值是该原理的非常简单的扩展。

– Tobias
13年1月28日在23:03

根据我的测试，Python3和Python 2.7之间也存在巨大差异。％比Python 3中的format（）更有效。我使用的代码可以在这里找到：github.com/rasbt/python_efficiency_tweaks/blob/master/test_code/…和github.com/rasbt/python_efficiency_tweaks/blob/主/测试代码/ ...

–user2489252
2014年1月24日14:15

在一种情况下，我实际上遇到了相反的情况。新型格式化速度更快。可以提供您使用的测试代码吗？

–大卫·桑德斯（David Sanders）
2014年10月21日19:54

#8 楼

.format的另一个优点（我没有在答案中看到）：它可以具有对象属性。

In [12]: class A(object):
   ....:     def __init__(self, x, y):
   ....:         self.x = x
   ....:         self.y = y
   ....:         

In [13]: a = A(2,3)

In [14]: 'x is {0.x}, y is {0.y}'.format(a)
Out[14]: 'x is 2, y is 3'

或者，作为关键字参数：

In [15]: 'x is {a.x}, y is {a.y}'.format(a=a)
Out[15]: 'x is 2, y is 3'

据我所知，这对于%是不可能的。

与等价的'x是{0}，y是{1}'。format（a.x，a.y）相比，这看起来比所需的更具可读性。仅应在a.x操作非常昂贵时使用。

–dtheodor
15年3月29日在14:03

@dtheodor通过调整使用关键字参数而不是位置参数...'x为{a.x}，y为{a.y}'。format（a = a）。比这两个示例更具可读性。

– CivFan
15年4月17日在21:11

@CivFan或者，如果您有多个对象，则'x是{a.x}，y是{a.y}'。format（** vars（））

–杰克
15年6月18日在17:02

还要注意以下相同的方式：'{foo [bar]}'。format（foo = {'bar'：'baz'}）。

– Antoine Pinsard
16年7月23日在22:01

这对于面向客户的应用程序非常有用，其中您的应用程序提供了一组标准的格式设置选项以及用户提供的格式字符串。我经常用这个。例如，配置文件将具有一些“ messagestring”属性，用户可以向其提供您的订单，编号{order [number]}已在{now：％Y-％m-％d％H：％M ：％S}，将在{orderη：％H：％M：％S}或他们希望的任何位置准备就绪。这比尝试与旧的格式化程序提供相同的功能要干净得多。它使用户提供的格式字符串更强大。

–塔威
16 Sep 30 '21：33

#9 楼

正如我今天发现的那样，通过%格式化字符串的旧方法不支持Decimal，即开箱即用的Python十进制定点和浮点算术模块。

示例（使用Python 3.3.5））：

#!/usr/bin/env python3

from decimal import *

getcontext().prec = 50
d = Decimal('3.12375239e-24') # no magic number, I rather produced it by banging my head on my keyboard

print('%.50f' % d)
print('{0:.50f}'.format(d))

输出：

0.00000000000000000000000312375239000000009907464850
0.00000000000000000000000312375239000000000000000000

>肯定有解决方法，但您仍然可以考虑立即使用format()方法。

这可能是因为新格式格式化在扩展参数之前调用了str（d），而旧格式格式化可能首先调用了float（d）。

–大卫·桑德斯（David Sanders）
14-10-21在19:53

您可能会这样认为，但str（d）返回“ 3.12375239e-24”，而不是“ 0.00000000000000000000000312375239000000000000000000”

–杰克
2015年6月18日在16:52

#10 楼

如果您的python> = 3.6，则F字符串格式的文字是您的新朋友。

它更简单，更干净且性能更好。

In [1]: params=['Hello', 'adam', 42]

In [2]: %timeit "%s %s, the answer to everything is %d."%(params[0],params[1],params[2])
448 ns ± 1.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [3]: %timeit "{} {}, the answer to everything is {}.".format(*params)
449 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit f"{params[0]} {params[1]}, the answer to everything is {params[2]}."
12.7 ns ± 0.0129 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

#11 楼

附带说明，您不必为了提高性能而在日志记录中使用新样式格式。您可以将任何对象传递给实现logging.debug魔术方法的logging.info，__str__等。当日志记录模块决定必须发出您的消息对象（无论它是什么）时，它会先调用str(message_object)。因此，您可以执行以下操作：

import logging


class NewStyleLogMessage(object):
    def __init__(self, message, *args, **kwargs):
        self.message = message
        self.args = args
        self.kwargs = kwargs

    def __str__(self):
        args = (i() if callable(i) else i for i in self.args)
        kwargs = dict((k, v() if callable(v) else v) for k, v in self.kwargs.items())

        return self.message.format(*args, **kwargs)

N = NewStyleLogMessage

# Neither one of these messages are formatted (or calculated) until they're
# needed

# Emits "Lazily formatted log entry: 123 foo" in log
logging.debug(N('Lazily formatted log entry: {0} {keyword}', 123, keyword='foo'))


def expensive_func():
    # Do something that takes a long time...
    return 'foo'

# Emits "Expensive log entry: foo" in log
logging.debug(N('Expensive log entry: {keyword}', keyword=expensive_func))

Python 3文档（https://docs.python.org/3/howto/logging -cookbook.html＃formatting-styles）。但是，它也可以与Python 2.6配合使用（https://docs.python.org/2.6/library/logging.html#using-arbitrary-objects-as-messages）。

其中之一使用这种技术的优点，除了它的格式风格不可知之外，还在于它允许使用惰性值，例如上面的功能expensive_func。这为Python文档中的建议提供了更优雅的替代方法：https：//docs.python.org/2.6/library/logging.html#optimization。

我希望我能对此再投票。它允许使用格式进行日志记录而不会降低性能-通过按日志记录的设计目的精确覆盖__str__来实现-将函数调用缩短为单个字母（N），感觉与某些定义字符串的标准方法非常相似- AND允许延迟函数调用。谢谢！ +1

– CivFan
15年1月26日在22:21

这与使用logging.Formatter（style ='{'）参数的结果有何不同？

– davidA
17年5月10日在2:03

#12 楼

在格式化正则表达式时，%可能会有所帮助的一种情况。例如，

'{type_names} [a-z]{2}'.format(type_names='triangle|square')

提高IndexError。在这种情况下，可以使用：

'%(type_names)s [a-z]{2}' % {'type_names': 'triangle|square'}

这避免了将正则表达式写为'{type_names} [a-z]{{2}}'。当您有两个正则表达式时，这很有用，其中一个正则表达式单独使用而没有格式，但是两个正则表达式的连接都已格式化。

或者只使用'{type_names} [a-z] {{2}}'。format（type_names ='triangle | square'）。这就像说.format（）在使用已经包含百分比字符的字符串时可以提供帮助。当然。那你就必须逃脱它们。

– Alfe
17 Mar 2 '17 at 16:24

@Alfe您是对的，这就是为什么答案开头为“％可能会帮助您格式化正则表达式的情况”。具体来说，假设a = r“ [a-z] {2}”是一个正则表达式块，您将在两个不同的最终表达式中使用它（例如c1 = b + a和c2 = a）。假设c1需要格式化（例如b需要在运行时格式化），而c2则不需要。然后，对于c2，需要a = r“ [a-z] {2}}”；对于c1.format（...），需要a = r“ [a-z] {{2}}”。

–乔治·雷涛（Jorge Leitao）
17 Mar 2 '17 at 16:41

#13 楼

我要补充一点，从3.6版开始，我们可以像下面这样使用fstrings

foo = "john"
bar = "smith"
print(f"My name is {foo} {bar}")

给哪个

我的名字叫john smith

所有内容都转换为字符串

mylist = ["foo", "bar"]
print(f"mylist = {mylist}")

结果：

mylist = ['foo'，'bar']

您可以传递函数，就像其他格式的方法一样

print(f'Hello, here is the date : {time.strftime("%d/%m/%Y")}')

给定例如

您好，这是日期：16/04/2018

#14 楼

对于python版本> = 3.6（请参阅PEP 498）

s1='albha'
s2='beta'

f'{s1}{s2:>10}'

#output
'albha      beta'

#15 楼

Python 3.6.7比较：

#!/usr/bin/env python
import timeit

def time_it(fn):
    """
    Measure time of execution of a function
    """
    def wrapper(*args, **kwargs):
        t0 = timeit.default_timer()
        fn(*args, **kwargs)
        t1 = timeit.default_timer()
        print("{0:.10f} seconds".format(t1 - t0))
    return wrapper


@time_it
def new_new_format(s):
    print("new_new_format:", f"{s[0]} {s[1]} {s[2]} {s[3]} {s[4]}")


@time_it
def new_format(s):
    print("new_format:", "{0} {1} {2} {3} {4}".format(*s))


@time_it
def old_format(s):
    print("old_format:", "%s %s %s %s %s" % s)


def main():
    samples = (("uno", "dos", "tres", "cuatro", "cinco"), (1,2,3,4,5), (1.1, 2.1, 3.1, 4.1, 5.1), ("uno", 2, 3.14, "cuatro", 5.5),) 
    for s in samples:
        new_new_format(s)
        new_format(s)
        old_format(s)
        print("-----")


if __name__ == '__main__':
    main()

输出：

new_new_format: uno dos tres cuatro cinco
0.0000170280 seconds
new_format: uno dos tres cuatro cinco
0.0000046750 seconds
old_format: uno dos tres cuatro cinco
0.0000034820 seconds
-----
new_new_format: 1 2 3 4 5
0.0000043980 seconds
new_format: 1 2 3 4 5
0.0000062590 seconds
old_format: 1 2 3 4 5
0.0000041730 seconds
-----
new_new_format: 1.1 2.1 3.1 4.1 5.1
0.0000092650 seconds
new_format: 1.1 2.1 3.1 4.1 5.1
0.0000055340 seconds
old_format: 1.1 2.1 3.1 4.1 5.1
0.0000052130 seconds
-----
new_new_format: uno 2 3.14 cuatro 5.5
0.0000053380 seconds
new_format: uno 2 3.14 cuatro 5.5
0.0000047570 seconds
old_format: uno 2 3.14 cuatro 5.5
0.0000045320 seconds
-----

您应该多次运行每个示例，一次运行可能会产生误导，例如操作系统通常可能很忙，因此代码的执行会延迟。请参阅docs：docs.python.org/3/library/timeit.html。（很好的头像，Guybrush！）

– jake77
19年2月8日在7:33

#16 楼

但是有一件事是，如果嵌套了花括号，则不能使用格式，但是%可以使用。

示例：

>>> '{{0}, {1}}'.format(1,2)
Traceback (most recent call last):
  File "<pyshell#3>", line 1, in <module>
    '{{0}, {1}}'.format(1,2)
ValueError: Single '}' encountered in format string
>>> '{%s, %s}'%(1,2)
'{1, 2}'
>>>

您可以执行此操作，但我同意它的功能非常强大'{{{0}，{1}}}'。format（1、2）

–西尔万·勒·登弗（Sylvan LE DEUNFF）
18-11-15在11:15

编程黑洞网