Why is %s better than + for concatenation?

  • I understand that we should use %s to concatenate a string rather than + in Python.

    I could do any of:

    hello = "hello"
    world = "world"
    
    print hello + " " + world
    print "%s %s" % (hello, world)
    print "{} {}".format(hello, world)
    print ' '.join([hello, world])
    

    But why should I use anything other than the +? It's quicker to write concatenation with a simple +. Then if you look at the formatting string, you specify the types e.g. %s and %d and such. I understand it could be better to be explicit about the type.

    But then I read that using + for concatenation should be avoided even though it's easier to type. Is there a clear reason that strings should be concatenated in one of those other ways?

    Who told you it's better?

    `%s` isn't for concatenation, it's a conversion specification for string formatting derived from C's `printf(3)`. There are cases to for using that or a concatenation operator; which you use should be based on judgment of the situation, not dogma. How easy it is to write the code is entirely irrelevant because you're only going to do that once.

    I've refocused the question to *just* python (though I'm not a python person and there might still be glitches in the code). Please make sure that this is the question you are asking, make any appropriate updates and consider asking a *different* question if you are interested in C or Java.

    And now we have the superior f-strings! `print(f"{hello} {world}")`, has readability of concatenation since variables are seen where they occur in the string, and is faster than `str.format`.

  • Lie Ryan

    Lie Ryan Correct answer

    5 years ago
    1. Readability. The format string syntax is more readable, as it separates style from the data. Also, in Python, %s syntax will automatically coerce any non str types to str; while concatenation only works with str, and you can't concatenate str with int.

    2. Performance. In Python str is immutable, so the left and right string have to be copied into the new string for every pair of concatenation. If you concatenate four strings of length 10, you will be copying (10+10) + ((10+10)+10) + (((10+10)+10)+10) = 90 characters, instead of just 40 characters. And things gets quadratically worse as the number and size of the string increases. Java optimizes this case some of the times by transforming the series of concatenation to use StringBuilder, but CPython doesn't.

    3. For some use cases, the logging library provide an API that uses format string to create the log entry string lazily (logging.info("blah: %s", 4)). This is great for improved performance if the logging library decided that the current log entry will be discarded by a log filter, so it doesn't need to format the string.

    do you have any scientific or empiric source for #1? Because I think it's much **much** less readable (especially with more than 2 or three arguments)

    @L.Möller: I'm not quite sure what kind of source you expect from what is ultimately a subjective experience (ease of reading), but if you want my reasoning: 1) %s requires 2 extra characters per placeholder vs + requires minimum of 4 (or 8 if you follow PEP8, 13 if you coerce), 2) %s is enclosed in a single string, so it's easier to parse visually, with +, you has more moving parts: close string, operator, variable, operator, open string, 3) syntax coloring %s has one color for each functions: string and placeholder, with + you get three colorings: string, operator, and variable coloring.

    @L.Möller: 4) I have the option to put longer format strings in a variable or dictionary, away from where formatting needs to be done, 5) the format string can be user specified from a config file, command args, or database, the same can't be said with concatenations. But yeah, I also wouldn't use %s when I have more than 4-5 things to interpolate, instead I'd use the %(varname)s variant or "{foo}".format() in Python. I think the explicit names improves readability for longer format strings with lots of interpolated variables.

    Thing is, ease of reading is _not_ a (pure) subjective thing. (E.g.: 80 chars per line restriction: our monitors got larger, our range of view didn't; or the fact that that Gestalt principles also hold true to text). That's why I wondered wether you know a source. However, if I have `logging.info("blah: %s %s %s %s %s %s", "a", "b", "c", "d", "e", "f", "a")` it can be easily less readable than the alternative, since you cant even see on the first glance if the number of params is correct. If you use line breaks, you might not see the connection to the method anymore.

    I don't know whats "true", that's why I ask if you have evidence :-). Really agree with your second comment

    @L.Möller: ultimately though, it's my personal experience that I find that generally format string is easier to read than concatenation. If you want to say your experience says otherwise, I'm not going to argue with that. It's your experience, you are free to interpret it as you wish :)

    I find #2 to be suspect - do you have documented proof? I'm not supremely familiar with Java, but in C# *concatenation is faster than string interpolation*. I completely agree with #1 and really rely on that for deciding when to use which, but you have to remember interpolation requires an amount of string parsing and complexity where concatenation requires none of that.

    @JimmyHoffa: as I said, Java can optimize some cases of concatenation, so it can be as performant or better than %s in Java. If the format string is a constant, the compiler can in theory parse it at compile time to produce an optimized code which should be just as fast as StringBuilder. I don't know whether Java do this. Now that I think about it more though, I agree with you that the performance claim is probably not true in Java.

    @JimmyHoffa: But Python definitely don't optimize repeated concatenations, also in Python % is a single bytecode and the format string is parsed by C, while + involves one byte code each and a quadratic growth behaviour. I think I'm going to remove Java from #2.

    Are we trying to compare the performance of `sprintf("%s%s",foo,bar)` with `strcat(foo,bar)`? And logging has ***other*** issues at hand (`logging.info("foo: %s; bar: %s", foo, bar);` is less expensive than `logging.info("foo: " + foo + " bar: " + bar);` ***if*** the logging level is warn or higher and *more* costly if it is info or lower). That said, these are problems that are unclear in the question and leads to problems in the answer trying to guess at the OP's intent.

    I've attempted to focus the question to just python in an attempt to make sure that it matches the question that you answered, though there may be some different points you would like to bring up.

    @LieRyan: I very much doubt #2, do you have any actual measurements which proves that concatenating with % is faster than + ?

    @JacquesB: when formatting an IPv6-like string, %s is about 25% faster than + concatenation. If you increase n=15 (IPv15?), the gap increases even more that + concatenate takes double the time compared to %s. Even faster than either of these would be a `str.join()` based method, but `str.join()` is inflexible when you have more complex formats. In practice though, whenever you're dealing with strings you probably have to deal with a file or socket, and that will almost always overwhelm any performance gain you get from changing how you concatenate string.

    @JacquesB: PyPy is much smarter than CPython when optimizing code, and are able to do massive optimizations here. In PyPy, the same benchmark, + concatenation is more than 40 times faster than using %s, as you would have expected. I'd attribute this to PyPy being able to infer the type of the variables and produce JIT optimized code for the snippet that recognizes that it can avoid quadratic copying.

    @LieRyan: Thanks for the timing-code, very cool. But I tried with only three strings as in the original question, and here + is faster than %s. In the most common case, which is concatenating two strings, + is significantly faster. I don't really agree that performance is an important augment either way, since you are only dealing with a small and fixed number of strings.

    #2 is actually not true in CPython: It concatenates in place if possible, see https://stackoverflow.com/questions/4435169/good-way-to-append-to-a-string

    @LieRyan Since you acknowledge that the statement #1 is subjective and cannot be established as fact, perhaps it makes sense to make the other two items #1 and #2, and turn #1 into a statement that is qualified as being an opinion, subjective, or based on experience.

    @ThomasCarlisle: reading is a subjective experience, as is hearing, thinking, etc. But there are objective facts about these activities, despite the activities themselves being a subjective experience. There are colour combinations that are objectively bad for readability and there are character sequences that are objectively difficult to untangle. I wasn't saying that #1 is a subjective opinion.

    For Java devs. I find readability to be the main reason to go with templating instead of concatenation. It also makes easier to look for the strings into the code (when need It). However, I would not find format() to be a big deal for short strings. As is the case of the OP' example.

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM