Why doesn't Python need a compiler?

  • Just wondering (now that I've started with C++ which needs a compiler) why Python doesn't need a compiler?

    I just enter the code, save it as an exec, and run it. In C++ I have to make builds and all of that other fun stuff.

    Python is just a language with many implementations. Iron Python is compiled in the same way C# and C++ is compiled, and there may be other implementations like it.

    C# and C++ are not compiled the same way - though you could argue that they both end up as machine instructions eventually, but if you do then you can say BASIC is compiled the same way too.

    @gbjbaanb but then again English is not compiled and the semantic analysis of one sentence might yield two equally valid results and the above could be read as "iron python is compiled just as C# and C++ is compiled"

    What platform / software are you using to write your Python code? If you write a .py file, it is not an executable. It is still a source code file. From the command line you are using the `python` command to interpret the .py file or if you use IDLE or Eclipse the IDE does it for you.

  • Python has a compiler! You just don't notice it because it runs automatically. You can tell it's there, though: look at the .pyc (or .pyo if you have the optimizer turned on) files that are generated for modules that you import.

    Also, it does not compile to the native machine's code. Instead, it compiles to a byte code that is used by a virtual machine. The virtual machine is itself a compiled program. This is very similar to how Java works; so similar, in fact, that there is a Python variant (Jython) that compiles to the Java Virtual Machine's byte code instead! There's also IronPython, which compiles to Microsoft's CLR (used by .NET). (The normal Python byte code compiler is sometimes called CPython to disambiguate it from these alternatives.)

    C++ needs to expose its compilation process because the language itself is incomplete; it does not specify everything the linker needs to know to build your program, nor can it specify compile options portably (some compilers let you use #pragma, but that's not standard). So you have to do the rest of the work with makefiles and possibly auto hell (autoconf/automake/libtool). This is really just a holdover from how C did it. And C did it that way because it made the compiler simple, which is one main reason it is so popular (anyone could crank out a simple C compiler in the 80's).


    Some things that can affect the compiler's or linker's operation but are not specified within C or C++'s syntax:

    • dependency resolution
    • external library requirements (including dependency order)
    • optimizer level
    • warning settings
    • language specification version
    • linker mappings (which section goes where in the final program)
    • target architecture

    Some of these can be detected, but they can't be specified; e.g. I can detect which C++ is in use with __cplusplus, but I can't specify that C++98 is the one used for my code within the code itself; I have to pass it as a flag to the compiler in the Makefile, or make a setting in a dialog.

    While you might think that a "dependency resolution" system exists in the compiler, automatically generating dependency records, these records only say which header files a given source file uses. They cannot indicate what additional source code modules are required to link into an executable program, because there is no standard way in C or C++ to indicate that a given header file is the interface definition for another source code module as opposed to just a bunch of lines you want to show up in multiple places so you don't repeat yourself. There are traditions in file naming conventions, but these are not known or enforced by the compiler and linker.

    Several of these can be set using #pragma, but this is non-standard, and I was speaking of the standard. All of these things could be specified by a standard, but have not been in the interest of backward compatibility. The prevailing wisdom is that makefiles and IDEs aren't broke, so don't fix them.

    Python handles all this in the language. For example, import specifies an explicit module dependency, implies the dependency tree, and modules are not split into header and source files (i.e. interface and implementation).

    The C implementation of Python is **CPython**, Cython is something different.

    Guess my memory got scrambled back in a "Why can't we get rid of the GIL" thread. Fixed.

    Other reasons why C compiled to machine code were that it was intended to be little more than a glorified assembler, because bytecode interpreters were technically unfeasible on the hardware they had, and because one of the most important tasks was writing an OS kernel.

    I don't see how C/C++ doesn't specify what the linker needs; after all, there are a lot of C and C++ programs out there.

    @Billy: I expanded on that in my answer. Short version is that makefiles aren't actually part of C or C++.

    @Mike: Erm, Python has the same problem, because it doesn't do dependency resolution at all (it always reads/parses all executed code on every run, even with JIT optimized implementations). You could make C++ compilation behave like Python compilation if you'd put everything in a single translation unit.

    @BillyONeal with the one big exception that in c/c++ you as a programmer has to do stuff in a certain way (either makefiles or dump every thing into the same blob) in python you just do your work and the compiler together with the VM takes care of the rest

    @RuneFS: Python has no object files to induce dependencies. There's nothing like make involved because everything is parsed on every run. I would call that a limitation of Python (that the entire source must be parsed every time) more than I'd call it a limitation of other languages.

    Unless you delete the `.pyc`/`.pyo` files, everything is *not* parsed on every run. This is actually a known source of unexpected behavior. For example, if you delete a `.py` file but not its corresponding `.pyc` file, Python will act as if the module still exists; it won't look for the `.py` file once it finds the `.pyc` file.

    @Billy I didn't say anything about object files I said there were things you need to do in C/c++ that you don't need to do when developing in Python which yur statement supports so I guess we agree

    _"C++ needs to expose its compilation process because the language itself is incomplete"_ Er, what??

    You read the part *right after that*, right? "it does not specify everything the linker needs to know to build your program, nor can it specify compile options portably." You can't just build *any* C++ file by feeding it to a compiler; frequently you have to provide metadata like compile flags, include paths, etc. This metadata isn't specified by the standard and is not portable, which is why we have to drag in other things like make, cmake, Visual Studio, or whatever to finish the job. So the standard has to call out some things as in the compilation unit and others as program-wide.

  • Python is an interpreted language. This means that there is software on your computer that reads the Python code, and sends the "instructions" to the machine. The Wikipedia article on interpreted languages might be of interest.

    When a language like C++ (a compiled language) is compiled, it means that it is converted into machine code to be read directly by the hardware when executed. The Wikipedia article on compiled languages might provide an interesting contrast.

    There are a lot more steps than "reads Python and sends the instructions to the machine". One of those steps is indeed a *compiler*. Now, you might say that the bytecode used by CPython is interpreted, but that's just a CPython thing and isn't the same across all Python implementations. For example, Jython compiles to the Java VM which gets JIT compiled to machine code. Or, PyPy compiles directly from Python to machine code.

    There is no such thing as an interpreted or a compiled language. A language is an abstract set of mathematical rules. A language isn't compiled or interpreted. A language just *is*. Compilation and interpretation are traits of the compiler or interpreter (duh!), not the language. Every language can be implemented with a compiler and every language can be implemented with an interpreter. Most languages have both compiled and interpreted implementations. There are interpreters for C++ and there are compilers for Python. (In fact, *all* currently existing Python implementations have compilers.)

    The majority of modern high-performance language implementations combine both an interpreter and a compiler (or even several compilers) for maximum performance. Actually, it is *impossible* to run *any* program *without* an interpreter. After all, a compiler is just a program which translates a program from one language to another language. But at *some* point you have to actually *run* the program, which is done by an interpreter (which may or may not be implemented in silicon).

    Coming back to my original point: the term "compiled language" doesn't make sense. It's not even *wrong*, it's simply nonsensical. If English were a typed language, "compiled language" would be a compile error.

    @JörgWMittag: You are technically right. However, most languages were designed for an interpreted context or for full compilation. Writing an interpreter for GW BASIC or Common Lisp is much easier than writing one for, say, C++ or C#; Python loses many of its selling points without the interactive environment; writing a compiler for PHP is pretty damn hard, and probably horribly inefficient, as the compiled executable would have to contain the entire PHP interpreter, due to eval() and similar constructs - one could argue that such a compiler would be cheating.

    @tdammers, yes. We can reasonably use "compiled language" to mean "language usually compiled." But that misses the point that PHP, Java, Python, Lua, and C# are all implemented as compilers to bytecode. All of these languages have also had JIT's implemented for them. So really, you can't really call some of these languages compiled and some interpreted because they've got the same implementation strategy.

    Its also worth pointing out that there is a PHP compiler called HipHop developed by facebook. However, it works by not including features like `eval`.

    My point is that I think it is reasonable to talk about "compiled" vs. "interpreted" languages; even though the terms are technically wrong, it should be somewhat obvious what is meant.

    @WinstonEwert: Strictly speaking, C# and Java are compiled, and their language semantics work that way. The bytecode is JIT'd of course.

    @BillyONeal, but PHP/Python/Lua are also all compiled to a bytecode just like C# and Java. I do no think there is a useful distinction between compiled and interpreted languages as they stand today.

    @Winston: The difference is that Python/PHP/Lua require the source to be available during execution; the bytecode bit for these languages is an implementation detail rather than being part of the language itself. These languages take advantage of the fact that you have to deploy the entire interpreter to a machine in order to run code written in such languages. There are good things and bad things about either approach, of course. At the end though, Python/PHP/Lua are designed like scripting languages. C/C++/C#/Java/Pascal/etc. are not. A lot of that is the result of their compilation models.

    @BillyONeal, not true at least for python. You can distribute python bytecode and run that without the source. But is true that you can't distribute python without a compiler.

    @WinstonEwert Strictly speaking HipHop is not a compiler, but a translator). The difference being that the output is a high level language.

    @YannisRizos, I don't see that I've mentioned HipHop anywhere in this discussion much less called it a compiler.

    @YannisRizos, I swear I did a page search to see where if I mentioned that...

    @Jörg W Mittag: When people say that *Python is an interpreted language* they refer to the reference implementation of Python that uses interpreter; the don't say that it is impossible to write a Python compiler. It is just a matter of agreed upon terms. Right or wrong, like it or not, but that's what most of the people use to classify programming languages in this regard.

    @golem: However, CPython also uses a compiler. In fact, CPython *never* interprets Python, it *always* compiles it to a completely different language and then interprets *that* language. So, CPython compiles Python and interprets a completely different language, how does that make it a Python interpreter? If the completely different language it compiles to happened to be x66 machine code which then gets interpreted by the CPU, would that still make it a Python interpreter? What is the difference between Python getting compiled to a different language and that language then being interpreted …

    … by a "virtual CPU" and C being compiled to a different language and that language then being interpreted by a virtual CPU (e.g. QEmu)?

    @Jörg W Mittag: Python interpreter is called so because people agreed to call the facility that translates Python code to bytecode and then interprets it a Python interpreter. For example check the first two paragraphs of the latest official Python tutorial. They call Python language *interpreted* and use *Python interpreter* term. When a processor interprets machine code they say that process executes it, though you can always say that processor just interprets the opcodes. It's just that you will be understood by less people. Terms may be ...

    ... technically wrong, but if they let people understand what is going on more easily and clearly they will be preferred to more complex abstractions.

  • Not all compiled languages have an in-your-face edit-compile-link-run cycle.

    What you're running into is a feature/limitation of C++ (or at least C++ implementations).

    To do anything, you must store your code into files, and build a monolithic image by a process called linking.

    In particular, it's this monolithic linking process which is mistaken for the distinction between compiling and interpreting.

    Some languages do all this stuff much more dynamically, by eliminating the clumsy monolithic linking step, not by eliminating compiling to machine code. Source is still compiled to object files, but these are loaded into a run-time image, rather than linked into a monolithic executable.

    You say "reload this module", and it loads the source and interprets it, or compiles it, depending on some mode switch.

    Linux kernel programming has some of this flavor even though you're working in C. You can recompile a module and load and unload it. Of course, you're still aware that you're producing some executable thing, and it's managed by a complex build system, with still some manual steps. But the fact is that in the end you can unload and re-load just that small module and not have to restart the whole kernel.

    Some languages have an even more fine grained modularization than this, and the building and loading is done from within their run-time, so it is more seamless.

  • what a diversion from the initial question... A point not mentioned is that the source of a python program is what you use and distribute, from a user perspective it IS the program. We tend to simplify things into categories that are not well defined.

    Compiled programs are usually considered to be stand alone files of machine code. (admittedly often containing links to dynamic link libraries associated with specific operating systems). This said... there are variation of most programing language that could be described as compiled or interpreted.

    Python does not need a compiler because it relies on an application (called an interpreter) that compiles and runs the code without storing the machine code being created in a form that you can easily access or distribute.

  • All programming languages require translation from human concepts into a target machine code. Even assembly language must be translated into machine code. That translation usually takes place in the following phases:

    Phase 1: Analysis and translation (parsing) into an intermediate code. Phase 2: Translation of the intermediate code into target machine code with place holders for external references. Phase 3: Resolution of the external references and packaging into a machine executable program.

    This translation is often referred to as pre-compiling and "Just in time" (JIT) or run-time-compiling.

    Languages such as C, C++, COBOL, Fortran, Pascal (not all) and Assembly are precompiled languages that can be executed directly by the opertating system without need of an interpreter.

    Languages like Java, BASIC, C# and Python are interpreted. They all use that intermediate code created in Phase 1, but will sometimes differ in how they translate it into machine code. The simplest forms use that intermediate code to execute machine code routines that do the expected work. Others will compile the intermediate code down to machine code and do the external dependency fixing during runtime. Once compiled it can be immediately executed. As well the machine code is stored in a cache of previously-compiled reusable machine code that can later be reused if the function is needed again later. If a function has already been cached, the interpreter does not need to compile it again.

    Most modern high level languages fall into the interpreted (with JIT) category. It is mostly the older languages like C & C++ that is precompiled.

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM