How Does A Compiler Work?
Note: I am surprised that this hasn't been asked before, and if it has I could not find it in a search.
I've been on tons of websites, I've read tons of articles, and I have heard tons of explanations. Most of them were good, but they were all either to broad or too complicated or just plain bad. So my question is , how does a compiler work?
If this is a difficult, broad question, please tell me. But if not, please answer the question.
Too broad, at least the "How does it work" part. There are whole books written on that topic.
http://en.wikipedia.org/wiki/Compiler would be the Wikipedia link that is trivial to find, what specifically are you wondering? The question is broad enough that I'd be tempted to give the smart alec response of, "Compilers translate code from one language to another," as that is the general idea that has a lot of nuances within that once one starts to look at what does that really involve.
@JB King: I guess my question is "How does the compiler convert code from high-level to low-level"
May I suggest looking into loaders, linkers, and interpreters? Interpreters also convert code from high level to low level in a different way while loaders and linkers are used with compilers to create executables and libraries to some extent.
Any explanation of how a compiler works will either be too broad or too complicated. It's a complicated subject, and the compilers classes were the hardest computer-related courses I ever took.
A compiler is a program that translates the source code for another program from a programing language into executable code.
The source code is typically in a high-level programming language (e. g. Pascal, C, C++, Java, Perl, C#, etc.). The executable code may be a sequence of machine instructions that can be executed by the CPU directly, or it may be an intermediate representation that is interpreted by a virtual machine (e. g. Java byte code).
In short, a compiler converts a program from a human-readable format into a machine-readable format.
As to how a compiler works, that is indeed complicated. There are books and university courses on the subject. I will attempt to briefly outline the main stages of the process, but this will be a very cursory overview.
- Lexing - break up the text of the program into "tokens". The tokens are the "words" of the programming language, such as identifiers (keywords, variable names, function names, etc.) or operators (=, *, &, etc.).
- Parsing - convert the sequence of tokens into a parse tree, which is a data structure representing various language constructs: type declarations, variable declarations, function definitions, loops, conditionals, expressions, etc.
- Optimization - evaluate constant expressions, optimize away unused variables or unreachable code, unroll loops if possible, etc.
- Translate the parse tree into machine instructions (or JVM byte code).
Again, I stress that this is a very brief description. Modern compilers are very smart, and, consequently, very complicated.
Actually, it transform a language into another. Early C++ compiler did compile to C. The same goes for Vala compiler. Java compiler compiles into bytecode that isn't executable without a JVM's JIT compiler.
@deadalnix IMHO, the point is that you go from non-executable code to executable code. I would argue that C-front was not a compiler but a front-end to the C compiler. Or a stage in the compilation process, if you will. Virtual machines blur the boundary between "executable" and "non-executable", of course. Here I would simply consider executable code to be whatever goes into the virtual machine, like the byte code, and abstract away whatever goes on inside the VM, like JIT.
@Dima, it doesn't have to be from non-executable code to executable code. For instance you cannot execute JVM byte code directly on Windows machines.
@Thorbjørn Ravn Andersen: but the byte code is executable by the JVM. Isn't the whole point of a "virtual machine" to look like a real machine to the programmer?
@Dima, modern JVM's actually compile byte codes into machine code - yes, another compiler. Hence the distinction of "source language" and “executable code" is at best blurry.
@Thorbjørn Ravn Andersen: Yes, I know. But the question was posed at a high level of detail, at which we can just treat a JVM as a black box that executes byte code, without going into the details of JIT, etc. Just as we can simply say that a CPU executes machine code produced by a C compiler, without going into the details of microcode.
I would argue that traditionally a compiler converted a program from a human-readable format into a machine-readable format, just as Dima said. Variations such as Cfront converting C++ to C or javac converting Java to bytecode are more advanced topics which should probably be left until after explaining the basic, traditional concept to someone unfamiliar with it.