How to find out what programming language a website is built in?
I think that it's fundamental for security testers to gather information about how a web application works and eventually what language it's written in.
I know that URL extensions, HTTP headers, session cookies, HTML comments and style-sheets may reveal some information but it's still hard and not assured.
So I was wondering: is there a way to determine what technology and framework are behind a website ?
@HagenvonEitzen If HTML had been a programming language it would have been named HTPL rather than HTML.
`I think that it's fundamental for security testers to gather information about how a web application works and what language it's written in.` I think that, if even a security tester can't figure out what language the site is built in, that makes it more secure because then no one will know which exploits to try. (Yes, there are occasionally valid use cases for security through obscurity.)
@MasonWheeler: figuring out what language the site is built in will only determine which exploits *not* to try. That won't make the site more secure.
@BenoitEsnard well, if an attacker uses it to determine which exploits *not* to try, then it would be a security improvement if a site successfully misleads the attacker into thinking it's something different and thus the attacker skips trying the "proper" exploits.
There's no way to be 100% sure if you don't have access to the server, so it's about guessing. Here are some clues:
- File extensions:
login.phpis most likely a PHP script.
- HTTP headers: they may leak some information about the language which is running on the server, and some additional details like the version:
X-Powered-By: PHP/7.0.0means that the page was rendered by PHP.
- HTTP Parameter Pollution: if you managed to guess which server is running, you can refine the guess.
- Language limits: maximum post data, maximum number variable in GET and POST data, etc. It may be useful if the webmaster kept the default values.
- Specific input: for example, PHP had some easter eggs.
- Errors: triggering errors may also leak the language.
Warning: Division by zero in /var/www/html/index.php on line 3is PHP, for example.
- File uploads: libraries may add metadata if the file is being modified server-side. For example, most sites resize users' avatars, and checking for EXIF data will leak
CREATOR: gd-jpeg v1.0 (using IJG JPEG v90), default quality, which may help to guess which language is used.
- Default filenames: Check if
/index.phpare the same page.
- Exploits: reading a backup file, or executing arbitrary code on the server.
- Open source: the website may have been open-sourced and is available somewhere on Internet.
- About page: the webmaster may have thanked the language community in a "FAQ" or "About" page.
- Jobs page: the development team may be recruiting, and they may have detailed the technologies they're using.
- Social Engineering: ask the webmaster!
- Public profiles: if you know who is working on the website (check LinkedIn and
/humans.txt), you can check their public repos or their skills on online profiles (GitHub, LinkedIn, Twitter, ...).
You may also want to know if the website is built with a framework or a CMS, since this will give information about the language used:
- URLs: directories and pages are specific to certain CMS. For example, if some resources are located in the
/wp-content/directory, it means that WordPress have been used.
- Session cookies: name and format.
- CSRF tokens: name and format.
- Rendered HTML: for example: meta tags order, comments.
Note that all information coming from the server may be altered to trick you. You should always try to use multiple sources to validate your guess.
You forget to mention some example that are from Java which use generally a cookie JSESSIONID for their session management. Login URL can betray unlerlying technology too, spring default URL for instance. Those example are for java but are surely true from some others
Just a note: just because the http headers *say* they're powered by php, doesn't mean the site actually is. Although this example is more about the server platform, I know of a guy who would make his nginx server return Server: Microsoft-IIS/5.0 with every request so he could trick attackers into using the wrong attacks against the server. "It's too easy!" ~ *the attacker*. You're right about that! (This just goes to show that you can't trust headers)
I liked the Parameter Pollution technique .. I'm sure that there are many more ways though
Another good one is checking the source to see if there are tell-tale signs of the use of some templating engine specific to a language.
Nitpick: the first 9 will really only tell you what language was used to *deploy* the site, not to *build* it. E.g., if you determine that the site was deployed on a JVM, that doesn't tell you much, there are over 400 languages with implementations for the JVM, the site may have been built in Scala, Groovy, Clojure (which also has implementations for the CLI and ECMAScript), Fantom (ditto), Ruby (JRuby), Python (Jython), PHP (IBM P8, Quercus), ECMAScript (Mozilla Rhino, Oracle Nashorn, dyn.js). The same applies to the CLI (IronPython, IronRuby, IronJS, …). There are also many compilers that …
@mowwwalker: i've added that sign under the "rendered HTML" part. I'm not sure if you were thinking about another sign though, so let me know if I missed something!
- File extensions: