How to find out what programming language a website is built in?

  • I think that it's fundamental for security testers to gather information about how a web application works and eventually what language it's written in.

    I know that URL extensions, HTTP headers, session cookies, HTML comments and style-sheets may reveal some information but it's still hard and not assured.

    So I was wondering: is there a way to determine what technology and framework are behind a website ?


    My tomcat server returns "CERN httpd" just to mess with people

    My first guess would be HTML

    @HagenvonEitzen If HTML had been a programming language it would have been named HTPL rather than HTML.

    `I think that it's fundamental for security testers to gather information about how a web application works and what language it's written in.` I think that, if even a security tester can't figure out what language the site is built in, that makes it more secure because then no one will know which exploits to try. (Yes, there are occasionally valid use cases for security through obscurity.)

    @MasonWheeler: figuring out what language the site is built in will only determine which exploits *not* to try. That won't make the site more secure.

    @BenoitEsnard well, if an attacker uses it to determine which exploits *not* to try, then it would be a security improvement if a site successfully misleads the attacker into thinking it's something different and thus the attacker skips trying the "proper" exploits.

    I use to be satisfied by just checking .php or .aspx to identify if website is on PHP or on ASP.NET webforms. Now a days, with URL routing and MVC framework it is quite hard for me to differentiate. :p thanks for the question.

  • There's no way to be 100% sure if you don't have access to the server, so it's about guessing. Here are some clues:

    • File extensions: login.php is most likely a PHP script.
    • HTTP headers: they may leak some information about the language which is running on the server, and some additional details like the version: X-Powered-By: PHP/7.0.0 means that the page was rendered by PHP.
    • HTTP Parameter Pollution: if you managed to guess which server is running, you can refine the guess.
    • Language limits: maximum post data, maximum number variable in GET and POST data, etc. It may be useful if the webmaster kept the default values.
    • Specific input: for example, PHP had some easter eggs.
    • Errors: triggering errors may also leak the language. Warning: Division by zero in /var/www/html/index.php on line 3 is PHP, for example.
    • File uploads: libraries may add metadata if the file is being modified server-side. For example, most sites resize users' avatars, and checking for EXIF data will leak CREATOR: gd-jpeg v1.0 (using IJG JPEG v90), default quality, which may help to guess which language is used.
    • Default filenames: Check if / and /index.php are the same page.
    • Exploits: reading a backup file, or executing arbitrary code on the server.
    • Open source: the website may have been open-sourced and is available somewhere on Internet.
    • About page: the webmaster may have thanked the language community in a "FAQ" or "About" page.
    • Jobs page: the development team may be recruiting, and they may have detailed the technologies they're using.
    • Social Engineering: ask the webmaster!
    • Public profiles: if you know who is working on the website (check LinkedIn and /humans.txt), you can check their public repos or their skills on online profiles (GitHub, LinkedIn, Twitter, ...).

    You may also want to know if the website is built with a framework or a CMS, since this will give information about the language used:

    • URLs: directories and pages are specific to certain CMS. For example, if some resources are located in the /wp-content/ directory, it means that WordPress have been used.
    • Session cookies: name and format.
    • CSRF tokens: name and format.
    • Rendered HTML: for example: meta tags order, comments.

    Note that all information coming from the server may be altered to trick you. You should always try to use multiple sources to validate your guess.

    You forget to mention some example that are from Java which use generally a cookie JSESSIONID for their session management. Login URL can betray unlerlying technology too, spring default URL for instance. Those example are for java but are surely true from some others

    Just a note: just because the http headers *say* they're powered by php, doesn't mean the site actually is. Although this example is more about the server platform, I know of a guy who would make his nginx server return Server: Microsoft-IIS/5.0 with every request so he could trick attackers into using the wrong attacks against the server. "It's too easy!" ~ *the attacker*. You're right about that! (This just goes to show that you can't trust headers)

    I liked the Parameter Pollution technique .. I'm sure that there are many more ways though

    @Walfrat: I've just detailed the CMS / framework part!

    @AhmedJerbi: I've added more techniques.

    @Benoit: thank you .. Many docs to read for the weekend :-)

    Another good one is checking the source to see if there are tell-tale signs of the use of some templating engine specific to a language.

    You forgot one of the simplest - looking at the jobs page. :)

    Nitpick: the first 9 will really only tell you what language was used to *deploy* the site, not to *build* it. E.g., if you determine that the site was deployed on a JVM, that doesn't tell you much, there are over 400 languages with implementations for the JVM, the site may have been built in Scala, Groovy, Clojure (which also has implementations for the CLI and ECMAScript), Fantom (ditto), Ruby (JRuby), Python (Jython), PHP (IBM P8, Quercus), ECMAScript (Mozilla Rhino, Oracle Nashorn, dyn.js). The same applies to the CLI (IronPython, IronRuby, IronJS, …). There are also many compilers that …

    … target PHP: haXe, Hack, Wasabi, …

    @mowwwalker: i've added that sign under the "rendered HTML" part. I'm not sure if you were thinking about another sign though, so let me know if I missed something!

    How about humans.txt?

    Or maybe I'm trolling you. /cgi-bin/postcomment.exe turns out to be a ksh script.

    If there's a hidden field named "__VIEWSTATE", and/or if the buttons say "href=javascript:__doPostBack" it's likely Off the top of my head I can't think of comparable "signatures" in other platforms, but, etc.

License under CC-BY-SA with attribution

Content dated before 6/26/2020 9:53 AM