How do you dive into large code bases?

  • What tools and techniques do you use for exploring and learning an unknown code base?

    I am thinking of tools like grep, ctags, unit-tests, functional test, class-diagram generators, call graphs, code metrics like sloccount, and so on. I'd be interested in your experiences, the helpers you used or wrote yourself and the size of the code base with which you worked.

    I realize that becoming acquainted with a code base is a process that happens over time, and familiarity can mean anything from "I'm able to summarize the code" to "I can refactor and shrink it to 30% of the size". But how to even begin?

    I would like to see how this gets answered as well; usually I end up just rewriting everything if code is too complex (or poorly written), and that's probably unacceptable/unwise for large projects.

  • ist_lion

    ist_lion Correct answer

    10 years ago

    what I've always done is the following:

    Open multiple copies of my editor (Visual Studio/Eclipse/Whatever) and then debug and do line breaks step through the code. Find out the flow of the code, stack trace through to see where the key points are and go from there.

    I can look at method after method - but it's nice if I can click on something and then see where in the code it's executed and follow along. Let's me get a feel for how the developer wanted things to work.

    Yes, set a breakpoint on a button that launched an important piece of logic, and step through. That's what I always do.

    +1 Yeah, that's what I do too, but I don't know of any way to make the job easy. In my experience, it can take weeks before I feel safe making any changes, and months before I'm "at home" in the code. It certainly helps if you can ask questions of the developers.

    in addition: I usually start by a feature. Say I want to know how does this send emails? so I look for "sendEmail", breakpoint there, and then do as described. Then you find out some magical component that does something, and go into that, and see how THAT works

    +1, but sometimes before setting up the breakpoints, I add printing function in the first line of almost all functions to see the functions call hierarchy.

    @mrz It's an interesting idea to add printing function. I think a tool can be made to automate this. And it can be not necessarily a printing function, but a custom logging function. So whenever we experiment a new feature with some unfamiliar code, we can easily find the method invoking chain for that feature in the log generated by the tool.

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM