for vs. foreach vs. LINQ

  • When I write code in Visual Studio, ReSharper (God bless it!) often suggests me to change my old-school for loop in the more compact foreach form.

    And often, when I accept this change, ReSharper goes a step forward, and suggests me to change it again, in a shiny LINQ form.

    So, I wonder: are there some real advantages, in these improvements? In pretty simple code execution, I cannot see any speed boost (obviously), but I can see the code becoming less and less readable... So I wonder: is it worth it?

    Just a note - LINQ syntax is actually pretty readable if you're familiar with SQL syntax. There are also two formats for LINQ (the SQL-like lambda expressions, and the chained methods), which might make it easier to learn. It could just be ReSharper's suggestions that make it seem unreadable.

    As a rule of thumb, I typically use foreach unless working with a known length array or similar cases where the number of iterations is relevant. As to LINQ-ifying it, I'll usually see what ReSharper makes of a foreach, and if the resulting LINQ statement is tidy / trivial / readable I use it, and otherwise I revert it back. If it would be a chore to re-write the original non-LINQ logic if requirements changed or if it might be necessary to granularly debug thru the logic the LINQ statement is abstracting away from, I don't LINQ it and leave it in long form.

    one common mistake with `foreach` is removing items from a collection while enumerating it, where usually a `for` loop is needed to start from the last element.

    You might take value from Øredev 2013 - Jessica Kerr - Functional Principles for Object-Oriented Developers. Linq comes into the presentation soon after the 33 minute mark, under the title "Declarative Style".

  • for vs. foreach

    There is a common confusion that those two constructs are very similar and that both are interchangeable like this:

    foreach (var c in collection)
    {
        DoSomething(c);
    }
    

    and:

    for (var i = 0; i < collection.Count; i++)
    {
        DoSomething(collection[i]);
    }
    

    The fact that both keywords start by the same three letters doesn't mean that semantically, they are similar. This confusion is extremely error-prone, especially for beginners. Iterating through a collection and doing something with the elements is done with foreach; for doesn't have to and shouldn't be used for this purpose, unless you really know what you're doing.

    Let's see what's wrong with it with an example. At the end, you'll find the full code of a demo application used to gather the results.

    In the example, we are loading some data from the database, more precisely the cities from Adventure Works, ordered by name, before encountering "Boston". The following SQL query is used:

    select distinct [City] from [Person].[Address] order by [City]
    

    The data is loaded by ListCities() method which returns an IEnumerable<string>. Here is what foreach looks like:

    foreach (var city in Program.ListCities())
    {
        Console.Write(city + " ");
    
        if (city == "Boston")
        {
            break;
        }
    }
    

    Let's rewrite it with a for, assuming that both are interchangeable:

    var cities = Program.ListCities();
    for (var i = 0; i < cities.Count(); i++)
    {
        var city = cities.ElementAt(i);
    
        Console.Write(city + " ");
    
        if (city == "Boston")
        {
            break;
        }
    }
    

    Both return the same cities, but there is a huge difference.

    • When using foreach, ListCities() is called one time and yields 47 items.
    • When using for, ListCities() is called 94 times and yields 28153 items overall.

    What happened?

    IEnumerable is lazy. It means that it will do the work only at the moment when the result is needed. Lazy evaluation is a very useful concept, but has some caveats, including the fact that it's easy to miss the moment(s) where the result will be needed, especially in the cases where the result is used multiple times.

    In a case of a foreach, the result is requested only once. In a case of a for as implemented in the incorrectly written code above, the result is requested 94 times, i.e. 47 × 2:

    • Every time cities.Count() is called (47 times),

    • Every time cities.ElementAt(i) is called (47 times).

    Querying a database 94 times instead of one is terrible, but not the worse thing which may happen. Imagine, for example, what would happen if the select query would be preceded by a query which also inserts a row in the table. Right, we would have for which will call the database 2,147,483,647 times, unless it hopefully crashes before.

    Of course, my code is biased. I deliberately used the laziness of IEnumerable and wrote it in a way to repeatedly call ListCities(). One can note that a beginner will never do that, because:

    • The IEnumerable<T> doesn't have the property Count, but only the method Count(). Calling a method is scary, and one can expect its result to not be cached, and not suitable in a for (; ...; ) block.

    • The indexing is unavailable for IEnumerable<T> and it's not obvious to find the ElementAt LINQ extension method.

    Probably most beginners would just convert the result of ListCities() to something they are familiar with, like a List<T>.

    var cities = Program.ListCities();
    var flushedCities = cities.ToList();
    for (var i = 0; i < flushedCities.Count; i++)
    {
        var city = flushedCities[i];
    
        Console.Write(city + " ");
    
        if (city == "Boston")
        {
            break;
        }
    }
    

    Still, this code is very different from the foreach alternative. Again, it gives the same results, and this time the ListCities() method is called only once, but yields 575 items, while with foreach, it yielded only 47 items.

    The difference comes from the fact that ToList() causes all data to be loaded from the database. While foreach requested only the cities before "Boston", the new for requires all cities to be retrieved and stored in memory. With 575 short strings, it probably doesn't make much difference, but what if we were retrieving only few rows from a table containing billions of records?

    So what is foreach, really?

    foreach is closer to a while loop. The code I previously used:

    foreach (var city in Program.ListCities())
    {
        Console.Write(city + " ");
    
        if (city == "Boston")
        {
            break;
        }
    }
    

    can be simply replaced by:

    using (var enumerator = Program.ListCities().GetEnumerator())
    {
        while (enumerator.MoveNext())
        {
            var city = enumerator.Current;
            Console.Write(city + " ");
    
            if (city == "Boston")
            {
                break;
            }
        }
    }
    

    Both produce the same IL. Both have the same result. Both have the same side effects. Of course, this while can be rewritten in a similar infinite for, but it would be even longer and error-prone. You're free to choose the one you find more readable.

    Want to test it yourself? Here's the full code:

    using System;
    using System.Collections.Generic;
    using System.Data;
    using System.Data.SqlClient;
    using System.Diagnostics;
    using System.Linq;
    
    public class Program
    {
        private static int countCalls;
    
        private static int countYieldReturns;
    
        public static void Main()
        {
            Program.DisplayStatistics("for", Program.UseFor);
            Program.DisplayStatistics("for with list", Program.UseForWithList);
            Program.DisplayStatistics("while", Program.UseWhile);
            Program.DisplayStatistics("foreach", Program.UseForEach);
    
            Console.WriteLine("Press any key to continue...");
            Console.ReadKey(true);
        }
    
        private static void DisplayStatistics(string name, Action action)
        {
            Console.WriteLine("--- " + name + " ---");
    
            Program.countCalls = 0;
            Program.countYieldReturns = 0;
    
            var measureTime = Stopwatch.StartNew();
            action();
            measureTime.Stop();
    
            Console.WriteLine();
            Console.WriteLine();
            Console.WriteLine("The data was called {0} time(s) and yielded {1} item(s) in {2} ms.", Program.countCalls, Program.countYieldReturns, measureTime.ElapsedMilliseconds);
            Console.WriteLine();
        }
    
        private static void UseFor()
        {
            var cities = Program.ListCities();
            for (var i = 0; i < cities.Count(); i++)
            {
                var city = cities.ElementAt(i);
    
                Console.Write(city + " ");
    
                if (city == "Boston")
                {
                    break;
                }
            }
        }
    
        private static void UseForWithList()
        {
            var cities = Program.ListCities();
            var flushedCities = cities.ToList();
            for (var i = 0; i < flushedCities.Count; i++)
            {
                var city = flushedCities[i];
    
                Console.Write(city + " ");
    
                if (city == "Boston")
                {
                    break;
                }
            }
        }
    
        private static void UseForEach()
        {
            foreach (var city in Program.ListCities())
            {
                Console.Write(city + " ");
    
                if (city == "Boston")
                {
                    break;
                }
            }
        }
    
        private static void UseWhile()
        {
            using (var enumerator = Program.ListCities().GetEnumerator())
            {
                while (enumerator.MoveNext())
                {
                    var city = enumerator.Current;
                    Console.Write(city + " ");
    
                    if (city == "Boston")
                    {
                        break;
                    }
                }
            }
        }
    
        private static IEnumerable<string> ListCities()
        {
            Program.countCalls++;
            using (var connection = new SqlConnection("Data Source=mframe;Initial Catalog=AdventureWorks;Integrated Security=True"))
            {
                connection.Open();
    
                using (var command = new SqlCommand("select distinct [City] from [Person].[Address] order by [City]", connection))
                {
                    using (var reader = command.ExecuteReader(CommandBehavior.SingleResult))
                    {
                        while (reader.Read())
                        {
                            Program.countYieldReturns++;
                            yield return reader["City"].ToString();
                        }
                    }
                }
            }
        }
    }
    

    And the results:

    --- for ---
    Abingdon Albany Alexandria Alhambra [...] Bonn Bordeaux Boston

    The data was called 94 time(s) and yielded 28153 item(s).

    --- for with list ---
    Abingdon Albany Alexandria Alhambra [...] Bonn Bordeaux Boston

    The data was called 1 time(s) and yielded 575 item(s).

    --- while ---
    Abingdon Albany Alexandria Alhambra [...] Bonn Bordeaux Boston

    The data was called 1 time(s) and yielded 47 item(s).

    --- foreach ---
    Abingdon Albany Alexandria Alhambra [...] Bonn Bordeaux Boston

    The data was called 1 time(s) and yielded 47 item(s).

    LINQ vs. traditional way

    As for LINQ, you may want to learn functional programming (FP) - not C# FP stuff, but real FP language like Haskell. Functional languages have a specific way to express and present the code. In some situations, it is superior to non-functional paradigms.

    FP is known being much superior when it comes to manipulating lists (list as a generic term, unrelated to List<T>). Given this fact, the ability to express C# code in a more functional way when it comes to lists is rather a good thing.

    If you're not convinced, compare the readability of code written in both functional and non-functional ways in my previous answer on the subject.

    Question about the ListCities() example. Why would it run only once ? I have had no problems foreaching over yield returns in the past.

    He's not saying you would only get one result from the IEnumerable - he's saying that the SQL query (which is the expensive part of the method) would only execute once - this is a good thing. It would then read and yield all of the results from the query.

    @HappyCat Oh, ok. I understood as if there is only 1 result.

    "Given this fact, the ability to express C# code in a more functional way when it comes to lists is rather a good thing.": In functional languages traditional imperative-style loops are discouraged (see e.g. Scala) or even completely absent (see e.g. Haskell). I think it is a good thing that a language chooses one approach or the other: offering both can be confusing and, in this sense, beccoblu's question it totally understandable.

    @Giorgio: While this question understandable, having a language's semantics pander to what a beginner might find confusing would not leave us with a very effective language.

    LINQ isn't just semantic sugar. It provides delayed execution. And in the case of IQueryables (e.g. Entity Framework) allows for the query to be passed and composed until it's iterated (meaning that adding a where clause to a returned IQueryable will result in the SQL passed to the server upon iteration to include that where clause offloading the filtering onto the server).

    @SnOrfus: I agree with you. On the other hand I know of experienced programmers that are confused by programming languages that are too complex (or that offer too many different but equivalent ways of solving the same problem). I do not know if `C#` is a good example of this because I have never used `C#`, but this question caught my attention because I have observed this in practice.

    Much as I like this answer, I think the examples are somewhat contrived. The summary at the end suggests that `foreach` is more efficient than `for`, when in actual fact the disparity is a result of deliberately broken code. The thoroughness of the answer redeems itself, but it's easy to see how a casual observer might come to the wrong conclusions.

    @RobertHarvey: you're right, giving performance metrics like this is terrible, since some readers might just look at them without reading the whole answer. I removed the metrics; feel free to modify the answer if it will help to avoid more the confusion.

    "Iterating through a collection and doing something with the elements is done with foreach;" is not true if you modify the collection (ex. deleting an item). Otherwise, it's an excellent answer!

    I wish I could give this answer many more up-votes, but alas I cannot. This was an awesome answer!

  • While there are some great expositions already on the differences between for and foreach. There are some gross misrepresentations of the role of LINQ.

    LINQ syntax is not just syntactic sugar giving a functional programming approximation to C#. LINQ provides Functional constructs including all the benefits thereof to C#. Combined with returning IEnumerable instead of IList, LINQ provides deferred execution of the iteration. What people typically do now is construct and return an IList from their functions like so

    public IList<Foo> GetListOfFoo()
    {
       var retVal=new List<Foo>();
       foreach(var foo in _myPrivateFooList)
       {
          if(foo.DistinguishingValue == check)
          {
             retVal.Add(foo);
          }
       }
       return retVal;
    }
    

    Instead use the yield return syntax to create a deferred enumeration.

    public IEnumerable<Foo> GetEnumerationOfFoo()
    {
       //no need to create an extra list
       //var retVal=new List<Foo>();
       foreach(var foo in _myPrivateFooList)
       {
          if(foo.DistinguishingValue == check)
          {
             //yield the match compiler handles the complexity
             yield return foo;
          }
       }
       //no need for returning a list
       //return retVal;
    }
    

    Now the enumeration won't occur until you ToList or iterate over it. And it only occurs as needed (here's an enumeration of Fibbonaci that doesn't have a stack overflow problem)

    /**
    Returns an IEnumerable of fibonacci sequence
    **/
    public IEnumerable<int> Fibonacci()
    {
      int first, second = 1;
      yield return first;
      yield return second;
      //the 46th fibonacci number is the largest that
      //can be represented in 32 bits. 
      for (int i = 3; i < 47; i++)
      {
        int retVal = first+second;
        first=second;
        second=retVal;
        yield return retVal;
      }
    }
    

    Performing a foreach over the Fibonacci function will return the sequence of 46. If you want the 30th that's all that will be calculated

    var thirtiethFib=Fibonacci().Skip(29).Take(1);
    

    Where we get to have a lot of fun is the support in the language for lambda expressions (combined with the IQueryable and IQueryProvider constructs, this allows for functional composition of queries against a variety of data sets, the IQueryProvider is responsible for interpreting the passed in expressions and creating and executing a query using the source's native constructs). I won't go into the nitty gritty details here, but there is a series of blog posts showing how to create a SQL Query Provider here

    In summary, you should prefer returning IEnumerable over IList when the consumers of your function will perform a simple iteration. And use the capabilities of LINQ to defer execution of complex queries until they are needed.

  • but I can see the code becoming less and less readable

    Readability is in the eye of the beholder. Some people might say

    var common = list1.Intersect(list2);
    

    is perfectly readable; others might say that this is opaque, and would prefer

    List<int> common = new List<int>();
    for(int i1 = 0; i1 < list1.Count; i1++)
    {
        for(int i2 = 0; i2 < list2.Count; i2++)
        {
            if (list1[i1] == list2[i2])
            {
                common.Add(i1);
                break;
            }
        }
    }
    

    as making it clearer what's being done. We can't tell you what you find more readable. But you might be able to detect some of my own bias in the example I have constructed here...

    Honestly I'd say that Linq makes the intention objectively more readable while for loops make the mechanism objectively more readable.

    I would run as fast as I can from someone that tells me that the for-for-if version is more readable than the intersect version.

    @Konamiman - That would depend on *what* a person looks for when they think of "readability." jk.'s comment illustrates this perfectly. The loop is more readable in the sense that you can easily see *how* it's getting its end result, while the LINQ is more readable in *what* the end result should be.

    That's why the loop goes into the implementation, and then you use Intersect everywhere.

    @Shauna: Imagine the for-loop version inside a method doing several other things; it's a mess. So, naturally, you split it into it's own method. Readability-wise, this is the same as IEnumerable.Intersect, but now you've duplicated framework functionality and introduced more code to maintain. The only excuse is if you need a custom implementation for behavioral reasons, but we're only talking about readability here.

    @Chris - I completely agree that it's not ideal (I'll take LINQ over the above any day, and I don't even really like .Net), but like you said, we're talking about readability, here. In that context, the for-loop, in and of itself, is readable in the mechanism sense. Anything can be rendered unreadable when mixed with a number of questionably related things, but that's getting into larger architectural/coding issues.

    I'd be suprised if anyone found the latter more readable; I would suspect something like that would be quickly turned into a re-usable method / function named Intersect... which is now part of the framework.

    When talking about readability I'd go for the linq expression most of the time. There are however complex expressions that I find completely unreadable

  • The difference between LINQ and foreach really boils down to two different programming styles: imperative and declarative.

    • Imperative: in this style you tell the computer "do this... now do this... now do this now do this". You feed it a program one step at a time.

    • Declarative: in this style you tell the computer what you want the result to be and let it figure out how to get there.

    A classic example of these two styles is comparing assembly code (or C) with SQL. In assembly you give instructions (literally) one at a time. In SQL you express how to join data together and what result you want from that data.

    A nice side-effect of declarative programming is that it tends to be a bit higher level. This allows for the platform to evolve underneath you without you having to change your code. For instance:

    var foo = bar.Distinct();
    

    What is happening here? Is Distinct using one core? Two? Fifty? We don't know and we don't care. The .NET devs could rewrite it at any time, as long as it continues to perform the same purpose our code could just magically get faster after a code update.

    This is the power of functional programming. And the reason that you'll find that code in languages like Clojure, F# and C# (written with a functional programming mindset) is often 3x-10x smaller than it's imperative counterparts.

    Finally I like the declarative style because in C# most of the time this allows me to write code that does not mutate the data. In the above example, Distinct() does not change bar, it returns a new copy of the data. This means that whatever bar is, and where ever it came from, it not suddenly changed.

    So like the other posters are saying, learn functional programming. It'll change your life. And if you can, do it in a true functional programming language. I prefer Clojure, but F# and Haskell are also excellent choices.

    LINQ processing is deferred until you actually iterate over it. `var foo = bar.Distinct()` is essentially an `IEnumerator` until you call `.ToList()` or `.ToArray()`. That is an important distinction because if you aren't aware of that it can lead to difficult to understand bugs.

  • Can other developers in the team read LINQ?

    If not then don't use it or one of two things will happen:

    1. Your code will be unmaintainable
    2. You'll be stuck with maintaining all your code and everything which relies on it

    A for each loop is perfect for iterating through a list but if that's not what you what to do then don't use one.

    hmm, I appreciate that for a single project this may be the answer, but for the medium to long term you should train your staff, otherwise you have a race to the bottom of code comprehension which doesn't sound like a good idea.

    That's great if you're at a company which allows time for formal training so you can book some time to train staff when implementation of LINQ begins. However if you're not then you're running the risk of the LINQ guy leaving before they've passed on the knowledge and being left with code which no one can understand.

    then make sure your new hires understand Linq if you cant train.

    Actually, there's a third thing that could happen: the other developers could put forth a small amount of effort and actually learn something new and useful. It's not unheard of.

    @InvertedLlama if I were at a company where the developers need formal training to understand new language concepts then I'd be thinking about finding a new company.

    -1 Linq is how many years old now? Either A) found a company with competent programmers, or B) tell them to stop wasting "down time" and learn their craft.

    "Linq is how many years old now": Not every new stuff that comes out is automatically worthwhile learning. If the existing techniques get the job done they are OK. You invest time (= energy and money) to learn something new if it makes you more effective. I am not saying that LINQ is not a good technology, but it is not worth learning if a team does not need it just for the sake of learning it.

    Perhaps you can get away with that attitude with libraries, but when it comes to core language features, that doesn't cut it. You can pick-and-choose frameworks. But a good .NET programmer needs to understand each and every feature of the language, and of the core platform (System.*). And considering that you can't even use EF properly without using Linq, I have to say...in this day and age, if you are a .NET programmer and you don't know Linq, you are incompetent.

    I would also add that training need not be that formal - code review is an excellent way of training people on 'new' language features

    if the question was some obscure technology I would agree. There is quite a bit of LINQ expertise on the web so even someone who does not understand it can probably figure it out with some good use of search engines.

    This has enough downvotes already, so I won't add to that, but an argument supporting ignorant/incompetent coworkers is never a valid one.

    learning new things like LINQ arguably isn't even a time/money-investment for any company, if the programmer is worth 10 cents they would spend a couple of hours each week staying up to date, reading blogs, trying new stuff out and overall getting better in their spare time. I have real issues seeing how you can stay competetive if you don't have a genuine interest in the craft.

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM