Most of us are familar with a For Each loop. If you have a WinForms app with any type of collection (List(Of T), array, ListBox Items, etc) there’s a good chance you have some type of loop, maybe one comparing two collections. Did you know there is a .NET method that has some serious advantages over a loop such as this. If you’d like to learn more,
A funny thing happened to me on the way to this blog…I learned something. Two things actually!
After seeing another post in the Forums about comparing lines in a TextBox, I decided to write an article on one of my favorite, but rarely seen, VB.NET methods: Intersect. The next article will discuss it’s counterpart: Except.
In the thread the OP laid out this situation: there are two multiline TextBoxes, each of which has a list of names from an XML file. The first contained one list, and the second contained a partial list of the first. The question was how to find the names in the first TextBox that were not in the second.
Immediately, the .NET Collection Except method came to mind (which will be covered in the next article). In this article we will review the Intersect method, because that was where I found some startling information. The reason for the discovery was because after posting a solution using Except, I had a thought: “I wonder if there really is any advantage of one over the other?”.
In order to compare Intersect over a For Each loop, which others had posted, I created an app with two ListBoxes, each with randomly generated items. The code to populate the ListBoxes, along with the Class level variables used:
<code>
Dim rndm As New Random
Dim sw As Stopwatch
Private Sub btnAddData_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnAddData.Click
For Each lb In Controls.OfType(Of ListBox)()
lb.Items.Clear()
Next
For i As Integer = 1 To 10000
lbNames1.Items.Add("A" & rndm.Next(0, 100001))
lbNames2.Items.Add("A" & rndm.Next(0, 100001))
Next
Debug.WriteLine("Item count in base collection: " & lbNames1.Items.Count)
End Sub
</code>
I then used the Intersect method in one Button Click event and a loop in another Button Click event. I also added a StopWatch to determine the amount of time each took. I wasn’t expecting all that much difference – previously when I’ve used a loop on a collection with a few dozen items, the results came back almost immediately.
<code>
Private Sub btnIntersect_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnIntersect.Click
lbIntersectItems.Items.Clear()
sw = New Stopwatch
sw.Start()
Dim arrSameLB1toLB2 As IEnumerable(Of String) = lbNames1.Items().OfType(Of String).Intersect(lbNames2.Items().OfType(Of String))
lbIntersectItems.Items.AddRange(arrSameLB1toLB2.Cast(Of String).ToArray)
sw.Stop()
Label1.Text = "Matches: " & lbIntersectItems.Items.Count.ToString & vbCrLf & _
"Elapsed: " & sw.ElapsedMilliseconds.ToString
Debug.WriteLine("Intersect:")
Debug.WriteLine(vbTab & "Matches: " & lbIntersectItems.Items.Count)
Debug.WriteLine(vbTab & "Elapsed: " & sw.ElapsedMilliseconds)
sw.Reset()
End Sub
Private Sub btnLoop_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnLoop.Click
lbLoopItems.Items.Clear()
sw = New Stopwatch
sw.Start()
For Each s As String In lbNames1.Items
If lbNames2.Items.Contains(s) Then
lbLoopItems.Items.Add(s)
End If
Next
sw.Stop()
Label2.Text = "Matches: " & lbLoopItems.Items.Count.ToString & vbCrLf & _
"Elapsed: " & sw.ElapsedMilliseconds.ToString
Debug.WriteLine("Loop:")
Debug.WriteLine(vbTab & "Matches: " & lbLoopItems.Items.Count)
Debug.WriteLine(vbTab & "Elapsed: " & sw.ElapsedMilliseconds)
sw.Reset()
End Sub
</code>
And with each ListBox containing 100 items, that was pretty much the case. However as I increased the number of items in each ListBox, the time difference grew substantially.
Here is a breakdown of the times (in milliseconds) to run each (note that I ran each 5 times, except the 100,000, which I only ran 3 times):
. Intersect Loop
. Range(ms) Avg(ms) Range(ms) Avg(ms)
# Items
100 . 1-1 1 2-5 3
1000 . 2-2 2 28-36 33
10000 . 34-47 40 2099-2180 2134
100000 . 189-197 193 ??? ??
As you can see, with only 100 items, both were comparable in average elapsed time. And even at 1000 items, while there is the beginning of a separation, an average of 33 ms for the loop is not a problem. However, once you get to 10,000 items, there is a significant difference: 40 ms for Intersect versus 2134 ms for the loop. Then at 100,000 items, Intersect still came in at a respectable 193 ms (substantially faster than the loop using 10,000 items). There are no times for the loop using 100,000 items because my app became unresponsive while running it, and after 3 tries at it and failing each time, I gave up.
That was my first interesting discovery – the time difference. The second came about because I noticed the number of matches being displayed in the labels were different. For example, in the 10,000 item test, if Intersect displayed 900 matches, the loop displayed 950. At first I was confused, then I thought “maybe Intersect is not only reporting the matches, but the distinct matches!”. To test this, I then added a LINQ query to the items returned from the loop and compared that number to Intersect. And guess what? They were the same.
<code>
Private Sub btnLoopDist_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnLoopDist.Click
lbLoopItemsDistinct.Items.Clear()
sw = New Stopwatch
sw.Start()
For Each s As String In lbNames1.Items
If lbNames2.Items.Contains(s) Then
lbLoopItemsDistinct.Items.Add(s)
End If
Next
Dim distitem = ((From d In lbLoopItemsDistinct.Items).Distinct().ToArray).Count
sw.Stop()
Label3.Text = "Matches: " & distitem.ToString & vbCrLf & _
"Elapsed: " & sw.ElapsedMilliseconds.ToString
Debug.WriteLine("Loop w/ Distinct:")
Debug.WriteLine(vbTab & "Matches: " & distitem)
Debug.WriteLine(vbTab & "Elapsed: " & sw.ElapsedMilliseconds)
sw.Reset()
End Sub
</code>
So not only is Intersect much faster the greater the number of items being compared, but it throws in a Distinct query also, free of charge.








[...] the first part of this article found here I reviewed the Collection Intersect method, explaining why it can be a valuable tool when [...]