Groupin' ain't easy

Using the group_by method in Ruby

April 9, 2015

I spent some time this week learning about the #group_by method for arrays.

Simply put, the group_by method takes an array and a code block and returns a hash that contains your array split up and grouped by the instructions of your code block. Check out the examples below to get an idea of how it works.

I started with an array called sf_group that contains all of my SF cohort's names.

Group by first letter of first name

sf_group.group_by { |i| i[0] }

The code block here looks at every first letter in every element of the array and groups them up accordingly. This hash is returned very nicely because it's implictly in alphabetical order.

#=> {"A"=>["Alex Cusack", "Alex Kass", "Andrew Donato", "Andrew Dowd", "Arielle Chen"], "B"=>["Brian Kennedy"], "C"=>["Carissa Blossom", "Christopher William Kramlich"], "D"=>["David Matthew Grotting"], "E"=>["Ellis Marte", "Emily Boerner", "Eric Flores", "Eric Saxman"], "I"=>["Ian Andrew Harris"], "J"=>["James Newman", "Jeffrey Chang", "Jennifer Jaochico", "Josh Ullman"], "K"=>["Karan Aditya"], "L"=>["Le Tran"], "M"=>["Michael Farr", "Miche Weinberg"], "P"=>["Patrick Betti"], "R"=>["Ryan Posthumus"], "T"=>["Tanay Arora", "Travis R Siebenhaar"], "V"=>["Vivek Poola"]}

I thought about grouping by letters in last name, but that would require a lot more logic. In theory I would start from the back of the name and stop when I hit the first bit of white space. Then I would use the first letter after the whitespace and assume that was the first letter in the last name. But that doesn't work for names like Van Dam. If I had to do such a thing I might break the data up FIRST, and then populate my array. The key to good output is good input.

Group by names that include 'k'

sf_group.group_by { |i| i.scan(/k/i) }

The block I used with the group_by method above returns groups of the letter K. One group has the lowercase k's, one has the uppercase K's and the last group is everything without a k. Color added for emphasis.

#=> {["k"]=>["Alex Cusack", "Patrick Betti", "Vivek Poola"], ["K"]=>["Alex Kass", "Brian Kennedy", "Christopher William Kramlich", "Karan Aditya"], []=>["Andrew Donato", "Andrew Dowd", "Arielle Chen", "Carissa Blossom", "David Matthew Grotting", "Ellis Marte", "Emily Boerner", "Eric Flores", "Eric Saxman", "Ian Andrew Harris", "James Newman", "Jeffrey Chang", "Jennifer Jaochico", "Josh Ullman", "Le Tran", "Michael Farr", "Miche Weinberg", "Ryan Posthumus", "Tanay Arora", "Travis R Siebenhaar"]}

I first tried to group by names including K's using #include?("k"). But that method would only group values with lowercase K's. I wanted uppercase and lowercase K's. So I read up on regular expressions and found #scan. This is going to come in handy! /k/ searches for the letter K and the i at the end ignores the case of the letter.

Group by amount of whitespace

sf_group.group_by { |i| i.scan(/\s/) }

My goal for this was to find out who had more than two names by seeing how many spaces were in each name. But that's not really the point of this method. The point is to show you how many occurences there were and group them together.

#=> {[" "]=>["Alex Cusack", "Alex Kass", "Andrew Donato", "Andrew Dowd", "Arielle Chen", "Brian Kennedy", "Carissa Blossom", "Ellis Marte", "Emily Boerner", "Eric Flores", "Eric Saxman", "James Newman", "Jeffrey Chang", "Jennifer Jaochico", "Josh Ullman", "Karan Aditya", "Le Tran", "Michael Farr", "Miche Weinberg", "Patrick Betti", "Ryan Posthumus", "Tanay Arora", "Vivek Poola"], [" ", " "]=>["Christopher William Kramlich", "David Matthew Grotting", "Ian Andrew Harris", "Travis R Siebenhaar"]}

But this does get me some results that I can then use to display who has more than two names.

Group by length of name

sf_group.group_by { |i| i.length }

#=> {11=>["Alex Cusack", "Andrew Dowd", "Ellis Marte", "Eric Flores", "Eric Saxman", "JoshUllman", "Tanay Arora", "Vivek Poola"], 9=>["Alex Kass"], 13=>["Andrew Donato", "Brian Kennedy", "Emily Boerner", "Jeffrey Chang", "Patrick Betti"], 12=>["Arielle Chen", "James Newman", "Karan Aditya", "Michael Farr"], 15=>["Carissa Blossom"], 28=>["Christopher William Kramlich"], 22=>["David Matthew Grotting"], 17=>["Ian Andrew Harris", "Jennifer Jaochico"], 7=>["Le Tran"], 14=>["Miche Weinberg", "Ryan Posthumus"], 19=>["Travis R Siebenhaar"]}

We were able to group everything up by the length of each full name, but the hash returned sorted by the first name of the first value. That's kinda funky, as I would prefer to see the hash ordered by the key. But hashes are unsorted by nature, so that can be bothersome to us detail-oriented folks.

If you run #sort on the hash, you will get an array that is much more to my liking.

#=> [[7, ["Le Tran"]], [9, ["Alex Kass"]], [11, ["Alex Cusack", "Andrew Dowd", "Ellis Marte", "Eric Flores", "Eric Saxman", "Josh Ullman", "Tanay Arora", "Vivek Poola"]], [12, ["Arielle Chen", "James Newman", "Karan Aditya", "Michael Farr"]], [13, ["Andrew Donato", "Brian Kennedy", "Emily Boerner", "Jeffrey Chang", "Patrick Betti"]], [14, ["Miche Weinberg", "Ryan Posthumus"]], [15, ["Carissa Blossom"]], [17, ["Ian Andrew Harris", "Jennifer Jaochico"]], [19, ["Travis R Siebenhaar"]], [22, ["David Matthew Grotting"]], [28, ["Christopher William Kramlich"]]]