the singularity of being and nothingness
PHP Diversion: Making Nested Arrays from Query Data
As if the title doesn’t give away the whole plot, I’m currently working on a project that uses PHP–gasp! No, I’ve not abandoned ColdFusion, and too be completely honest, I’m kind of a PHP noob.
Sure, I can do a fair amount of things in PHP…you know, querying databases, echoing content, that kind of stuff. Where I really get hurt, though, is when things start to get complex.
NOTE: If you don’t particularly feel like reading and just want to see some code, be sure to check out my GitHub repo for this.
Some Context
For example, I recently needed to build out an ExtJS View, and I wanted to have some pretty heavily-nested data. My ideal outcome would print something to the page like so:
Main Group 1 Category 1 Name 1 Name 2 Fav Color 1 Fav Color 2 Main Group 2 Category 1 Name 1 Name 2 Main Group 3 Category 2 Name 1 Category 3 Name 1
And so on. In other words, each step in the “tree” can have multiple children, which can themselves have multiple children, ad infinitum.
Now in ExtJS, creating a View is dead simple. You simply have to pass the View a JSON structure wherein each level is an array of objects. My problem, however, was not with the ExtJS side. Rather, it was turning a MySQL query result set into a structure that could be serialized into the proper format and sent off to ExtJS.
An Initial Attempt
Unfortunately, there is nothing in PHP like ColdFusion’s amazing “group” attribute in the <cfoutput> tag. So not knowing what to do, I headed out to The Google, and asked it to tell me about “creating nested json” and “nested data” and the like. For the most part, the answers I found suggested doing something like this:
- Set a “flag” for the group (this will be updated when you move to a different group)
- Loop over record set. When you reach a new “group” (e.g., group value != flag), reset the group flag, create a new array for the group data, nest child data, etc.
While this approach certainly works (I used it in my first several *attempts*), it gets hairy really quickly. Sure, it’s not a huge deal when you have a simple “group” -> “item” relationship. However, when you have to start nesting data in more complex fashions, it becomes incredibly unwieldy. Not only is it unsustainable as you have to replicate the functionality for EVERY change in data, but it’s also a nightmare because there are only so many “flags” and “nested flags” that the human brain can keep track of at a single time.
A Better Approach…But Still Not Great
I quickly abandoned this process. it was a headache to write, and the thought of having to do it N times for each query was maddening. So I stepped back a bit, and tried to think a bit more critically about my approach. One big problem, in my estimation, is that I was getting tripped up by the conversion from the flat result set from MySQL to the nested-array model I needed. The flags became a mess really quickly, because they were each trying to keep track of their place within the whole data set, instead of a “parent” only being concerned about whatever children it might have.
With this in mind, my second approach took the form of trying to iteratively reduce the data being dealt with at any particular “level” of the process. My first step was to create a quick function to transform the result set from MySQL into a super-dumb array.
function convertquerytoarray($query) { while($row = mysql_fetch_assoc($query)) { $rows[] = $row; } return $rows; }
With this done, I proceeded to leverage PHP’s foreach() to loop over each record:
$gidx = ''; $groups = array(); $allrows = convertquerytoarray($sqlresult); foreach($allrows as $row) { if($row['group'] != $gidx) { $carr = filterarraybykey($allrows,'group',$row['group']); ... } }
Now you’ll notice that in the midst of the “group” foreach() loop, I create a new array ($carr) and assign to it the value returned from the filterarraybykey() method. Here’s what that looks like:
function filterarraybykey($array,$key,$val) { return array_filter($array,function($row) use($key,$val) { return $val==$row[$key]; }); }
As I mentioned, the idea here is that for each “parent”, evaluating the children (and children’s children) would seem to be much easier if we limit down the data being evaluated within that relationship to only that data which is contained within the relationship.
So, let’s say that we have 3 groups: “Cats, Dogs, Hamsters”. When the “Cats” group is being evaluated, the filterarraybykey() method will take the full result set ($allrows), and return an array that has only the values that have a “group” value of “Cats”…maybe 5 records instead of the whole 20 of the full result set.
Now that I’ve limited the result set, I can continue the pattern farther down, applying the same technique to “categories” and “names”:
$gidx = ''; $groups = array(); $allrows = convertquerytoarray($sqlresult); foreach($allrows as $row) { if($row['group'] != $gidx) { $carr = filterarraybykey($allrows,'group',$row['group']); $cidx = ''; $categories = array(); foreach($carr as $row) { if($row['category'] != $cidx) { $narr = filterarraybykey($carr,'category',$row['category']); $nidx = ''; $names = array(); foreach($narr as $row) { if($row['name'] != $nidx) { $name = array('name'=>row['name']; array_push($names,$name); $nidx = $row['name']; } } $category = array('category'=>$row['category'],'names'=>$names); array_push($categories,$category); $cidx = $row['category']; } }; $group = array('group'=>$row['group'],'categories'=>$categories); array_push($groups,$group); $idx = $row['group']; } }
Alright, this feels a little better. Even though it still has a bunch of “flags,” the logic is pretty easy to follow. Since the result set being evaluated at each “level” of the nesting is constantly getting whittled, it eases the burden of trying to figure out where the particular record exists within the WHOLE data set. In this approach, you only have to worry about the current parent and siblings. Plus, since we have nested foreach() loops, each “level” is more or less self-contained.
Although this is a bit better, it still suffers from many of the same issues as the first. Namely, every time this needs to be changed, or every time I need something like this elsewhere (perhaps a grouping of only groups and names, with no categories), I have to recreate this nesting nightmare. If at any point along the way my flags or row pointers get off (via typo, omission, etc.), things will break, and frustration will ensue.
A Final (Hopefully Better) Solution
So while the last approach is still kind of kludgy, I think it’s moving in the right direction. Since each “level” of nesting represents a thin, relationally isolated slice of the total data set, it’s easier to drop things in and out of it to get the results you want. Therefore, with this as a basis, my last shot at a solution was an attempt to abstract the philosophy of this approach. I’ll paste the whole example, and then explain what’s going on:
function makenestedarray($array,$model) { // check if array is a mysql result; if so, convert it to a flat array $array = gettype($array)=='resource' ? convertquerytoarray($array) : $array; // set blank index for tracking which rows we've already added $idx = ''; // blank array to store all the data $master = array(); // retrieve value of "key" for the current iteratin of the data model $key = $model['key']; // loop over each row in the passed data array foreach($array as $row) { // if the value is not equal to the index value, evaluate row if($row[$key] != $idx) { // retrieve data for row based on model definition $item = makerowdata($row,$model); // if current level has "children" defined, evaluate each child if(isset($model['children'])) { // loop over array of children foreach($model['children'] as $child=>$val) { // whittle it down based on the child's data key $childarr = filterarraybykey($array,$key,$row[$key]); // recursively call makenestedarray() $children = makenestedarray($childarr,$val); // add nested data to parent array $item[$child]=$children; } } // done with all the looping; add current data item to master array array_push($master,$item); // update the index and kip rows if they have the same index $idx = $row[$key]; } } // yay! all finished; return the completed data object return $master; } function makerowdata($row,$struct) { $columns = array(); $keys = array_keys($struct); foreach($keys as $pos=>$key) { if($key != 'key' && $key != 'children') { $columns[$struct[$key]] = $row[$key]; } } return $columns; } $model = array( 'key'=>'group', 'group'=>'group', 'groupid'=>'id', 'children'=>array( 'categories'=>array( 'key'=>'category', 'category'=>'category', 'children'=>array( 'names'=>array( 'key'=>'names', 'nameid'=>'id', 'name'=>'name' ) ) ) ); $data = makenestedarray($sqlresult,$model); $returnjson = array("groups" => $data); // serialize this mug, and return the string return json_encode($returnjson);
My solution involves creating a recursive method to do what the last approach did–whittle down the data set being evaluated as the “level” increases. The big difference, of course, is that the recursive method couldn’t care less what the data being thrown at it is, and doesn’t particularly care how many levels of nesting there might be.
And this is what I like about this approach. Instead of manually looping and nesting data, all I have to do is define a “model” of the data in the form I’d like it be prepared, and the recursive makenestedarray() will do the rest.
This has a couple big benefits.
First, if I need to create various nested data arrays for ExtJS of different configurations (and I definitely do), I no longer have to worry about the headache of creating the nested nightmares of the first two “solutions.” I simply plot out exactly the data model that I want to return, and the method takes care of it for me.
And second, the “model” provides a nice visual verification of how the data I’m preparing is going to look when it’s returned as JSON. And if I need to make a quick tweak (like adding an additional data column to be returned at some level within this structure), it’s easy to add as I only need to update the model, and not the mechanism creating the ultimate output.
Wrapping Up
So what are your thoughts? This is the result of a lot of pain, headaches, and trial and error. Since I’m a total PHP noob, I acknowledge that it’s entirely possible (if not probable) that there are much cleaner, easier, and elegant solutions to this seemingly routine need. However, I wanted to write this up, if for nothing else than for my own benefit of reminiscing about the process I went through to get from absolute frustration to a sustainable solution that I feel pretty happy about.
Print article | This entry was posted by existdissolve on March 1, 2012 at 3:49 am, and is filed under PHP. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site. |
about 13 years ago
Hi,
Well I’m a Ext-JS, CF noob but php and mysql experienced. Without seeing what the mysql data structure looks like I don’t understand exactly what you’re trying to do, but let me guess:
It seems that you’re trying to recreate a hierarchy to pass back as JSON, from a hierarchy stored as parent/child records in a mysql table. I guess that you might want some filtering of results and/or mapping of field names to your JSON object names along the way.
If so, then yes, you must have a recursive function that returns the children of each parent. The question is where to put it. Personally I would always get mysql to do the heavy lifting.
So normally I would make the recursive function do a new query for each set of children and then loop through each child calling itself i.e. something like:
function get_children($parent_key, $extra_criteria) {
$cursor = mysql_query(“select * from mytable where parent_key = $parent_key and $extra_criteria”);
while ($row = mysql_fetch_array($cursor))
{
$row[‘children’] = get_children($row[‘child_key’],$extra_criteria);
$rows[] = $row;
}
mysql_free_result($cursor);
return $rows;
}
$rows = get_children($root,$extra_criteria);
If you want to map fields to new names you could pass your mappings into get_children at the start and then just loop through them inside the per-row loop so you only store the fields you want in rows and you change their names to whatever you want.
This means a lot more queries for mysql than simply picking up the entire set of data in one go but that should be ok as long as you’ve got an index on “parent_key”. And you do keep your memory use to a minimum so it should be the most scaleable approach.
In fact, if this isn’t the fastest, cleanest approach I’d go as far as to say that you must have a problem with your mysql setup and/or server that’s going to bite you somewhere else until you fix it! 😉
Does that make sense?
George
about 13 years ago
Hi George–
Thanks for the input! As far as what I’m trying to accomplish, it’s more generic than trying to recreate MySQL hierarchies in JSON…my ultimate goal with the method I outlined is to have a method I can run against any data set returned from MySQL and transform it into whatever data model I want, and finally return it in a JSON format that my JS app will understand and be able to use.
As far as the approach you outlined, it is something that I experimented with, but ultimately found it to be lacking. What you’re suggesting presumes a model where the data is housed within the same table. My need–which ultimately led me to the solution I outlined–was to create a “realized” hierarchy of data out of disparate tables (some of which have parent->child relationships, other which have sibling relationships, etc.). In order to use the method you’ve outlined, I’d have to significantly modify it to allow for dynamic references to table names, joins, etc., and I think it would become quickly muddled.
But on a higher level, one of the reasons I like my approach is that it’s (almost) completely agnostic of the data model and, more importantly, of the data structure in MySQL. In other words, I can spit out whatever I want from MySQL, apply my data model to it, and then pass it to the agnostic method. In this way, I don’t have to modify the table name being used each time, and all of the specificity for any particular result set can be isolated into my model and query files.
Re: the broader discussions of performance, I am curious as to which approach would be faster. Some of my datasets have a hierarchy depth of about 5. So let’s say we have 200 records being returned…that’s a lot of queries. In my approach, increasingly-filtered arrays are passed to each subsequent level of the hierarchy. I’m assuming arrays in PHP are generally pretty fast (especially if the array’s domain is progressively shrunk), but that’s just an assumption.
Obviously, either way there is a lot to process, so would querying and processing a result be faster than working on a filtered array? I have no reason to assume this, but it seems like working with the initial result would be better than dozens and dozens of queries. Perhaps I’ll have to work up an example of the alternate approach and see which performs better.
Thanks again for the input and for stimulating the conversation–look forward to any additional thoughts you might have!
about 12 years ago
I have like 30+ queries to map into complex JSON structures. This is going simplify things greatly. Thanks much!!
FYI – pulled the code from your git repo. In your makerowdata function you have $columns[$smodel[$key]] = $row[$key];
should be
$columns[$model[$key]] = $row[$key];
(drop the s off model)
about 12 years ago
Hi Mitch–
Glad it’s helpful to you! Thanks, also, for the heads-up on the typo. I’ve updated the git repo with the change. Best of luck!
about 11 years ago
Mitch,
This is outstanding. I thoughts this up the other day in my head and knew it would be a real task to write it.
You have put this together exactly as I would hope it would exist with recursion allowing it to nest as deeply as you want.
Thanks for this. Great job!
about 11 years ago
Thanks, glad it was helpful!
about 8 years ago
Thank you very much for your code. This snippets helped me a lot. The “model” solution very handy if i have different column. Once more, thank you very much.