Thursday, June 25, 2015

Implementing a tree using two python dictionaries

A tree is a data structure where one to many relationships can be represented using nodes. These nodes can have values and may be represented using objects in an Object Oriented manner. Yet the idea of object usage may not be very appealing where the operations are time critical and the implementation subjects to memory constraints.

Though trees are used in heavy computational tasks such as neural networks (in a modified manner). Another area where related nodes/ objects are used is graphs. These can be used to simulate relationships that we see in social networks and so on.

This post mainly focuses on building a tree structure to support large number of raw data at a very high access and modifying speed.
This also supports path tracing where the path between two nodes could be derived as a list of nodes in between.

There are two dictionaries involved and their tasks are as below

treeDic = {}

This is used to store the parent nodes and their child nodes. Each entry will carry the value allocated for the tree node (parent) as well. This has been done for the completeness. Yet this could be performed in many other ways.

Data here is stored as follows.
treeDic = {1 : [ value, [ 2, 3 ] ] }
  • 1 - This is the parent node value
  • value - This is an integer to store the value of the node/ the data item or even a separate object
  • [ 2 , 3 ] - The array containing the child nodes
parentDic = {}

This is used to keep the parent nodes of each of the nodes. It is important that we note that there will be no parent node for the tree root and this has to be checked before we proceed with the upward tree traversal

The preconditions that are required are as follows
  •  All the parent-child nodes should be connected in some form
  • A child node can have only one parent
  • There could be no or more child nodes (This is not limited to a binary tree, this is k-ary)
 A small note on the getPath(node1, node2) function

This uses two temporary arrays where each array will save the path from the node to the tree root. This is done for both the nodes and then the node list is concatenated followed by getting the set(). This will provide the set of nodes that are there in between the given two nodes.

This could further be optimized which I have not mentioned but here is how. We could iterate through the nodes simultaneously and once we see a common node we may stop and then concatenate the arrays of nodes accordingly. Here the simultaneous looping will not negatively affect the time complexity since the deepest node will anyway have to iterate that number of times to meet the common node.

Before viewing the code, here are some advantages of this implementation.
  • Fast access to every node O(1)
  • Low memory consumption
  • Can have millions of nodes
  • Search is O(1) if we know the node ID
  • Tree is capable of traversing child-parent and parent-child either ways

No comments:

Post a Comment