python draw tree from dictionary

首页 算法

Source code please see in Github repository

Q1.Hash Tree

We have this hash functions shown in the figure below.

hash function

(a) Please write a program to generate a hash tree with max leaf size 3, output the nested list (or nested dict) of the hash tree hierarchically and draw the structure of the hash tree (you can write program to draw this hash tree or just manually draw it according to the nested list you output).

Build a data structure of LinkedList

In the last level of leaf node, it may exist items over 3 but the max leaf size == 3. Therefore, we could use Linkedlist to store items whose index >= 2. We first build a LinkedList structure.

            class Node(object):     def __init__(self, data):         self.data = data         self.next = None  class LinkedList(object):     def __init__(self):         self.head = None         self.tail = None       # is empty => True     def is_empty(self):         return self.head is None      # Append node in the end of Linkedlist     def append(self, data):         node = Node(data)         if self.head is None:             self.head = node                 self.tail = node         else:             self.tail.next = node                self.tail = node              # Traversal all elements in Linkedlist     def iter(self):         if not self.head:             return          cur = self.head         yield cur.data         while cur.next:             cur = cur.next             yield cur.data                      

Define the Hash Tree function

            import sys  def divide_subnode(node, depth, max_level, max_leaf_size):     """     To divide the node, recursion function          :node: set of node     :depth: depth of the item     :max_level: the maximum level of tree     :max_leaf_size: the maximum leaf size     """      # If the size of node <= max_leaf_size, we don't need to split the node     if len(node) <= max_leaf_size:         return node     # When depth == max_level, it need to be considered separately     if depth == max_level:         if len(node) <= max_leaf_size:             return node         else:             # Remind the max leaf size = 3 so when depth == max_level with len(node) > 3, we need to use a kind of structure to store this node             # A linked list is used to store items whose index >= 2             # structure = [item, item, LinkedList]             linked_list = LinkedList()             for item in node[2:]:                   linked_list.append(item)             structure = [node[0], node[1], linked_list]             # Show the linked list             for node in linked_list.iter():                 print(node)             print()             return structure          # Implement the hash funtion to divide node     subnode = [[],[],[]]     for item in node:         if item[depth] in [1, 3, 7]:             subnode[0].append(item)         elif item[depth] in [2, 4, 8]:             subnode[1].append(item)         elif item[depth] in [5, 6, 9]:             subnode[2].append(item)         else:             # If 0 exits, the program would be terminated because we don't define Hash funtion on 0             print("Cannot apply Hash function on 0")             sys.exit(1)     return [divide_subnode(subnode[0], depth+1, max_level, max_leaf_size), \             divide_subnode(subnode[1], depth+1, max_level, max_leaf_size), \             divide_subnode(subnode[2], depth+1, max_level, max_leaf_size)]   def create_hash_tree(input, begin_depth=0, max_level=3, max_leaf_size=3):     """     Create Hash Tree          :input: (list) Input a nested list as the input     :begin_depth: the beginning depth, default = 0     :max_level: the maximum level of tree, default = 3     :max_leaf_size: the maximum leaf size, default = 3          :return: (list) hash_tree     """          hash_tree = divide_subnode(input, begin_depth, max_level, max_leaf_size)     return hash_tree          
            # The example is 3-item so we build a hash tree with max-level 3, starting from 0 depth # Example input input = [[1,2,3],[1,3,9],[1,4,5],[1,4,6],[1,5,7],[1,5,9],[1,6,8],[1,6,9],[1,8,9],          [2,3,9],[2,5,6],[2,5,7],[2,5,9],[2,6,7],[2,6,8],[2,6,9],[2,7,8],[2,7,9],[2,8,9],          [3,4,6],[3,4,8],[3,7,8],          [4,5,6],[4,5,8],[4,5,9],[4,7,8],[4,7,9],[4,8,9],          [5,6,7],[5,6,8],[5,7,8],[5,7,9],[5,8,9],          [6,7,9],[6,8,9],          [7,8,9]] create_hash_tree(input, begin_depth=0, max_level=3)          
            [1, 8, 9] [3, 4, 6] [7, 8, 9]  [2, 6, 9] [4, 5, 6] [4, 5, 9]                      
            [[[[1, 3, 9], [3, 7, 8]],   [[[1, 2, 3]],    [[3, 4, 8]],    [[1, 4, 5], [1, 4, 6], <__main__.LinkedList at 0x2b9156641d0>]],   [[[1, 5, 7]], [[1, 6, 8]], [[1, 5, 9], [1, 6, 9]]]],  [[[], [[2, 7, 8], [4, 7, 8]], [[2, 3, 9], [2, 7, 9], [4, 7, 9]]],   [[2, 8, 9], [4, 8, 9]],   [[[2, 5, 7], [2, 6, 7]],    [[2, 6, 8], [4, 5, 8]],    [[2, 5, 6], [2, 5, 9], <__main__.LinkedList at 0x2b915664518>]]],  [[[5, 7, 8], [5, 7, 9], [6, 7, 9]],   [[5, 8, 9], [6, 8, 9]],   [[5, 6, 7], [5, 6, 8]]]]                      

Therefore, we can draw the hash tree below

hash tree

(b) Given a transaction that contains items {1, 3, 5, 6, 7, 9}, how many comparisons are needed using the hash tree which you generate above? Please circle these candidates in the hash tree. No programming required.

Match transaction 10 candidates in the hash tree which are circled. It needs to compare totally 34 times using my generated hash tree.

            1 + 35679    13 + 5679   2+2+2+1=7 15 + 679    2+1+1=4 16 + 79     1+2=3 17 + 9      2  3 + 5679     35 + 679    2+1+2=5 36 + 79     1+2=3 37 + 9      2  5 + 679      56 + 79     1+2=3 57 + 9      2  679         3          

tree compare

Q2. FP-Tree

Suppose you are given some transactions and a vocabulary that map terms to indexes. Please use
FP-Tree algorithm to discover the frequent itemsets.

Data Description:

  • topi-i.txt

Input file of frequent pattern mining algorithms. Each line represents a transaction with indices of terms.

format: term1_index term2_index term3_index ...

Columns are separated by a space.

  • vocab.txt

Dictionary that maps term index to term.

format: term_index term

Columns are separated by a space.

  • pattern-i.txt:

The file you need to submit, which contains your result for this frequent pattern mining task. Each line represents a transaction with frequent itemsets sorted in descending order of support count.

format: support_count term1 term2 ...

support_count and term1 are separated by a tab, while terms are separated by a space.

Here we give an example:

            233 rule association 230 random 227 finding 203 mining pattern          

Questions:

(a) Please write a program to implement FP-growth algorithm and find all frequent itemsets with support >= 400 in the given dataset.

(b) Based on the result of (a), please print out those FP-conditional trees whose height is larger than 1.

Q2-output

Define preprocessed funtion

  • read_file() function that converts txt into nested list
  • create_init_set() function that generates transaction dictionary which is as initial data to input create_fp_tree() function
            def read_file(file):     """     Read files from txt      """     items_bought = [] # origin data     with open(file,"r") as f:         for line in f.readlines():             items_bought.append(line.strip().split())     return items_bought  def create_init_set(items_bought):     trans_dict={}     for trans in items_bought:         key = frozenset(trans)         if key not in trans_dict:             trans_dict[key] = 1         else:             trans_dict[key] += 1     return trans_dict          

Define the class of TreeNode and make a vocab dict

            # Define the vocab_dict which is used to transform index to vocabulary vocab_dict = {} with open("vocab.txt", 'r') as f:     for line in f.readlines():         term = line.strip().split("\t")         vocab_dict[term[0]] = term[1]          
            # Define the structure of TreeNode class TreeNode:     def __init__(self, name_value, num_occur, parent_node):         self.name = name_value         self.node_link = None         self.count = num_occur         self.parent = parent_node         self.children = {}          # Add the count of node     def inc(self, num_occur):         self.count += num_occur          # Print tree's structrue in the terminal     def display(self, index=1):         print('  '*index, self.name, ' ', self.count)         for child in self.children.values():             child.display(index+1)          # Calculate the height of tree     def get_height(self, index=1):         mid = 0         for child in self.children.values():             if child.get_height(index+1) >= mid:                 mid = child.get_height(index+1)         return max(index, mid)          # Transform tree to a nested list     # According to the output EXAMPLE => [parent, children]     def transform_to_list(self):         if self.name in vocab_dict:             node_info = vocab_dict[self.name] + "    " + str(self.count)         else:             node_info = self.name + "    " + str(self.count)         if len(self.children) == 0:             return node_info                  local_list = []         for child in self.children.values():             local_list.append(child.transform_to_list())          return [node_info, local_list]          

Define FP-growth function

            def update_header(node_to_test, target_node):     """     Update header table          :node_to_test: the entry of the node     :target_node: Target node to link     """     # To find the node with node_link == None     while node_to_test.node_link != None:         node_to_test = node_to_test.node_link     # Link the target_node on node     node_to_test.node_link = target_node              def update_fp_tree(items, fp_tree, header_table, count):     """     Update FP Tree          :items: items in frequent itemsets     :fp_tree: TreeNode     :header_table: header table     :count: added count     """     if items[0] in fp_tree.children:         # If items[0] is as child node, fp_tree.count += count         fp_tree.children[items[0]].inc(count)     else:         # If items[0] is not a child node, create a new branch         fp_tree.children[items[0]] = TreeNode(items[0], count, fp_tree)         # Update linked list of the frequent itemsets         if header_table[items[0]][1] == None:             header_table[items[0]][1] = fp_tree.children[items[0]]         else:             update_header(header_table[items[0]][1], fp_tree.children[items[0]])     # Recursion     if len(items) > 1:         update_fp_tree(items[1::], fp_tree.children[items[0]], header_table, count)              def create_fp_tree(trans_dict, threshold=400):     """     Create FP Tree          :trans_dict: transaction dictionary     :threshold: bound that decides whether to remove the item in transactions. Default value = 400          :return: fp_tree, header_table, filtered_trans_dict     """     header_table = {}          # First Scan: Construct the header table and sort it     for items in trans_dict:         for item in items:             header_table[item] = header_table.get(item, 0) + trans_dict[items]     # Delete the elements with frequency < threshold     for k in set(header_table.keys()):         if header_table[k] < threshold:             del(header_table[k])      # Sort the header table by descending frequency     header_table = dict(sorted(header_table.items(), key=lambda p:p[1], reverse=True))     # Frequent itemsets contain items whose frequency >= threshold     freq_itemsets = set(header_table.keys())          if len(freq_itemsets) == 0:         return None, None, None          for k in header_table:         header_table[k] = [header_table[k], None]      # Initialize the FP Tree with root node     fp_tree = TreeNode('Null Set', 1, None)          # Second Scan: remove the infrequent item in transaction and sort it     filtered_trans_dict = {}     for trans, count in trans_dict.items():         local_dict = {}         for item in trans:             if item in freq_itemsets:                 local_dict[item] = header_table[item][0]         if len(local_dict) > 0:             # Sorted transaction item by descending frequency             ordered_items = [x[0] for x in sorted(local_dict.items(), key=lambda p:p[1], reverse=True)]                          # Build the filtered transaction dictionary             key = frozenset(ordered_items)             if key not in filtered_trans_dict:                 filtered_trans_dict[key] = count             else:                 filtered_trans_dict[key] += count              # Update FP Tree using ordered items             update_fp_tree(ordered_items, fp_tree, header_table, count)          # Sort the filtered transaction dictionary as an output     filtered_trans_dict = dict(sorted(filtered_trans_dict.items(), key=lambda p:p[1], reverse=True))          return fp_tree, header_table, filtered_trans_dict          

Generate FP-Tree from txt file

            # Test for topic-0.txt items_bought = read_file("topic-0.txt") trans_dict = create_init_set(items_bought) fp_tree, header_table, filtered_trans_dict = create_fp_tree(trans_dict, threshold=400)  # fp_tree.display() print(filtered_trans_dict) header_table          
            {frozenset({'248'}): 668, frozenset({'390'}): 624, frozenset({'458'}): 500, frozenset({'298'}): 328, frozenset({'382'}): 324, frozenset({'118'}): 303, frozenset({'390', '382'}): 296, frozenset({'473'}): 217, frozenset({'723'}): 208, frozenset({'225'}): 170, frozenset({'382', '723'}): 123, frozenset({'382', '225'}): 108, frozenset({'473', '248'}): 67, frozenset({'458', '248'}): 52, frozenset({'298', '248'}): 47, frozenset({'390', '248'}): 45, frozenset({'390', '298'}): 38, frozenset({'473', '390'}): 37, frozenset({'118', '248'}): 37, frozenset({'382', '458'}): 33, frozenset({'382', '248'}): 28, frozenset({'118', '390'}): 26, frozenset({'225', '248'}): 23, frozenset({'390', '458'}): 23, frozenset({'390', '225'}): 23, frozenset({'390', '723'}): 23, frozenset({'390', '382', '723'}): 23, frozenset({'298', '458'}): 21, frozenset({'390', '382', '248'}): 20, frozenset({'390', '382', '298'}): 20, frozenset({'118', '723'}): 20, frozenset({'723', '298'}): 18, frozenset({'118', '458'}): 18, frozenset({'473', '382'}): 18, frozenset({'390', '382', '225'}): 15, frozenset({'382', '225', '248'}): 14, frozenset({'473', '118'}): 13, frozenset({'118', '298'}): 13, frozenset({'473', '382', '723'}): 12, frozenset({'118', '390', '382'}): 12, frozenset({'473', '723'}): 11, frozenset({'473', '225'}): 11, frozenset({'382', '723', '458'}): 11, frozenset({'382', '298'}): 11, frozenset({'473', '298'}): 9, frozenset({'458', '298', '248'}): 9, frozenset({'473', '458'}): 9, frozenset({'473', '390', '382'}): 9, frozenset({'225', '298'}): 9, frozenset({'118', '382'}): 9, frozenset({'723', '458'}): 9, frozenset({'473', '382', '225'}): 8, frozenset({'473', '118', '248'}): 8, frozenset({'118', '382', '723'}): 7, frozenset({'723', '248'}): 7, frozenset({'382', '723', '248'}): 7, frozenset({'458', '473', '248'}): 7, frozenset({'473', '390', '248'}): 6, frozenset({'382', '723', '298'}): 6, frozenset({'473', '382', '248'}): 5, frozenset({'118', '225'}): 5, frozenset({'118', '390', '723'}): 4, frozenset({'473', '390', '298'}): 4, frozenset({'390', '382', '458'}): 4, frozenset({'225', '723'}): 4, frozenset({'390', '382', '225', '248'}): 4, frozenset({'473', '118', '390'}): 3, frozenset({'118', '382', '458'}): 3, frozenset({'390', '382', '723', '298'}): 3, frozenset({'118', '382', '225'}): 3, frozenset({'473', '298', '458'}): 2, frozenset({'118', '390', '248'}): 2, frozenset({'458', '473', '723', '248'}): 2, frozenset({'382', '225', '298'}): 2, frozenset({'473', '118', '382'}): 2, frozenset({'390', '225', '248'}): 2, frozenset({'390', '723', '458'}): 2, frozenset({'473', '118', '382', '723'}): 2, frozenset({'473', '382', '225', '248'}): 2, frozenset({'473', '390', '382', '723'}): 2, frozenset({'118', '298', '458'}): 2, frozenset({'473', '390', '382', '225'}): 1, frozenset({'473', '723', '298'}): 1, frozenset({'473', '390', '225', '723'}): 1, frozenset({'225', '458'}): 1, frozenset({'118', '382', '248'}): 1, frozenset({'118', '723', '458'}): 1, frozenset({'473', '118', '298'}): 1, frozenset({'118', '390', '298'}): 1, frozenset({'473', '382', '723', '298'}): 1, frozenset({'118', '723', '248'}): 1, frozenset({'390', '723', '298'}): 1, frozenset({'298', '382', '248'}): 1, frozenset({'473', '225', '248'}): 1, frozenset({'225', '382', '723'}): 1, frozenset({'298', '382', '458'}): 1, frozenset({'473', '382', '723', '248'}): 1, frozenset({'225', '382', '723', '298'}): 1, frozenset({'473', '382', '298'}): 1, frozenset({'298', '390', '248'}): 1, frozenset({'118', '225', '248'}): 1, frozenset({'118', '298', '723', '248'}): 1, frozenset({'458', '225', '248'}): 1, frozenset({'390', '723', '248'}): 1, frozenset({'118', '390', '458'}): 1, frozenset({'118', '390', '225'}): 1, frozenset({'382', '248', '723', '473', '390'}): 1, frozenset({'473', '382', '723', '458'}): 1, frozenset({'118', '390', '382', '723'}): 1, frozenset({'225', '723', '298'}): 1, frozenset({'458', '390', '382', '248'}): 1, frozenset({'225', '723', '248'}): 1, frozenset({'473', '390', '723'}): 1, frozenset({'298', '390', '473', '248'}): 1, frozenset({'473', '118', '390', '723'}): 1, frozenset({'458', '118', '248'}): 1, frozenset({'473', '118', '723', '248'}): 1, frozenset({'473', '390', '382', '248'}): 1, frozenset({'473', '118', '723'}): 1, frozenset({'473', '118', '225'}): 1, frozenset({'118', '298', '248'}): 1, frozenset({'390', '225', '458'}): 1, frozenset({'458', '382', '248'}): 1, frozenset({'458', '390', '248'}): 1, frozenset({'298', '723', '248'}): 1, frozenset({'473', '382', '458'}): 1, frozenset({'458', '473', '382', '248'}): 1, frozenset({'473', '723', '248'}): 1, frozenset({'118', '723', '298'}): 1, frozenset({'473', '298', '248'}): 1, frozenset({'298', '248', '723', '473', '390'}): 1}                      
            {'390': [1288, <__main__.TreeNode at 0x2b915671518>],  '382': [1163, <__main__.TreeNode at 0x2b915671320>],  '248': [1087, <__main__.TreeNode at 0x2b915671278>],  '458': [720, <__main__.TreeNode at 0x2b915671630>],  '298': [560, <__main__.TreeNode at 0x2b915671588>],  '723': [528, <__main__.TreeNode at 0x2b9156715f8>],  '118': [509, <__main__.TreeNode at 0x2b915671208>],  '473': [488, <__main__.TreeNode at 0x2b915671550>],  '225': [416, <__main__.TreeNode at 0x2b915671400>]}                      

Begin to Mine FP-tree

To mine the FP-tree, we need to start from certain node and go through all nodes in the direction of root and build conditional pattern base. Then using the pattern base to generate FP-conditional tree. Finally, through recursively mining FP-conditional tree, we can gain the frequent itemsets pattern.

            # Ascend FP Tree def ascend_fp_tree(leaf_node, prefix_path):     if leaf_node.parent != None:         prefix_path.append(leaf_node.name)         ascend_fp_tree(leaf_node.parent, prefix_path)          # Find FP-conditional base def find_cond_pat_base(base_pat, header_table):     tree_node = header_table[base_pat][1]     cond_pats = {}     while tree_node != None:         prefix_path = []         ascend_fp_tree(tree_node, prefix_path)         if len(prefix_path) > 1:             cond_pats[frozenset(prefix_path[1:])] = tree_node.count         tree_node = tree_node.node_link     return cond_pats          
            def mine_fp_tree(in_tree, header_table, threshold, pre_fix, freq_item_list, min_height):     """     Mine the frequent pattern itemsets from FP-tree      :in_tree: not used     :threshold: The minimum support     :pre_fix: the frequent items set in the last recursion     :freq_item_list: final frequent items pattern to output     :min_height: The min height of conditional FP-tree to print     """      items_list = [(v[0], v[1][0]) for v in sorted(header_table.items(), key=lambda p:p[1][0])] # Notice a bug p[1][0] is True, p[0] is false          for item, count in items_list:                  new_freq_set = pre_fix.copy()         new_freq_set.add(vocab_dict[item])         freq_item_list.append([count, new_freq_set])                  # Construct conditional pattern base         cond_pattern_base = find_cond_pat_base(item, header_table)         # print(cond_pattern_base)         # Build conditional FP trees         cond_tree, cond_header_table, _ = create_fp_tree(cond_pattern_base, threshold)         # If FP-conditional tree exists         if cond_tree:             # Find FP-conditional tree whose height is larger than 1             if cond_tree.get_height() > min_height:                 print("height = " + str(cond_tree.get_height()))                 print("list form of cond tree:")                 print(cond_tree.transform_to_list())                 print("display cond tree")                 cond_tree.display()         # If header_table exists         if cond_header_table:             mine_fp_tree(cond_tree, cond_header_table, threshold, new_freq_set, freq_item_list, min_height)          
            def mine_from_txt(topic_file, pattern_file, threshold=400, min_height=1):     """     Mine frequent itemsets pattern from topic file          :topic_file: (string) the path of topic file     :pattern_file: (string) the path to store the pattern file     :threshold: the minimum support     :min_height: the minimum height of FP-conditional tree to print     """     items_bought = read_file(topic_file) # Read topic.txt file to initialize dataset => nested list     trans_dict = create_init_set(items_bought) # list => transaction dict          fp_tree, header_table, _ = create_fp_tree(trans_dict, threshold) # Create FP-tree and header table      freq_items = [] # frequent items pattern list     mine_fp_tree(fp_tree, header_table, threshold, set([]), freq_items, min_height)          # Sort freq_items by descending supported count     freq_items = sorted(freq_items, key=lambda p:p[0], reverse=True)     print("frequent items pattern:")     print(freq_items)          # Write the pattern in txt file     with open(pattern_file, "w") as f:         for count, vocabs in freq_items:             f.write(str(count) + "\t")             i = 0             length = len(vocabs)             for vocab in vocabs:                 i += 1                 # Replace index with terms in vocab.txt                 f.write(vocab)                 if i < length:                     f.write(" ")             f.write("\n")          

Print the FP-conditional tree and write pattern in file

According to the question description, for eache topic, we need to output to a pattern.txt file and we also need to find out FP-conditional tree with height more than 1. Therefore we do this:

            # Process all topic files for i in range(5):     topic_file = "topic-" + str(i) + ".txt"     pattern_file = "pattern-" + str(i) + ".txt"     print("processing " + topic_file)     mine_from_txt(topic_file, pattern_file, threshold=400, min_height=1)     print("write the pattern in " + pattern_file)     print()          
            processing topic-0.txt height = 2 list form of cond tree: ['Null Set    1', ['data    413']] display cond tree    Null Set   1      390   413 frequent items pattern: [[1288, {'data'}], [1163, {'mining'}], [1087, {'algorithm'}], [720, {'graph'}], [560, {'time'}], [528, {'pattern'}], [509, {'tree'}], [488, {'efficient'}], [416, {'rule'}], [413, {'mining', 'data'}]] write the pattern in pattern-0.txt  processing topic-1.txt frequent items pattern: [[1488, {'learning'}], [1050, {'using'}], [819, {'model'}], [715, {'based'}], [582, {'classification'}], [488, {'feature'}], [474, {'clustering'}], [463, {'network'}]] write the pattern in pattern-1.txt  processing topic-2.txt height = 2 list form of cond tree: ['Null Set    1', ['information    475']] display cond tree    Null Set   1      190   475 frequent items pattern: [[1226, {'web'}], [1211, {'information'}], [1114, {'retrieval'}], [863, {'based'}], [757, {'system'}], [707, {'search'}], [564, {'document'}], [490, {'language'}], [475, {'information', 'retrieval'}], [421, {'model'}], [414, {'semantic'}]] write the pattern in pattern-2.txt  processing topic-3.txt frequent items pattern: [[1074, {'database'}], [928, {'system'}], [743, {'knowledge'}], [558, {'learning'}], [514, {'data'}], [506, {'logic'}], [493, {'reasoning'}], [446, {'model'}], [426, {'constraint'}]] write the pattern in pattern-3.txt  processing topic-4.txt frequent items pattern: [[1713, {'query'}], [1163, {'database'}], [1040, {'data'}], [767, {'system'}], [637, {'processing'}], [567, {'distributed'}], [528, {'object'}], [508, {'efficient'}], [458, {'xml'}]] write the pattern in pattern-4.txt                      



velezwhinunatined.blogspot.com

Source: http://www.yelbee.top/index.php/archives/166/

0 Response to "python draw tree from dictionary"

Publicar un comentario

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel