performance - Fast(est) Mutable Heap Implementation in C++ -

i'm looking fastest data structure in c++ fulfils requirements:

i start few million entries need inserted.
in each iteration want peek @ maximum element , update around 10 other elements. decreasing keys, prefer update (increase , decrease functionality).

i don't need deletes/insertions (apart initial one) nor else. thought heap preferable choice. after looking stl found data structures not support update (which crucial part). solution delete , reinsert elements seems quite slow (bottleneck of program). looked @ heaps provided boost , found pairing_heap gives me best results. heaps are, however, still slower delete/insert process on multimap. have suggestion, other approach/implementation try?

thank much.

again completeness sake, here current timings:

multimap stl (delete/insert): ~70 sec
fibonacci boost: ~110 sec
d-ary heap boost ~(best choice: d=150): ~120 sec
pairing heap boost: ~90 sec
skew heap boost: 105 sec

edited post clarify few things:

my entries doubles (double key, still have arbitrary value attached it) why don't think hashing idea.
i talking priorityqueue incorrect. instead first implementation used multimap values erased , reinserted (with new value). updated post. sorry confusion.
i don't see how std::make_heap can solve problem.
to update elements have separate lookup table in maintain handle element. can directly update element without searching it.

reducing generality

i tried hand @ since interesting , think use such structure myself. it's extremely difficult beat professionally-written standard libraries unless can make narrowing assumptions restrict generality (and possibly robustness) of solution more theirs.

for example, extremely hard beat malloc , free if goal write memory allocator general-purpose , well-rounded malloc. however, if make lot of assumptions specific use case, it's easy write allocator beats these specific case.

it's why, if have questions data structures, suggest providing information peculiar details of specific use case possible (what need out of structure iteration, sorting, searching, removal middle, insertion front, range of keys, types, etc). test code (even pseudocode) of want helpful. want many of narrowing assumptions possible.

sub-par search algorithms highly dynamic content

in case, have peculiarly dynamic use of such large heap-type structure. come old school gaming background , in kinds of dynamic cases, sub-par search algorithm works better 1 has superior algorithmic complexity since search superiority comes price tag of more expensive build/update process (slower insertions/removals).

for example, if have loads of sprites moving around every frame in 2d game, evenly crudely-written, algorithmically-inferior fixed-grid accelerator nearest-neighbor searches can work better in practice algorithmically-superior, professionally-written quad-tree, cost of moving things around , rebalancing tree can add overhead outweighs superior acceleration , theoretical logarithmic complexity of everything. fixed grid has pathological cases when sprites bunch in 1 area, happens.

so took kind of strategy. i'm making assumptions, biggest of keys reasonably distributed across range. other can approximate maximum number of elements container should handle (though can go on or under number, works best crude knowledge , anticipation). , didn't bother provide iterators iterate through container (it's possible if want, keys won't sorted std::multimap, they'll 'kinda' sorted), offer removals middle in addition popping element minimum key. pathological case exists in solution doesn't exist in, say, std::multimap, if have loads of elements keys same value (ex: 0.000001, 0.00000012, 0.000000011, etc) million elements, degrades linear search through elements , performs considerably worse multimap.

but got solution ~8 times faster std::multmap if assumptions fit use case.

note: hasty code , written lot of quick , dirty profiler-assisted micro-optimizations, providing pool allocator , manipulating things @ bits , bytes level alignment assumptions (using max alignment assuming that's "portable enough"). doesn't bother things exception safety. should safe use c++ objects, however.

as test case, created million random keys , started popping minimum keys, changing them, , reinserting them. did both multimaps , structure compare performance.

balanced-distribution heap/priority queue (kinda)

#include <iostream> #include <cassert> #include <utility> #include <stdexcept> #include <algorithm> #include <cmath> #include <ctime> #include <map> #include <vector> #include <malloc.h>  // max alignment #if defined(_msc_ver)     #define max_align __declspec(align(16)) #else     #define max_align __attribute__((aligned(16))) #endif  using namespace std;  static void* max_malloc(size_t amount) {     #ifdef _msc_ver         return _aligned_malloc(amount, 16);     #else         void* mem = 0;         posix_memalign(&mem, 16, amount);         return mem;     #endif }  static void max_free(void* mem) {     #ifdef _msc_ver         return _aligned_free(mem);     #else         free(mem);     #endif }  // balanced priority queue quick insertions ,  // removals when keys balanced across distributed range. template <class key, class value, class keytoindex> class balancedqueue { public:     enum {zone_len = 256};      /// creates queue 'n' buckets.     explicit balancedqueue(int n):          num_nodes(0), num_buckets(n+1), min_bucket(n+1), buckets(static_cast<bucket*>(max_malloc((n+1) * sizeof(bucket)))), free_nodes(0), pools(0)     {         const int num_zones = num_buckets / zone_len + 1;         zone_counts = new int[num_zones];         (int j=0; j < num_zones; ++j)             zone_counts[j] = 0;          (int j=0; j < num_buckets; ++j)         {             buckets[j].num = 0;             buckets[j].head = 0;         }     }      /// destroys queue.     ~balancedqueue()     {         clear();         max_free(buckets);         while (pools)         {             pool* to_free = pools;             pools = pools->next;             max_free(to_free);         }         delete[] zone_counts;     }      /// makes queue empty.     void clear()     {         const int num_zones = num_buckets / zone_len + 1;         (int j=0; j < num_zones; ++j)             zone_counts[j] = 0;         (int j=0; j < num_buckets; ++j)         {             while (buckets[j].head)             {                 node* to_free = buckets[j].head;                 buckets[j].head = buckets[j].head->next;                 node_free(to_free);             }             buckets[j].num = 0;         }         num_nodes = 0;         min_bucket = num_buckets+1;     }      /// pushes element queue.     void push(const key& key, const value& value)     {         const int index = keytoindex()(key);         assert(index >= 0 && index < num_buckets && "key out of range!");          node* new_node = node_alloc();         new (&new_node->key) key(key);         new (&new_node->value) value(value);         new_node->next = buckets[index].head;         buckets[index].head = new_node;         assert(new_node->key == key && new_node->value == value);         ++num_nodes;         ++buckets[index].num;         ++zone_counts[index/zone_len];         min_bucket = std::min(min_bucket, index);     }      /// @return size() == 0.     bool empty() const     {         return num_nodes == 0;     }      /// @return number of elements in queue.     int size() const     {         return num_nodes;     }      /// pops element minimum key queue.     std::pair<key, value> pop()     {         assert(!empty() && "queue empty!");         (int j=min_bucket; j < num_buckets; ++j)         {             if (buckets[j].head)             {                 node* node = buckets[j].head;                 node* prev_node = node;                 node* min_node = node;                 node* prev_min_node = 0;                 const key* min_key = &min_node->key;                 const value* min_val = &min_node->value;                 (node = node->next; node; prev_node = node, node = node->next)                 {                     if (node->key < *min_key)                     {                         prev_min_node = prev_node;                         min_node = node;                         min_key = &min_node->key;                         min_val = &min_node->value;                     }                 }                 std::pair<key, value> kv(*min_key, *min_val);                 if (min_node == buckets[j].head)                     buckets[j].head = buckets[j].head->next;                 else                 {                     assert(prev_min_node);                     prev_min_node->next = min_node->next;                 }                 removed_node(j);                 node_free(min_node);                 return kv;             }         }         throw std::runtime_error("trying pop empty queue.");     }      /// erases element middle of queue.     /// @return true if element found , removed.     bool erase(const key& key, const value& value)     {         assert(!empty() && "queue empty!");         const int index = keytoindex()(key);         if (buckets[index].head)         {             node* node = buckets[index].head;             if (node_key(node) == key && node_val(node) == value)             {                 buckets[index].head = buckets[index].head->next;                 removed_node(index);                 node_free(node);                 return true;             }              node* prev_node = node;             (node = node->next; node; prev_node = node, node = node->next)             {                 if (node_key(node) == key && node_val(node) == value)                 {                     prev_node->next = node->next;                     removed_node(index);                     node_free(node);                     return true;                 }             }         }         return false;     }  private:     // didn't bother make copyable -- left exercise.     balancedqueue(const balancedqueue&);     balancedqueue& operator=(const balancedqueue&);      struct node     {         key key;         value value;         node* next;     };     struct bucket     {         int num;         node* head;     };     struct pool     {         pool* next;         max_align char buf[1];     };     node* node_alloc()     {         if (free_nodes)         {             node* node = free_nodes;             free_nodes = free_nodes->next;             return node;         }          const int pool_size = std::max(4096, static_cast<int>(sizeof(node)));         pool* new_pool = static_cast<pool*>(max_malloc(sizeof(pool) + pool_size - 1));         new_pool->next = pools;         pools = new_pool;          // push new pool's nodes free stack.         (int j=0; j < pool_size; j += sizeof(node))         {             node* node = reinterpret_cast<node*>(new_pool->buf + j);             node->next = free_nodes;             free_nodes = node;         }         return node_alloc();     }     void node_free(node* node)     {         // destroy key , value , push node free stack.         node->key.~key();         node->value.~value();         node->next = free_nodes;         free_nodes = node;     }     void removed_node(int bucket_index)     {         --num_nodes;         --zone_counts[bucket_index/zone_len];         if (--buckets[bucket_index].num == 0 && bucket_index == min_bucket)         {             // if bucket became empty, search next occupied minimum zone.             const int num_zones = num_buckets / zone_len + 1;             (int j=bucket_index/zone_len; j < num_zones; ++j)             {                 if (zone_counts[j] > 0)                 {                     (min_bucket=j*zone_len; min_bucket < num_buckets && buckets[min_bucket].num == 0; ++min_bucket) {}                     assert(min_bucket/zone_len == j);                     return;                 }             }             min_bucket = num_buckets+1;             assert(empty());         }     }     int* zone_counts;     int num_nodes;     int num_buckets;     int min_bucket;     bucket* buckets;     node* free_nodes;     pool* pools; };  /// test parameters enum {num_keys = 1000000}; enum {buckets = 100000};  static double sys_time() {     return static_cast<double>(clock()) / clocks_per_sec; }  struct keytoindex {     int operator()(double val) const     {         return static_cast<int>(val * buckets);     } };  int main() {     vector<double> keys(num_keys);     (int j=0; j < num_keys; ++j)         keys[j] = static_cast<double>(rand()) / rand_max;      (int k=0; k < 5; ++k)     {         // multimap         {             const double start_time = sys_time();             multimap<double, int> q;             (int j=0; j < num_keys; ++j)                 q.insert(make_pair(keys[j], j));              // pop each key, modify it, , reinsert.             (int j=0; j < num_keys; ++j)             {                 pair<double, int> top = *q.begin();                 q.erase(q.begin());                 top.first = static_cast<double>(rand()) / rand_max;                 q.insert(top);             }             cout << (sys_time() - start_time) << " secs multimap" << endl;         }          // balanced queue         {             const double start_time = sys_time();             balancedqueue<double, int, keytoindex> q(buckets);             (int j=0; j < num_keys; ++j)                 q.push(keys[j], j);              // pop each key, modify it, , reinsert.             (int j=0; j < num_keys; ++j)             {                 pair<double, int> top = q.pop();                 top.first = static_cast<double>(rand()) / rand_max;                 q.push(top.first, top.second);             }             cout << (sys_time() - start_time) << " secs balancedqueue" << endl;         }         cout << endl;     } }

results on machine:

3.023 secs multimap 0.34 secs balancedqueue  2.807 secs multimap 0.351 secs balancedqueue  2.771 secs multimap 0.337 secs balancedqueue  2.752 secs multimap 0.338 secs balancedqueue  2.742 secs multimap 0.334 secs balancedqueue

Search This Blog

Ruby Code