$Id$ 1. Introduction NB: Wmem still does not provide all of the functionality of emem (see README.malloc), although it should provide most of it. New code may still need to use emem for the time being. The 'emem' memory manager (described in README.malloc) has been a part of Wireshark since 2005 and has served us well, but is starting to show its age. The framework has become increasingly difficult to maintain, and limitations in the API have blocked progress on other long-term goals such as multi- threading, and opening multiple files at once. The 'wmem' memory manager is an attempt to write a new memory management framework to replace emem. It provides a significantly updated API, a more modular design, and it isn't all jammed into one 2500-line file. Wmem was originally conceived in this email to the wireshark-dev mailing list: https://www.wireshark.org/lists/wireshark-dev/201210/msg00178.html The wmem code can now be found in epan/wmem/ in the Wireshark source tree. 2. Usage for Consumers If you're writing a dissector, or other "userspace" code, then using wmem should be very similar to using emem. All you need to do is include the header (epan/wmem/wmem.h) and get a handle to a memory pool (if you want to *create* a memory pool, see the section "3. Usage for Producers" below). A memory pool is an opaque pointer to an object of type wmem_allocator_t, and it is the very first parameter passed to almost every call you make to wmem. Other than that parameter (and the fact that functions are prefixed wmem_ instead of ep_ or se_) usage is exactly like that of emem. For example: wmem_alloc(myPool, 20); allocates 20 bytes in the pool pointed to by myPool. 2.1 Available Pools 2.1.1 (Sort Of) Global Pools Dissectors that include the wmem header file will have three pools available to them automatically: wmem_packet_scope(), wmem_file_scope() and wmem_epan_scope(); The packet pool is scoped to the dissection of each packet, replacing emem's ep_ allocators. The file pool is scoped to the dissection of each file, replacing emem's se_ allocators. For example: ep_malloc(32); se_malloc(sizeof(guint)); could be replaced with wmem_alloc(wmem_packet_scope(), 32); wmem_alloc(wmem_file_scope(), sizeof(guint)); NB: Using these pools outside of the appropriate scope (eg using the packet pool when there isn't a packet being dissected) will throw an assertion. See the comment in epan/wmem/wmem_scopes.c for details. The epan pool is scoped to the library's lifetime - memory allocated in it is not freed until epan_cleanup() is called, which is typically at the end of the program. 2.1.2 Pinfo Pool Certain places (such as AT_STRINGZ address allocations) need their memory to stay around a little longer than the usual packet scope - basically until the next packet is dissected. This is effectively the scope of Wireshark's pinfo structure, so the pinfo struct has a 'pool' member which is a wmem pool scoped to the lifetime of the pinfo struct. 2.2 API Full documentation for each function (parameters, return values, behaviours) lives (or will live) in Doxygen-format in the header files for those functions. This is just an overview of which header files you should be looking at. 2.2.1 Core API wmem_core.h - Basic memory management functions like malloc, realloc and free. 2.2.2 Strings wmem_strutl.h - Utility functions for manipulating null-terminated C-style strings. Functions like strdup and strdup_printf. wmem_strbuf.h - A managed string object implementation, similar to std::string in C++ or GString from Glib. 2.2.3 Container Data Structures wmem_slist.h - A singly-linked list implementation. wmem_stack.h - A stack implementation (push, pop, etc). 2.3 Callbacks WARNING: You probably don't actually need these; use them only when you're sure you understand the dangers. Sometimes (though hopefully rarely) it may be necessary to store data in a wmem pool that requires additional cleanup before it is freed. For example, perhaps you have a pointer to a file-handle that needs to be closed. In this case, you can register a callback with the wmem_register_cleanup_callback function declared in wmem_user_cb.h. This function takes as parameters: - the allocator - boolean indicating whether or not the callback should be recurring (ie happens once if FALSE, or every time if TRUE) - the callback function (signature defined in wmem_user_cb.h) - a void user_data pointer Every time the memory in a pool is freed, all registered cleanup functions are called first, being passed: - a pointer to the allocator - a boolean indicating if this is just a free_all (FALSE) or if this was triggered by destruction of the entire pool (TRUE) - mostly useful for recurring callbacks - whatever user_data was registered with that callback. Note that the user_data pointer is not freed when a callback is finished, you have to do that yourself in the callback, or just allocate it in the appropriate wmem pool. Also note that callback calling order is not defined, you cannot rely on a certain callback being called before or after another. WARNING: Manually freeing memory with wmem_free will NOT trigger any callbacks. It is an error to call wmem_free on memory if you have a callback registered to deal with the contents of a that memory. 3. Usage for Producers NB: If you're just writing a dissector, you probably don't need to read this section. One of the problems with the old emem framework was that there were basically two allocator backends (glib and mmap) that were all mixed together in a mess of if statements, environment variables and #ifdefs. In wmem the different allocator backends are cleanly separated out, and it's up to the owner of the pool to pick one. 3.1 Available Allocator Back-Ends Each available allocator type has a corresponding entry in the wmem_allocator_type_t enumeration defined in wmem_core.h. The currently available allocators are: - WMEM_ALLOCATOR_SIMPLE (wmem_allocator_simple.*) A trivial allocator that g_allocs requested memory and tracks allocations via a GHashTable. As simple as possible, intended more as a demo than for practical usage. Also has the benefit of being friendly to tools like valgrind. - WMEM_ALLOCATOR_BLOCK (wmem_allocator_block.*) A block allocator that grabs large chunks of memory at a time (8 MB currently) and serves allocations out of those chunks. Designed for efficiency, especially in the free_all operation. - WMEM_ALLOCATOR_STRICT (wmem_allocator_strict.*) An allocator that does its best to find invalid memory usage via things like canaries and scrubbing freed memory. Valgrind is the better choice on platforms that support it. 3.2 Creating a Pool To create a pool, include the regular wmem header and call the wmem_allocator_new() function with the appropriate type value. For example: #include "wmem/wmem.h" wmem_allocator_t *myPool; myPool = wmem_allocator_new(WMEM_ALLOCATOR_SIMPLE); From here on in, you don't need to remember which type of allocator you used (although allocator authors are welcome to expose additional allocator-specific helper functions in their headers). The "myPool" variable can be passed around and used as normal in allocation requests as described in section 2 of this document. 3.3 Destroying a Pool Regardless of which allocator you used to create a pool, it can be destroyed with a call to the function wmem_destroy_allocator(). For example: #include "wmem/wmem.h" wmem_allocator_t *myPool; myPool = wmem_allocator_new(WMEM_ALLOCATOR_SIMPLE); /* Allocate some memory in myPool ... */ wmem_destroy_allocator(myPool); Destroying a pool will free all the memory allocated in it. 3.4 Reusing a Pool It is possible to free all the memory in a pool without destroying it, allowing it to be reused later. Depending on the type of allocator, doing this (by calling wmem_free_all()) can be significantly cheaper than fully destroying and recreating the pool. This method is therefore recommended, especially when the pool would otherwise be scoped to a single iteration of a loop. For example: #include "wmem/wmem.h" wmem_allocator_t *myPool; myPool = wmem_allocator_new(WMEM_ALLOCATOR_SIMPLE); for (...) { /* Allocate some memory in myPool ... */ /* Free the memory, faster than destroying and recreating the pool each time through the loop. */ wmem_free_all(myPool); } wmem_destroy_allocator(myPool); 4. Internal Design Despite being written in Wireshark's standard C90, wmem follows a fairly object-oriented design pattern. Although efficiency is always a concern, the primary goals in writing wmem were maintainability and preventing memory leaks. 4.1 struct _wmem_allocator_t The heart of wmem is the _wmem_allocator_t structure defined in the wmem_allocator.h header file. This structure uses C function pointers to implement a common object-oriented design pattern known as an interface (also known as an abstract class to those who are more familiar with C++). Different allocator implementations can provide exactly the same interface by assigning their own functions to the members of an instance of the structure. The structure has eight members in three groups. 4.1.1 Implementation Details - private_data - type The private_data pointer is a void pointer that the allocator implementation can use to store whatever internal structures it needs. A pointer to private_data is passed to almost all of the other functions that the allocator implementation must define. The type field is an enumeration of type wmem_allocator_type_t (see section 3.1). Its value is set by the wmem_allocator_new() function, not by the implementation-specific constructor. This field should be considered read-only by the allocator implementation. 4.1.2 Consumer Functions - alloc() - free() - realloc() These function pointers should be set to functions with semantics obviously similar to their standard-library namesakes. Each one takes an extra parameter that is a copy of the allocator's private_data pointer. Note that realloc() and free() are not expected to be called directly by user code in most cases - they are primarily optimisations for use by data structures that wmem might want to implement (it's hard, for example, to implement a dynamically sized array without some form of realloc). Also note that allocators do not have to handle NULL pointers or 0-length requests in any way - those checks are done in an allocator-agnostic way higher up in wmem. Allocator authors can assume that all incoming pointers (to realloc and free) are non-NULL, and that all incoming lengths (to malloc and realloc) are non-0. 4.1.3 Producer/Manager Functions - free_all() - gc() - destroy() The free_all() function takes the private_data pointer and should free all the memory currently allocated in the pool. Note that this is not necessarilly exactly the same as calling free() on all the allocated blocks - free_all() is allowed to do additional cleanup or to make use of optimizations not available when freeing one block at a time. The gc() function takes the private_data pointer and should do whatever it can to reduce excess memory usage in the dissector by returning unused blocks to the OS, optimizing internal data structures, etc. The destroy() function does NOT take the private_data pointer - it instead takes a pointer to the allocator structure as a whole, since that structure may also need freeing. This function can assume that free_all() has been called immediately before it (though it can make no assumptions about whether or not gc() has ever been called). 4.2 Pool-Agnostic API One of the issues with emem was that the API (including the public data structures) required wrapper functions for each scope implemented. Even if there was a stack implementation in emem, it wasn't necessarily available for use with file-scope memory unless someone took the time to write se_stack_ wrapper functions for the interface. In wmem, all public APIs take the pool as the first argument, so that they can be written once and used with any available memory pool. Data structures like wmem's stack implementation only take the pool when created - the provided pointer is stored internally with the data structure, and subsequent calls (like push and pop) will take the stack itself instead of the pool. 4.3 Debugging The primary debugging control for wmem is the WIRESHARK_DEBUG_WMEM_OVERRIDE environment variable. If set, this value forces all calls to wmem_allocator_new() to return the same type of allocator, regardless of which type is requested normally by the code. It currently has three valid values: - The value "simple" forces the use of WMEM_ALLOCATOR_SIMPLE. The valgrind script currently sets this value, since the simple allocator is the only one whose memory allocations are trackable properly by valgrind. - The value "strict" forces the use of WMEM_ALLOCATOR_STRICT. The fuzz-test script currently sets this value, since the goal when fuzz-testing is to find as many errors as possible. - The value "block" forces the use of WMEM_ALLOCATOR_BLOCK. This is not currently used by any scripts, but is useful for stress-testing the block allocator. Note that regardless of the value of this variable, it will always be safe to call allocator-specific helpers functions. They are required to be safe no-ops if the allocator argument is of the wrong type. 4.4 Testing There is a simple test suite for wmem that lives in the file wmem_test.c and should get automatically built into the binary 'wmem_test' when building Wireshark. It contains at least basic tests for all existing functionality. The suite is run automatically by the build-bots via the shell script test/test.sh which calls out to test/suite-unittests.sh. New features added to wmem (allocators, data structures, utility functions, etc.) must also have tests added to this suite. The test suite could potentially use a clean-up by someone more intimately familiar with Glib's testing framework, but it does the job. 5. TODO List The following is an incomplete list of things that emem provides but wmem has not yet implemented: - red-black tree - tvb_memdup The following is a list of things that emem doesn't provide but that it might be nice if wmem did provide them: - radix tree - dynamic array - hash table /* * Editor modelines - http://www.wireshark.org/tools/modelines.html * * Local variables: * c-basic-offset: 4 * tab-width: 8 * indent-tabs-mode: nil * End: * * vi: set shiftwidth=4 tabstop=8 expandtab: * :indentSize=4:tabSize=8:noTabs=true: */