summaryrefslogtreecommitdiff
path: root/firmware/malloc/THOUGHTS
blob: 27517361da2ba59bc253c6685ec5f6fb59d64bed (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
		    ====================================
		    Memory Allocation Algorithm Theories
		    ====================================

GOAL
  It is intended to be a 100% working memory allocation system. It should be
  capable of replacing an ordinary Operating System's own routines. It should
  work good in a multitasking, shared memory, non-virtual memory environment
  without clogging the memory. Primary aimed for small machines, CPUs and
  memory amounts.

  I use a best-fit algorithm with a slight overhead in order to increase speed
  a lot. It should remain scalable and work good with very large amount of
  memory and free/used memory blocks too.

TERMINOLOGY

  FRAGMENT - small identically sized parts of a larger BLOCK, they are _not_
             allocated when traversed in lists etc 
  BLOCK    - large memory area, if used for FRAGMENTS, they are linked in a
	     lists. One list for each FRAGMENT size supported.
  TOP      - head struct that holds information about and points to a chain
	     of BLOCKS for a particular FRAGMENT size.
  CHUNK    - a contiguous area of free memory

MEMORY SYSTEM

  We split the system in two parts. One part allocates small memory amounts
  and one part allocates large memory amounts, but all allocations are done
  "through" the small-part-system. There is an option to use only the small
  system (and thus use the OS for large blocks) or the complete package.

##############################################################################
 SMALL SIZE ALLOCATIONS
##############################################################################

  Keywords for this system is 'Deferred Coalescing' and 'quick lists'.

  ALLOC

  * Small allocations are "aligned" upwards to a set of preset sizes. In the
    current implementation I use 20, 28, 52, 116, 312, 580, 1016, 2032 bytes.
    Memory allocations of these sizes are referred to as FRAGMENTS.
    (The reason for these specific sizes is the requirement that they must be
    32-bit aligned and fit as good as possible within 4064 bytes.)

  * Allocations larger than 2032 will get a BLOCK for that allocation only.

  * Each of these sizes has it's own TOP. When a FRAGMENT is requested, a
    larger BLOCK will be allocated and divided into many FRAGMENTS (all of the
    same size). TOP points to a list with BLOCKS that contains FRAGMENTS of
    the same size. Each BLOCK has a 'number of free FRAGMENTS' counter and so
    has each TOP (for the entire chain).

  * A BLOCK is around 4064 bytes plus the size of the information header. This
    size is adjusted to make the allocation of the big block not require more
    than 4096 bytes. (This might not be so easy to be sure of, if you don't
    know how the big-block system works, but the BMALLOC system uses an
    extra header of 12 bytes and the header for the FRAGMENT BLOCK is 20 bytes
    in a general 32-bit environment.)

  * In case the allocation of a BLOCK fails when a FRAGMENT is required, the
    next size of FRAGMENTS will be checked for a free FRAGMENT. First when the
    larger size lists have been tested without success it will fail for real.

  FREE

  * When FRAGMENTS are freed so that a BLOCK becomes non-used, it is returned
    to the system.

  * FREEing a fragment adds the buffer in a LIFO-order. That means that the
    next request for a fragment from the same list, the last freed buffer will
    be returned first.

  REALLOC

  * REALLOCATION of a FRAGMENT does first check if the new size would fit
    within the same FRAGMENT and if it would use the same FRAGMENT size. If it
    does and would, the same pointer is returned.

  OVERHEAD

   Yes, there is an overhead on small allocations (internal fragmentation).
  Yet, I do believe that small allocations more often than larger ones are
  used dynamically. I believe that a large overhead is not a big problem if it
  remains only for a while. The big gain is with the extreme speed we can GET
  and RETURN small allocations. This has yet to be proven. I am open to other
  systems of dealing with the small ones, but I don`t believe in using the
  same system for all sizes of allocations.

  IMPROVEMENT

   An addition to the above described algorithm is the `save-empty-BLOCKS-a-
  while-afterwards`. It will be used when the last used FRAGMENT within a
  BLOCK is freed. The BLOCK will then not get returned to the system until "a
  few more" FRAGMENTS have been freed in case the last [few] freed FRAGMENTS
  are allocated yet again (and thus prevent the huge overhead of making
  FRAGMENTS in a BLOCK). The "only" drawback of such a SEBAWA concept is
  that it would mean an even bigger overhead...

  HEADERS (in allocated data)

    FRAGMENTS - 32-bit pointer to its parent BLOCK (lowest bit must be 0)
    BLOCK     - 32-bit size (lowest bit must be 1 to separate this from
		FRAGMENTS)

##############################################################################
 LARGER ALLOCATIONS
##############################################################################

  If the requested size is larger than the largest FRAGMENT size supported,
  the allocation will be made for this memory area alone, or if a BLOCK is
  allocated to fit lots of FRAGMENTS a large block is also desired.

  * We add memory to the "system" with the add_pool() function call. It
    specifies the start and size of the new block of memory that will be
    used in this memory allocation system. Several add_pool() calls are
    supported and they may or may not add contiguous memory.

  * Make all blocks get allocated aligned to BLOCKSIZE (sometimes referred to
    as 'grain size'), 64 bytes in my implementation. Reports tell us there is
    no real gain in increasing the size of the align.

  * We link *all* pieces of memory (AREAS), free or not free. We keep the list
    in address order and thus when a FREE() occurs we know instantly if there
    are FREE CHUNKS wall-to-wall. No list "travels" needed. Requires some
    extra space in every allocated BLOCK. Still needs to put the new CHUNK in
    the right place in size-sorted list/tree. All memory areas, allocated or
    not, contain the following header:
    - size of this memory area (31 bits)
    - FREE status (1 bit)
    - pointer to the next AREA closest in memory (32 bits)
    - pointer to the prev AREA closest in memory (32 bits)
    (Totally 12 bytes)

  * Sort all FREE CHUNKS in size-order. We use a SPLAY TREE algorithm for
    maximum speed. Data/structs used for the size-sorting functions are kept
    in an abstraction layer away from this since it is really not changing
    anything (except executing speed).

  ALLOC (RSIZE - requested size, aligned properly)

  * Fetch a CHUNK that RSIZE fits within. If the found CHUNK is larger than
    RSIZE, split it and return the RSIZE to the caller. Link the new CHUNK
    into the list/tree.

  FREE (AREA - piece of memory that is returned to the system)

  * Since the allocated BLOCK has kept its link-pointers, we can without
    checking any list instantly see if there are any FREE CHUNKS that are
    wall-to-wall with the AREA (both sides). If the AREA *is* wall-to-wall
    with one or two CHUNKS that or they are unlinked from the lists, enlarged
    and re-linked into the lists.

  REALLOC
 
  * There IS NO realloc() of large blocks, they are performed in the previous
    layer (dmalloc).


##############################################################################
 FURTHER READING
##############################################################################
 
 * "Dynamic Storage Allocation: A Survey and Critical Review" (Paul R. Wilson,
   Mark S. Johnstone, Michael Neely, David Boles)
   ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps

 * "A Memory Allocator" (Doug Lea)
   http://g.oswego.edu/dl/html/malloc.html