Modern malloc implementations (and its possible earlier ones circa 2001 did too; remember I was only 21 then!) have a separation between "small" and "large" (and "huge"!) objects. Small objects (say, under a page size) will generally go in a pool of just those object sizes. Large objects (from say page size to something larger, like a megabyte) will be allocated a multiple of pages.
This unfortunately means that my 4096 + 12 byte structure may suddenly take 8192 bytes of RAM! Oops.
I decided to test this out. This is what happens when you do that with FreeBSD's allocator. Henrik has tried this under GNUMalloc and has found that the 4108 byte allocation doesn't take two pages.
[adrian@sarah ~]$ ./test1 test1 131072
allocating 12, then 4096 byte structures 131072 times..
[adrian@sarah ~]$ ./test1 test2 131072
allocating 4108 byte structure 131072 times..