The major change of this quarter is the addition of a generic low level performance monitoring system, developed in large part by Rémy Noël.
This system provides users with the ability to track low level events, such as cycles, instruction fetches or cache misses to name a few, either per-thread or per-processor. The implementation is completely real-time friendly, with no limits on the number of users. There is a limit on the number of events that can be tracked at any time, currently 8, which matches the number of available hardware counters on current processors, and can be changed by configuration. Drivers are available for architectural interfaces of AMD and Intel processors.
Another major change is the addition of local atomic operations. Atomic operations have been massively reworked so that local atomic operations may transparently be used instead. This improves performance when building for a single processor, on architectures where local atomic operations are better than regular atomic operations, and this also enables atomic operations on targets with a single processor and no atomic instructions (in particular most classic ARM processors).
Among the less important changes are the addition of bulletins, a lightweight notification system, per-CPU operations, which allow modules to run callbacks on each processor when they boot, and support for an embedded symbol table, to avoid relying on the boot loader to provide it.
Bug fixes include :
- Inter-processor interrupt generation, which could suffer from data races and fail silently
- The consume memory order, which was too relaxed and reset to the compiler value
- Out-of-tree builds