A new version of tracy has been released. A short summary of the new features:
Complete list of features:
- Breaking change: the format of trace files has changed.
- Previous tracy version will crash when trying to open new traces.
- Loading of traces saved by previous version is supported.
- Tracy will no longer crash when trying to load traces saved by future
versions. Instead, a dialog advising to update will be displayed.
- Tracy will no longer crash in most cases when trying to open files that
are not traces. Some crashes are still possible, due to support of old,
- Ability to track every memory allocation in profiled program.
- Allocation event queuing must be done in order, which requires exclusive
access to the serialized queue on the client side. This has no effect on
the rest of events, which are stored in a concurrent queue, as before.
- You can search for a memory address and see where it was allocated, for
how long, etc. This lists all matching allocations since the program was
- All active (non-freed) allocations may be listed. This shows the current
memory state by default, but can go back to any point in time.
- Graphical representation of process memory map may be displayed. New
allocations/frees are displayed in a bright color and fade out with
time. This feature also can look back in time.
- Memory usage plot is automatically generated.
- Basic allocation information is displayed in memory plot tooltips.
- A summary of memory events within a zone (and its children) is now
printed in zone info window.
- Support loading profile dumps with no memory allocation data (generated by
- Added ability to display global statistics of a selected zone from the
zone info window.
- Fixed regression with lock announce processing that appeared during
- Allow selecting/unselecting all locks for display.
- Performance improvements.
- Don't save unneeded lock information in trace file.
- Don't save thrash in message list data.
- Allow expanding view span up to one hour, instead of one minute.
- Added trace comparison window.
- An external trace has to be loaded first.
- Zone query in both traces (current and external).
- Both results are overlaid on the same histogram.
- Graphs can be adjusted as-if there was the same number of zones
- Read time directly from a hardware register on ARM/ARM64, if possible.
- User-space access to the timer needs to be enabled in the kernel, so
tracy will perform run-time checks and fallback to the old method if the
- Prevent connections in a TIME-WAIT state from blocking new listen
- Display y-range of plots.
- Added ability to unload traces loaded from files. To do so close the main
profiler window. You will return to the connect/open selection dialog.
Live captures cannot be terminated this way.
- Zones previously displayed in zone info window are remembered and you can
go back to them. Closing the zone info window or switching between CPU and
GPU zones will clear the memory.
- Improved message list window.
- Messages are now displayed in columns.
- Originating thread of each message is now included in the list.
- You can now navigate to next and previous frame.
- Zone statistics can be now displayed using only self times.
- Support for tracing GPU events using Vulkan.
- Timeline will now display "OpenGL context" or "Vulkan context" instead of
- Fixed regression causing invalid display of GPU context appearance time.
- Fixed regression causing invalid reporting of an active CPU in zone end
events, if MSVC rdtscp optimization was not enabled.
- Ability to collect true call stacks.
- Supported on Windows, Linux, Android.
- The following events can collect call stacks:
- Memory alloc/free.
- Zone begin.
- GPU zone begin.
- Zone stack trace now also displays frames from a real call trace.
- On Linux call stack frame name resolution requires a call to dladdr,
which in turn requires linking with libdl.
- Allow manual entry of GPU time drift value.
- Unix build system no longer shares object files between different build
- Fixes inability to build debug and release versions of a single utility
without "make clean".
- Fixes incompatibility between "standalone" and "capture" utilities due
to different set of used feature flags.
- On Windows "standalone" utility now adapts to system DPI setting.
- Optional per-call zone naming.
Tracy is a real time, nanosecond resolution frame profiler that can be
used for remote or embedded telemetry of your application. It can
profile both CPU (C++, Lua) and GPU (OpenGL). It also can display locks
held by threads and their interactions with each other.
Tracy requires compiler support for C++11, Thread Local Storage and a
way to workaround static initialization order fiasco. There are no
other requirements. The following platforms are confirmed to be working
(this is not a complete list):
Q: I already use VTune/perf/Very Sleepy/callgrind/MSVC profiler. A:
These are statistical profilers, which can be used to find hot spots in the code. This is very useful, but it won't show you the underlying reason for semi-random frame stutter that may occur every couple of seconds.
Q: You can use Telemetry for that. A: Telemetry license costs about 8000 $ per year. Tracy is open source software. Telemetry doesn't have Lua bindings.
Q: You can use the free Brofiler. Crytek does use it, so it has to be good. A:
After a cursory look at the Brofiler code I can tell that the timer resolution there is at 300 ns. Tracy can achieve 5 ns timer resolution. Brofiler event logging infrastructure seems to be overengineered. Brofiler can't track lock contention, nor does it have Lua bindings.
Q: So tracy is supposedly faster? A:
My measurements show that logging a single zone with tracy takes only 15 ns. In theory, if the program was doing nothing else, tracy should be able to log 66 million zones per second.
Q: Bullshit, RAD
is advertising that they are able only to log about a million zones, over the network nevertheless: "Capture over a million timing zones per
second in real-time!" A: Tracy can perform network transfer of 15 million zones per second. Should the client and server be on separate machines, this number will be even higher, but you will need more than a gigabit link to achieve the maximum throughput. https://www.youtube.com/watch?v=DSMIHShKGAc
Q: Can I connect to my application at any time and start profiling at this moment? A: No, all events are registered from the beginning of program's execution and are waiting in a queue.
Q: Am I seeing correctly that the profiler allocates one gigabyte of memory per second? A: Only in extreme cases. Normal usage has much lower memory pressure.
Q: Why do you do magic with the static initialization order? Everyone says that's a bad practice. A: It allows tracking construction of static objects and memory allocations performed before main() is entered.
Q: There's no support for consoles. A: Welp. But there's mobile support.
Q: I do need console support. A: The code is open. Write your own, then send a patch.
Following is the annotated assembly code (generated from C++ sources) that's responsible for logging start of the zone: call qword ptr [__imp_GetCurrentThreadId] mov r14d,eax mov qword ptr [rsp+0F0h],r14 // save thread id for later use mov r12d,10h mov rax,qword ptr gs:[58h] // TLS mov r15,qword ptr [rax] // queue address mov rdi,qword ptr [r12+r15] // data address mov rbp,qword ptr [rdi+20h] // buffer counter mov rbx,rbp and ebx,7Fh // 128 item buffer jne Application::InnerLoop+66h --+ mov rdx,rbp | mov rcx,rdi | call enqueue_begin_alloc | // reclaim/alloc next buffer shl rbx,5 <---------------------+ // buffer items are 32 bytes add rbx,qword ptr [rdi+40h] mov byte ptr [rbx],4 // queue item type rdtscp mov dword ptr [rbx+19h],ecx // cpu id shl rdx,20h or rax,rdx // 64 bit timestamp mov qword ptr [rbx+1],rax mov qword ptr [rbx+9],r14 // thread id lea rax,[__tracy_source_location] // static struct address mov qword ptr [rbx+11h],rax lea rax,[rbp+1] // increment buffer counter mov qword ptr [rdi+20h],rax