Thursday, 29 March 2018

Introduction to the tracy profiler

A short feature presentation and integration guide for the tracy profiler.


Saturday, 6 January 2018

Tracy frame profiler

Tracy is a real time, nanosecond resolution frame profiler that can be used for remote or embedded telemetry of your application. It can profile both CPU (C++, Lua) and GPU (OpenGL). It also can display locks held by threads and their interactions with each other.



Tracy requires compiler support for C++11, Thread Local Storage and a way to workaround static initialization order fiasco. There are no other requirements. The following platforms are confirmed to be working (this is not a complete list):
  • Windows (x86, x64)
  • Linux (x86, x64, ARM, ARM64)
  • Android (ARM, x86)
  • FreeBSD (x64)
  • Cygwin (x64)
  • WSL (x64)
  • OSX (x64)
The following compilers are supported:
  • MSVC
  • gcc
  • clang

Source code and more information: https://bitbucket.org/wolfpld/tracy

A quick FAQ:

Q: I already use VTune/perf/Very Sleepy/callgrind/MSVC profiler. 
A: These are statistical profilers, which can be used to find hot spots in the code. This is very useful, but it won't show you the underlying reason for semi-random frame stutter that may occur every couple of seconds.

Q: You can use Telemetry for that.
A: Telemetry license costs about 8000 $ per year. Tracy is open source software. Telemetry doesn't have Lua bindings. 

Q: You can use the free Brofiler. Crytek does use it, so it has to be good.
A: After a cursory look at the Brofiler code I can tell that the timer resolution there is at 300 ns. Tracy can achieve 5 ns timer resolution. Brofiler event logging infrastructure seems to be overengineered. Brofiler can't track lock contention, nor does it have Lua bindings.

Q: So tracy is supposedly faster?
A: My measurements show that logging a single zone with tracy takes only 15 ns. In theory, if the program was doing nothing else, tracy should be able to log 66 million zones per second.

Q: Bullshit, RAD is advertising that they are able only to log about a million zones, over the network nevertheless: "Capture over a million timing zones per second in real-time!"
A: Tracy can perform network transfer of 15 million zones per second. Should the client and server be on separate machines, this number will be even higher, but you will need more than a gigabit link to achieve the maximum throughput. https://www.youtube.com/watch?v=DSMIHShKGAc

Q: Can I connect to my application at any time and start profiling at this moment?
A: No, all events are registered from the beginning of program's execution and are waiting in a queue.

Q: Am I seeing correctly that the profiler allocates one gigabyte of memory per second?
A: Only in extreme cases. Normal usage has much lower memory pressure.

Q: Why do you do magic with the static initialization order? Everyone says that's a bad practice.
A: It allows tracking construction of static objects and memory allocations performed before main() is entered.

Q: There's no support for consoles.
A: Welp. But there's mobile support.

Q: I do need console support.
A: The code is open. Write your own, then send a patch.

Following is the annotated assembly code (generated from C++ sources) that's responsible for logging start of the zone:
call        qword ptr [__imp_GetCurrentThreadId]
mov         r14d,eax
mov         qword ptr [rsp+0F0h],r14        // save thread id for later use
mov         r12d,10h
mov         rax,qword ptr gs:[58h]          // TLS
mov         r15,qword ptr [rax]             // queue address
mov         rdi,qword ptr [r12+r15]         // data address
mov         rbp,qword ptr [rdi+20h]         // buffer counter
mov         rbx,rbp
and         ebx,7Fh                         // 128 item buffer
jne         Application::InnerLoop+66h --+
mov         rdx,rbp                      |
mov         rcx,rdi                      |
call        enqueue_begin_alloc          |  // reclaim/alloc next buffer
shl         rbx,5  <---------------------+  // buffer items are 32 bytes
add         rbx,qword ptr [rdi+40h]
mov         byte ptr [rbx],4                // queue item type
rdtscp
mov         dword ptr [rbx+19h],ecx         // cpu id
shl         rdx,20h
or          rax,rdx                         // 64 bit timestamp
mov         qword ptr [rbx+1],rax
mov         qword ptr [rbx+9],r14           // thread id
lea         rax,[__tracy_source_location]   // static struct address
mov         qword ptr [rbx+11h],rax
lea         rax,[rbp+1]                     // increment buffer counter
mov         qword ptr [rdi+20h],rax

Wednesday, 23 August 2017

Ruch w polskim usenecie

Poniższy wykres przedstawia liczbę wiadomości wysłanych w ramach polskiego usenetu (na podstawie danych zawartych w archiwum https://archive.org/details/usenet-uat-pl).


Sunday, 29 January 2017

Archiwum usenetu stało się jeszcze lepsze

Narzędzia

 

Tekstowy czytnik

 

Od niedawna dostępny jest czytnik archiwum działający w trybie tekstowym. Użytkownicy slrn-a powinni poczuć się jak w domu.



tbrowser posiada wszystkie możliwości starego czytnika graficznego, przy czym działa szybciej i potrzebuje mniej pamięci. Jest to teraz podstawowy program do przeglądania archiwum. Wersja oparta na Qt staje się przestarzałą i nie będzie dalej rozwijana.

 

Galaktyka

 

Do tej pory poszczególne grupy dyskusyjne były niezależnymi bytami. Nowo wprowadzony tryb "galaxy" umożliwia przełączanie się między grupami, przejście do wiadomości z innej grupy na podstawie Message-ID, a także pozwala na śledzenie followup-ów i crosspostów.


Aby skorzystać z przygotowanej bazy danych, należy pobrać plik galaxy.7z z archiwum i rozpakować do katalogu, w którym umieszczone są pliki grup dyskusyjnych. Następnie wystarczy podać ścieżkę do katalogu galaxy jako parametr programu tbrowser.

Tryb galaxy nie wymaga obecności wszystkich grup dyskusyjnych. Gdy któregoś archiwum będzie brakowało, zostanie ono wyświetlone, ale nie będzie można go otworzyć.

 

 Wyszukiwanie

 

Zmieniony został algorytm punktacji wyników wyszukiwania. Wyniki, w których wyszukiwane słowa znajdują się blisko siebie są dodatkowo premiowane. Ponadto, w wyszukiwaniu biorą udział słowa podobne do podanych w zapytaniu, co pozwala znaleźć również wiadomości z literówkami.

Wyszukiwarka pozostawia jeszcze sporo do życzenia, szczególnie jeżeli chodzi o kwestię prezentacji wyników.

 

 Narzędzia i biblioteki

 

Wspólna część kodu doczekała się wielu drobnych optymalizacji, dzięki czemu oba czytniki działają jeszcze szybciej. Podobne usprawnienia trafiły do narzędzi służących do tworzenia archiwum. Pojawiło się również kilka nowych programów przetwarzających dane. Więcej szczegółów jest dostępnych pod adresem https://bitbucket.org/wolfpld/usenetarchive.

 

 Archiwum polskiego usenetu

 

Archiwum dostępne pod adresem https://archive.org/details/usenet-uat-pl zawiera teraz grupy zarówno z hierarchii pl.*, jak i alt.pl.*. Poza tym znalazło się tam również kilka innych polskich grup dyskusyjnych.

Zawartość archiwum została zaktualizowana o wiadomości wysłane do grudnia 2016 roku. Niektóre grupy zyskały też dodatkowy rok archiwalnych wiadomości. Najstarsze wpisy sięgają teraz 1995 roku. Archiwum zawiera teraz ponad 56 milionów unikalnych wiadomości.

 

 Kompatybilność

 

Poprzednie wersje archiwum w dalszym ciągu można otworzyć graficzną przeglądarką, ale wersja tekstowa nie jest z nimi kompatybilna. Aktualna wersja nie będzie działać ze starszymi wersjami programów. Niemniej, z uwagi na liczne poprawki i uzupełnienia zawartości, najlepiej usunąć wcześniej pobrane archiwa i pobrać je na nowo.

Thursday, 1 September 2016

Archiwum polskiego usenetu

URL: https://archive.org/details/usenet-uat-pl

[Aktualizacja 29.01.2017: http://zgredowo.blogspot.com/2017/01/archiwum-usenetu-stao-sie-jeszcze-lepsze.html]
[Aktualizacja 18.11.2016: Poprawione zostały polskie znaki w opisach niektórych grup i przywrócono widoczność niedostępnych wcześniej wiadomości.]

Pod powyższymi odnośnikami znaleźć można najbardziej kompletne archiwum polskich grup dyskusyjnych (usenetu, newsów). Można je odczytać za pomocą czytnika ze strony https://bitbucket.org/wolfpld/usenetarchive.

Archiwum zostało utworzone w czerwcu i lipcu 2016 roku, przy użyciu następujących źródeł:
Najstarsze dostępne wiadomości pochodzą z 1996 roku. Niestety, mimo użycia wielu źródeł, część wiadomości w dalszym ciągu zagubiona jest w pomroce dziejów (ewentualnie w Google Groups, ale to w sumie bez różnicy).

Archiwum przetworzone zostało za pomocą narzędzi wchodzących w skład Usenet Archive Toolkit:
  • Nie są przechowywane duplikaty wiadomości.
  • Wszystkie grupy zostały potraktowane filtrem od-spamującym (sprawdzane były tylko wiadomości, które zaczynały wątek i pozostały bez odpowiedzi).
  • Wiadomości zostały przekonwertowane na UTF-8, z uwzględnieniem większość problemów związanych z nieprawidłowym stosowaniem standardów przez czytniki, złymi, bądź brakującymi deklaracjami kodowania znaków, itp.
  • Dostępny jest obliczony wcześniej graf zależności między wiadomościami (struktura wątkowania). Jeżeli to możliwe, uwzględnione są również zależności wynikające wprost z cytatów (w przypadku gdy brak jest odpowiednich nagłówków). Jest to szczególnie pomocne w przypadku grup, które były połączone z listami mailingowymi, bądź z FidoNetem.
  • Dostępna jest również możliwość przeszukiwania wiadomości.

Usenet Archive Toolkit

URL: https://bitbucket.org/wolfpld/usenetarchive

Usenet Archive Toolkit project aims to provide a set of tools to process various sources of usenet messages into a coherent, searchable archive.

Motivation

Usenet is dead. You may believe it's not, but it really is.

People went away to various forums, facebooks and twitters and seem fine there. Meanwhile, the old discussions slowly rot away. Google groups is a sad, unusable joke. Archive.org dataset, at least with regard to polish usenet archives, is vastly incomplete. There is no easy way to get the data, browse it, or search it. So, maybe something needs to be done. How hard can it be anyway? (Not very: one month for a working prototype, another one for polish and bugfixing.)

Advantages

Why use UAT? Why not use existing solutions, like google groups, archives from archive.org or NNTP servers with long history?
  • UAT is designed for offline work. You don't need network connection to access data in "the cloud". You don't need to wait for a reply to your query, or, god forbid, endure "web 2.0" interfaces.
  • UAT archives won't suddenly disappear. You have them on your disk. Google groups are deteriorating with each new iteration of the interface. Also, google is known for shutting down services they no longer feel viable. Google reader, google code search, google code, etc. Other, smaller services are one disk crash away from completly disappearing from the network.
  • UAT archive format is designed for fast access and efficient search. Each message is individually compressed, to facilitate instant access, but uses whole-archive dictionary for better compression. Search is achieved through a database similar in design to google's original paper. Total archive size is smaller than uncompressed collection of messages.
  • Multiple message sources may be merged into a single UAT archive, without message duplication. This way you can fill blanks in source A (eg. NNTP archive server) with messages from source B (eg. much smaller archive.org dump). Archives created in such way are the most complete collection of messages available.
  • UAT archives do not contain duplicate messages (which is common even on NNTP servers), nor stray messages from other groups (archive.org collections contain many bogus messages).
  • Other usenet archives are littered with spam messages. UAT can filter out spam, making previously unreadable newsgroups a breeze to read. Properly trained spam database has very low false positive and false negative percentage.
  • All messages are transcoded to UTF-8, so that dumb clients may be used for display. UAT tries very hard to properly decode broken and/or completly invalid headers, messages without specified encoding or with bad encoding. HTML parts of message are removed. You also don't need to worry about parsing quoted-printable content (most likely malformed). And don't forget about search. Have fun grepping that base64 encoded message without UAT.
  • UAT archives contain precalculated message connectivity graph, which removes the need to parse "references" headers (often broken), sort messages by date, etc. UAT can also "restore" missing connectivity that is not indicated in message headers, through search for quoted text in other messages.
  • Access to archives is available through a trivial libuat interface.
  • UAT archives are mapped to memory and 100% disk backed. In high memory pressure situations archive pages may just be purged away and later reloaded on demand. No memory allocations are required during normal libuat operation, other than:
    • Small, static growing buffer used to decompress single message into.
    • std::vectors used during search operation.

Toolkit description

UAT provides a multitude of utilities, each specialized for its own task. You can find a brief description of each one below.

Import Formats

Usenet messages may be retrieved from a number of different sources. Currently we support:
  • import-source-slrnpull --- Import from a directory where each file is a separate message (slrnpull was chosen because of extra-simple setup required to get it working).
  • import-source-slrnpull-7z --- Import from a slrnpull directory compressed into a single 7z compressed file.
  • import-source-mbox --- Archive.org keeps its collection of usenet messages in a mbox format, in which all posts are merged into a single file.
Imported messages are stored in a per-message LZ4 compressed meta+payload database.

Data Processing

Raw imported messages have to be processed to be of any use. We provide the following utilities:
  • extract-msgid --- Extracts unique identifier of each message and builds reference table for fast access to any message through its ID.
  • extract-msgmeta --- Extracts "From" and "Subject" fields, as a quick reference for archive browsers.
  • merge-raw --- Merges two imported data sets into one. Does not duplicate messages.
  • utf8ize --- Converts messages to a common character encoding, UTF-8.
  • connectivity --- Calculate connectivity graph of messages. Also parses "Date" field, as it's required for chronological sorting.
  • threadify --- Some messages do not have connectivity data embedded in headers. Eg. it's a common artifact of using news-email gateways. This tool parses top-level messages, looking for quotations, then it searches other messages for these quotes and creates (not restores! it was never there!) missing connectivity between children and parents.
  • repack-zstd --- Builds a common dictionary for all messages and recompresses them to a zstd meta+payload+dict database.
  • repack-lz4 --- Converts zstd database to LZ4 database.
  • package --- Packages all databases into a single file. Supports unpacking.

Data Filtering

Raw data right after import is highly unfit for direct use. Messages are duplicated, there's spam. These utilities help clean it up:
  • kill-duplicates --- Removes duplicate messages. It is relatively rare, but data sets from even a single NNTP server may contain the same message twice.
  • filter-newsgroups --- Some data sources (eg. Archive.org's giganews collection) contain messages that were not sent to the collection's newsgroup. This utility will remove such bogus messages.
  • filter-spam --- Learns which messages look like spam and removes them.
Search in archive is performed with the help of a word lexicon. The following tools are used for its preparation:
  • lexicon --- Build a list of words and hit-tables for each word.
  • lexopt --- Optimize lexicon string database.
  • lexstats --- Display lexicon statistics.
  • lexdist --- Calculate distances between words (unused).
  • lexhash --- Prepare lexicon hash table.
  • lexsort --- Sort lexicon data.

Data Access

These tools provide access to archive data:
  • query-raw --- Implements queries on LZ4 database. Requires results of extract-msgid utility. Supports:
    • Message count.
    • Listing of message identifiers.
    • Query message by identifier.
    • Query message by database record number.
  • libuat --- Archive access library. Operates on zstd database.
  • query --- Testbed for libuat. Exposes all provided functionality.

End-user Utilities

  • browser --- Graphical browser of archives.


Future work ideas

Here are some viable ideas that I'm not really planning to do any time soon, but which would be nice to have:
  • Implement messages extractor, for example in mbox format. Would need to properly encode headers and add content encoding information (UTF-8 everywhere).
  • Implement a read-only NNTP server. Would need to properly encode headers and add content encoding information. 7-bit cleanness probably would be nice, so also encode as quoted-printable. Some headers may need to be rewritten (eg. "Lines", which most probably won't be true, due to MIME processing). Message sorting by date may be necessary to put some sense into internal message numbers, which currently have no meaning at all.
  • Implement pan-group search mechanism.
  • Query google groups for missing messages present in "references" header.

Workflow

Usenet Archive Toolkit operates on a couple of distinct databases. Each utility requires a specific set of these databases and produces its own database, or creates a completly new database indexing schema, which invalidates rest of databases.

slrnpull directory → import-source-slrnpull → produces: LZ4
slrnpull compressed → import-source-slrnpull-7z → produces: LZ4
mbox file → import-source-mbox → produces: LZ4
LZ4kill-duplicates → produces: LZ4
LZ4extract-msgid → adds: msgid
LZ4, msgidconnectivity → adds: conn
LZ4, connfilter-newsgroups → produces: LZ4
LZ4, msgid, conn, strfilter-spam → produces: LZ4
LZ4extract-msgmeta → adds: str
(LZ4, msgid) + (LZ4, msgid) → merge-raw → produces: LZ4
LZ4utf8ize → produces: LZ4
LZ4repack-zstd → adds: zstd
zstdrepack-lz4 → adds: LZ4
LZ4, connlexicon → adds: lex
lexlexopt → modifies: lex lexlexhash → adds: lexhash
lexlexsort → modifies: lex
lexlexdist → adds: lexdist (unused)
lexlexstats → user interaction
LZ4, msgidquery-raw → user interaction
zstd, msgid, conn, str, lex, lexhashlibuat → user interaction
everything but LZ4packageone file archive
everything but LZ4threadify → modifies: conn, invalidates: lex, lexhash

Additional, optional information files, not created by any of the above utilities, but used in user-facing programs:
  • name --- Group name.
  • desc_short --- A short description about the purpose of the group (per 7.6.6 in RFC 3977).
  • desc_long --- Group charter. (Some newsgroups regularly post a description to the group that describes its intention. These descriptions are posted by the people involved with the newsgroup creation and/or administration. If the group has such a description, it almost always includes the word "charter", so you can quickly find it by searching the newsgroup for that word. A charter is the "set of rules and guidelines" which supposedly govern the users of that group.)

Notes

Be advised that some utilities (repack-zstd, lexicon) do require enormous amounts of memory. Processing large groups (eg. 2 million messages, 3 GB data) will swap heavily on a 16 GB machine.

utf8ize doesn't compile on MSVC. Either compile it on cygwin, or have fun banging glib and gmime into submission. Your choice.

UAT only works on 64 bit machines.

License

GNU AGPL.

Wednesday, 27 January 2016

etcpak 0.5

etcpak strikes again, this time with version 0.5, which has the ability to calculate planar blocks from the ETC2 standard. Color gradients, which were a sore spot in the image quality previously, will now have a much smoother look. This new option is activated by passing -etc2 parameter and comes at a small time cost (152% of pure ETC1 mode, 77 ms vs 117 ms). Example compressed image:


Planar block count in this image is quite high, as can be seen on the following debug image, where blue color indicates planar mode:


It should be noted that AVX2 version of planar block compression does not produce the same results as scalar one. Keep that in mind on pre-Haswell machines.

Download: https://bitbucket.org/wolfpld/etcpak/downloads