Cookie Consent by Free Privacy Policy Generator Aktuallisiere deine Cookie Einstellungen ๐Ÿ“Œ Bringing up BPI-F3 - Part 2.5


๐Ÿ“š Bringing up BPI-F3 - Part 2.5


๐Ÿ’ก Newskategorie: Programmierung
๐Ÿ”— Quelle: dev.to

this is a sort of intermission

Getting perf to work up to a point

Apparently the opensbi-mediated access to the performance counter does not map so using the usual cycles and instructions event works in perf record. I got this board mainly to help with dav1d development efforts, so not having perf support would make harder to reason about performance.

The best workaround after a discussion in the forums, is to build the pmu-events to include custom ones and then rely on the overly precise cpu-specific events instead:

$ perf list | grep cycle
  bus-cycles                                         [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  ref-cycles                                         [Hardware event]
  stalled-cycles-backend OR idle-cycles-backend      [Hardware event]
  stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]
  m_mode_cycle
       [M-mode cycles]
  rtu_flush_cycle
  s_mode_cycle
       [S-mode cycles]
  stalled_cycle_backend
       [Stalled cycles backend]
  stalled_cycle_frontend
       [Stalled cycles frontend]
  u_mode_cycle
       [U-mode cycles]
  vidu_total_cycle
  vidu_vec0_cycle
  vidu_vec1_cycle
...
$ perf list | grep inst
  branch-instructions OR branches                    [Hardware event]
  instructions                                       [Hardware event]
  br_inst
       [Branch instructions]
  cond_br_inst
       [Conditional branch instructions]
  indirect_br_inst
       [Indirect branch instructions]
  taken_cond_br_inst
       [Taken conditional branch instructions]
  uncond_br_inst
       [Unconditional branch instructions]
instruction:
  alu_inst
       [ALU (integer) instructions]
  amo_inst
       [AMO instructions]
  atomic_inst
       [Atomic instructions]
  bus_fence_inst
       [Bus FENCE instructions]
  csr_inst
       [CSR instructions]
  div_inst
       [Division instructions]
  ecall_inst
       [ECALL instructions]
  failed_sc_inst
       [Failed SC instructions]
  fence_inst
       [FENCE instructions]
  fp_div_inst
       [Floating-point division instructions]
  fp_inst
       [Floating-point instructions]
  fp_load_inst
       [Floating-point load instructions]
  fp_store_inst
       [Floating-point store instructions]
  load_inst
       [Load instructions]
  lr_inst
       [LR instructions]
  mult_inst
       [Multiplication instructions]
  sc_inst
       [SC instructions]
  store_inst
       [Store instructions]
  unaligned_load_inst
       [Unaligned load instructions]
  unaligned_store_inst
       [Unaligned store instructions]
  vector_div_inst
       [Vector division instructions]
  vector_inst
       [Vector instructions]
  vector_load_inst
       [Vector load instructions]
  vector_store_inst
       [Vector store instructions]
  id_inst_pipedown
       [ID instruction pipedowns]
  id_one_inst_pipedown
       [ID one instruction pipedowns]
  issued_inst
       [Issued instructions]
  rf_inst_pipedown
       [RF instruction pipedowns]
  rf_one_inst_pipedown
       [RF one instruction pipedowns]

Building perf

Perf way to deal with cpu-specific events is through some machinery called jevents.

It lives in tools/perf/pmu-events and you can manually trigger it with.

./jevents.py riscv arch pmu-events.c

And produce C code from a bunch of JSON and a CSV map file.

When I tried build the sources the first time I tried to cut it by setting most NO_{} make variables and left NO_JEVENTS=1, luckily I fixed it after noticing the different output in the forum.

## I assume you have here the custom linux sources
cd /usr/src/pi-linux/tools/perf
## being lazy I disabled about everything instead of installing dependencies, one time I disabled too much.
make -j 8 V=1 VF=1 HOSTCC=riscv64-unknown-linux-gnu-gcc HOSTLD=riscv64-unknown-linux-gnu-ld CC=riscv64-unknown-linux-gnu-gcc CXX=riscv64-unknown-linux-gnu-g++ AR=riscv64-unknown-linux-gnu-ar LD=riscv64-unknown-linux-gnu-ld NM=riscv64-unknown-linux-gnu-nm PKG_CONFIG=riscv64-unknown-linux-gnu-pkg-config prefix=/usr bindir_relative=bin tipdir=share/doc/perf-6.8 'EXTRA_CFLAGS=-O2 -pipe' 'EXTRA_LDFLAGS=-Wl,-O1 -Wl,--as-needed' ARCH=riscv BUILD_BPF_SKEL= BUILD_NONDISTRO=1 JDIR= CORESIGHT= GTK2= feature-gtk2-infobar= NO_AUXTRACE= NO_BACKTRACE= NO_DEMANGLE= NO_JEVENTS=0 NO_JVMTI=1 NO_LIBAUDIT=1 NO_LIBBABELTRACE=1 NO_LIBBIONIC=1 NO_LIBBPF=1 NO_LIBCAP=1 NO_LIBCRYPTO= NO_LIBDW_DWARF_UNWIND= NO_LIBELF= NO_LIBNUMA=1 NO_LIBPERL=1 NO_LIBPFM4=1 NO_LIBPYTHON=1 NO_LIBTRACEEVENT= NO_LIBUNWIND=1 NO_LIBZSTD=1 NO_SDT=1 NO_SLANG=1 NO_LZMA=1 NO_ZLIB= TCMALLOC= WERROR=0 LIBDIR=/usr/libexec/perf-core libdir=/usr/lib64 plugindir=/usr/lib64/perf/plugins -f Makefile.perf install

Now I have a perf with still cycles and instructions not working with perf record, I wonder if there is a way at opensbi or kernel level to aggregate events to make it work properly, but I never had to look into perf internals so probably I poke it way later if nobody address it otherwise, anyway

perf record --group -e u_mode_cycle,m_mode_cycle,s_mode_cycle

produces something close enough for cycles, well u_mode_cycle is enough.

While for instructions the situation is a bit more annoying

perf record --group -e alu_inst,amo_inst,atomic_inst,fp_div_inst,fp_inst,fp_load_inst,fp_store_inst,load_inst,lr_inst,mult_inst,sc_inst,store_inst,unaligned_load_inst,unaligned_store_inst

is close to count all the scalar instructions, but trying to add vector_div_inst,vector_inst,vector_load_inst,vector_store_inst somehow makes perf record stop collecting samples silently, adding just 3 more events works though, so I guess I can be happy with u_mode_cycle,alu_inst,atomic_inst,fp_inst,vector_inst at least.

...



๐Ÿ“Œ BPI-Webduino:bit board & BPI-UNO32 & Banana Pi M2 Zero Giveaway at Allchips


๐Ÿ“ˆ 48.83 Punkte

๐Ÿ“Œ YouTube-MP3 Ripping Site Sued By IFPI, RIAA and BPI


๐Ÿ“ˆ 24.42 Punkte

๐Ÿ“Œ http://intra.bpi.lipi.go.id/security/lang.tmp


๐Ÿ“ˆ 24.42 Punkte

๐Ÿ“Œ bpi solutions: Einblicke in das digitale Bรผro 4.0


๐Ÿ“ˆ 24.42 Punkte

๐Ÿ“Œ Banana Pi BPI-F2S: Preiswerter Minrechner steuert Maschinen und Hobbyprojekte


๐Ÿ“ˆ 24.42 Punkte

๐Ÿ“Œ Banana Pi BPI-M2 Pro: Kleiner Bastelrechner unterstรผtzt 4K und misst 65 x 65 mm


๐Ÿ“ˆ 24.42 Punkte

๐Ÿ“Œ Banana Pi BPI-M6: Leistungsstarker Raspi-Konkurrent


๐Ÿ“ˆ 24.42 Punkte

๐Ÿ“Œ Banana Pi BPI-M6: Bastelrechner hat HDMI 2.1 und M.2-Slot


๐Ÿ“ˆ 24.42 Punkte

๐Ÿ“Œ YouTube-MP3 Ripping Site Sued By IFPI, RIAA and BPI


๐Ÿ“ˆ 24.42 Punkte

๐Ÿ“Œ Banana Pi BPI-M6: Leistungsstarker Raspi-Konkurrent


๐Ÿ“ˆ 24.42 Punkte

๐Ÿ“Œ Banana Pi BPI-M6: Leistungsstarker Raspi-Konkurrent


๐Ÿ“ˆ 24.42 Punkte

๐Ÿ“Œ Banana Pi BPI-F3: Ein Bastelboard fรผr extreme Temperaturen


๐Ÿ“ˆ 24.42 Punkte

๐Ÿ“Œ Banana Pi BPI-Wifi6: Ein Wi-Fi-6-Router fรผr 35 Euro


๐Ÿ“ˆ 24.42 Punkte

๐Ÿ“Œ Windows 10 Anniversary SDK is bringing exciting opportunities to developers


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Bringing Pori Fashion Show to Windows Using Windows Bridge for iOS


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Bringing HTTPS to all blogspot domain blogs


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Bringing your existing desktop apps to UWP


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Bringing Minecraft: Story Mode to the Universal Windows Platform (UWP) and Windows Store


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Recap: Bringing your UWP apps to Xbox โ€“ App Dev on Xbox Twitter Chat


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Bringing 3D to everyone through open standards


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Bringing HSTS to www.google.com


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Microsoft Is Bringing Its Top Windows Game on iPhones and Android Devices


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Microsoft Is Bringing WebVR To Microsoft Edge On Windows 10


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ IBM X-Force IRIS: Bringing a New Approach to Incident Response


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Google Play Starts Bringing Android Apps To Chromebooks


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Windows 10 Redstone 2 Officially Bringing Back Top Windows 8 Feature


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Microsoft is Bringing Visual Studio To Mac


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Windows 10 Anniversary SDK is bringing exciting opportunities to developers


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Bringing Pori Fashion Show to Windows Using Windows Bridge for iOS


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Facebook Is Bringing Games Like Pac-Man, Space Invaders To Messenger and Your News Feed


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Conversion options for bringing your existing desktop app to the Universal Windows Platform using the Desktop Bridge


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Microsoftโ€™s Bringing Skype in Volvo Cars


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Bringing your existing desktop apps to UWP


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Bringing Minecraft: Story Mode to the Universal Windows Platform (UWP) and Windows Store


๐Ÿ“ˆ 12.02 Punkte

๐Ÿ“Œ Recap: Bringing your UWP apps to Xbox โ€“ App Dev on Xbox Twitter Chat


๐Ÿ“ˆ 12.02 Punkte











matomo