#+TODO: TODO(t) IN-PROGRESS DONE #+STARTUP: indent #+STARTUP: beamer #+OPTIONS: ':nil *:t -:t ::t <:t H:2 \n:nil ^:t arch:headline #+OPTIONS: author:t broken-links:nil c:nil creator:nil #+OPTIONS: d:(not "LOGBOOK") date:t e:t email:nil f:t inline:t num:t #+OPTIONS: p:nil pri:nil prop:nil stat:t tags:t tasks:t tex:t #+OPTIONS: timestamp:t title:t toc:t todo:t |:t #+TITLE: Logs and Backtraces #+SUBTITLE: How to provide meaningful problem reports #+DATE: 2023-08-06 #+AUTHOR: Guido Günther #+EMAIL: agx@sigxcpu.org #+LANGUAGE: en #+SELECT_TAGS: export #+EXCLUDE_TAGS: noexport #+CREATOR: Emacs 25.2.2 (Org mode 9.0.9) #+latex_header: \mode{\usetheme{Copenhagen}} #+latex_header: \setbeamertemplate{headline}{} #+latex_header: \AtBeginSection[]{\begin{frame}\frametitle{Topic}\tableofcontents[currentsection]\end{frame}} #+latex_header: \hypersetup{ linkcolor=blue, pdfborder={0 0 0 0}} #+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %4BEAMER_col(Col) %10BEAMER_extra(Extra) * Introduction ** About me :PROPERTIES: :UNNUMBERED: t :END: - Debian Developer - phosh / GNOME contributor - Freelancing Free Software Developer - Working with Purism on the Librem 5 Phone ** This can happen #+ATTR_LATEX: :height 0.8\textheight [[./img/fail-whale.png]] ** Providing meaningful information, why? - You want the issue fixed as it impacts you badly - You want to be close to the fix to apply it quickly ** Providing meaningful information, why? - You want to know more on how the system works - You want to move out of the consumer seat ** Providing meaningful information, why? - Developer attention can be limited (focus on fixable bugs, assume good faith) ** You might know more than you think - Not a proprietary device #+BEAMER: \pause #+ATTR_LATEX: :height 0.4\textheight [[./img/Windows_9X_BSOD.png]] You can (and are invited to) look at everything. Problem is *where* to look. #+BEAMER: \pause - Developers *want* you to find information ** You might know more than you think - Phone → Desktop → Server If you know how to debug Linux desktops or server you know a lot of tools already ** You might know more than you think - command line: htop, netstat, powertop, strace, ls{cpu,mem,...}, dbus-monitor, gsettings, … Recommended: Michael Prokop's [[https://media.ccc.de/v/glt23-334-debugging-fr-sysadmins][Debugging for Sysadmin's talk]] ** You might know more than you think - GUI dconf-editor, d-feet, bustle, sysprof, … #+ATTR_LATEX: :height 0.4\textheight [[./img/bustle.png]] ** But how? 1. What is the next bit of information that I can extract that helps me to pinpoint the cause? 2. Write that down #+BEAMER: \pause Then go back to one ** But how? - Check the bugtracker Once you have an idea what the component is. #+BEAMER: \pause Gitlab can sort by recent activity! #+ATTR_LATEX: :height 0.4\textheight [[./img/sort-bugs.png]] ** But how? *** Try rubberducking #+ATTR_LATEX: :height 0.5\textheight [[./img/Rubber_duck_assisting_with_debugging.jpg]] #+BEAMER: \pause A bug report can be a rubber duck * Diving in ** High level overview #+begin_src ditaa :file _build/session.png :cmdline -s 3 -o :exports results +------------+---------------+--------------+ | Phoc | Phosh | Squeekboard | +------------+---------------+--+-----------+ |c7AD gnome—settings—daemon | feedbackd | +-------------------------------+-----------+ |c7AD gnome—session | +-------------------------------------------+ |c777 policykit, upower, iio—s—p | +-------------------------------------------+ |c777 NetworkManager, ModemManager | +-------------------------------------------+ |c555 Linux kernel | +-------------------------------------------+ #+end_src #+ATTR_LATEX: :height 0.4\textheight #+RESULTS: [[file:_build/session.png]] ** High level overview - There's a user session and system services - DBus is prevalent for inter process communication - There's two: user and system - Get an overview with d-feet or #+begin_example busctl --system --list busctl --user --list #+end_example ** High level overview Hence there's two systemd daemons: - =/sbin/init= - =systemd --user= Get an overview with #+begin_example systemctl --user status systemctl --system status systemctl --user --failed systemctl --system --failed #+end_example *** --system also contains dmesg :B_note: :PROPERTIES: :BEAMER_env: note :END: ** High level overview When you see a failure: system or user session? ** Log output - Getting Log Output No standard way on Linux. But #+BEAMER: \pause #+begin_example journalctl --user journalctl --system #+end_example #+BEAMER: \pause See a pattern? ** Log output - Get more output: Startup Time - Environment variables #+begin_example G_MESSAGES_DEBUG=all /usr/bin/gnome-calculator #+end_example - Command line optons #+begin_example chatty -vvvv #+end_example ** Log output - Get more output: Runtime #+begin_example kill -SIGUSR1 $(pidof /usr/libexec/phosh) #+end_example What works *should* be in the application's `README`. ** Log output - Don't cite but copy information: Details are important #+begin_quote "It says something like..." #+end_quote - Provide the *complete* output It might not make sense to you or me but maybe to others. ** Log output - But there's more details #+begin_example journalctl -o json | jq --unbuffered 'select(.GLIB_DOMAIN == "phosh-lockscreen")' #+end_example #+begin_src { "_AUDIT_LOGINUID": "1000", "__REALTIME_TIMESTAMP": "1691147633303845", "CODE_FUNC": "load_background", "MESSAGE": "Ignoring XML background 'file:///usr/share/phosh/backgrounds/logo.xml'", "CODE_FILE": "../src/lockscreen.c", "_COMM": "phosh", "GLIB_DOMAIN": "phosh-lockscreen", } #+end_src ** Log output - The above gives you a good idea where to look at - Can be educating as not (yet) programmer - Will likely give a clue if you're on the right track - For usable log domains: #+begin_src gbp import-dsc apt:phosh git grep G_LOG_DOMAIN #+end_src #+begin_example src/call-notification.c:#define G_LOG_DOMAIN "phosh-call-notification" src/call.c:#define G_LOG_DOMAIN "phosh-call" src/calls-manager.c:#define G_LOG_DOMAIN "phosh-calls-manager" src/connectivity-info.c:#define G_LOG_DOMAIN "phosh-connectivity-info" src/docked-info.c:#define G_LOG_DOMAIN "phosh-docked-info" src/docked-manager.c:#define G_LOG_DOMAIN "phosh-docked-manager" #+end_example ** Crashes *** What is a crash - Operating system ends the process: killed by a signal (usually SIGSEGV/11) - Program hits an internal assertion (SIGABRT/6) - Process exits (no crash dump) ** Crashes *** What is a crash or core dump - Memory image and stack of the process at crash time - Debugger gets a call stack - Needs to be enabled - Devices like GPUs can create their own core dumps ** Crashes - Investigate crash dumps again: first check the journal #+begin_example Aug 06 11:57:45 foo systemd-coredump[6148]: [.] Process 6092 (gnome-calculato) of user 1000 dumped core. #+end_example ** Crashes - Investigate crash dumps Now we're onto something #+begin_example coredumpctl list #+end_example #+begin_example TIME PID UID GID SIG COREFILE EXE Mon 2023-07-31 15:18:30 CEST 355561 1000 1000 SIGABRT present /usr/bin/phoc Mon 2023-07-31 15:18:30 CEST 355573 1000 1000 SIGABRT present /usr/bin/phoc Mon 2023-07-31 15:18:30 CEST 355576 1000 1000 SIGABRT present /usr/bin/phoc #+end_example - Investigate crash dump #+begin_example echo bt | coredumpctl debug 355576 #+end_example ** Crashes - Debug information makes the backtrace more meaningful - Debug info packages or debuginfod - Compile options =-Dbuilttype=debug= ** Crashes - Demo *** Demo this :B_note: :PROPERTIES: :BEAMER_env: note :END: First demo phosh's kill -SIGUSR1 and =G_MESSAGES_DEBUG= ** Crashes #+begin_example dpkg -S libglib-2.0.so.0 #+end_example #+begin_example echo "deb http://deb.debian.org/debian-debug/ sid-debug main" > /etc/apt/sources.list.d/debuginfo.list apt update apt install libglib2.0-0-dbgsym #+end_example ** Crashes - Again: Use the source ** Crashes - In case of Segmentation faults (SIGSEGV) Might just be a symptom: =valgrind= ** Practical contains - Replicate on desktop - Run phosh nested on desktop [[https://phosh.mobi/posts/phosh-dev-part-0/][(Phosh Nested blog post]]) ** Practical concerns - Get yourself non-osk access #+begin_example apt install ssh systemctl disable ssh #+end_example When needed #+begin_example systemctl start ssh systemctl stop ssh #+end_example ** Practical concerns - Use usb-gadget Connecting via USB to laptop should give you a shell: #+begin_example Bus 001 Device 010: ID 1d6b:0104 Linux Foundation Multifunction Composite Gadget #+end_example #+begin_example screen /dev/ttyACM0 115200 #+end_example ** Practical concerns - Capturing early boot output: Serial console #+ATTR_LATEX: :height 0.4\textheight [[./img/breakout_board_2.png]] *** Mast have for suspend/resume debugging :B_note: :PROPERTIES: :BEAMER_env: note :END: ** What else - Tracing (e.g. systemtap) - /sys/kernel/debug/tracing #+begin_example echo '1' > /sys/kernel/debug/tracing/\ events/tps6598x/enable echo '1' > /sys/kernel/debug/tracing/tracing_on cat /sys/kernel/debug/tracing/trace_pipe #+end_example *** Allows you to e.g. trace USB PD events :B_note: :PROPERTIES: :BEAMER_env: note :END: * Reporting Bugs ** Use the bug tracker Keeps more consistent record than chat. Chat might be good to get an idea where to file the bug. ** Avoid - Dismissive wording (annoying, totally broken, …) (Developers are humans too, assume good faith) - This is wrong because xyz makes it differently. It has to be done because foo does the same ** Avoid - plain: I'm seeing this too #+BEAMER: \pause Add device information, what varies from what the original reporter experienced. Add the details you figured out. ** Examples #+begin_quote After updating gnome-calls from *version* to *version* on my foo phone it crashes when I do *x*. This is the backtrace. #+end_quote ** What baseline information can I provide? - Device - Operating System and Version - Software versions of the relevant components ** What baseline information can I provide? *** phosh-mobile-settings #+ATTR_LATEX: :height 0.7\textheight [[./img/p-m-s-troubleshoot.png]] ** How to validate the fix? - Build yourself. (Might be easier than you think thanks to meson) - Development builds - Ask your distro maintainers for help (they might already provide nightly builds somewhere) ** Summary - Bug chasing can be exciting - Use the logs - Try to identify affected component - Look at crash/core dumps - There's usually always more information - Provide baseline information * Thanks ** Contact - Mail: - Fediverse: agx@librem.one - Matrix: @agx:librem.one - IRC OFTC: agx * Appendix :B_appendix: :PROPERTIES: :BEAMER_env: appendix :END: ** Image/Video references - cc-by-sa-3.0: - Screenshots: myself using https://gitlab.gnome.org/World/Phosh/phosh - Rubberduck: Tom Morris https://commons.wikimedia.org/wiki/File:Rubber_duck_assisting_with_debugging.jpg - Public Domain - Window 9X BSOD: Akhristov ** Links - Michael Prokop's: Debugging for Sysadmin's talk: https://media.ccc.de/v/glt23-334-debugging-fr-sysadmins - Phosh Nested blog post: https://phosh.mobi/posts/phosh-dev-part-0/