summaryrefslogtreecommitdiff
path: root/talk.org
blob: 64f73aeeb02a2e50ccb8525313981d66e00f2bc9 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
#+TODO: TODO(t) IN-PROGRESS DONE
#+STARTUP: indent
#+STARTUP: beamer
#+OPTIONS: ':nil *:t -:t ::t <:t H:2 \n:nil ^:t arch:headline
#+OPTIONS: author:t broken-links:nil c:nil creator:nil
#+OPTIONS: d:(not "LOGBOOK") date:t e:t email:nil f:t inline:t num:t
#+OPTIONS: p:nil pri:nil prop:nil stat:t tags:t tasks:t tex:t
#+OPTIONS: timestamp:t title:t toc:t todo:t |:t
#+TITLE: Logs and Backtraces
#+SUBTITLE: How to provide meaningful problem reports
#+DATE: 2023-08-06
#+AUTHOR: Guido Günther
#+EMAIL: agx@sigxcpu.org
#+LANGUAGE: en
#+SELECT_TAGS: export
#+EXCLUDE_TAGS: noexport
#+CREATOR: Emacs 25.2.2 (Org mode 9.0.9)
#+latex_header: \mode<beamer>{\usetheme{Copenhagen}}
#+latex_header: \setbeamertemplate{headline}{}
#+latex_header: \AtBeginSection[]{\begin{frame}<beamer>\frametitle{Topic}\tableofcontents[currentsection]\end{frame}}
#+latex_header: \hypersetup{ linkcolor=blue, pdfborder={0 0 0 0}}
#+COLUMNS: %40ITEM %10BEAMER_env(Env) %9BEAMER_envargs(Env Args) %4BEAMER_col(Col) %10BEAMER_extra(Extra)

* Introduction

** About me
:PROPERTIES:
:UNNUMBERED: t
:END:

- Debian Developer
- phosh / GNOME contributor
- Freelancing Free Software Developer
- Working with Purism on the Librem 5 Phone

** This can happen

#+ATTR_LATEX: :height 0.8\textheight
[[./img/fail-whale.png]]

** Providing meaningful information, why?

- You want the issue fixed as it impacts you badly
- You want to be close to the fix to apply it quickly

** Providing meaningful information, why?

- You want to know more on how the system works
- You want to move out of the consumer seat

** Providing meaningful information, why?

- Developer attention can be limited

  (focus on fixable bugs, assume good faith)

** You might know more than you think

- Not a proprietary device

  #+BEAMER: \pause

  #+ATTR_LATEX: :height 0.4\textheight
  [[./img/Windows_9X_BSOD.png]]

  You can (and are invited to) look at everything.
  Problem is *where* to look.

#+BEAMER: \pause
- Developers *want* you to find information

** You might know more than you think

- Phone → Desktop → Server

 If you know how to debug Linux desktops or server you know a lot
 of tools already

** You might know more than you think

- command line:

  htop, netstat, powertop, strace,  ls{cpu,mem,...},
  dbus-monitor, gsettings, …

  Recommended: Michael Prokop's [[https://media.ccc.de/v/glt23-334-debugging-fr-sysadmins][Debugging for Sysadmin's talk]]

** You might know more than you think

- GUI

  dconf-editor, d-feet, bustle, sysprof, …

  #+ATTR_LATEX: :height 0.4\textheight
  [[./img/bustle.png]]

** But how?

1. What is the next bit of information that I can
   extract that helps me to pinpoint the cause?

2. Write that down

   #+BEAMER: \pause
   Then go back to one

** But how?

- Check the bugtracker

  Once you have an idea what the component is.

  #+BEAMER: \pause
  Gitlab can sort by recent activity!

  #+ATTR_LATEX: :height 0.4\textheight
  [[./img/sort-bugs.png]]

** But how?

*** Try rubberducking

#+ATTR_LATEX: :height 0.5\textheight
[[./img/Rubber_duck_assisting_with_debugging.jpg]]

  #+BEAMER: \pause
  A bug report can be a rubber duck

* Diving in

** High level overview

#+begin_src ditaa :file _build/session.png :cmdline -s 3 -o :exports results

   +------------+---------------+--------------+
   |    Phoc    |     Phosh     |  Squeekboard |
   +------------+---------------+--+-----------+
   |c7AD gnome—settings—daemon     | feedbackd |
   +-------------------------------+-----------+
   |c7AD             gnome—session             |
   +-------------------------------------------+
   |c777      policykit, upower, iio—s—p       |
   +-------------------------------------------+
   |c777     NetworkManager, ModemManager      |
   +-------------------------------------------+
   |c555           Linux kernel                |
   +-------------------------------------------+
#+end_src

#+ATTR_LATEX: :height 0.4\textheight
#+RESULTS:
[[file:_build/session.png]]


** High level overview

- There's a user session and system services
- DBus is prevalent for inter process communication
- There's two: user and system
- Get an overview with d-feet or

   #+begin_example
   busctl --system --list
   busctl --user --list
   #+end_example

** High level overview

Hence there's two systemd daemons:

- =/sbin/init=
- =systemd --user=

Get an overview with

#+begin_example
systemctl --user status
systemctl --system status

systemctl --user --failed
systemctl --system  --failed
#+end_example

*** --system also contains dmesg                                   :B_note:
:PROPERTIES:
:BEAMER_env: note
:END:


** High level overview

When you see a failure: system or user session?

** Log output

- Getting Log Output

  No standard way on Linux. But

  #+BEAMER: \pause
  #+begin_example
  journalctl --user
  journalctl --system
  #+end_example

  #+BEAMER: \pause
  See a pattern?

** Log output

- Get more output: Startup Time

    - Environment variables

  #+begin_example
  G_MESSAGES_DEBUG=all /usr/bin/gnome-calculator
  #+end_example

    - Command line optons

   #+begin_example
   chatty -vvvv
   #+end_example

** Log output

- Get more output: Runtime

  #+begin_example
  kill -SIGUSR1 $(pidof /usr/libexec/phosh)
  #+end_example

What works *should* be in the application's `README`.

** Log output

- Don't cite but copy information: Details are important

#+begin_quote
"It says something like..."
#+end_quote

- Provide the *complete* output

  It might not make sense to you or me but maybe to others.

** Log output

- But there's more details

  #+begin_example
  journalctl -o json |
    jq  --unbuffered
      'select(.GLIB_DOMAIN == "phosh-lockscreen")'
  #+end_example

#+begin_src
{
  "_AUDIT_LOGINUID": "1000",
  "__REALTIME_TIMESTAMP": "1691147633303845",
  "CODE_FUNC": "load_background",
  "MESSAGE": "Ignoring XML background
    'file:///usr/share/phosh/backgrounds/logo.xml'",
  "CODE_FILE": "../src/lockscreen.c",
  "_COMM": "phosh",
  "GLIB_DOMAIN": "phosh-lockscreen",
}
#+end_src

** Log output

- The above gives you a good idea where to look at
- Can be educating as not (yet) programmer
- Will likely give a clue if you're on the right track
- For usable log domains:

  #+begin_src
  gbp import-dsc apt:phosh
  git grep G_LOG_DOMAIN
  #+end_src

  #+begin_example
src/call-notification.c:#define G_LOG_DOMAIN "phosh-call-notification"
src/call.c:#define G_LOG_DOMAIN "phosh-call"
src/calls-manager.c:#define G_LOG_DOMAIN "phosh-calls-manager"
src/connectivity-info.c:#define G_LOG_DOMAIN "phosh-connectivity-info"
src/docked-info.c:#define G_LOG_DOMAIN "phosh-docked-info"
src/docked-manager.c:#define G_LOG_DOMAIN "phosh-docked-manager"
  #+end_example

** Crashes

*** What is a crash

  - Operating system ends the process: killed by a signal (usually SIGSEGV/11)
  - Program hits an internal assertion (SIGABRT/6)

  - Process exits (no crash dump)

** Crashes

*** What is a crash or core dump

- Memory image and stack of the process at crash time
- Debugger gets a call stack
- Needs to be enabled
- Devices like GPUs can create their own core dumps

** Crashes

- Investigate crash dumps

  again: first check the journal

  #+begin_example
  Aug 06 11:57:45 foo systemd-coredump[6148]:
      [.] Process 6092 (gnome-calculato) of user
          1000 dumped core.
  #+end_example

** Crashes

- Investigate crash dumps

  Now we're onto something

  #+begin_example
  coredumpctl list
  #+end_example

  #+begin_example
TIME                             PID  UID  GID SIG
    COREFILE EXE
Mon 2023-07-31 15:18:30 CEST  355561 1000 1000 SIGABRT
    present  /usr/bin/phoc
Mon 2023-07-31 15:18:30 CEST  355573 1000 1000 SIGABRT
    present  /usr/bin/phoc
Mon 2023-07-31 15:18:30 CEST  355576 1000 1000 SIGABRT
    present  /usr/bin/phoc
  #+end_example

- Investigate crash dump

  #+begin_example
  echo bt | coredumpctl debug 355576
  #+end_example

** Crashes

- Debug information makes the backtrace more meaningful
- Debug info packages or debuginfod
- Compile options =-Dbuilttype=debug=

** Crashes

- Demo

*** Demo this                                                      :B_note:
:PROPERTIES:
:BEAMER_env: note
:END:

First demo phosh's kill -SIGUSR1 and =G_MESSAGES_DEBUG=

** Crashes

#+begin_example
dpkg -S libglib-2.0.so.0
#+end_example

#+begin_example
echo "deb http://deb.debian.org/debian-debug/ sid-debug main"
    > /etc/apt/sources.list.d/debuginfo.list
apt update
apt install libglib2.0-0-dbgsym
#+end_example

** Crashes

- Again: Use the source

** Crashes

- In case of Segmentation faults (SIGSEGV)
  Might just be a symptom: =valgrind=

** Practical contains

- Replicate on desktop
- Run phosh nested on desktop
  [[https://phosh.mobi/posts/phosh-dev-part-0/][(Phosh Nested blog post]])

** Practical concerns

- Get yourself non-osk access

  #+begin_example
  apt install ssh
  systemctl disable ssh
  #+end_example

  When needed

  #+begin_example
  systemctl start ssh
  systemctl stop ssh
  #+end_example

** Practical concerns

- Use usb-gadget

  Connecting via USB to laptop should give you a shell:

  #+begin_example
  Bus 001 Device 010: ID 1d6b:0104 Linux Foundation Multifunction Composite Gadget
  #+end_example

  #+begin_example
  screen /dev/ttyACM0 115200
  #+end_example

** Practical concerns

- Capturing early boot output: Serial console

  #+ATTR_LATEX: :height 0.4\textheight
  [[./img/breakout_board_2.png]]

*** Mast have for suspend/resume debugging                         :B_note:
:PROPERTIES:
:BEAMER_env: note
:END:

** What else

- Tracing (e.g. systemtap)
- /sys/kernel/debug/tracing

  #+begin_example
  echo '1' > /sys/kernel/debug/tracing/\
                    events/tps6598x/enable
  echo '1' > /sys/kernel/debug/tracing/tracing_on
  cat /sys/kernel/debug/tracing/trace_pipe
  #+end_example

*** Allows you to e.g. trace USB PD events                         :B_note:
:PROPERTIES:
:BEAMER_env: note
:END:

* Reporting Bugs

** Use the bug tracker

Keeps more consistent record than chat. Chat might be good to get an idea
where to file the bug.

** Avoid

- Dismissive wording (annoying, totally broken, …)
  (Developers are humans too, assume good faith)

- This is wrong because xyz makes it differently.
  It has to be done because foo does the same

** Avoid

- plain: I'm seeing this too

  #+BEAMER: \pause
  Add device information, what varies from what the original reporter
  experienced. Add the details you figured out.

** Examples

#+begin_quote
After updating gnome-calls from *version* to *version* on my foo phone
it crashes when I do *x*. This is the backtrace.
#+end_quote


** What baseline information can I provide?

- Device
- Operating System and Version
- Software versions of the relevant components

** What baseline information can I provide?

*** phosh-mobile-settings

#+ATTR_LATEX: :height 0.7\textheight
[[./img/p-m-s-troubleshoot.png]]

** How to validate the fix?

- Build yourself.
  (Might be easier than you think thanks to meson)
- Development builds
- Ask your distro maintainers for help
  (they might already provide nightly builds somewhere)

** Summary

- Bug chasing can be exciting
- Use the logs
- Try to identify affected component
- Look at crash/core dumps
- There's usually always more information
- Provide baseline information

* Thanks

** Contact
- Mail: <agx@sigxcpu.org>
- Fediverse: agx@librem.one
- Matrix: @agx:librem.one
- IRC OFTC: agx

* Appendix							 :B_appendix:
  :PROPERTIES:
  :BEAMER_env: appendix
  :END:

** Image/Video references

- cc-by-sa-3.0:
  - Screenshots: myself using https://gitlab.gnome.org/World/Phosh/phosh
  - Rubberduck: Tom Morris https://commons.wikimedia.org/wiki/File:Rubber_duck_assisting_with_debugging.jpg
- Public Domain
  - Window 9X BSOD: Akhristov

** Links

 - Michael Prokop's: Debugging for Sysadmin's talk: https://media.ccc.de/v/glt23-334-debugging-fr-sysadmins
 - Phosh Nested blog post: https://phosh.mobi/posts/phosh-dev-part-0/