Baloo/Debugging: Difference between revisions

From KDE Community Wiki
 
(10 intermediate revisions by 3 users not shown)
Line 2: Line 2:


If you're in doubt, and not sure what is going on, please just file a bug and provide whatever information you feel may be appropriate. The developers will ask for more information and provide steps for you to follow.
If you're in doubt, and not sure what is going on, please just file a bug and provide whatever information you feel may be appropriate. The developers will ask for more information and provide steps for you to follow.
== Seeing debug output ==
The baloo and kfilemetadata code logs (selectively prints) informative messages that may be useful using [https://doc.qt.io/qt-5/qloggingcategory.html#configuring-categories Qt's logging facility]. One way to turn on this logging (there are [[Guidelines_and_HOWTOs/Debugging/Using_Error_Messages#Controlling_Messages|many others]]) is to create or modify the file <code>$HOME/.config/QtProject/qtlogging.ini</code> and add the following lines
<pre>
[Rules]
kf.filemetadata=true
kf.baloo=true
</pre>
restart baloo and/or <kbd>balooctl monitor</kbd> and you will see these messages where log messages go on your system.


== Baloo is consuming too much CPU / RAM / Disk IO ? ==
== Baloo is consuming too much CPU / RAM / Disk IO ? ==
The Baloo Project is responsible for many different parts of KDE. The developers need to know exactly which component is problematic. It would be best if you checked what the offending process is called.
The Baloo Project is responsible for many different parts of KDE. The developers need to know exactly which component is problematic. It would be best if you checked what the offending process is called.
KSysGuard and Plasma System Monitor should be able to tell CPU, RAM and disk usage for baloo processes, but KSysGuard requires you to manually add disk usage columns.


=== <tt>baloo_file</tt> ===  
=== <tt>baloo_file</tt> ===  


This process is responsible for scheduling the indexing of files and save the name of the file. If this process is consuming too much CPU or Disk IO, it's probably due to the initial indexing. Based on your hard drive speed, baloo_file is able to do somewhere between 100 to 1000 files per second. Depending on how many files you have, it could take a couple of minutes. It would be best to just wait for a couple of minutes.
This process is responsible for scheduling the indexing of files and save the name of the file. If this process is consuming too much CPU or Disk IO, it's probably due to the initial indexing. Based on your hard drive speed, baloo_file is able to do somewhere between 100 to 1000 files per second. Depending on how many files you have, it could take a couple of minutes. It would be best to just wait for a couple of minutes.
You can check which files are currently being scanned with <tt>balooctl monitor</tt> and the current indexing status with <tt>balooctl status</tt>.


If you feel that you have been waiting for quite some time, and the <tt>baloo_file</tt> process is still consuming too much CPU or disk IO, please file a bug.
If you feel that you have been waiting for quite some time, and the <tt>baloo_file</tt> process is still consuming too much CPU or disk IO, please file a bug.


=== <tt>baloo_file_extractor</tt> ===
== Baloo does not find a file ==
This process is responsible for looking into the contents of the file, and indexing it. It can cause a reasonable amount of disk and cpu io depending on the type of files you have. This extractor process is generally commanded to index files in batches of 40.
If <kbd>baloosearch</kbd> or Dolphin Search > Content does not find a file that should be indexed, the rough steps to take in a terminal are
 
* <kbd>balooctl status</kbd> - is Baloo running?
You will commonly see it running as follows - <tt>baloo_file_extractor 2343 2344 2345</tt>. These numbers indicate the files that it is currently processing. If you feel that the extractor is consuming too much cpu / ram / disk usage, it is probably because one of these files.
* <kbd>balooshow -x ''/path/to/file</kbd> to see if Baloo knows about the file
 
* Check your [[../Configuration |  Baloo configuration]] to make sure you have directed Baloo to index or not index that directory
Each file is represented by a unique number. This number can be resolved into the corresponding URL by running the <tt>balooshow</tt> command on them. Eg - <tt>balooshow 2243</tt>.
* See if Baloo indexes a simple file in the same location:
 
** Run <kbd>balooctl monitor</kbd>
When reporting a bug regarding the extractor, it is important that offending file be tracked down. This can be done as follows.
** Create a simple file, e.g. <kbd>echo balooTestMe > ''/path/to/testfile''.txt</kbd>
 
** see if <code>testfile.txt</code> is indexed (<kbd>baloosearch balooTestMe</kbd>, try same steps above)
# Run <tt>ps aux | grep baloo_file_extractor</tt> and note down the numbers at the end of the process.
* If baloo does not, tell it to reindex with <kbd>balooctl index ''/path/to/file''</kbd>
# Kill Baloo via <tt>killall baloo_file && killall baloo_file_extractor</tt>
* Check the mime type of the file with <kbd>kfilemeta
# Launch baloo_file_extractor on any one of those numbers.
* Turn on debug info as described here.
# Check if baloo_file_extractor is still causing problems
* Clear the file from the index with <kbd>balooctl clear ''/path/to/file''</kbd> and re-index it.
## If it is not, goto (3), and try again with another file number
* ...
## If it is causing problems, then file a NEW bug on bugzilla and upload the offending file.
 
=== <tt>baloo_file_cleaner</tt> ===
The cleaner process is responsible for cleaning out invalid files from the index. This process is only called when the system configuration for Baloo is modified. If you feel that this process is consuming too much CPU or disk IO, please report a bug immediately.
 
On disabling Baloo, the baloo_file_cleaner would take some time in cleaning the database in 4.13.0. This has been fixed with 4.13.1, it now just directly deletes the database files.
 
=== <tt>akonadi_baloo_indexer</tt> ===
The Akonadi part of Baloo is completely separate from the file parts. If you feel that this process is consuming too much CPU usage, it may just be the initial indexing. Based on our tests, it can easily index 100000 emails in under 10 minutes. If after 10 minutes, the system is still choking up, please report a bug.
 
= My Files were not found in the search results =
 
Baloo provides simple command line tools to directly search in its index. It is best to cross check with one of these tools. That way the developers know if the problem is with baloo or with the search interface.


=== <tt>balooshow</tt> ===
=== <tt>balooshow</tt> ===


The BalooShow tool is used to print debug information about a file.  
<tt>balooshow</tt> prints debug information about a file.  


     $ balooshow /home/vishesh/file.jpg
     $ balooshow /home/vishesh/file.jpg
Line 51: Line 51:
             Photo X Dimension: 955  
             Photo X Dimension: 955  


Please run this tool on the file which is not being found. It should print information similar to what is being printed above. If no information is printed, then please check the following
If <tt>baloosearch <i>search terms</i></tt> does not find a file that should match, then run <tt>balooshow -x <i>/path/to/file</i></tt> for the file which Baloo does not return. It should print information similar to what is being printed above, including the file information and terms in the file that Baloo should have indexed. If no information is printed, then please check the following


# The file is not in the list of exclude folders
* The file is not in Baloo's list of exclude folders
# The baloo_file process is running.
* The <tt>baloo_file</tt> process is running.
 
=== Find by filename ===
<tt>balooshow -x</tt> displays additional information, including the terms that Dolphin's Find > by Filename search matches. For example,
 
  $ balooshow -x path/to/日本国_déjà_γενεος.txt`
  ...
  File Name Terms: Fdeja Ftxt Fγενεος deja txtConfiguration γενεος
 
If you type more than one character in the Dolphin file browser's find by Filename field, it searches these terms anchored at the start, thus searching for "de", "γε", etc. should return this file.


=== <tt>baloosearch</tt> ===
=== <tt>baloosearch</tt> ===


The Baloosearch tool can be used to perform simple searches.
The Baloosearch tool can be used to perform simple searches.
 
Configuration
   $ baloosearch fire
   $ baloosearch fire
     77040 /home/vishesh/kde5/src/qt5/qtimageformats/tests/shared/images/mng/fire.mng
     77040 /home/vishesh/kde5/src/qt5/qtimageformats/tests/shared/images/mng/fire.mng
Line 72: Line 81:
     52255 /home/vishesh/kde5/src/qt5/qtwebkit/Source/WebCore/platform/Timer.cpp
     52255 /home/vishesh/kde5/src/qt5/qtwebkit/Source/WebCore/platform/Timer.cpp


You should try searching for the file with this command. By default it only lists the first 10 results.
You should try searching for the file with this command.


If the file is not found even though it is indexed, then please file a bug with the exact search term, and the relevant words in the file.
If the file is not found even though it is indexed, then please file a bug with the exact search term, and the relevant words in the file.
== General testing procedure ==
This procedure should provide you a cozy method to check if '''baloo''' is experiencing issues as well as files to attach to your bug reports. It assumes you have a single file that has failed indexing for whatever reason.
* Open two '''Konsole''' windows and tile one to the left (A) and the other to the right (B);
* On '''Konsole''' A, run <code>balooctl monitor | tee -a baloomon.txt</code> and leave it running, this should leave a log of a full index by the end of this procedure;
* On '''Konsole''' B, run <code>balooctl status</code>
You should get output like the following if everything has already been indexed and one file failed:
{{Output|1=<nowiki>
Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 1.000
Files waiting for content indexing: 0
Files failed to index: 1
Current size of index is 10,00 MiB
</nowiki>}}
* If you do have a failed file, run <code>balooctl failed | tee -a baloofailed.txt</code> to show which file has failed on your terminal and append it to a file named <tt>baloofailed.txt</tt>. Then, manually check the file path and run <code>balooshow /path/to/faulty/file | tee -a baloofailed.txt</code> to append the debug information about the file;
* On '''Konsole''' B, run <code>balooctl purge</code>;
* See if there's anything on Konsole A running the monitor tool that seems suspicious, like if it's staggering at the failing file;
* By the end, on Konsole B, run <code>balooctl status</code> again, and on Konsole A cancel the monitoring with <keycap>Ctrl+C</keycap>.
Is the file still failing to index? Then the issue is consistently reproducible, please report it over https://bugs.kde.org.
If the file is now indexing correctly, then it's not a consistently reproducible issue, but this might still indicate something to be worked on, requiring further reporting steps. Please report the issue over https://bugs.kde.org and follow any instructions the developers might ask of you.
Remember to attach the generated files on your bug report!
{{Note | If you have manually enabled the new systemd initialization, then you may also use <nowiki>journalctl --unit plasma-baloorunner --output=cat > baloojournal.txt</nowiki> to create a log file of baloo.}}
=== Indexing a file directly ===
Using the above technique you can copy or modify a problematic file and watch for errors. You can also run <code>balooctl clear <em>/path/to/problemfile</em></code> followed by <code>balooctl index <em>/path/to/problemfile</em></code> to force a reindex and possibly see additional debug output.
== Determining content indexing duration ==
Here's a simple bash script to output and log the state and date of balooctl status immediately after purging the index in order to determine how fast your files get indexed:
<syntaxhighlight lang="bash" line>
#!/bin/bash
balooctl purge &>/dev/null
sleep 2
balooctl status | grep -B1 "indexing" | tee balootime.txt 2>/dev/null
date "+%T" | tee -a balootime.txt
while sleep 2; do
balooctl status | grep -B1 "indexing" | tee -a balootime.txt 2>/dev/null
date "+%T" | tee -a balootime.txt
done
</syntaxhighlight>
What it does:
* Purges the database
* Fetches only the lines "Total files indexed" and "Files waiting for content indexing", redirecting standard output to a file named <tt>balootime.txt</tt> and removes standard error
* Prints that and the current time continuously every two seconds
Since it ends on a loop, you'll need to press <keycap>Ctrl+C</keycap> once "Files waiting for content indexing" reaches 0. If you have over 10,000 files, you may want to change the loop time to 5 to 15 seconds instead.
For a general comparison, on a machine with an SSD and 1559 files to be indexed, immediately after a purge, 4 seconds are required to index all files (going from 0 to 1559 instantly) and 35 seconds are required to index the content of all files. These values might be different for you depending on your machine's specs, the number of total files, and their format.
In case you think your indexing is taking more time than it should, please refer to the above general testing procedure to determine whether you should file a bug report.

Latest revision as of 01:50, 29 November 2023

Baloo is responsible for searching through files. Given that every system is quite different, and many different files can have different results, there can be some issues with Baloo. This page aims at helping you debug what exactly is going on and how to report the issue appropriately.

If you're in doubt, and not sure what is going on, please just file a bug and provide whatever information you feel may be appropriate. The developers will ask for more information and provide steps for you to follow.

Seeing debug output

The baloo and kfilemetadata code logs (selectively prints) informative messages that may be useful using Qt's logging facility. One way to turn on this logging (there are many others) is to create or modify the file $HOME/.config/QtProject/qtlogging.ini and add the following lines

[Rules]
kf.filemetadata=true
kf.baloo=true

restart baloo and/or balooctl monitor and you will see these messages where log messages go on your system.

Baloo is consuming too much CPU / RAM / Disk IO ?

The Baloo Project is responsible for many different parts of KDE. The developers need to know exactly which component is problematic. It would be best if you checked what the offending process is called.

KSysGuard and Plasma System Monitor should be able to tell CPU, RAM and disk usage for baloo processes, but KSysGuard requires you to manually add disk usage columns.

baloo_file

This process is responsible for scheduling the indexing of files and save the name of the file. If this process is consuming too much CPU or Disk IO, it's probably due to the initial indexing. Based on your hard drive speed, baloo_file is able to do somewhere between 100 to 1000 files per second. Depending on how many files you have, it could take a couple of minutes. It would be best to just wait for a couple of minutes.

You can check which files are currently being scanned with balooctl monitor and the current indexing status with balooctl status.

If you feel that you have been waiting for quite some time, and the baloo_file process is still consuming too much CPU or disk IO, please file a bug.

Baloo does not find a file

If baloosearch or Dolphin Search > Content does not find a file that should be indexed, the rough steps to take in a terminal are

  • balooctl status - is Baloo running?
  • balooshow -x /path/to/file to see if Baloo knows about the file
  • Check your Baloo configuration to make sure you have directed Baloo to index or not index that directory
  • See if Baloo indexes a simple file in the same location:
    • Run balooctl monitor
    • Create a simple file, e.g. echo balooTestMe > /path/to/testfile.txt
    • see if testfile.txt is indexed (baloosearch balooTestMe, try same steps above)
  • If baloo does not, tell it to reindex with balooctl index /path/to/file
  • Check the mime type of the file with kfilemeta
  • Turn on debug info as described here.
  • Clear the file from the index with balooctl clear /path/to/file and re-index it.
  • ...

balooshow

balooshow prints debug information about a file.

   $ balooshow /home/vishesh/file.jpg
   4 /home/vishesh/file.jpg
           Width: 713
           Height: 955
           Photo X Dimension: 713
           Photo X Dimension: 955 

If baloosearch search terms does not find a file that should match, then run balooshow -x /path/to/file for the file which Baloo does not return. It should print information similar to what is being printed above, including the file information and terms in the file that Baloo should have indexed. If no information is printed, then please check the following

  • The file is not in Baloo's list of exclude folders
  • The baloo_file process is running.

Find by filename

balooshow -x displays additional information, including the terms that Dolphin's Find > by Filename search matches. For example,

 $ balooshow -x path/to/日本国_déjà_γενεος.txt`
 ...
 File Name Terms: Fdeja Ftxt Fγενεος deja txtConfiguration γενεος

If you type more than one character in the Dolphin file browser's find by Filename field, it searches these terms anchored at the start, thus searching for "de", "γε", etc. should return this file.

baloosearch

The Baloosearch tool can be used to perform simple searches. Configuration

 $ baloosearch fire
   77040 /home/vishesh/kde5/src/qt5/qtimageformats/tests/shared/images/mng/fire.mng
   237751 /home/vishesh/Videos/Catching Fire Epic Review Special Edition.mp4
   237788 /home/vishesh/Videos/The Hunger Games Catching Fire (2013) [1080p]
   237790 /home/vishesh/Videos/fire.mp4
   53907 /home/vishesh/kde5/src/qt5/qtwebkit/Source/WebCore/html/MediaController.cpp
   54051 /home/vishesh/kde5/src/qt5/qtwebkit/Source/WebCore/html/HTMLMediaElement.cpp
   52257 /home/vishesh/kde5/src/qt5/qtwebkit/Source/WebCore/platform/ThreadTimers.cpp
   52867 /home/vishesh/kde5/src/qt5/qtwebkit/Source/WebCore/loader/NavigationScheduler.cpp
   53568 /home/vishesh/kde5/src/qt5/qtwebkit/Source/WebCore/html/track/TrackListBase.cpp
   52255 /home/vishesh/kde5/src/qt5/qtwebkit/Source/WebCore/platform/Timer.cpp

You should try searching for the file with this command.

If the file is not found even though it is indexed, then please file a bug with the exact search term, and the relevant words in the file.

General testing procedure

This procedure should provide you a cozy method to check if baloo is experiencing issues as well as files to attach to your bug reports. It assumes you have a single file that has failed indexing for whatever reason.

  • Open two Konsole windows and tile one to the left (A) and the other to the right (B);
  • On Konsole A, run balooctl monitor | tee -a baloomon.txt and leave it running, this should leave a log of a full index by the end of this procedure;
  • On Konsole B, run balooctl status

You should get output like the following if everything has already been indexed and one file failed:

Baloo File Indexer is running
Indexer state: Idle
Total files indexed: 1.000
Files waiting for content indexing: 0
Files failed to index: 1
Current size of index is 10,00 MiB
  • If you do have a failed file, run balooctl failed | tee -a baloofailed.txt to show which file has failed on your terminal and append it to a file named baloofailed.txt. Then, manually check the file path and run balooshow /path/to/faulty/file | tee -a baloofailed.txt to append the debug information about the file;
  • On Konsole B, run balooctl purge;
  • See if there's anything on Konsole A running the monitor tool that seems suspicious, like if it's staggering at the failing file;
  • By the end, on Konsole B, run balooctl status again, and on Konsole A cancel the monitoring with Ctrl+C.

Is the file still failing to index? Then the issue is consistently reproducible, please report it over https://bugs.kde.org.

If the file is now indexing correctly, then it's not a consistently reproducible issue, but this might still indicate something to be worked on, requiring further reporting steps. Please report the issue over https://bugs.kde.org and follow any instructions the developers might ask of you.

Remember to attach the generated files on your bug report!

Note

If you have manually enabled the new systemd initialization, then you may also use journalctl --unit plasma-baloorunner --output=cat > baloojournal.txt to create a log file of baloo.


Indexing a file directly

Using the above technique you can copy or modify a problematic file and watch for errors. You can also run balooctl clear /path/to/problemfile followed by balooctl index /path/to/problemfile to force a reindex and possibly see additional debug output.

Determining content indexing duration

Here's a simple bash script to output and log the state and date of balooctl status immediately after purging the index in order to determine how fast your files get indexed:

#!/bin/bash
balooctl purge &>/dev/null
sleep 2
balooctl status | grep -B1 "indexing" | tee balootime.txt 2>/dev/null
date "+%T" | tee -a balootime.txt
while sleep 2; do
balooctl status | grep -B1 "indexing" | tee -a balootime.txt 2>/dev/null
date "+%T" | tee -a balootime.txt
done

What it does:

  • Purges the database
  • Fetches only the lines "Total files indexed" and "Files waiting for content indexing", redirecting standard output to a file named balootime.txt and removes standard error
  • Prints that and the current time continuously every two seconds

Since it ends on a loop, you'll need to press Ctrl+C once "Files waiting for content indexing" reaches 0. If you have over 10,000 files, you may want to change the loop time to 5 to 15 seconds instead.

For a general comparison, on a machine with an SSD and 1559 files to be indexed, immediately after a purge, 4 seconds are required to index all files (going from 0 to 1559 instantly) and 35 seconds are required to index the content of all files. These values might be different for you depending on your machine's specs, the number of total files, and their format.

In case you think your indexing is taking more time than it should, please refer to the above general testing procedure to determine whether you should file a bug report.