=============================================================================== "defragment.txt" CONTENTS (last updated 24 June 2006) =============================================================================== How to determine if POPFile's SQLite database is fragmented POPFile "Open Discussion" Forum Thread How to defragment the SQLite 2.x database used by POPFile Download Links =============================================================================== How to determine if POPFile's SQLite database is fragmented =============================================================================== The POPFile SQLite Database Analyser Utility's report shows many technical details, almost all of which mean nothing to me. However this report makes it easy to determine whether or not POPFile's SQLite 2.x database is fragmented. The SQLite 2.x database file is organized as a series of pages. Normally there are three types of page in the database file: the header page, pages used to store the data and the "freelist" pages. When items are deleted from the database, for example when messages are deleted from the HISTORY page in the UI, the database does not shrink; instead, the pages which held the deleted data are added to the freelist so they can be reused later on. This makes the database file larger than it needs to be, but this can speed up subsequent database inserts. However eventually inserts and deletes can leave the database file structure fragmented and this slows down disk access to the database contents. When this happens the database can be defragmented to improve things. Another problem is that in some (rare?) cases the database file can contain some pages which do not belong to any of the previously mentioned three types of page. These "orphaned" pages are wasted space: they make the database file bigger than it should be and since they are not included in the "freelist" SQLite cannot reuse them. The POPFile SQLite Database Status Check utility's report will list any orphaned pages found, e.g. "Page 21830 is never used". The first few lines of the "Disk-Space Utilization Report" produced by the SQLite Database Analyser indicate how much unused space (freelist and orphaned pages) is present in the database file. This can be used as a simple indication of database fragmentation. There are five key values to be considered when determining whether or not a SQLite 2.x database is fragmented: A - the number of pages in the whole file B - the number of pages that store data C - the number of free pages (the freelist) D - the number of header pages (normally 1) E - the number of orphaned pages, E = A - (B + C + D) The "Pages on the freelist (calculated)" entry in the analysis report contains a convenient percentage value which can be used as a guesstimate of the amount of fragmentation (Fragmentation Guesstimate Percentage, FGP): Example 1: A database which is not fragmented: Pages in the whole file (measured).... 2005 Pages in the whole file (calculated).. 2005 Pages that store data................. 2004 99.950% Pages on the freelist (per header).... 0 0.000% Pages on the freelist (calculated).... 0 0.000% Header pages.......................... 1 0.050% A = 2005 (B + C + D) = (2004 + 0 + 1) = 2005 E = A - (B + C + D) = 2005 - 2005 = 0 FGP = 0.000% Example 2: A database which is slightly fragmented: Pages in the whole file (measured).... 3084 Pages in the whole file (calculated).. 3084 Pages that store data................. 3013 97.7% Pages on the freelist (per header).... 70 2.3% Pages on the freelist (calculated).... 70 2.3% Header pages.......................... 1 0.032% A = 3084 (B + C + D) = (3013 + 70 + 1) = 3084 E = A - (B + C + D) = 3084 - 3084 = 0 FGP = 2.3% Example 3: A database which includes orphaned pages: Pages in the whole file (measured).... 34848 Pages in the whole file (calculated).. 34725 Pages that store data................. 33796 97.0% Pages on the freelist (per header).... 928 2.7% Pages on the freelist (calculated).... 1051 3.0% Header pages.......................... 1 0.003% A = 34848 (B + C + D) = (33796 + 928 + 1) = 34725 E = A - (B + C + D) = 34848 - 34725 = 123 FGP = 3.0% Having some free space in the database can speed up inserts so having a non-zero FGP is not always a bad thing. However having a non-zero number of unused pages is a bad thing (I think). Suggested rules: (1) Always defragment the database when there are orphaned pages, i.e. when E = A - (B + C + D) is non-zero (2) Defragment the database when the FGP is "too high" Unfortunately I have no idea what limit to use for the FGP and it is possible that using rule (2) is not a very good idea. Further research is required, especially with systems using larger databases and message loads. Replacing rule (2) with one which examines "table fragmentation" might be a better approach but at present I don't know how to determine "table fragmentation". =========================================================== POPFile "Open Discussion" Forum Thread =========================================================== These notes were produced in response to the "vacuum of PF db and performance" thread in POPFile's "Open Discussion" forum: http://sourceforge.net/forum/forum.php?thread_id=1513567&forum_id=213099 If you have any comments or suggestions for improvements, please contribute to this thread. =============================================================================== How to defragment the SQLite 2.x database used by POPFile =============================================================================== The database can be defragmented by using the VACUUM command in the SQLite command-line utility. This command cleans the database by copying its contents to a temporary database file and reloading the original database file from the copy. This eliminates free pages, aligns table data to be contiguous, and otherwise cleans up the database file structure. The Windows installer for POPFile creates a shortcut in POPFile's 'User Data' folder which makes it easy to do this. If you are not sure of the 'User Data' folder's location, the installer creates a shortcut which will display the location for you: Start -- Programs -- POPFile -- Support -- PFI Diagnostic utility (simple) Before you use the "Run SQLite utility" shortcut, it is recommended that you make a backup copy of the database file (the default filename is 'popfile.db') in case anything goes wrong! Double-clicking the "Run SQLite utility" shortcut in the 'User Data' folder will open a DOS-box like this: SQLite version 2.8.16 Enter ".help" for instructions sqlite> Enter the command VACUUM followed by a semi-colon then press the ENTER key. You can use uppercase or lowercase for the VACUUM command. It may take a few seconds (or longer if the database is very big) but soon the "sqlite>" prompt should reappear, as shown here: SQLite version 2.8.16 Enter ".help" for instructions sqlite> VACUUM; sqlite> Now enter the command .q (that is a full-stop followed by a lowercase letter Q) and press the ENTER key to exit from the utility and close the DOS-box: SQLite version 2.8.16 Enter ".help" for instructions sqlite> VACUUM; sqlite> .q =========================================================== Download Links =========================================================== POPFile SQLite Database Analyser for Windows (approx 700 KB): http://www.sugelan.co.uk/popfile/download/test/pfidbanalyser.zip SQLite Database Analyser for Linux (binary, approx 1.3 MB): http://www.sqlite.org/sqlite_analyzer-2.8.16.bin.gz Official SQLite download page (for source and binaries): http://www.sqlite.org/download.html =========================================================== Brian Smith My POPFile Utilities download page: http://www.sugelan.co.uk/popfile/utilities.html =========================================================== (end)