Difference between revisions of "chm"

From Free Pascal wiki
Jump to navigationJump to search
(19 intermediate revisions by 4 users not shown)
Line 1: Line 1:
Package CHM is a set of units distributed with FPC 2.2.2 and newer that allow to read/write .chm compressed html help files on multiple platforms. The package is still continued, and already 2.5.1 contains stuff that didn't make 2.4.2 (2.4.2 roughly is the state of the project late may 2010, with some more risky commits not merged)
+
== Overview ==
 +
The FPC CHM package is a set of units distributed with FPC 2.2.2+ that allow to read/write .chm compressed HTML help files on multiple platforms.
  
There are two projects that use the CHM package, the textmode IDE ( 2.2.4 and up), and the Lazarus [[chmhelp]] package (which features a separate helpsystem, connected over TCPIP). The original author is Andrew Haines, but some parts are by others (specially Lars/z505).
+
There are two projects that use the CHM package:
 +
# the textmode IDE (FPC 2.2.4+)
 +
# the Lazarus [[chmhelp]] package (which features lhelp, a separate help viewer, connected over TCPIP). The original author is Andrew Haines, but some parts are by others (specially Lars/z505).
  
The write aspect is mostly used in combination with fpdoc, an doxygen like documentation tool, but contrary to that of the strictly out-of-source kind, and chmcmd the commandline compiler.
+
The write aspect is mostly used in combination with [[FPDoc Editor|fpdoc]], a doxygen like documentation tool, and [[chmcmd]] the commandline compiler.
  
 
It is expected that in the next FPC release, chm will replace the doc-html archive with its thousands of separate htmls.
 
It is expected that in the next FPC release, chm will replace the doc-html archive with its thousands of separate htmls.
  
The package is pure pascal, and thus portable in principle:
+
The package is pure Pascal, and thus portable in principle:
 +
* The package is prepared for different endianness, but has not been extensively tested on Big endian systems
 +
* No testing has been done on systems that generate an exception on unaligned memory access (like arm-wince). Specially the reader parts should be made <code>aligned()</code> clean.
  
* Since Andrew works on 64-bit nowadays, x86_64 64-bit cleanness should work. PPC64 probably too (but see endian)
+
The package also packages two examples:
* The package is prepared for different endianess, but has not been extensively tested on Big endian systems
+
# chmls  
* No testing has been done on systems that generate an exception on unaligned memory access (like arm-wince). Specially the reader parts should be made  aligned() clean.
+
# [[chmcmd]] (see below)
 
 
The package also packages two examples, chmls and [[chmcmd]] (see below)
 
  
 
== Package status ==
 
== Package status ==
  
The basic system works since 2.2.4 and was refined with 2.4.0. However keep in mind that the FPC helpsystems, while massive (30000+ webpages in the various chms) are generated, compiled and read mostly by FPC tools. (and an occasional test in Windows/gnochm/kchmviewer to see if it opens properly).
+
The basic system works since 2.2.4. However keep in mind that the FPC help systems, while massive (30000+ webpages in the various chms) are generated, compiled and read mostly by FPC tools (with an occasional test in Windows/gnochm/kchmviewer to see if it opens properly).
  
If you want to use FPC's chm package/compiler for own projects, take the follow advise under consideration:
+
If you want to use FPC's chm package/compiler for your own projects, take the follow advice under consideration:
* For now, use trunk. If only for the CHM generating parts. There has been quite some post 2.4.0 action
+
* Work on data that is backed up. I haven't lost data yet, but better be safe than sorry.
 
* Don't expect it to work right away.
 
* Don't expect it to work right away.
 
* Keep in mind what is implemented and what is not (see below)
 
* Keep in mind what is implemented and what is not (see below)
* report bugs in the FPC tracker with a reproducable description.
+
* report bugs in the FPC tracker with a reproducible description (if you are in an hurry with patch :_)
** and if you are in an hurry with patch :_)
 
  
 
== Units ==
 
== Units ==
  
# chmbase - some structures, constants and helper funcs (compare and compression streaming helpers)
+
# chmbase - some structures, constants and helper functions (compare and compression streaming helpers)
# chmfiftimain - the unit that is responsible for reading and writing the search index of chms (TChmSearchReader, TChmSearchWriter)
+
# chmfiftimain - the unit that is responsible for reading and writing the search index of CHMs (TChmSearchReader, TChmSearchWriter)
 
# chmfilewriter - TChmProject, a class that describes a CHM project in the form of a XML description. Can be used to generate CHMs easily.
 
# chmfilewriter - TChmProject, a class that describes a CHM project in the form of a XML description. Can be used to generate CHMs easily.
 
# chmreader - the base CHM Reader
 
# chmreader - the base CHM Reader
# chmsitemap - Support for sitemaps (toc and index, the unit helps transforming these xml files to collections)
+
# chmsitemap - Support for sitemaps (TOC - table of contents - and index) The unit helps transforming these XML files to collections.
 
# chmspecialfiles - Streaming helpers for special index files in the chm
 
# chmspecialfiles - Streaming helpers for special index files in the chm
 
# chmtypes - CHM helper class types.
 
# chmtypes - CHM helper class types.
 
# chmwriter - the base CHM Writer.
 
# chmwriter - the base CHM Writer.
# fasthtmlparser - Base skeleton of a html parser.
+
# fasthtmlparser - Base skeleton of an HTML parser.
# htmlindexer - Contains classes that TChmWriter uses to parse and index the contents of html files for searching
+
# htmlindexer - Contains classes that TChmWriter uses to parse and index the contents of HTML files for searching
# htmlutil - helperfunctions. Mostly getting info out of html/xml tags.
+
# htmlutil - helper functions. Mostly getting info out of HTML/XML tags.
 
# lzxcompressthread - wrapper around paslzxcomp that will do compression in multiple threads
 
# lzxcompressthread - wrapper around paslzxcomp that will do compression in multiple threads
 
# paslznonslide - (de)compression routines
 
# paslznonslide - (de)compression routines
Line 46: Line 48:
 
== Files ==
 
== Files ==
  
Besides the library units, there are also two endprograms:
+
This describes the FPC 2.6+ situation.
 +
 
 +
Besides the library units, there are also two end user programs:
 +
 
 +
* chmls - listing, extracting and unblocking (needed on Windows XP SP2+ and presumably newer Windows versions) of a chm
 +
* [[chmcmd]] - creates a CHM using a XML file made with the TCHMProject class. Roughly equivalent to a simple help compiler. Experimental support for .hhp is also available.
  
* chmls - lists contents of a chm
+
chmcmd is, like the library itself, more a backend tool than a user friendly tool. It is unforgiving if your files are inconsistent (will throw exceptions on missing files), and does not support goodies like autoindexing. It is probably only a substitute for backend jobs where you ensure the input is consistent.
* chmcmd - creates a CHM using a XML file made with the TCHMProject class. Roughly equivalent to a simple help compiler.
 
* unblockchm - unlock downloaded CHMs on Vista/w7 with wildcards. (2.4.2+)
 
  
chmcmd is, like the library itself, more a backend tool than an userfriendly tool. It is unforgiving if your files are inconsistent (will throw exceptions on missing files), and does not support goodies like autoindexing. It is probably only a substitute for backend jobs where you assure the input is consistent.
+
=== chmls ===
 +
Help output of chmls (FPC trunk, September 2014):
 +
<pre>
 +
chmls, a CHM utility. (c) 2010 Free Pascal core.
 +
 
 +
Usage: chmls [switches] [command] [command specific parameters]
 +
 
 +
Switches :
 +
-h, --help    : this screen
 +
-p, --no-page  : do not page list output
 +
-n,--name-only : only show "name" column in list output
 +
 
 +
Where command is one of the following or if omitted, equal to LIST.
 +
list      <filename> [section number]
 +
            Shows contents of the archive's directory
 +
extract    <chm filename> <filename to extract> [saveasname]
 +
            Extracts file "filename to get" from archive "filename",
 +
            and, if specified, saves it to [saveasname]
 +
extractall <chm filename> [directory]
 +
            Extracts all files from archive "filename" to directory
 +
            "directory"
 +
unblockchm <filespec1> [filespec2] ..
 +
            Mass unblocks (XPsp2+) the relevant CHMs. Multiple files
 +
            and wildcards allowed
 +
extractalias <chmfilename> [basefilename] [symbolprefix]
 +
            Extracts context info from file "chmfilename"
 +
            to a "basefilename".h and "basefilename".ali,
 +
            using symbols "symbolprefix"contextnr
 +
extracttoc <chmfilename> [filename]
 +
            Extracts the toc (mainly to check binary TOC)
 +
extractindex <chmfilename> [filename]
 +
            Extracts the index (mainly to check binary index)
 +
printidxhdr <chmfilename>
 +
            prints #IDXHDR in readable format
 +
printsystem <chmfilename>
 +
            prints #SYSTEM in readable format
 +
printwindows <chmfilename>
 +
            prints #WINDOWS in readable format
 +
printtopics <chmfilename>
 +
            prints #TOPICS in readable format
 +
</pre>
  
 
== Missing features ==
 
== Missing features ==
Line 70: Line 115:
 
* (popup) comments
 
* (popup) comments
 
* samples (fpdoc?) Make samples extractable, MSDN style.
 
* samples (fpdoc?) Make samples extractable, MSDN style.
 +
* exclusion lists (stoplists) for wordindexing.
 +
** why is a MS generated index larger while we also index words like "the" and "of" ?
 +
      Probably the parser doesn't see all text.
  
 
== Implemented features ==
 
== Implemented features ==
 
* Full text search
 
* Full text search
 
* Binary Table of contents
 
* Binary Table of contents
* Binary Index (experimental)
+
* Binary Index (experimental, completely rewritten in 3.2.0+)
 
* Experimental threaded LZX Compression
 
* Experimental threaded LZX Compression
 
* context ids.
 
* context ids.
 
* threaded compilation (requires recompile, see below)
 
* threaded compilation (requires recompile, see below)
* (post 2.4.2) #windows support
+
* windows support
* (post 2.4.2) toc and indexfile are not always "default.hh[c/k]"
+
* toc and indexfile are not always "default.hh[c/k]"
* (post 2.4.2) chmcmd and chmls totally revised, and chmcmd now also compiles .hhps.
+
* chmcmd and chmls totally revised, and chmcmd now also compiles .hhps.
  
=== disable binary toc/index generation ===
+
=== Disable binary toc/index generation ===
  
Generating binary forms can be disabled in [[fpdoc]] resp with --no-bintoc and --no-indextoc.
+
Generating binary versions of the TOC and/or index can be disabled in [[FPDoc Editor|fpdoc]] respectively with <code>--no-bintoc</code> and <code>--no-indextoc</code>.
  
 
Note that these options were not added to save space, but to be able to temporarily not generate them if some viewer has a problem with them, and refuses to fall back to default.
 
Note that these options were not added to save space, but to be able to temporarily not generate them if some viewer has a problem with them, and refuses to fall back to default.
  
One can notice that kchmviewer loads chms with binary index and toc significantly faster.
+
One can notice that kchmviewer loads chms with binary index and TOC significantly faster.
  
 
=== How to enable threaded LZX Compression ===
 
=== How to enable threaded LZX Compression ===
  
If you want to test multithreaded chm compression add -dLZX_USETHREADS when compiling the package (or the whole snapshot).
+
If you want to test multithreaded chm compression add <code>-dLZX_USETHREADS</code> when compiling the package (or the whole snapshot).
  
The program using the package (fpdoc must have cthreads added in it's uses section on *nix.
+
The program using the package (fpdoc must have cthreads added in its uses section on *nix.
  
On my Core2 1.73GHz laptop this reduced time from 2min 57s to 2m 7s, iow a reduction by slightly less that a third.  
+
On my Core2 1.73GHz laptop this reduced time from 2min 57s to 2m 7s, in other words a reduction by slightly less that a third.  
  
 
It is unknown how to configure the number of parallel threads.
 
It is unknown how to configure the number of parallel threads.
Line 104: Line 152:
 
* [[Textmode IDE development#Chm Support]]
 
* [[Textmode IDE development#Chm Support]]
 
* Lazarus [[chmhelp]] component
 
* Lazarus [[chmhelp]] component
 +
* [http://www.codeproject.com/Articles/7997/Check-Help-Links-Tool CheckHelpLinks] Tool that displays links etc in CHM files; open CHM (suggest using Fast mode to avoid crashes) and File/Work to check links
 +
* [http://www.nongnu.org/chmspec/latest/ Unofficial CHM file specs]
  
 
Go to back [[Package_List|Packages List]]
 
Go to back [[Package_List|Packages List]]
 +
 +
[[Category:Packages]]
 +
[[Category:Help and Docs]]
 +
[[Category:CHM]]

Revision as of 11:10, 31 March 2020

Overview

The FPC CHM package is a set of units distributed with FPC 2.2.2+ that allow to read/write .chm compressed HTML help files on multiple platforms.

There are two projects that use the CHM package:

  1. the textmode IDE (FPC 2.2.4+)
  2. the Lazarus chmhelp package (which features lhelp, a separate help viewer, connected over TCPIP). The original author is Andrew Haines, but some parts are by others (specially Lars/z505).

The write aspect is mostly used in combination with fpdoc, a doxygen like documentation tool, and chmcmd the commandline compiler.

It is expected that in the next FPC release, chm will replace the doc-html archive with its thousands of separate htmls.

The package is pure Pascal, and thus portable in principle:

  • The package is prepared for different endianness, but has not been extensively tested on Big endian systems
  • No testing has been done on systems that generate an exception on unaligned memory access (like arm-wince). Specially the reader parts should be made aligned() clean.

The package also packages two examples:

  1. chmls
  2. chmcmd (see below)

Package status

The basic system works since 2.2.4. However keep in mind that the FPC help systems, while massive (30000+ webpages in the various chms) are generated, compiled and read mostly by FPC tools (with an occasional test in Windows/gnochm/kchmviewer to see if it opens properly).

If you want to use FPC's chm package/compiler for your own projects, take the follow advice under consideration:

  • Work on data that is backed up. I haven't lost data yet, but better be safe than sorry.
  • Don't expect it to work right away.
  • Keep in mind what is implemented and what is not (see below)
  • report bugs in the FPC tracker with a reproducible description (if you are in an hurry with patch :_)

Units

  1. chmbase - some structures, constants and helper functions (compare and compression streaming helpers)
  2. chmfiftimain - the unit that is responsible for reading and writing the search index of CHMs (TChmSearchReader, TChmSearchWriter)
  3. chmfilewriter - TChmProject, a class that describes a CHM project in the form of a XML description. Can be used to generate CHMs easily.
  4. chmreader - the base CHM Reader
  5. chmsitemap - Support for sitemaps (TOC - table of contents - and index) The unit helps transforming these XML files to collections.
  6. chmspecialfiles - Streaming helpers for special index files in the chm
  7. chmtypes - CHM helper class types.
  8. chmwriter - the base CHM Writer.
  9. fasthtmlparser - Base skeleton of an HTML parser.
  10. htmlindexer - Contains classes that TChmWriter uses to parse and index the contents of HTML files for searching
  11. htmlutil - helper functions. Mostly getting info out of HTML/XML tags.
  12. lzxcompressthread - wrapper around paslzxcomp that will do compression in multiple threads
  13. paslznonslide - (de)compression routines
  14. paslzxcomp - (de)compression routines
  15. paslzx - (de)compression routines

Files

This describes the FPC 2.6+ situation.

Besides the library units, there are also two end user programs:

  • chmls - listing, extracting and unblocking (needed on Windows XP SP2+ and presumably newer Windows versions) of a chm
  • chmcmd - creates a CHM using a XML file made with the TCHMProject class. Roughly equivalent to a simple help compiler. Experimental support for .hhp is also available.

chmcmd is, like the library itself, more a backend tool than a user friendly tool. It is unforgiving if your files are inconsistent (will throw exceptions on missing files), and does not support goodies like autoindexing. It is probably only a substitute for backend jobs where you ensure the input is consistent.

chmls

Help output of chmls (FPC trunk, September 2014):

chmls, a CHM utility. (c) 2010 Free Pascal core.

Usage: chmls [switches] [command] [command specific parameters]

Switches :
 -h, --help     : this screen
 -p, --no-page  : do not page list output
 -n,--name-only : only show "name" column in list output

Where command is one of the following or if omitted, equal to LIST.
 list       <filename> [section number]
            Shows contents of the archive's directory
 extract    <chm filename> <filename to extract> [saveasname]
            Extracts file "filename to get" from archive "filename",
            and, if specified, saves it to [saveasname]
 extractall <chm filename> [directory]
            Extracts all files from archive "filename" to directory
            "directory"
 unblockchm <filespec1> [filespec2] ..
            Mass unblocks (XPsp2+) the relevant CHMs. Multiple files
            and wildcards allowed
 extractalias <chmfilename> [basefilename] [symbolprefix]
            Extracts context info from file "chmfilename"
            to a "basefilename".h and "basefilename".ali,
            using symbols "symbolprefix"contextnr
 extracttoc <chmfilename> [filename]
            Extracts the toc (mainly to check binary TOC)
 extractindex <chmfilename> [filename]
            Extracts the index (mainly to check binary index)
 printidxhdr <chmfilename>
            prints #IDXHDR in readable format
 printsystem <chmfilename>
            prints #SYSTEM in readable format
 printwindows <chmfilename>
            prints #WINDOWS in readable format
 printtopics <chmfilename>
            prints #TOPICS in readable format

Missing features

  • merged chm support/ #idxhdr
    • Combining of indexes of multiple chm files. (.CHW/.CHI?)
    • slave file support ? (making slave CHMs, and getting them out, will make for easy master/toc CHMs later)
  • Writing of binary indexes is multiple level, loading not. (???)
  • Expand TCHMProject to support
    • aliases. 2.4.2+ (via .hhp in 2.4.4+)
    • control over textual/binary toc generation, textual/binary index generation etc from the XML/project (2.4.2+)
    • also support .ini (MS helpcompiler compat?) (2.4.4+)
    • lcid/language (?)
    • Window list (in progress, probably 2.4.4+)
  • Alinks
  • auto index/toc(?) generation
  • (popup) comments
  • samples (fpdoc?) Make samples extractable, MSDN style.
  • exclusion lists (stoplists) for wordindexing.
    • why is a MS generated index larger while we also index words like "the" and "of" ?
      Probably the parser doesn't see all text.

Implemented features

  • Full text search
  • Binary Table of contents
  • Binary Index (experimental, completely rewritten in 3.2.0+)
  • Experimental threaded LZX Compression
  • context ids.
  • threaded compilation (requires recompile, see below)
  • windows support
  • toc and indexfile are not always "default.hh[c/k]"
  • chmcmd and chmls totally revised, and chmcmd now also compiles .hhps.

Disable binary toc/index generation

Generating binary versions of the TOC and/or index can be disabled in fpdoc respectively with --no-bintoc and --no-indextoc.

Note that these options were not added to save space, but to be able to temporarily not generate them if some viewer has a problem with them, and refuses to fall back to default.

One can notice that kchmviewer loads chms with binary index and TOC significantly faster.

How to enable threaded LZX Compression

If you want to test multithreaded chm compression add -dLZX_USETHREADS when compiling the package (or the whole snapshot).

The program using the package (fpdoc must have cthreads added in its uses section on *nix.

On my Core2 1.73GHz laptop this reduced time from 2min 57s to 2m 7s, in other words a reduction by slightly less that a third.

It is unknown how to configure the number of parallel threads.

See also

Go to back Packages List