Downloading wiki.mozilla.org Web Pages for Offline Browsing
In our 2005-2006 FIPS validation, we prepared the documentation using wiki.mozilla.org. Our documentation consists of a collection of web pages that don't form a directory hierarchy, so we can't describe their locations with a simple sentence like "the documentation is under this directory." Furthermore, unlike www.mozilla.org where we can check out master copies of the web pages from CVS, the only way to get copies of the web pages on wiki.mozilla.org is to download them. This presents a challenge because we have to submit a copy of our documentation on a CD-ROM to the testing lab.Fortunately, I found that there are many tools called offline browsers that can download web pages for local browsing, and they are inexpensive (< $50) or free. I tested an open source one called HTTrack. After I spent a few hours experimenting with its options, I got it to work. Here are the instructions.
Instructions
- Download HTTrack from http://www.httrack.com/.
- Install HTTrack
- Start up HTTrack.
- Click the Next button to start a new project.
- Name the project NSSCryptoModuleSpec and click Next
- Add the URL: http://wiki.mozilla.org/NSSCryptoModuleSpec.
- Under "Preferences and mirror options", Click on "Set options..."
- Select the Scan Rules tab.
- Copy and paste the following rules under the default rule:
+wiki.mozilla.org/* -http://wiki.mozilla.org/index.php?title=Special:Userlogin* -http://wiki.mozilla.org/Main_Page -http://wiki.mozilla.org/MozillaWiki:* -http://wiki.mozilla.org/Special:* -http://wiki.mozilla.org/Help:Contents -http://wiki.mozilla.org/index.php?title=*&action=history -http://wiki.mozilla.org/index.php?title=*&action=edit* -http://wiki.mozilla.org/index.php?title=*&action=watch -http://wiki.mozilla.org/NSS -www.mozilla.org/* +http://www.mozilla.org/css/* +http://www.mozilla.org/images/* +http://www.mozilla.org/projects/security/pki/nss/fips/nss-source/* +http://www.mozilla.org/projects/security/pki/nss/fips/audit-design.html +http://www.mozilla.org/projects/security/pki/nss/fips/secpolicy.pdf +http://www.mozilla.org/projects/security/pki/nss/overview.html +http://www.mozilla.org/projects/security/pki/nss/intro.html +http://www.mozilla.org/projects/security/pki/nss/nss-guidelines.html +http://www.mozilla.org/projects/security/pki/nss/devel/pk11wrap.pdf +http://www.mozilla.org/projects/security/pki/nss/pcertdb.html +http://www.mozilla.org/projects/security/pki/nss/*.gif -developer.mozilla.org/* +http://developer.mozilla.org/en/docs/skins/* +http://developer.mozilla.org/favicon.ico +http://developer.mozilla.org/css/* +http://developer.mozilla.org/en/docs/index.php?title=-&action=raw&smaxage=0&gen=js +http://developer.mozilla.org/en/docs/index.php?title=MediaWiki:Cavendish.css&action=raw&ctype=text/css&smaxage=18000 +http://developer.mozilla.org/en/docs/index.php?title=-&action=raw&gen=css&maxage=18000 +http://developer.mozilla.org/en/docs/FC_* +http://developer.mozilla.org/en/docs/NSS_reference* +http://developer.mozilla.org/en/docs/PKCS11_Module_Specs - Select the Flow Control tab.
- Set "TimeOut(s)" to the maximum 1200 s and "Retries" to the maximum 3, and click OK.
- Click Next, and then click Finish.
Notes on the Scan Rules
'-' means "excludes".'+' means "includes".
The rules seem to be interpreted sequentially, so their ordering matters.
The "excludes" rules are necessary to prevent the tool from wandering off and downloading irrelevant web pages from wiki.mozilla.org. I have to exclude all the links in the top and left panels of each web page. If those links ever change, we will need to update the "excludes" rules.
For www.mozilla.org, I only want a few files:
- fips/nss-source
- a few "design specification" documents I referenced.