PageVisualize.exe v1.0.0
PageVisualize Website Image Capture SDK
Copyright (c) 2004 Lucid Step Software
Introduction
The PageVisualize.exe v1.0.0 executable is a program that is used to capture images of websites. It can be used in GUI Mode (Graphical User Interface Mode) or Console Mode (which is used inside a DOS command prompt window). In GUI mode, a Windows user interface is presented with controls to configure the various available options. In Console Mode, options are submitted as command line switches. Both the GUI Mode and the Console Mode have the capability to process PageVisualize Command files (.pvc files), which are text files that contain a sequence of option statements and page image capture commands.
The program is multithreaded, and is therefore capable of performing multiple webpage image capture operations simultaneously. It is very flexible and provides many options to customize the website image capturing process. It can be used manually with a human operated present, or can be used in an automated manner on a server with no human user present. The PageVisualize Command file processing functionality is ideal for automating the page capturing process. The page capturing engine is very fast and is capable of processing very large page capture batches.
Getting Started
To get started using the PageVisualize program, either go to the Windows Start Menu, Programs menu, PageVisualize submenu, and click the shortcut to start the program in GUI mode, or open a command prompt, change the current directory to the PageVisualize application folder ("C:\Program Files\PageVisualize\Application\" by default), and type "PageVisualize" at the command prompt to display a usage summary.
GUI Mode
When you run the installer, by default a PageVisualize menu is added to your Start Menu. The menu contains a shorcut to launch the PageVisualize software in GUI mode by passing in the "/gui" command line switch (InstallPath\PageVisualize.exe /gui"). After the program has started, you can configure the options as desired using the controls provide. See the Options Reference section of this document for an explanation of each option.
Once the options are configured as desired, enter the URL of the page you would like to capture into the URL edit box. Then click on the "Capture Page Image" button to capture an image of the webpage. After the image has been captured, an image file is saved to disk. Also, the most recently captured image is displayed on the Captured Images tab of the PageVisualize main window. If the "LogFileName" option is blank, log messages are displayed on the "Log Messages" tab of the PageVisualize main window. Otherwise they are saved to the specified log file.
If the "Asynchronous" option is checked when you click the "Capture Page Image" button, you don't need to wait for the capture operation to finish before entering a new URL and clicking the button again. In fact you can enter new URLs and click the "Capture Page Image" button as many times as desired without waiting for the capture operations to complete, because the page capture processing will continue running asynchronously on backround threads. If the "Asynchronous" option is not checked when you click the "Capture Page Image" button, you will need to wait for the capture operation to complete before you can do anything else.
When you configure the various options that are available to control the capturing process, you may notice that the text in the "Sample Command Line Options" text box and the "Sample Command File Page Capture Options" text box changes to reflect the options that you have configured. These sample option strings can be copied out of the text boxes and used on the command line (if you copied the "Sample Command Line Options" text) or in a PageVisualize Command File (if you copied the "Sample Command File Page Capture Options" text). The auto-generated Sample Options text string feature is intended to make it easy to create a set of command line options that you can paste into a command prompt window, or a command file capture command string that you can paste into a command file, without needing to remember or look up all of the available options. All you need to do is use the GUI to select the desired options, and the appropriate option strings are generated for you automatically.
To process a PageVisualize Command file (.pvc file), enter the file name into the CommandFileName edit box, or click on the "..." button to the right of the CommandFileName edit box to display a file dialog that you can use to select the file name. Then click on the "Go" button. You can observe the progress of the command file capturing processing by watching the "Status" panel, switching to the "Captured Images" panel, or watching the log output. The log output will either be displayed on the "Log Messages" tab or saved to a log file, depending on whether the "LogFileName" option has been set to a non-blank value. The "LogFileName" option can be set using the edit box provided in the GUI interface, or by using an option statement in the command file.
The "Status" panel gives you feedback about the progress in processing the page capture requests that you have submitted when you pressed either the "Capture Page Image" button (to capture a single page image) or the "Go" button (to execute a Page Visualize Command file, which is generally used to capture multiple page images). The number of capture operations queued for processing (waiting to be processed) is displayed next to the "Queued:" label. The number of capture operations currently processing (in progress) is displayed next to the "Capturing:" label. The number of capture operations that have already successfully completed is displayed next to the "Completed:" label. The number of capture operations that have timed out before completing is displayed next to the "Timed Out:" label. The number of capture operations that have failed due to errors is displayed next to the "Failed:" label. The number of capture commands that have been parsed without actually being processed is displayed next to the "Parsed w/o Processing:" label.
Console Mode
To use the PageVisualize software in GUI mode, first open a command prompt window. Then change the current working directory to "C:\Program Files\PageVisualize\Application\", or to the directory in which the PageVisualize.exe program is located if you installed the software somewhere other than the default location. Alternatively you can add the directory where the PageVisualize.exe program is located to the system file search path, and then it won't be necessary to change the current working directory before running the program. A third option is to enter the fully qualifed PageVisualize.exe file name (including the path information) whenever you need to run the program.
Enter "PageVisualize" at the command prompt and press enter to run the program. If you don't enter any command line switches, or if you enter "/?" or "/h" as a command line switch, a basic usage summary will be displayed.
The most basic way to use the PageVisualize program in command line mode is to enter the executable name followed by one or more URLs, like the following:
PageVisualize http://www.example.com
This will initiate a page capture operation for the specified URL using the default options. To use options other than the default, enter the command line switches prior to the URL(s) to which they should be applied, like the following:
PageVisualize /ImageResizePercent=50 /ImageFileName=ExamplePageCapture http://www.example.com
This will capture the page, resize it to 50 percent of the length and width of the default capture area (from 800x600 to 400x300), and save the image in the default image format (PNG) to the filename "ExamplePageCapture.png" in the current working directory. Note that if the URL was placed in a position preceding the options, the options would not apply for the page capture of the URL.
Most command line switch names and values are case insensitive. The primary exception to this rule is the "CaptureDownloadOptions" switch, which uses case sensitive option values. Also, regarding the "CaptureDownloadURL" option, some URLs may be case sensitive, but this depends on the server software used to serve up the webpage (Unix-based and Linux-based webservers generally treat the directory and filename portion of the URL in a case-sensitive manner, while Windows-based webservers generally treat the entire URL in a case-insensitive manner).
Abbreviated command line switch option names are available to shorten the command line text that must be entered. In most cases the abbreviated option name is simply the first letter of each word of the full-length option name. A shortened example that is equivalent to the previous example follows:
PageVisualize /irp=50 /ifn=ExamplePageCapture http://www.example.com
In addition to the "/" (forward slash) character, the "-" (hyphen) character may be used as the first character of a command line switch. The following example is equivalent to the previous example:
PageVisualize -irp=50 -ifn=ExamplePageCapture http://www.example.com
Notice in the examples above that each command line switch consists of a single switch prefix character ("/" or "-"), the option full-length name or abbreviated name, the "=" (equals sign) character, and the option value. If the option value has spaces, it should be encapsulated in double quotes, as in the following example:
PageVisualize /irp=50 /ifn="Name With Spaces" http://www.example.com
In no case should there be spaces either immediately before or immediately after the "=" (equals sign) character. If spaces are included between the option name and the "=" character, or between the "=" character and the option value, the command line switch will not be parsed correctly and an error condition will result.
It is acceptable to include multiple URLs in a single command line, as in the following:
PageVisualize /irp=50 /ift=JPEG /ifn=JPEGImage www.example.com /ift=GIF /ifn=GIFImage www.example.com
In order for an option to be in effect for a specific URL, it must precede the URL in the command line string. In the case where an option is repeated, the rightmost occurrence of the option that precedes the specific URL is the one that will be in effect when a page capture is performed for the URL. In the example above, the "/irp" (ImageResizePercent) option precedes both URLs and thus is in effect for both URLs. The first occurrences of the "/ift" (ImageFileType) and "/ifn" (ImageFileName) options precede the first URL and are in effect when it is captured, but they are overriden by the second occurrences of the "/ift" and "/ifn" options, which specify new values for these options, so the new values are in effect for the second URL. Thus two images will be captured and saved, one as a JPEG image named "JPEGImage.jpg", and one as a GIF image named "GIFImage.gif". Note that since the protocol prefix is ommitted from the URL, the "http://" prefix is assumed.
To process a command file from the command line, use the "CommandFileName" option switch, as in the following example:
PageVisualize /irp=50 /ift=JPEG /ifn=JPEGImage www.example.com /ift=GIF /ifn=GIFImage www.example.com
It is acceptable to mix various options, URLs, and command file names all in the same command line statement. If this is done, the options and commands will be processed in the sequence in which they are encountered on the command line and within the command file(s).
PageVisualize Command Files
A PageVisualize Command File (.pvc file) is a text file that contains global options and page capture commands. Any standard text editor, such as Windows Notepad, can be used to create the command file.
In command files, global options and capture commands use the same general set of option names as command line switches. However, some option names are permitted in one place but not another. For information about where each option name is permitted to be used, see the Options Reference section of this document.
In the command file, each global option or page capture command is listed on its own line. The first non-whitespace character in each line determines whether the line will be treated as a comment, a global option, or a capture command. Beginning and trailing whitespace characters are stripped off before processing each line, and are ignored.
If the first character of the line is either ";" or "#", the line is treated as a comment and will be ignored. The comment string terminates at the end of the line. To create a multi-line comment, prefix each line of the comment with a comment character. An example follows:
# This is a comment that
; spans two lines of text
If the first character of the line is either "/" or "-", the line is treated as a global option. Global options are similar to command line switches in that they take effect for all URLs and capture commands which the precede. Also, the syntax of global options is essentially the same as the syntax for command line switches. The primary difference is that only one global option is permitted per line of text in a command file, whereas multiple command line switches are permitted (and expected) on the same command line. For a description of the syntax of command file global option lines, please see the description of the syntax of command line options in the Console Mode section of this document. An example follows:
/LogFileName=PageVisualizeLogMessages.log
If the first character of the line is the "|" (pipe) character, the line is treated as a capture command. Each capture command consists of a pipe-delimitted list of OptionName=value strings. Each option included in the capture command is effective only for the single page capture represented by the capture command in which it is embedded. For each option not included in the capture command, the value from the most recent preceding instance of the equivalent global option is effective, or the default value for the option is effective if there is no preceding instance of the equivalent global option. Either full-length or abbreviated option names may be used. In a capture command, the option name must not be preceded by a prefix character, unlike global options and command line switches. Spaces may optionally be present before the option name or after the option value, but spaces must not be present between the option name and the "=" character or between the "=" character and the option value. An example of a capture command follows:
| irp=50 | ifn="Name With Spaces" | http://www.example.com |
If the first character of the line is none of the above-mentioned characters, the line is assumed to be a single URL, and is treated as a capture command that uses the global option values that are currently in effect. For any global option that is not specified somewhere in the file prior to the URL capture line, the default value for that option is in effect. An example of a single-URL capture line follows:
http://www.example.com
For detailed sample PageVisualize Command Files, please look in the "PVCFiles" subfolder of the folder where the PageVisualize software was installed.
Options Reference
The following is a list of all available options. The options can be used as command line switches, as global options in a PageVisualize Command File (.pvc file), or within capture commands in a PageVisualize Command File. All options are permitted to be used as command line switches. All options except the following (and their abbreviated equivalents) are permitted to be used as global options in a command file: Help, GraphicalUserInterface, CommandFileName, CapturePageURL, ImageFileName. All options except the following (and their abbreviated equivalents) are permitted to be used within capture commands in a command file: Help, MaxCaptureThreads, BatchTimeoutSeconds, ParseWithoutProcessing, GraphicalUserInterface, CommandFileName, LogFileName, LogFileOption.
/Help[=value], /hlp[=value], /h[=value], /?[=value]
BRIEF DESCRIPTION
Displays usage summary or usage information for a specific option.
ACCEPTED VALUES
The value may by blank, "all", "summary", or an option name.
DETAILED DESCRIPTION
Use the "Help" option to get information on how to use the program.
If the value is left blank, a basic usage summary is displayed. If
the value "all" is specified, more detailed help information is
displayed for all available options, in addition to a basic usage
summary. If the value "summary" is specificed, a basic usage summary
is displayed. If a specific option name is specified, more detailed
help information is displayed for the specific option, in addition to
a basic usage summary.
EXAMPLES
/Help=MaxCaptureThreads
/hlp=gui
/h=all
/?
/CommandFileName=value, /cfn=value
BRIEF DESCRIPTION
Specifies the name of a PageVisualize command file (.pvc file) to be
processed.
ACCEPTED VALUES
The value must be the name of a valid PageVisualize command file.
DETAILED DESCRIPTION
The "CommandFileName" option is used to specify the name of a
PageVisualize command file (.pvc file), which is a text file that
contains special options and commands to be processed by the
PageVisualize website image capturing engine. The name may be a fully
qualified file name (include path information), or a relative file
name. The options and capture commands in the file must conform with
the PageVisualize command file syntax. In its most basic form, the
command file may contain simply a list of URLs, one URL per line. For
a detailed description of the PageVisualize command file syntax,
please see the PageVisualize documentation.
EXAMPLES
/CommandFileName=Sample-Basic.pvc
/cfn="C:\Program Files\PageVisualize\PVCFiles\Sample-Complete.PVC"
/LogFileName=value, /lfn=value
BRIEF DESCRIPTION
Specifies the name of a file to which log messages will be saved.
ACCEPTED VALUES
The value must either be blank, or a valid file name.
DETAILED DESCRIPTION
The "LogFileName" option is used to specify the name of a file to
which log messages will be saved. Log messages may be generated by
various events, including the successful capture of a page image, or
the failure to capture a page image. If the log file does not already
exist when a message is logged, it will be created. If it does
already exist, the message will be appended to the end of the file.
The log file is written in a plain text format. If no log file name
is specified, or if a blank file name is specified, log messages will
be output to the console.
EXAMPLES
/LogFileName=PageVisualize.log
/lfn=output.txt
/LogFileOption=value, /lfo=value
BRIEF DESCRIPTION
Controls whether messages are logged on success, on failure, or both.
ACCEPTED VALUES
"OnSuccess", "OnFailure", "OnSuccessOrFailure", "s", "f", "sf".
The default value is "OnSuccessOrFailure".
DETAILED DESCRIPTION
Use "LogFileOption" to control what types of messages are logged. If
"OnSuccess" or "s" is specified, only success messages will be logged.
If "OnFailure" or "f" is specified, only failure messages will be
logged. If "OnSuccessOrFailure" or "sf" is specified, both success
messages and failure messages will be logged.
EXAMPLES
/LogFileOption=OnFailure
/lfo=OnSuccessOrFailure
/lfo=sf
/lfo=f
/MaxCaptureThreads=value, /mct=value
BRIEF DESCRIPTION
Maximum number of simultaneous asynchronous image capture threads.
ACCEPTED VALUES
The value must be an integer in the range 1 - 9999.
The default value is 10.
DETAILED DESCRIPTION
The "MaxCaptureThreads" option is used to configure the maximum number
of simultaneous threads that will be used for asynchronous website
image capture operations. If the value is too low, the page capturing
engine will capture fewer website images in parallel and will thus be
unable to utilize fully the available hardware resources. If the
value is too high, the high processing demands may overwhelm the
hardware and the machine may become sluggish or unresponsive.
EXAMPLES
/MaxCaptureThreads=30
/mct=5
/BatchTimeoutSeconds=value, /bts=value
BRIEF DESCRIPTION
Maximum number of seconds before the batch capture will time out.
ACCEPTED VALUES
The value must be an integer in the range 1 - 9999999.
The default value is 600 (10 minutes).
DETAILED DESCRIPTION
The "BatchTimeoutSeconds" option is used to control the amount of time
that the page capturing engine will wait before timing out a batch of
image capture operations. If the number of page captures in the batch
is large, the value for this option should be increased to allow
sufficient time for the batch to complete. If the batch times out
prior to completion, the page capture operations that completed prior
to the timeout will have been already saved to disk and will not be
deleted.
EXAMPLES
/BatchTimeoutSeconds=6000
/bts=100
/ParseWithoutProcessing[=value], /pwp[=value]
BRIEF DESCRIPTION
Enables or disables parsing (syntax checking) without image capturing.
ACCEPTED VALUES
"1", "yes", "true", "enable"; "0", "no", "false", "disable".
If this option is not specified, the default value is "0" or
"disable". If this option is specified without a value, the default
value is "1" or "enable".
DETAILED DESCRIPTION
If the "ParseWithoutProcessing" option is enabled (using a value of
"1", "yes", "true", or "enable"), then all input will be parsed and
checked for syntax errors, but no page captures will processed. If
this option is disabled (the default), then both parsing and normal
processing will occur. This option can be enabled to check a large
batch for syntax problems prior to the actual processing of the batch.
This option is most useful in conjunction with the "CommandFileName"
option.
EXAMPLES
/ParseWithoutProcessing=enable
/pwp=1
/pwp
/GraphicalUserInterface[=value], /gui[=value]
BRIEF DESCRIPTION
Controls whether the program runs in GUI mode or console mode.
ACCEPTED VALUES
"1", "yes", "true", "enable"; "0", "no", "false", "disable".
If this option is not specified, the default value is "0" or
"disable". If this option is specified without a value, the default
value is "1" or "enable".
DETAILED DESCRIPTION
The "GraphicalUserInterface" option is used to control whether the
program will run in GUI mode (Graphical User Interface mode) or in
console mode. If GUI mode is enabled, a Windows graphical user
interface will be displayed. If GUI mode is disabled, the program
will run in console mode.
EXAMPLES
/GraphicalUserInterface=true
/gui=yes
/gui
/CapturePageURL=value], /url=value
BRIEF DESCRIPTION
The URL of the website for which an image is to be captured.
ACCEPTED VALUES
The value must be a valid URL that points to an HTML, XML, or text
document (the file extension should be .html, .htm, .xml, or .txt).
The protocol prefix should be "http://", "https://", or "file://"
("ftp://" and other protocols are disallowed).
DETAILED DESCRIPTION
The "CapturePageURL" option is used to specify the URL of the website
or local HTML document that should be used to render an image, which
is then captured and saved to a file. Since the "CapturePageURL"
option is the default when no option name is specified, it is
acceptable to specify one or more URLs while including neither the
long option name ("CapturePageURL") nor the short option name ("url").
If the URLs are specified on the command line, they should be
separated by spaces. If the URLs are specified in a command file,
each URL should be on a line by itself in the command text file. Note
that options on the command line take effect in the order in which
they appear on the command line, so each "CapturePageURL" value should
always appear in sequence on the command line in a position that is
after any options that should be applied to the page capture for the
specific URL.
EXAMPLES
/CapturePageURL=http://www.example.com
/url=http://www.example.com
http://www.example.com
/CapturePageWidth=value, /cpw=value
BRIEF DESCRIPTION
The width of the website page capture area in pixels.
ACCEPTED VALUES
The value must be an integer in the range 1 - 9999.
The default value is 800.
DETAILED DESCRIPTION
The "CapturePageWidth" option is used to set the width in pixels of
the rectangular area in which the page will be rendered and from
which a page image will be captured. Many websites are designed to
fit a width of 800, so this is the default value. The value can be
adjusted up or down as needed to fit the desired area.
EXAMPLES
/CapturePageWidth=800
/cpw=1024
/CapturePageHeight=value, /cph=value
BRIEF DESCRIPTION
The height of the website page capture area in pixels.
ACCEPTED VALUES
The value must be an integer in the range 1 - 9999.
The default value is 600.
DETAILED DESCRIPTION
The "CapturePageHeight" option is used to set the height in pixels of
the rectangular area in which the page will be rendered and from
which a page image will be captured. An image with a height of 600
will fit on most displays, and this is the default value. However, a
height of 600 will usually not be sufficient to capture the entire
length of the webpage, so if this is the objective, the value should
be increased as needed to capture the full height of the page, or as
much as needed. However, if the objective is only to provide a
preview image of the website, the default of 600 is generally
sufficient.
EXAMPLES
/CapturePageHeight=600
/cph=768
/CaptureTimeoutSeconds=value, /cts=value
BRIEF DESCRIPTION
Maximum number of seconds before the page capture will time out.
ACCEPTED VALUES
The value must be an integer in the range 1 - 9999.
The default value is 60 (one minute).
DETAILED DESCRIPTION
The "CaptureTimeoutSeconds" option is used to control the amount of
time that the page capturing engine will wait before timing out a
specific page image capture operation. If the specific page to be
captured is slow to load, the value for this option should be
increased to allow sufficient time for the page to load before an
image is captured. If the page capture times out prior to completion,
an image may or may not be captured and saved depending on the
page display progress percentage at timeout and the value of the
"CaptureMinProgressToKeep" option.
EXAMPLES
/CaptureTimeoutSeconds=120
/cts=40
/CaptureMinProgressToKeep=value, /cmp=value
BRIEF DESCRIPTION
Minimum page display progress percent to keep if the page times out.
ACCEPTED VALUES
The value must be an integer in the range 1 - 100.
The default value is 100.
DETAILED DESCRIPTION
The "CaptureMinProgressToKeep" option may be used to configure the
minimum page load and display progress percentage that will be kept
and captured in the event that the page loading and rendering process
times out before reaching 100 percent completion. This value has no
effect for pages that complete the loading and rendering process
before the timeout (as set by the "CaptureTimeoutSeconds" option)
expires. For pages that time out prior to 100 percent completion,
if the completion percentage is not equal to or greater than the value
of "CaptureMinProgressToKeep" then the operation fails and the
partially-downloaded page is discarded. Otherwise, the page image is
captured and saved, and if the completion percentage was less than 100
but equal to or greater than the value of "CaptureMinProgressToKeep",
a message is logged in addition to saving the captured image. This
option should generally be left at the default value of 100, since if
it is set to a lower value then an image may be captured of a
partially-downloaded page, which may in some circumstances even appear
to be blank.
EXAMPLES
/CaptureMinProgressToKeep=90
/cmp=75
/CaptureDownloadOptions=value, /cdo=value
BRIEF DESCRIPTION
Options used to control what the webpage download process.
ACCEPTED VALUES
The value must be a string formed from the concatenation of zero
or more of the following, without any spaces or other delimiters:
J, j, X, x, R, r, I, i, S, s, M, m, B, b, C, c
The default value is "IvFxjSUmbcOrg".
DETAILED DESCRIPTION
Use "CaptureDownloadOptions" to control various aspects of the webpage
download process. Each letter in the option string either enables or
disables a specific download option. An uppercase letter enables the
corresponding option, and a lowercase letter disables the
corresponding option. The option letters and their meanings are as
follows:
I or i - AllowImages or DisallowImages
V or v - AllowVideos or DisallowVideos
F or f - AllowFrames or DisallowFrames
X or x - AllowActiveX or DisallowActiveX
J or j - AllowJava or DisallowJava
S or s - AllowScripts or DisallowScripts
U or u - AllowUTF8 or DisallowUTF8
M or m - AllowMetaCharSet or DisallowMetaCharSet
B or b - AllowBehaviors or DisallowBehaviors
C or c - AllowClientPull or DisallowClientPull
O or o - AllowOffline or DisallowOffline
R or r - AllowForceOffline or DisallowForceOffline
G or g - AllowIgnoreCache or DisallowIgnoreCache
EXAMPLES
/CaptureDownloadOptions=IvFxjSUmbcOrg
/cdo=xjS
/ImageFilePath=value, /ifp=value
BRIEF DESCRIPTION
Specifies the directory path to which captured images will be saved.
ACCEPTED VALUES
The value must either be blank, or a valid file directory path.
DETAILED DESCRIPTION
The "ImageFilePath" option is used to specify the path of the
directory to which captured image files will be saved. It may be an
absolute path or a path relative to the current working directory.
If the directory does not already exist when an image is saved, the
directory will be automatically created.
EXAMPLES
/ImageFilePath="C:\Captured Website Images\"
/ifp=C:\Images
/ImageFileName=value, /ipn=value
BRIEF DESCRIPTION
Specifies the name of a file to a captured image will be saved.
ACCEPTED VALUES
The value must either be blank, or a valid file name.
DETAILED DESCRIPTION
The "ImageFileName" option is used to specify the name of a file to
which a captured image will be saved. If a fully qualified file name
(including an absolute path) is specified for the "ImageFileName"
value, the value of "ImageFilePath" will be ignored and the path
specified in the "ImageFileName" value will used. Otherwise, the
path information in the "ImageFilePath" value will be combined with
the file name (and optional relative path information) in the
"ImageFileName" value to create the fully qualified file name.
If the value for "ImageFileName" is blank, a file name will be
automatically generated from the page title, if available, or from
the URL if the page title is not available. If a file extension is
not specified, the correct file extension for the image file format
will be automatically appended. If a file already exists with the
same name as as the file to be saved, either it will be overwritten,
or the file will be saved under a different name with numbers appended
to make it unique, depending on the value of the "ImageFileOverwrite"
option.
EXAMPLES
/ImageFileName=CapturedImage.png
/ifn=C:\Images\WebsiteImage.jpg
/ImageFileType=value, /ift=value
BRIEF DESCRIPTION
Determines the file format in which the captured image will be saved.
ACCEPTED VALUES
"PNG", "JPEG", "BMP", "GIF".
The default value is "PNG".
DETAILED DESCRIPTION
The "ImageFileType" option is used to select the file format in which
the captured image will be saved. The "BMP" value is used to select
an uncompressed bitmap format. This will result in the largest image
file, but will not lose image detail if the captured image is saved at
full size. The "PNG" file format uses non-lossy compression that also
will not lose image detail, but results in a much smaller image file.
The "JPEG" file format uses lossy compression that will often create
the smallest image files for image captures with large amounts of
detail, but some of the detail will be lost, even if the captured
image is saved at full size. The "GIF" file format uses non-lossy
compression and can generate reasonably small images, but it uses a
reduced color pallette so the colors of the captured image may not
be the same as the original colors.
EXAMPLES
/ImageFileType=PNG
/ift=JPEG
/ift=BMP
/ImageResizePercent=value, /irp=value
BRIEF DESCRIPTION
Determines the resize percent of the image to be saved.
ACCEPTED VALUES
The value must be an integer in the range 1 - 200.
The default value is 100.
DETAILED DESCRIPTION
The "ImageResizePercent" option is used to determine the size of the
image to be saved relative to the page image capture area as set by
the "CapturePageWidth" and "CapturePageHeight" options. For example,
if the "CapturePageWidth" and "CapturePageHeight" options are set to
800 and 600, respectively, and the "ImageResizePercent" option is set
to 50, then the length and width of the original page capture will be
800 and 600, but the length and width of the resized image, which is
the image that is saved to disk, will be 400 and 300. Note that the
"ImageResizePercent" option determines the relative length and width
of the resized image, not the relative image area (if the length and
with of the resized image are 50 percent of the original image, the
resized image area will actually be 25 percent of the original image
area). This option can be used either to shrink or to enlarge the
image to be saved relative to the captured page area. Note that some
blurring or loss of detail may occur when the image is resized.
EXAMPLES
/ImageResizePercent=80
/irp=33
/ImageCompressionQuality=value, /icq=value
BRIEF DESCRIPTION
Determines the compression quality ratio for JPEG images.
ACCEPTED VALUES
The value must be an integer in the range 1 - 100.
The default value is 80.
DETAILED DESCRIPTION
The "ImageCompressionQuality" option is used to determine the quality
of JPEG images, which uses a form of lossy compression that sacrifices
some image detail or quality in exchange for smaller image file sizes.
A value of 100 gives the highest quality JPEG image, but also creates
the largest JPEG file size. A value of 1 gives the lowest quality
JPEG image, but creates the smallest JPEG file size. A value in the
range of 70 to 90 will generally provide a reasonable balance between
good quality and small image size. Note that this option is ignored
when image types other than JPEG are used.
EXAMPLES
/ImageCompressionQuality=80
/icq=30
/ImageGrayscaleConvert=value, /igc=value
BRIEF DESCRIPTION
Determines whether captured images are converted to grayscale.
ACCEPTED VALUES
"1", "yes", "true", "enable"; "0", "no", "false", "disable".
If this option is not specified, the default value is "0" or
"disable". If this option is specified without a value, the default
value is "1" or "enable".
DETAILED DESCRIPTION
The "ImageGrayscaleConvert" option is used to determine whether an
image will be converted to grayscale prior to being saved to disk. If
this option is enabled, the final image saved to disk will be saved as
a grayscale (shades of gray ranging from black to white) image, which
means that there will be no color in the saved image, even if there
was color in the original captured page. If this option is disabled,
the final image saved to disk will have color as long as there was
color in the original captured image. An image file saved in
grayscale mode will generally be significantly smaller than an
equivalent image file saved in color mode.
EXAMPLES
/ImageGrayscaleConvert=no
/igc=true
/igc
/ImageFileOverwrite[=value], /ifo[=value]
BRIEF DESCRIPTION
Determines whether existing image files will be overwritten.
ACCEPTED VALUES
"1", "yes", "true", "enable"; "0", "no", "false", "disable".
The default value is "1" or "enable".
DETAILED DESCRIPTION
The "ImageFileOverwrite" option is used to determine whether an
existing file will be overwritten in the case where a new image file
is to be saved using the same file name. If this option is enabled
(and it is by default) then when a file of the same name already
exists, it will be overwritten by the new file as long as the existing
file is not already open with an exclusive lock that prevents writing
to the file. If this option is disabled, when a file already exists
with the same name as new file to be saved, then the new file is saved
under a different, unique file name. The unique file name is
generated by appending one or more digits to the old file name as
needed until there is no existing file of the same name.
EXAMPLES
/ImageFileOverwrite=disable
/ifo=1
/ifo
Support and Contact Information
For further information or to obtain support, please use the following contact information:
Website: http://www.pagevisualize.com
Email: support@pagevisualize.com
PageCaptureLibrary.dll v1.0.0
PageVisualize Website Image Capture SDK
Copyright (c) 2004 Lucid Step Software
Introduction
The PageCaptureLibrary.dll v1.0.0 is a COM automation object that is used to capture images of websites. It has a simple yet flexible API. It can be used from any Windows programming language that can make use of COM automation objects, including Visual Basic, C++, ASP, Delphi, C#, or any .NET language. It has the capability to process PageVisualize Command files (.pvc files), which are text files that contain a sequence of option statements and page image capture commands.
The COM object is multithreaded, and is therefore capable of performing multiple webpage image capture operations simultaneously. It is very flexible and provides many options to customize the website image capturing process. The PageCaptureLibrary.dll COM object is ideal for automating the page capturing process. The page capturing engine is very fast and is capable of processing very large page capture batches.
Getting Started
Before you can begin using the PageCaptureLibrary.dll COM object, it should first be registered with the operating system so that Windows will be able to locate it when needed. If installed by the installer, the COM object is registered automatically. Otherwise, it is necessary to register it manually. To register it manually, open a command prompt and change the current working directory to the directory where the dll file is located ("C:\Program Files\PageVisualize\Component\" by default). Then run the following command:
regsvr32 PageCaptureLibrary.dll
If you use the uninstaller to remove the PageCaptureLibrary.dll COM object, it will be unregistered automatically. If you need to unregister the COM object manually, open a command prompt and change the current working directory to the directory where the dll file is located ("C:\Program Files\PageVisualize\Component\" by default). Then run the following command:
regsvr32 /u PageCaptureLibrary.dll
When you are ready to start using the COM object, you should first test a simple example to make sure it is registered correctly and that everything works. Open Notepad (or your preferred text editor) and copy and paste in the following sample VBScript code. Change the URL from "http://www.example.com" to some other valid URL, if you would like to do so. Then save the file and name it something like "TestPageCapture.vbs".
' Create an instance of the page capture engine
Set pageCaptureEngine = CreateObject("PageCaptureLibrary.PageCapture")
' Capture a web page image using default options and save it to a file
pageCaptureEngine.CapturePageSynchronously "http://www.example.com", "ExampleFileName"
' Release the page capture engine instance
Set pageCaptureEngine = Nothing
' All finished, so quit
Wscript.Quit
In Windows Explorer, navigate to the folder where you saved the "TestPageCapture.vbs" file and double click on the file. It should quietly execute, and soon an image file named "ExampleFileName.png" should appear in the folder. Double-click on the image file to open it in your default image viewer. If it appears to be a captured image of the website URL you saved in the test script file, then everything is working correctly. Otherwise, you should double-check to make sure you followed all of the instructions above.
API Methods Overview
The PageCaptureLibrary.dll COM object provides a COM interface named IPageCapture that exposes the following methods:
HRESULT _stdcall CapturePageSynchronously(
[in] BSTR URL,
[in] BSTR ImageFileName
);
HRESULT _stdcall CapturePageSynchronouslyWithOptions(
[in] BSTR URL,
[in] int CapturePageWidth,
[in] int CapturePageHeight,
[in] int CaptureTimeoutSeconds,
[in] int CaptureMinProgressToKeep,
[in] BSTR CaptureDownloadOptions,
[in] BSTR ImageFilePath,
[in] BSTR ImageFileName,
[in] BSTR ImageFileType,
[in] int ImageResizePercent,
[in] int ImageCompressionQuality,
[in] int ImageGrayscaleConvert,
[in] int ImageFileOverwrite
);
HRESULT _stdcall CapturePageAsynchronously(
[in] BSTR URL,
[in] BSTR ImageFileName
);
HRESULT _stdcall CapturePageAsynchronouslyWithOptions(
[in] BSTR URL,
[in] int CapturePageWidth,
[in] int CapturePageHeight,
[in] int CaptureTimeoutSeconds,
[in] int CaptureMinProgressToKeep,
[in] BSTR CaptureDownloadOptions,
[in] BSTR ImageFilePath,
[in] BSTR ImageFileName,
[in] BSTR ImageFileType,
[in] int ImageResizePercent,
[in] int ImageCompressionQuality,
[in] int ImageGrayscaleConvert,
[in] int ImageFileOverwrite
);
HRESULT _stdcall ProcessCommandFileAsynchronously(
[in] BSTR CommandFileName
);
HRESULT _stdcall ProcessCommandTextAsynchronously(
[in] BSTR CommandFileText
);
HRESULT _stdcall WaitForAsynchCompletion(
void
);
The "CapturePageSynchronously", "CapturePageSynchronouslyWithOptions", "CapturePageAsynchronously", and "CapturePageAsynchronouslyWithOptions" methods are all used to capture page images.
The two synchronous methods, "CapturePageSynchronously" and "CapturePageSynchronouslyWithOptions", will not return to the caller until the page capture operation has completed or timed out. The two asynchronous methods, "CapturePageAsynchronously" and "CapturePageAsynchronouslyWithOptions", will queue up the capture operation for asynchronous processing and then return to the caller immediately.
The methods named "CapturePageSynchronously" and "CapturePageAsynchronously" only accept a URL and an ImageFileName as parameters, and rely on the properties of the PageCapture object instance for the rest of the option values. The methods named "CapturePageSynchronouslyWithOptions" and "CapturePageAsynchronouslyWithOptions" accept all available options as parameters. However, it is still possible to fall back on the values stored in the properties of the PageCapture object instance by passing in a negative value (if the parameter is an integer) or an empty string (if the parameter is a string).
The "ProcessCommandFileAsynchronously" method is used to process the commands in a PageVisualize command file. The method accepts a single string parameter named CommandFileName, which should contain the name of the command file to be processed. Please see the PageVisualize Command Files section in the document named "Documentation-PageVisualize.rtf" for information about how to create and use command files.
The "ProcessCommandTextAsynchronously" method is similar to the "ProcessCommandFileAsynchronously" method, but instead of accepting the name of a command file as its sole parameter, it accepts the actual command text (the text that would otherwise be stored in a command file) as its sole parameter. Please see the PageVisualize Command Files section in the document named "Documentation-PageVisualize.rtf" for information about the command text syntax.
The "WaitForAsynchCompletion" method is used whenever it is necessary to wait (or block) for all of the queued and processing asynchronous page capture operations to complete. For example, if a program is created to capture a batch of page images asynchronously, afterwhich it should terminate, the program could call "CapturePageAsynchronously" several times, followed by a single call to the "WaitForAsynchCompletion" method. It would then be okay for the program to finish and exit. If the call to "WaitForAsynchCompletion" at the end of the program were to be accidentally omitted, the asynchronous page capture operations would be started but then the process would immediately terminate prior to the completion of the page captures. The asynchronous page capturing threads would be ended before having the chance to complete their work. For the reasons explained above, it is very important to call the "WaitForAsynchCompletion" method after any group of asynchronous operations is initiated. Note that it is not necessary to call the "WaitForAsynchCompletion" method after a synchronous method (such as the "CapturePageSynchronously" method) is called, since synchronous methods already wait for their own completion before returning to the caller.
API Properties Overview
The PageCaptureLibrary.dll COM object IPageCapture interface exposes the following properties:
LogFileName [out, retval] BSTR
LogFileOption [out, retval] BSTR
MaxCaptureThreads [out, retval] int
BatchTimeoutSeconds [out, retval] int
ParseWithoutProcessing [out, retval] int
CapturePageWidth [out, retval] int
CapturePageHeight [out, retval] int
CaptureTimeoutSeconds [out, retval] int
CaptureMinProgressToKeep [out, retval] int
CaptureDownloadOptions [out, retval] BSTR
ImageFilePath [out, retval] BSTR
ImageFileType [out, retval] BSTR
ImageResizePercent [out, retval] int
ImageCompressionQuality [out, retval] int
ImageGrayscaleConvert [out, retval] int
ImageFileOverwrite [out, retval] int
Each property that is of type string (BSTR) accepts or returns a string containing the option value. Each property that is of type integer (int) accepts or returns an integer containing the option value. The boolean options (options that can be either enabled or disabled) are represented as integers, and they accept or return the integer "0" to mean disabled and "1" to mean enabled. The boolean options are "ParseWithoutProcessing", "ImageGrayscaleConvert", and "ImageFileOverwrite".
For descriptions of the meaning and usage of each option property, please refer to the Options Reference section in the document named "Documentation-PageVisualize.rtf".
Sample Code
For sample code that demonstrates how to use each of the COM object API methods and properties, please refer to the "Samples" subfolder of the program installation root folder.
Support and Contact Information
For further information or to obtain support, please use the following contact information:
Website: http://www.pagevisualize.com
Email: support@pagevisualize.com