home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Freelog 65
/
Freelog065.iso
/
BAS
/
Bureautique
/
Gnumeric
/
gnumeric-1.3.92-rc1.exe
/
files-textopen.xml
< prev
next >
Wrap
Text File
|
2004-11-01
|
57KB
|
1,741 lines
<sect1 id="sect-file-textImport">
<title>Importing Text Files</title>
<!-- TODO: ask- In text import druid, what does row selection do? Why the highlight? -->
<para>
&gnum; can import data which is organized as text fields
structured in some systematic fashion either from a file or from
the clipboard. Importing structured text may require extensive
intervention on the part of the user so &gnum; provides a
<interface>Text Import</interface> druid, which is a three paneled
dialog with configuration options. For text imported from files,
this druid appears after the file has been opened, using the file
format named "Text Import (configurable)" in the <interface>File
Open</interface> dialog, as is explained in <xref
linkend="sect-file-open" />. For text imported from the clipboard,
the druid appears when a user attempts to paste the text into a
worksheet, as is explained in <xref
linkend="sect-movecopy-xclipboard" />.
</para>
<para>
The text import druid contains three panels but the middle panel
differs depending on the structuring system used, either with data
fields separated by a special character or with data fields
occurring at equally spaced intervals in each line. The first
panel allows the user to configure the character encoding, line
break characters, structuring system, and line range. The second
panel allows the user to define the columns by either, for
separated data, setting the separating character and text
delimiting character, or, for fixed space data, by setting the
column spacing. The third panel allows the user to select which
columns to import and define their data types.
</para>
<tip>
<title>The steps involved in the text import druid.</title>
<para></para>
<!-- TODO: render hack- remove this spacing hack -->
<orderedlist>
<listitem>
<para>
Launch the <interface>Text Import</interface> druid using,
in the <guimenu>File</guimenu>, the
<guimenuitem>Open</guimenuitem> and selecting the "Text import
(configurable)" file format type.
</para>
</listitem>
<listitem>
<para>
Define the character encoding of the text block.
</para>
</listitem>
<listitem>
<para>
Define the characters indicating the breaks between the lines.
</para>
</listitem>
<listitem>
<para>
Select the line range from the text block to be imported.
</para>
</listitem>
<listitem>
<para>
Go to the second panel, which will be different for data
structured by separating characters and data structured by
fixed spacing.
</para>
</listitem>
<listitem>
<para>
(For separated data) Define the separating character.
</para>
</listitem>
<listitem>
<para>
(For separated data) Define the character grouping a text field.
</para>
</listitem>
<listitem>
<para>
(For fixed width data) Define the field widths.
</para>
</listitem>
<listitem>
<para>
Go to the third panel.
</para>
</listitem>
<listitem>
<para>
Configure the inclusion of empty outside columns.
</para>
</listitem>
<listitem>
<para>
Select the locale that will influence the formating of the
numerical elements in each column.
</para>
</listitem>
<listitem>
<para>
Select the numerical formats for the data in each columns.
</para>
</listitem>
<listitem>
<para>
Select the columns to be included in the imported block.
</para>
</listitem>
<listitem>
<para>
Click on the <guibutton>Finish</guibutton> button.
</para>
</listitem>
</orderedlist>
</tip>
<para>
This explanation of the <interface>Text Import</interface> druid
will first start with a discussion of text files including
character encodings and line break delimiters. The explanation
will then cover the various strategies used to structure numeric
data in text files. Following these discussions, the components of
the druid will be presented and, finally, a detailed explanation
of each step in the use of the druid will be presented.
</para>
<sect2 id="sect-file-textImport-complex">
<title>The complexities of text format files</title>
<para>
The use of text format files to store and transmit data for use
in a spreadsheet involves three somewhat complex decisions which
determine how the file expresses and separates each data
value. These complexities must be understood for a user to be
able to use the <interface>Text Import</interface> druid
effectively. These complexities exist because of the limitations
of early computers and because or the historical development of
computer systems by different manufacturers and programmers, in
different countries, targeting different types of users,
speaking different languages.
</para>
<para>
The first complexity involves the different systems which relate
the contents of a computer file to the characters in a written
language. All text files on a computer consist of a long
sequence of binary digits. Text files are files in which these
digits are used to indicate different textual
characters. Character 'encodings' are standardized systems which
relate the binary digits in a computer file to a formal system
of characters which includes both text glyphs (shapes) and
formatting indicators. Each encoding defines a way to interpret
the binary digits and uses the characters from a particular
character set. The alternative character encoding strategies are
explained in greater detail in <xref
linkend="sect-file-textImport-complex-encoding"/>, below.
</para>
<para>
The second complexity involves the decision of how to separate
the characters in a file into different lines. Text files
explicitly determine the end of each line of a file with a
specific character or sequence of characters. The complexity
involves the particular character sequence used to determine the
end of each line. Different conventions have been used in
different computer systems. The alternative line breaking
strategies are explained in greater detail in <xref
linkend="sect-file-textImport-complex-lineBreak"/>, below.
</para>
<para>
The third complexity involves the decision of how to separate
the characters in each line into separate value fields. Again,
different strategies exist. These can be separated into two
broad categories: strategies which use a character or sequence
of characters to separate the values, so called 'delimited' or
'separated' strategies, and strategies which use the position of
the character in the line to separate the values, so called
'fixed-width' strategies. The alternative data structuring
strategies are explained in greater detail in <xref
linkend="sect-file-textImport-complex-dataStruct"/>, below.
</para>
<para>
Fortunately, the &gnum; <interface>Text Import</interface> druid
provides users with a way to preview the information in a text
file. This enables users to change the settings which determine
each of these three conventions until the text in the preview
correctly shows the contents of the data file. Therefore, while
the details of these three steps are complex, the practical
impact on users is minimal. Users can simply experiment until
the file appears correct without having to understand each of
these complexities in detail.
</para>
<sect3 id="sect-file-textImport-complex-encoding">
<title>Character Encodings</title>
<para>
The use of text files to store data in a structured fashion for
use by spreadsheet programs, and more generally all text files,
require some scheme to relate the binary number in the computer
file itself to the characters of a written language. Such
schemes are called <wordasword>'encodings'</wordasword>.
</para>
<para>
The origin of computers led to the invention of a number of
different encoding schemes. Due to the limitation of early
computer hardware, these encoding schemes all restricted
themselves to character sets which contained only the most
essential characters of the English language. The desire to
support characters which were not in this basic set of
characters led to the creation of new encoding schemes,
many of which restricted themselves to the characters in
specific languages. One encoding scheme, called UTF-8, has now
emerged as the best encoding scheme for the future for a
multitude of reasons including its ability to co-exist with
current operating systems and its ability to encode all of the
characters in the largest set of characters which has been
consistently defined, the Universal Character Set. However, the
existence of the diversity of encoding schemes means that for
the foreseeable future, files will be created and distributed
using several different schemes. This is especially true for
files containing text in languages other than English.
</para>
<para>
This complex situation generally does not impact users. &gnum;
has been designed to deal with most of the complexity. Many
kinds of flies, such as the &gnum; file format itself, describe
their encoding scheme internally in such a way that it can be
easily recognized. &gnum; also provides an easy approach to
changing the encoding scheme in case this proves necessary.
</para>
<para>
Encoding schemes merely prove a hindrance to users when opening
files. There is no danger that data be lost or that any other
serious problem arise by selecting the wrong scheme. If the
wrong scheme is selected, either the file will contain
characters which are non-sensical and &gnum; will open an error
dialog asking the user to select a different encoding scheme, or
the preview area will display non-sensical characters. These
non-sensical characters may simply be characters grouped
together which do not occur in any language, such as
"åÕÛÛÞ", or may be characters for which
a graphical representation (a glyph) does not exist in the font
being used and is therefore displayed using a small box with
four numbers inside, such as and . Each of
these errors indicates that the encoding scheme used to read the
file was not the same encoding scheme as was used to create the
file. The difficulty is then to determine what encoding scheme
to use. A simple process of trial and error should lead to
picking the right scheme.
</para>
<para>
A basic strategy to find the right encoding for a file being
imported into &gnum; is, first, to use the scheme proposed by
&gnum; and, then, to hunt for the correct encoding. The default
encoding scheme is the one defined by the locale setting of the
user and this is also the default scheme &gnum; uses to create
text files.
<!-- TODO: encoding- add xref to locale. -->
If the default encoding is incorrect, the correct encoding must
be found by trial and error. One strategy to use is to examine
the major wester encodings and then the major regional
encodings. The major western encoding schemes are ASCII,
ISO-8859-1, and UTF-8, but ASCII is a subset of the other two so
it does not need to be tried on its own. The major regional
encodings are the IS0-8859-x schemes since these have become
quite popular in GNU operating systems. Alternatively, the
various character sets used by the Microsoft operating systems
can be attempted. The encoding schemes are listed under
"Western", "Unicode", and the alphabet names.
</para>
<!-- TODO: encoding- expand discussion of each type to be useful. -->
<!--
<para>
The ASCII character set and encoding
* single byte, only seven bits used.
The ISO-8859-x family of encoding schemes
* single byte, all eight bits used
[From Wikipedia: http://en.wikipedia.org/wiki/ISO_8859-1]
Albanian, Basque, Catalan, Danish, Dutch, English,
Faroese, French (missing only œ), Finnish, German
(missing „ and “), Icelandic, Irish, Italian, Norwegian,
Portuguese, Rhaeto-Romanic, Scottish, Spanish, Swedish. Other
languages covered include Afrikaans and Swahili. Thus, this
character encoding is used throughout the American continent,
Western Europe, Australia, and much of Africa.
UTF-8
</para>
-->
<para>
The World Wide Web has many resources dedicated to explaining
encoding systems and other related information. One of the best
sites discussing UTF-8 and Unicode is the <ulink type="http"
url="http://www.cl.cam.ac.uk/~mgk25/unicode.html" >UTF-8 and
Unicode FAQ for UNIX/Linux</ulink> page maintained by Markus
Kuhn.
The Unicode project has a <ulink type="http"
url="http://www.unicode.org">web site</ulink> which includes an
online copy of their standard character set.
A discussion of the ISO-8859 family of encodings can be found at
a page titled: "<ulink type="http"
url="http://czyborra.com/charsets/iso8859.html" >The ISO-8859
Alphabet Soup</ulink>", which may alternatively be found <ulink
type="http"
url="http://www.unicodecharacter.com/charsets/iso8859.html"
>here</ulink>. A similar discussion on Wikipedia, focusing on
the western alphabets, can be found <ulink type="http"
url="http://en.wikipedia.org/wiki/ISO_8859-1" >here</ulink>.
</para>
<!-- TODO: encoding- make a table of the available encodings. Here or below -->
<!-- TODO: ask- encodings available are determined by gnum/pango? -->
</sect3>
<sect3 id="sect-file-textImport-complex-lineBreak">
<title>Line break delimiters</title>
<para>
The use of text files to store data in a structured fashion
for use by spreadsheet programs requires a scheme to separate
each line of the file. Structured text files rely on the files
having explicitly defined rows within the file as one
component in the structuring system. Each of these rows is
defined by a character sequence indicating the end of a row.
</para>
<para>
Two characters that are part of the ASCII code, an early
encoding that became a widely followed standard, were included
to help define the end of the line. These are the 'linefeed'
character and the 'carriage return' character, named after the
two processes which occur when a typewriter starts a new line:
first the typewriter barrel rolls - the linefeed - then the
whole carriage with the sheet of paper moves back to the
starting point -the carriage return. In the same way that
different computing systems have used different encoding
schemes, three different approaches became common for defining
the end of the line.
</para>
<para>
In GNU operating systems and other systems that inherit from
the UNIX legacy, the end of a line was defined simply using the
'linefeed' character. The Macintosh operating system chose
instead to use only the 'carriage return' character. The
Windows operating system uses both characters in the sequence
'carriage return' then 'linefeed'.
</para>
<para>
A user opening a file into &gnum; will see, in the preview area
of the <interface>Text Import</interface> druid, whether or not
the line breaks have been recognized correctly and will be able
to alter the recognition settings. An incompatible setup will
either yield a single unbroken line of text, lines of text with
extra, empty rows between them, or lines of text with extra
symbols at the start or end of each line.
</para>
<!-- TODO: ask- line break delimters Does having all 3 set ever not work? -->
<para>
The correct line break delimiters can be established by
checking or unchecking the alternatives. The preview area will
then show the result of the file interpreted with these
settings.
</para>
</sect3>
<!-- TODO: write- section on data structuring strategies. -->
<sect3 id="sect-file-textImport-complex-dataStruct">
<title>Data Structuring Strategies</title>
<para>
The use of text files to store data in a structured fashion for
use by spreadsheet programs also requires some scheme to
separate each value within every line. Two different approaches
are used to separate these values. The first strategy, uses a
particular character or character sequence to denote the start
and end of each value. Such strategies are called 'Separated
Value' or 'Delimited Value' systems. The second strategy places
each value stating at a specified position in the line. Such
strategies are called 'Fixed Width' strategies because they
inherently require that each value have a pre-determined size.
</para>
<para>
Separated Value structuring systems distinguish the contents of
each value using pre-determined characters to separate the
values. Certain characters have become common in such schemes,
for-example 'Comma Separated Value' files use a comma character
to separate values while 'Tab Separated Value' files use a tab
character. &gnum; allows the user to define the value separator
to be any one of several common characters or a specific
sequence of characters, either on their own or in
combination. For example, a file could use both space
characters and tab characters to separate values. Similarly, a
file could be read which used the entire word 'STOP' to separate
values like the common scheme to separate sentences in a
telegram.
</para>
<para>
Separate Value structuring systems often also include a method
to surround a single text value which may itself contain the
character used to separate values. The quote character is often
used in this role but &gnum; allows users to configure any
character in this role. For example, a file which used the
comma to separate values could nonetheless contain a value like
"Zoe, Sally, Dodji" if this value had appropriate text
indicating characters at either end.
</para>
<para>
Fixed Width structuring systems are common formats for the
output of database tables since the contents of these tables
have often been defined as variables of a particular size.
<!-- TODO: dataStruct- get example for dB variable CHAR14 -->
To import these files, users must specify exactly the start of
each column so that the importer can separate the values on each
row.
</para>
</sect3>
</sect2>
<sect2 id="note-file-textImport-druid">
<title>
The Components of the <interface>Text Import</interface> Druid
</title>
<para>
The <interface>Text Import</interface> druid consists of three
panels with the middle panel differing according to the type of
data structuring used.
</para>
<para>
The first panel allows users to configure the character encoding
used by the file, to determine the character sequences used to
separate lines, configure the type of structuring being used and
select the lines of the file to import. The second column allows
the user to define the separation strategy used for each
value. For separated value files this involves defining the
separating character sequences and the text indicating
character. For fixed width files, this involves defining the
width of each column. The third panel allows the user to select
the columns to be included during the import and to select the
format of the values in each column.
</para>
<para>
Users navigate the <interface>Text Import</interface> druid by
clicking on the <guibutton>Forward</guibutton> button on each
panel after they have configured the settings properly. The
third panel contains a <guibutton>Finish</guibutton> which
causes the file to be imported to a workbook using all the
settings as they are configured.
</para>
<sect3 id="sect-file-textImport-druid-panel1">
<title>
The first panel of the <interface>Text Import</interface> Druid.
</title>
<para>
The first panel of the <interface>Text Import</interface>
Druid allows users to set the file encoding, to determine the
character sequences used to separate lines, configure the type
of structuring being used and select the lines of the file to
import.
</para>
<figure id="fig-file-textImport-druid-panel1">
<title>
The first panel of the <interface>Text Import</interface>
druid with the component areas labeled with callouts.
</title>
<screenshot>
<mediaobject>
<imageobject>
<imagedata fileref="figures/textguru-import-panel1-withTags.png"
format="PNG" />
</imageobject>
<textobject>
<para>
This screenshot depicts the first panel 'Text Import'
druid with callouts labeling the different areas.
</para>
</textobject>
<caption>
<para>
The different components of the first panel of the
<interface>Text Import</interface> druid with each component
labeled with a callout.
</para>
</caption>
</mediaobject>
</screenshot>
</figure>
<para>
The purpose of each labeled component in <xref
linkend="fig-file-textImport-druid-panel1" /> is
explained below:
<variablelist>
<title>The components of the first panel</title>
<varlistentry>
<term>
<emphasis role="bold">1</emphasis> - The file encoding
selection menu.
</term>
<listitem>
<para>
This drop down menu provides a list of encoding
schemes for the characters in the text file. By
default, &gnum; selects the encoding scheme used by
the locale of the user. See <xref
linkend="sect-file-textImport-complex-encoding" /> for more
details.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis role="bold">2</emphasis> - The line break
character selector.
</term>
<listitem>
<para>
These three check boxes can be selected individually
or together to define the sequences which will be
interpreted as line break indicators. Generally,
selecting all three boxes will produce the correct
results.
<!-- TODO: Is having all three line separators checked ever wrong? -->
</para>
<para>
The errors produced if the wrong combination of boxes
is selected will include the entire file being placed
on a single line, empty lines appearing between the
lines of the file, or undefined symbols appearing at
the beginning or end of almost every line. See <xref
linkend="sect-file-textImport-complex-lineBreak" /> for more
details.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis role="bold">3</emphasis> - The data
structuring system selector.
</term>
<listitem>
<para>
These two push buttons allow the choice between the
two different structuring schemes, data structured by
placing a separating character between the data values
and data organized in fixed width columns. Note that
this choice will determine which panel will be shown
as the second panel of the druid. See <xref
linkend="sect-file-textImport-complex-dataStruct" /> for more
details.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis role="bold">4</emphasis> - The line range spinboxes.
</term>
<listitem>
<para>
These two spin buttons allow the user to select the
start and end rows for the data import. The spin boxes
can be used either by typing a new value in the text
entry area where the numbers are displayed, or by
using the mouse button to click on the up arrow to
increase the number and the down arrow to decrease the
number.
</para>
<para>
For instance, if the text file contained a large
header area with meta information, this header could
be excluded from the data imported to the &gnum;
worksheet by increasing the number of the starting,
"From", line.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis role="bold">5</emphasis> - The preview area.
</term>
<listitem>
<para>
This area displays a preview of the file as it will be
interpreted when the the settings that are currently
selected in this first panel are applied.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis role="bold">6</emphasis> - The button area.
</term>
<listitem>
<para>
These four buttons allow the user to navigate the
druid. The <guibutton>Help</guibutton> button should
open the &gnum; manual to this section. The
<guibutton>Cancel</guibutton> button will dismiss the
dialog and return the user to the worksheet. The
<guibutton>Back</guibutton> button is disabled since
this is the first panel of the druid and the
<guibutton>Forward</guibutton> button will bring up
the next panel in the druid.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</sect3>
<sect3 id="sect-file-textImport-druid-panel2separated">
<title>
The second panel of the <interface>Text Import</interface>
Druid used for separated data
</title>
<para>
The second panel of the <interface>Text Import</interface>
Druid used for separated data allows the user to configure the
character sequences used to separate the values in each row
and to configure the text delimiting characters. &gnum;, by
default, guesses which characters are being used to separate
values and pre-sets those characters. The user can, however,
reconfigure these characters. </para>
<figure id="fig-file-textImport-druid-panel2a">
<title>
The second panel of the <interface>Text Import</interface>
druid for separated data with
the component areas labeled with callouts.
</title>
<screenshot>
<mediaobject>
<imageobject>
<imagedata fileref="figures/textguru-import-panel2a-withTags.png"
format="PNG" />
</imageobject>
<textobject>
<para>
This screenshot depicts the second panel 'Text Import'
druid for separated data with callouts labeling the
different areas.
</para>
</textobject>
<caption>
<para>
The different components of the second panel of the
<interface>Text Import</interface> druid for separated data
with each component labeled with a callout.
</para>
</caption>
</mediaobject>
</screenshot>
</figure>
<para>
The purpose of each labeled component in <xref
linkend="fig-file-textImport-druid-panel2a" /> is
explained below:
<variablelist>
<title>The components of the second panel for structured data</title>
<varlistentry>
<term>
<emphasis role="bold">1</emphasis> - The separator
definition area.
</term>
<listitem>
<para>
This are allows the user to define the characters used
to separate data value fields within each
row. The checkboxes can be pressed to add or remove
characters from those treated as
separators. Additionally, the 'custom' type allows the
user to define either other single characters, or a
particular character sequence used to separate
values. The preview area in the panel will show the
file processed with the rules which have already been
applied.
</para>
<para>
Generally, this type of file structuring uses a single
character to separate fields but it is possible to use
either several different characters or to use a
sequence of characters. For example, it would be
possible to use the old telegraphic convention of
separating phrases with the word 'STOP' by selecting
the 'custom' separator type and entering the character
sequence 'STOP' in the text field.
</para>
<para>
This area also includes a checkbox enabling two
separator sequences that immediately follow one
another, to be treated as a single separator. This
option will only be useful where data is imported with
one or more completely empty columns and no partially
filled columns. If this option is checked and the data
file has partially filled columns of data, the columns
will be jumbled during the text import operation.
</para>
<para>
See <xref linkend="sect-file-textImport-complex-dataStruct" />
for more details.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis role="bold">2</emphasis> - The text indicating
character area.
</term>
<listitem>
<para>
Separated value files often additionally define a
character used to indicate the start and end of a data
element which should be considered a single text
entry. This strategy allows the inclusion of text
entries which include the value separator.
</para>
<para>
For example, a file which is structured as a comma
separated value file, could use the double quotation
mark to delimit text values and would then be able to
include text values such as: 'Zoe, Mark, Sally'.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis role="bold">3</emphasis> - The preview area.
</term>
<listitem>
<para>
This area displays a preview of the file as it will be
interpreted when the the settings that are currently
selected in the first and second panels are applied.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis role="bold">4</emphasis> - The button area.
</term>
<listitem>
<para>
These four buttons allow the user to navigate the
druid. The <guibutton>Help</guibutton> button should
open the &gnum; manual to this section. The
<guibutton>Cancel</guibutton> button will dismiss the
dialog and return the user to the worksheet. The
<guibutton>Back</guibutton> button will take the user
back to the first panel, without, however, changing
the settings in this second panel. The
<guibutton>Forward</guibutton> button will bring up
the next panel in the druid.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</sect3>
<sect3 id="sect-file-textImport-druid-panel2fixed">
<title>
The second panel of the <interface>Text Import</interface>
Druid used for fixed width data
</title>
<para>
The second panel of the <interface>Text Import</interface>
Druid used for fixed width data allows the user to define the
widths of each column to be imported. &gnum; provides a
mechanism to automatically guess the widths of the columns and
allows the user, using the mouse, to define the widths of the
columns.
</para>
<figure id="fig-file-textImport-druid-panel2b">
<title>
The second panel of the <interface>Text Import</interface>
druid for fixed width data with the component areas labeled
with callouts.
</title>
<screenshot>
<mediaobject>
<imageobject>
<imagedata fileref="figures/textguru-import-panel2b-withTags.png"
format="PNG" />
</imageobject>
<textobject>
<para>
This screenshot depicts the second panel 'Text Import'
druid for fixed width data with callouts labeling the
different areas.
</para>
</textobject>
<caption>
<para>
The different components of the second panel of the
<interface>Text Import</interface> druid for fixed width
data with each component labeled with a callout.
</para>
</caption>
</mediaobject>
</screenshot>
</figure>
<para>
The purpose of each labeled component in <xref
linkend="fig-file-textImport-druid-panel2b" /> is
explained below:
<variablelist>
<title>
The components of the second panel for fixed width data
</title>
<varlistentry>
<term>
<emphasis role="bold">1</emphasis> - The automatic
column discovery button.
</term>
<listitem>
<para>
This left most button, named <guibutton>Auto Column
Discovery</guibutton>, will cause &gnum; to scan the
file an attempt to assign the columns
automatically. The example presented in <xref
linkend="fig-file-textImport-druid-panel2b" /> shows
one result after this button has been pressed: many of
the columns were discovered automatically, but the
second and third columns were
misidentified. Nonetheless, the automatic mechanism
provides a useful starting point. The definition of
the columns can be refined using the methods described
below.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis role="bold">2</emphasis> - The column
definition clearing button.
</term>
<listitem>
<para>
This right most button, named
<guibutton>Clear</guibutton>, will clear all the
column definitions and reset the file to a single
column. This button should be used cautiously since
there is no way to reverse its action and any
carefully prepared column definition layout will be
irretrievably lost.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis role="bold">3</emphasis> - The preview and
column width definition area.
</term>
<listitem>
<para>
This area acts as both a preview area and an area
where users can define the columns widths.
</para>
<para>
As a preview area, this area
displays a preview of the file as it will be
interpreted when the the settings that are currently
selected in this first panel are applied.
</para>
<para>
This area can also be used to define column
widths. When the panel first appears, a single column
will be defined. The automatic column discovery
mechanism may split this single column into many more
columns. The mouse can then be used to further divide
columns or to join previously separate columns.
</para>
<para>
A new column can be defined by placing the mouse
pointer where the column should start and
double-clicking with the primary mouse button. This
will split the column which used to contain this
position and add a new column starting at this
location.
</para>
<para>
To remove the definition of a column which already
exists or to alter the ending position of a column,
the context menu must be used. The context menu
appears by clicking with one of the secondary mouse
buttons. A column which has already been defined can
be merged with the column on the left or right using
the <guimenuitem>Delete and Merge Left</guimenuitem>
or <guimenuitem>Delete and Merge right</guimenuitem>
menu items. The size of a column can be increased by
placing the mouse pointer inside the column area or
header and using the <guimenuitem>Widen</guimenuitem>
or <guimenuitem>Narrow</guimenuitem> menu items,
respectively. Either of these will change the width of
the column by changing the right hand end of the
column.
</para>
<para>
The context menu can also be used to define new
columns using the <guimenuitem>Split</guimenuitem> menu
item but the double-click approach described above
should be easier.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis role="bold">4</emphasis> - The button area.
</term>
<listitem>
<para>
These four buttons allow the user to navigate the
druid. The <guibutton>Help</guibutton> button should
open the &gnum; manual to this section. The
<guibutton>Cancel</guibutton> button will dismiss the
dialog and return the user to the worksheet. The
<guibutton>Back</guibutton> button will take the user
back to the first panel, without, however, changing
the settings in this second panel. The
<guibutton>Forward</guibutton> button will bring up
the next panel in the druid.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</sect3>
<sect3 id="sect-file-textImport-druid-panel3">
<title>
The third panel of the <interface>Text Import</interface>
Druid
</title>
<para>
This panel allows users to select and format the columns to be
imported to the &gnum; workbook. The first button allows the
exclusion of empty columns on either of the outer sides of the
columns with data. The second button allows the user to define
the locale used to interpret the values in the file. The
remaining area allows the user to predefine the data format to
be used for all the values in each column. This area also
allows the users to select which columns in the file will be
imported to the &gnum; worksheet. Finally, this panel provides
the <guibutton>Finish</guibutton> which is used to dismiss the
dialog and import the file.
</para>
<figure id="fig-file-textImport-druid-panel3">
<title>
The third panel of the <interface>Text Import</interface>
druid with the component areas labeled with callouts.
</title>
<screenshot>
<mediaobject>
<imageobject>
<imagedata fileref="figures/textguru-import-panel3-withTags.png"
format="PNG" />
</imageobject>
<textobject>
<para>
This screenshot depicts the third panel 'Text Import'
druid with callouts labeling the different areas.
</para>
</textobject>
<caption>
<para>
The different components of the third panel of the
<interface>Text Import</interface> druid with each component
labeled with a callout.
</para>
</caption>
</mediaobject>
</screenshot>
</figure>
<para>
The purpose of each labeled component in <xref
linkend="fig-file-textImport-druid-panel3" /> is
explained below:
<variablelist>
<title>The components of the third panel</title>
<varlistentry>
<term>
<emphasis role="bold">1</emphasis> - The trim of empty
outer columns drop down list button.
</term>
<listitem>
<para>
This button provides a list allowing the user to
select whether to trim any outer columns which are
completely empty. The choices are to delete the
columns on both sides, on neither side, or on one side
only. This will only affect columns which have been
previously defined but which contain no data values at
all.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis role="bold">2</emphasis> - Locale definition
for import drop down menu button.
</term>
<listitem>
<para>
This button provides a list of locales which can be
set. The chosen locale will affect how numeric values
are interpreted when then are imported. For instance,
the locale will define the character expected as the
decimal separator which is the period character (.) in
some locales, and the comma character (,) in
others. These locales generally then use the other
character as the spacer grouping the digits in
thousands.
<!-- TODO: add xref to localization discuss and to number formats. -->
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis role="bold">3</emphasis> - The column data
format selection list.
</term>
<listitem>
<para>
This list allows predetermining the format which
&gnum; will assign to each of the values in the columns
selected below. Cell data formats are explained in <xref
linkend="sect-data-format"/>.
</para>
<para>
To use this list, first, one or more columns must be
selected in the preview area below, then, a data
format in this list can be selected, and finally any
details of the format can be configured. Number
formats for instance allow the user to force numbers
to contain fixed number of digits after the decimal
point.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis role="bold">4</emphasis> - The column
selection, inclusion, and file preview area.
</term>
<listitem>
<para>
This area allows users to select columns which will be
preformatted, to select which columns to include in
the import and to preview the file. Each single column
can be selected by clicking with the mouse pointer on
the column header. Any single column can be excluded
from the data imported to the &gnum; worksheet by
clicking in the checkbox in the column header to
remove the check mark. The area also provides a
preview of the data in the text file showing the
effect of the with the current configuration.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
<emphasis role="bold">5</emphasis> - The button area.
</term>
<listitem>
<para>
These four buttons allow the user to navigate the
druid. The <guibutton>Help</guibutton> button should
open the &gnum; manual to this section. The
<guibutton>Cancel</guibutton> button will dismiss the
dialog and return the user to the worksheet. The
<guibutton>Back</guibutton> button will take the user
back to the second panel, without, however, changing
the settings in this third panel. The
<guibutton>Finish</guibutton> button will dismiss the
druid and cause the file to be imported into a new
worksheet using the selected configuration parameters.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</sect3>
</sect2>
<!-- TODO: docbookv4.3 change middle <step>s into <stepalternative>s -->
<!-- TODO: write- section 'Procedure to use the text importer'. -->
<!--
<sect2 id="sect-file-textImport-druid-process">
<title>
The procedure to use the <interface>Text Import</interface>
Druid.
</title>
<para>
</para>
<para>
Explain the optional-ness of the options.
</para>
<procedure>
<title>
The procedure to use the <interface>Text Import</interface>
Druid.
</title>
<step>
<title>
Open the File using the "Text import (configurable)" format.
</title>
<para>
Step description
</para>
<substeps>
<step>
<title>
Launch the <interface>File Open</interface> dialog.
</title>
<para>
Substep description
</para>
</step>
<step>
<title>
Select the folder and file to be opened.
</title>
<para>
Substep description
</para>
</step>
<step>
<title>
Select the "Text import (configurable)" format type.
</title>
<para>
Substep description
</para>
</step>
<step>
<title>
(Optional) Select the character encoding scheme.
</title>
<para>
Substep description
</para>
</step>
<step>
<title>
Open the file.
</title>
<para>
Click on the <guibutton>Open</guibutton> button to open
the file using the <interface>Text Importer</interface>.
</para>
</step>
</substeps>
</step>
<step>
<title>
Configure the 1<superscript>st</superscript> panel.
</title>
<para>
Step description: encoding, line break, data structuring
scheme, line selection.
</para>
<substeps>
<step>
<title>
Re-define the character encoding.
</title>
<para>
Substep description
</para>
</step>
<step>
<title>
Define the line break separator character sequences.
</title>
<para>
Substep description
</para>
</step>
<step>
<title>
Select the data field structuring scheme.
</title>
<para>
Substep description
</para>
</step>
<step>
<title>
Select the line region to import.
</title>
<para>
Substep description
</para>
</step>
<step>
<title>
Move to the next panel
</title>
<para>
Click on the <guibutton>Forward</guibutton> to move to
the next panel. The panel which will appear will be
different for the two types of data structuring
strategies. There are two sections below describing the
second panel, section 3 and section 4, one for each of
the two data structuring schemes.
</para>
</step>
</substeps>
</step>
<step>
<title>
(Separated value structured file)
Configure the 2<superscript>nd</superscript> panel.
</title>
<para>
Step description
</para>
<substeps>
<step>
<title>
Define the character sequences acting as separators.
</title>
<para>
pick any combo of individual chars
</para>
<para>
define a char sequence.
</para>
<para>
Combine 2?
</para>
</step>
<step>
<title>
Define the characters used to braket text fields.
</title>
<para>
Substep description
</para>
</step>
<step>
<title>
Move to the next panel
</title>
<para>
Click on the <guibutton>Forward</guibutton> to move to
the third panel.
</para>
</step>
</substeps>
</step>
<step>
<title>
(Fixed width structured file)
Configure the 2<superscript>nd</superscript> panel.
</title>
<para>
Step description
</para>
<substeps>
<step>
<title>
Define the fixed-width columns.
</title>
<para>
In this process, can restart at any time using the reset
button but CAUTION can't undo a reset.
</para>
<para>
Use the automatic column detection button.
</para>
<para>
Define the columns manually. Dbl click.
</para>
</step>
<step>
<title>
Move to the next panel
</title>
<para>
Click on the <guibutton>Forward</guibutton> to move to
the third panel.
</para>
</step>
</substeps>
</step>
<step>
<title>
Configure the 3<superscript>rd</superscript> panel.
</title>
<para>
Step description
</para>
<substeps>
<step>
<title>
Select which empty outer columns to trim during import.
</title>
<para>
Substep description
</para>
</step>
<step>
<title>
Configure the locale settings used to interpret data values.
</title>
<para>
Substep description
</para>
</step>
<step>
<title>
Select the columns to be imported.
</title>
<para>
Substep description
</para>
</step>
<step>
<title>
Preselect the data formats for the elements in each column.
</title>
<para>
Substep description
</para>
</step>
<step>
<title>
Import the file.
</title>
<para>
Click on the <guibutton>Finish</guibutton> button to
import the file using all the settings as currently
configured.
</para>
</step>
</substeps>
</step>
</procedure>
<para>
The file will be opened
</para>
</sect2>
section end comment to block out section -->
<!-- TODO: Remove the old text that follows. Kept now for inspiration.
********************************************************************
<sect2>
<title> OLD TEXT FOLLOWS: </title>
<sect4>
<title>The Number Formats</title>
<para>After selecting a column on the left select the appropriate
format on the right. In the preview section at the bottom of the
dialog, you can immediately see the effect of selecting that
format. The following types of formats are available:</para>
<variablelist>
<varlistentry>
<term>
General
</term>
<listitem>
<para>This format will guess for each field value whether it is text,
a number, a date, etc.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
Numbers
</term>
<listitem>
<para>You can choose between various number formats. The following list presents
just a short selection of those formats:</para>
<figure id="file-format-numberformats">
<title>Some Number Formats</title>
<screen>
0
0.00
#,##0
#,##0_);(#,##0)
#,##0.00_);[Red](#,##0.00)
</screen>
</figure>
<para>There are also formats facilitating the use of scientific notation,
see <xref linkend="file-format-scientificformats" />.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>
Currency Amounts
</term>
<listitem>
<para> You can choose between various currency formats. The following list presents
just a short selection of those formats:</para>
<figure id="file-format-currenyformats">
<title>Some Currency Formats</title>
<screen>
"$"#,##0
"$"#,##0_);(#,##0)
"$"#,##0.00_);[Red](#,##0.00)
</screen>
</figure>
</listitem>
</varlistentry>
<varlistentry>
<term>
Dates and Times
</term>
<listitem>
<para>You can choose between various date and time formats. Some of these formats will
recognize combined date/time entries. The following list presents just a short
selection of those formats:</para>
<figure id="file-format-dateformats">
<title>Some Date and Time Formats</title>
<screen>
m/d/yy
d-mmm-yyyy
d-mm
mmm/d
mmm/ddd/yyyy
mmmm-yyyy
m/d/yyyy h:mm
yyyy
h:mm:ss AM/PM
[h]:mm:ss
</screen>
</figure>
</listitem>
</varlistentry>
<varlistentry>
<term>
Percentages
</term>
<listitem>
<para>You can choose between various formats that recognize percentages.
The following list presents just a short
selection of those formats:</para>
<figure id="file-format-percentageformats">
<title>Some Percentage Formats</title>
<screen>
0%
0.00%
</screen>
</figure>
</listitem>
</varlistentry>
<varlistentry>
<term>
Fractions
</term>
<listitem>
<para>You can choose between a few formats that recognize fractions.
The following list presents just a short
selection of those formats:</para>
<figure id="file-format-fractionformats">
<title>Some Fraction Formats</title>
<screen>
# ?/?
# ??/??
</screen>
</figure>
</listitem>
</varlistentry>
<varlistentry>
<term>
Scientific Notation
</term>
<listitem>
<para>You can choose between a few formats that recognize numbers in scientific notation..
The following list presents just a short
selection of those formats:</para>
<figure id="file-format-scientificformats">
<title>Some Scientific Formats</title>
<screen>
0.00E+00
##0.0E+0
</screen>
</figure>
</listitem>
</varlistentry>
<varlistentry>
<term>
Text
</term>
<listitem>
<para>If you want the importer to simply read the field value as text without
attempting to interpret it in any way, use the following text format:</para>
<figure id="file-format-textformat">
<title>The Text Format</title>
<screen>
@
</screen>
</figure>
</listitem>
</varlistentry>
</variablelist>
<para>More details on the various formats can be found in
<xref linkend="file-format" />.</para>
<xref linkend="sect-data-format" />.</para>
</listitem>
<listitem><para>
Click the <quote><guibutton>Finish</guibutton></quote> button
to complete importing the file.</para>
</listitem>
</orderedlist>
</sect5>
<sect5>
<title>The Text Import Druid for Fixed Width Fields</title>
<orderedlist>
<listitem>
<para>If you selected fixed width fields you are asked to specify the widths for
each field. Click the <quote><guibutton>Auto Column Discovery</guibutton></quote> button
to have <application>Gnumeric</application> try to determine the fields widths automatically.</para>
<figure id="file-format-csv-import-ex5">
<title></title>
<screenshot>
<mediaobject>
<imageobject>
<imagedata fileref="figures/files-csv-import-ex5.png" format="PNG" />
</imageobject>
<textobject>
<phrase>An image of the third page of the text import
druid with fixed width customization.</phrase>
</textobject>
</mediaobject>
</screenshot>
</figure>
</listitem>
<listitem>
<para>Finally select the appropriate format for each input column as in
<xref linkend="file-format-csv-import-ex4" />.</para>
</listitem>
<listitem><para>
Click the <quote><guibutton>Finish</guibutton></quote> button
to complete importing the file.</para>
</listitem>
</orderedlist>
</sect5>
</sect4>
</sect2>
Old text. -->
</sect1>