home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Usenet 1994 January
/
usenetsourcesnewsgroupsinfomagicjanuary1994.iso
/
answers
/
clarinet
/
howitworks
< prev
next >
Wrap
Internet Message Format
|
1993-11-21
|
18KB
Path: senator-bedfellow.mit.edu!bloom-beacon.mit.edu!news.kei.com!sol.ctr.columbia.edu!howland.reston.ans.net!darwin.sura.net!news-feed-1.peachnet.edu!concert!decwrl!looking!brad
Message-ID: <S615.439@clarinet.com>
Date: Sun, 21 Nov 93 2:40:08 EST
Expires: Wed, 22 Dec 93 2:40:08 EST
Newsgroups: clari.net.newusers,news.answers
From: brad@clarinet.com (Brad Templeton)
Reply-To: clarinet@clarinet.com
Followup-to: poster
Approved: brad@clarinet.com
Subject: ClariNet: How it works (Sep/93)
Lines: 420
Xref: senator-bedfellow.mit.edu clari.net.newusers:119 news.answers:14924
Archive-name: clarinet/howitworks
ClariNet draws news from a variety of sources. This news is
processed and converted into USENET format at ClariNet
facilities. It is then sent out via UUCP (the telephone/modem
based inter-unix communications facility) and TCP/IP (the
computer communications protocol used by many machines,
including those on leased line networks like the internet)
to ClariNet customers around the world.
We receive UPI (United Press International) wireservice news
directly via satellite, in the same way that newspapers
receive it. The wire news comes (more or less) in what is
known as the ANPA (American Newspaper Publishers
Association) format.
This format was designed some time ago. In the beginning,
all wires simply fed directly to printers or teletypes, at
speeds of 300 bps or less. The ANPA format was eventually
designed and revised to help newspapers that fed the wire
directly into the composing computer.
Even so, it is primitive compared to formats like the USENET
news format and modern electronic mail formats. Only a
small amount of information is formally specified. By and
large, the information is intended for use by computer
assisted humans, not an electronic newspaper system like
ClariNet.
The satellite feed also provides us with syndicated columns,
stocks, and other newspaper related services. The
syndicates all buy transmission time on the two main
newswire satellite networks (UPI and AP) -- charging it back
to their customers, of course.
For other sources, we either call pickup points by modem or
have the sources upload the information to us. Once again,
our software converts the information and injects it into
the USENET style news system.
Where possible, news is fed directly to customers with
minimal human intervention. Our software has been trained
to deal with the various inconsistencies in the wire feed so
that news goes out even outside of business hours. This
ensures that the news gets to you as quickly as possible.
The software takes category information provided by the
reporters and uses it to classify the articles into one or
more appropriate newsgroups. For example, all NASA stories
go to clari.tw.space.
During business hours (and often outside them, too) ClariNet
editors scan the report. We can delete bad stories, edit
them to make corrections, or adjust categorizations and
newsgroups. If a story is corrected, the old version is
canceled and the update re-issued.
We don't edit every single mistake we find. In general, we
edit serious errors and add or delete categorizations from
stories. Most of this news is written quickly, with the
goal of getting it to the client as soon as possible. As
such we sometimes let typos and other minor mistakes stand,
in order to avoid excessive re-issuance of stories.
"Wireservices"
Long before USENET existed, the wireservices built the first
large scale text broadcast systems. Aside from the feeds to
newspapers -- done at first by telegraph, later by leased
lines and now by satellite -- the wires have their own
internal nets as well, where they can issue messages to
their own people and even engage in limited discussion.
These nets have been around since the 19th century, long
before computers even existed. Unfortunately, it seems at
times that their technology hasn't changed much since then.
As you will read, the reporters key in all the headers and
classifications by hand with cryptic single letter codes.
This is very prone to error. With luck, this system will be
replaced in the near future.
The largest wireservice in the world is the Associated
Press, or AP. AP is owned by member newspapers. It has its
own reporters, but also draws stories from the member
papers. In the USA, the #2 wire is United Press International,
or UPI. UPI is an independent wire, privately owned. UPI
draws revenue only from fees charged to client newspapers
and distributors like ClariNet. The third major wire is
Reuters. Reuters now makes the vast bulk of its revenue not
from newspapers, but by providing information to people in
the finance industries. Nonetheless its wireservice
components in the USA are similar in size to UPI.
As the #2 wire, UPI is far more willing to experiment with
new concepts like electronic publishing. This is what makes
ClariNet wireservice news possible.
Just like USENET, wireservices have their own vocabulary.
You'll see some of it in the advisories on ClariNet stories,
which we put in the Note: header line.
"Wire Activity"
All wire stories have the following main components:
1. A priority that marks the importance of the story.
2. A general category from one of about a dozen ANPA
defined codes.
3. A *slugword*, or unique keyword that identifies the
story for that day.
A variety of other fields are optional and described later.
"Priorities"
UPI covers a wide variety of topics. The most important
stories are termed *breaking* news. These stories are
assigned one of three special priorities -- flash, bulletin
and urgent.
*Flash* is the most extreme priority there is. Flash
stories are only one sentence long, and are followed almost
immediately by a bulletin. The last known flashes were
"space shuttle explodes" and "U.S. invades Iraq" -- this
gives you some idea of the importance of these stories. Any
flash, if and when it comes, will be posted to clari.news.flash.
If you're a system administrator, you might arrange for
special treatment and forwarding of such stories.
*Bulletin* is the normal priority for the most important
breaking stories of the week. Bulletins can range from
major government announcements up to big events such as the
U.S. invasion of Panama. One normally doesn't see more
than a few bulletins per week; although like world events,
bulletins come at random.
*Urgent* is a priority assigned reasonably frequently -- 3-6
times per day. The most important stories of the day get
this priority.
Most other news gets the *regular* (called *rush* in the
wire industry) priority. Some other news will see lower
priorities. These are listed in the description of the
Priority header line.
All breaking news stories are posted to special groups
dedicated to news of that priority. When a story is first
assigned a priority, we maintain it in the group for that
priority each time it is re-issued, even if the wire has
dropped the story's priority to a lower value.
"Scheduled News"
A lot of the major news that "moves" on the wires is not
unexpected. For example, a presidential press conference is
sure to produce a big story, and everybody knows what time
that story will arrive -- they just don't (usually) know
what it will say.
In addition, a number of stories are important, but not
particularly urgent, and are written with care for release
at a particular time. This is true of features and analysis
pieces, or pieces about developing world situations.
These types of stories are known as scheduled stories, or
"skedded" in the wire lingo. The editors release a schedule
of upcoming big stories for newspaper editors to use in
planning their pages. We assign any "skedded" story a
priority of *major*, and have created some special groups,
called "top" news groups, for such stories.
"Classification"
The ANPA category provides some useful information about a
dozen ANPA categories used regularly. To supplement this,
UPI has reporters and editors classify stories with special
custom codes. These map to keywords identifying several
hundred different story topics. It is these codes, along
with our own judgement, that classify most of the stories
into newsgroups.
"Story Updates"
When a newspaper goes to press, it wants the latest version
of any developing story. For this reason, almost all
breaking stories get issued several times during the day.
The reporter keeps the text in his or her laptop, edits it
as new details, quotes and corrections develop, and
re-issues the entire story whenever anything important
happens.
On a big story, as many as 20 updates may come in a day.
Most major stories see two or three.
All updates (should) come with the same *Slugword* -- the
unique keyword that identifies the story. When ClariNet
sees a story come in with the same slugword as a previous
story, we normally arrange to replace the old story with the
new one. This is done by canceling the old one (USENET
cancel message) and issuing the new one.
Unfortunately, it's not as simple as that, and this feature
of wireservices is the source of the greatest problem in
interfacing a wire to USENET format news.
Often updates come only minutes apart. In these cases, the
cancel and update is done before the original article is
batched and sent to our clients. This means that you never
even see that original, which is good.
If updates are more widely spaced, you will get both
versions (or several versions) and the cancel message(s).
This means your newsgroups -- particularly the groups for
breaking news -- will be full of gaps formed by deleted
articles. This causes the original rn program to pause, and
can cause worse problems for the nn newsreader. This can be
fixed, however.
The worst question is how to present the updates to the
reader. This system works well for newspapers, for which it
was designed. They are only issued once a day, so readers
only get the story that was current at press time.
On ClariNet, however, if you read an article soon after its
release, and then come back to read again a few hours later,
you may well see the same article presented again. You
aren't seeing the same article, of course, you're seeing an
update. It is up to you to decide if you wish to read the
update for the latest details, or skip it.
Fortunately most updates have a Note: line indicating what
has changed in the article -- but only since the last
update. If several updates have been sent out since you
last read news, this may not tell you enough.
It is a dilemma. Either we present the subscriber with
redundant news that most readers will elect to skip, or we
keep potentially important updates from eager readers. We
have decided to do the former. The use of Newsclip, and
eventually fancier reading tools, can deal with this problem
in a more suitable fashion.
"Other Duplicates"
The update system isn't perfect, because the input from the
wire isn't perfect. Reporters sometimes forget to put
updating flags on stories, for example. Our software is
keyed to look for changes in the headline or byline on a
story. A changed headline more than a few hours after the
original story is treated as a new story by us. This works
about 95% of the time. Sometimes, however, you will see a
duplicated story appear under two headlines. We try to
correct these by hand.
Another common source of duplicates is changed slugwords.
Sometimes an update comes to correct a mistyped or incorrect
slugword. As no information is provided as to what the old
slugword was, we can't arrange to cancel the story being
updated. A duplicate ensues.
The final major source of apparent duplicates comes from the
old concept of a wireservice being split into multiple
wires. One hears talk of the "news wire," the "sports wire"
and the "financial wire." In the old days, each wire went
to a different department in the newspaper. Today it's all
the same physical channel, processed by a computer.
If a story breaks that belongs in more than one category, it
may be sent out twice, with two entirely different
slugwords, and two different ANPA category codes. For
example, Pete Rose's expulsion from baseball was both a
sports story and a general news story.
"Standing Stories"
The wires put out a large variety of standing stories. These
are regular features, all with the same slugword, that
appear at some particular interval, such as every day or
every week.
A list of most of the major standing stories can be found in
a subsequent file.
"Wireservice Errors"
As noted, the wireservice coding schemes are particularly
prone to error. We have trained our software to catch many
typical errors, but the wires have little in the way of
formal specification for what they do put out, and they
don't always follow what formal rules they do have.
Thus you can expect some errors to reach you, particularly
after business hours, or in the lower importance groups
which don't receive full time scrutiny.
At first, we at ClariNet found these errors quite annoying.
One realizes, however, that with thousands of stories to put
out, even the best staff will make a few errors each day. By
and large, they do not interfere in any significant way with
your effort to find the news you want to read, and as such,
they can simply be ignored.
The most annoying are the coding errors, particularly those
from coding typos. You will sometimes see a story in a
group that has nothing to do with the topic of that group.
For example, a college football story, which a reporter
would code as sfc (Sports-Football-College) may get entered
as bfc (Business-manuFacturing-Computers) and thus posted to
our very popular computer group. Until we can convince UPI
reporters to adopt a new coding scheme, such things are
unfortunately possible.
"Local/Regional Stories"
A great deal of a wireservice's output is regional news,
collected for newspaper clients in various U.S. states.
Now, ClariNet releases many of these stories in the
clari.local hierarchy. We have local hierarchies for 30
different U.S. and Canadian regions, in addition to our
international and national news.
Local stories of national importance are cross-posted
between local and national newsgroups.
In certain national groups, we do publish regional stories.
For example, the computer group, as well as most of the
other technical groups, contain regional stories. While
this sometimes results in the odd truly-local computer
story, ("Computer demo day at local University") most of the
time it is worth it. Our editors delete stories of the
"demo day" form after-the-fact.
"Broadcast News"
ClariNet also buys some wireservice news meant for radio
stations. These are used to provide our hourly news
summaries (clari.news.cast and clari.news.headlines) along
with the various local news summaries in the clari.local
hierarchy.
Radio station wires contain shorter stories, and the stories
have no headlines. They are generally a bit sloppier, as
the reporters do not expect them to see print. In addition,
they contain phonetic spellings of unusual names, so that
radio announcers will read things correctly.
"Canadian Broadcast news"
To serve Canadian clients, as well as expatriate Canadians
around the world, ClariNet also offers Canadian news. UPI,
as a U.S. wire, offers very little coverage of Canada. This
is normal for U.S. media. The group clari.news.canada
contains the limited coverage that comes along the main wire
-- only truly major stories and financial news.
The clari.canada hierarchy provides a feed of a broadcast
wire (Standard Broadcast Wire) for Canadians to which we
have arranged access. All the problems of radio wires
described above apply.
The best group to read for those outside of Canada is
probably clari.canada.briefs which provides regularly
updated summaries of major Canadian stories. The group
clari.canada.newscast provides an hourly newscast on world
and Canadian news outside of business hours. This also
covers U.S. and world events, so non-Canadian readers may
wish to read it for late night updates.
Canadian regional summaries (still from SBW) appear in the
clari.local hierarchy.
"Newsbytes"
Newsbytes articles are not as well classified as UPI
articles, but there is still some useful information. It is
put on the Keywords: line.
The most important keyword that appears on each line takes
the form Bureau-xxx where "xxx" is a three letter code for
the location of the bureau. You can use the presence of
these codes to track or filter stories from certain regions.
For example, filtering out Bureau-AUS will eliminate
Australian stories.
(International stories that are more likely to be of
regional interest are also likely to be coded with country
prefix in the subject line, so you can use that in a filter
as well.)
Other keywords include things like exclusive, review and
correction, but it is less likely that you would filter on
these.
Newsbytes headlines arrive at ClariNet in upper case. Our
software converts them to a more readable mixed case.
Naturally such software can't be perfect, so the odd error
will occur, but this is surprisingly rare.
Newsbytes also tags important stories. These are
crossposted to the clari.nb.top newsgroup.
"Features"
Feature articles (such as the Dave Barry) column come in a
fashion similar to UPI material, but they will have no
keywords or location coding. This is not normally a
problem, as you usually will read every item in a feature
group.