home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Usenet 1994 January
/
usenetsourcesnewsgroupsinfomagicjanuary1994.iso
/
answers
/
clarinet
/
headers
< prev
next >
Wrap
Internet Message Format
|
1993-11-21
|
11KB
Path: senator-bedfellow.mit.edu!bloom-beacon.mit.edu!xlink.net!howland.reston.ans.net!cs.utexas.edu!uunet!looking!brad
Message-ID: <S615.ad9@clarinet.com>
Date: Sun, 21 Nov 93 5:10:15 EST
Expires: Wed, 22 Dec 93 5:10:15 EST
Newsgroups: clari.net.newusers,news.answers
From: brad@clarinet.com (Brad Templeton)
Reply-To: clarinet@clarinet.com
Followup-to: poster
Approved: brad@clarinet.com
Subject: Decoding ClariNet special article headers (Oct/92)
Lines: 262
Xref: senator-bedfellow.mit.edu clari.net.newusers:123 news.answers:14929
Archive-name: clarinet/headers
ClariNet articles come in the USENET message interchange format. This is
a variant of the ARPA/Internet electronic mail format. Exact details on
that format can be found in documents known as RFC822 (Mail format) and
RFC1036 (USENET format) which are stored for anonymous pickup on UUNET and
a variety of other machines.
ClariNet articles use the standard USENET headers, plus a variety of
special custom ones. Here we explain how we use the standard headers and
the meanings of our extensions.
Standard headers:
From:
The mail address found here will almost always be clarinews@clarinet.com.
The comment, or user's full name, will the the reporter's name. In some
cases, a title like "Science Reporter" or an affiliation will be added.
In some cases, the From: address is an e-mail address that reaches the
news agency. In the case of UPI stories, this is not true, and replies
go simply to us.
Subject:
In most cases, this is a professional reporter/editor's headline for the story.
In some cases, such as standing (regular) stories -- stock reports, weather,
sports statistics, etc. -- a headline is filled in by ClariNet, possibly
including the date of the story.
UPI headlines are in mixed case. Some syndicated feature headlines are in
upper case. Newsbytes headlines come in upper case, but are converted by
ClariNet software to mixed case.
Keywords:
On this line, we translate the reporter's story coding along with our own
keywords. A list of possible regular keywords is available. All keywords
are human-generated by reporters and editors. Unfortunately, the coding
system UPI uses is prone to errors. It's very terse, and a single keystroke
error can create a ridiculous keyword. With thousands of stories moving
every day, this is frequent enough to be annoying, but infrequent enough to
be easily tolerated.
Newsgroups:
Articles are cross posted to a variety of newsgroups based on their coding
and keywords. In addition, certain regular stories are put in special
newsgroups based on their slugword (see below.) In general, a story is
crossposted to up to 5 groups, so that those following a topic get every
story related to that topic. All modern news reading software makes sure
that you never see a crossposted article more than once, no matter how many
groups it appears in.
Date:
The time that we got the story directly from the wire, which we receive
via satellite. It will usually not reach you for another two hours on
average, due to batching, propagation delays and deliberate delays required
by contract.
Message-id:
We form message-ids from the slugword and an encoding of the date and time.
Sometimes a checksum is used when the story arrives without a date and time.
References:
ClariNet messages contain References lines that can be used by thread
following newsreading tools such as trn. References are generated
when a story is an update to an earlier story, and when a story is a
sidebar to an earlier story. We do not list all the messages in a
reference chain -- normally, we will list only the immediate predecessor
of a story, and the root of the story tree. This is done for each level
of sidebar -- though normally sidebars only go one level deep.
If you use a threaded newsreader you will thus see chains of updates
grouped together. Not all updates replace their predecessor, so you can
see several real stories in a chain. For example, if you come in to
clari.sports.baseball after a few days, you might see an entire series
of ball game stories grouped together as one thread. You will also see
related stories on a major topic grouped together.
Supersedes:
On some stories, when a story replaces an earlier version, the Supersedes
header is used to specify the message-id of the replaced story. This
doesn't always work, so a cancel message is also issued. In some cases
only the cancel is issued, and we note what was replaced with an
x-supersedes header, which is really just a comment.
ClariNet Special Headers
Slugword:
This is a special story-specific keyword. Every story is assigned a
slugword. If the story is updated, it goes out again with the same slugword.
We use this to cancel the old story before issuing the update, so that only
one version of the story exists on your machine at a given time.
Most slugwords are just simple words. The main story on George Bush, for
example, is usually slugged "bush." There is no formal pattern to this that
you can use, however. It is a safe bet that any story slugged "bush"
would be about him, but if some other bush became news, it might be used
in that context as well.
Sidebars to stories will often use a component slug that links them to the
main story. For example, the Panama invasion was slugged "panama," and a
variety of stories around it were slugged "panama-response," "panama-nuncio"
and so on. Sometimes more levels will appear.
Slugwords can also be used to indicate standing stories -- those that
repeat with some frequency. The daily PEOPLE column is always slugged
"people." You can track a standing story by looking for its slug. A list
of standing stories is available.
Location:
This field provides the location for the story. Sometimes a comma delimited
list of locations is provided. Unfortunately, quite often the reporter does
not code the location of a story, particularly on U.S. domestic news. Most
international news is coded for location.
Possible location codes include country names such as "canada" or
"france" and state names such as "california." Regions and continents
are also coded, and even a few places like New York City.
In general, expect a location only on an International story or a U.S. regional
story.
ACategory:
This provides the ANPA story category. There are just over a dozen of these.
They provide a general story category. Our keywords give far more specific
coding. This is useful if you're looking for general coding. The categories
are:
usa General U.S. related news
special Special section (rarely used)
feature Feature article
food Recipes etc. (rarely used)
entertainment
financial
international Non U.S. stories
commentary Editorials etc.
lifestyle
weather
regional Regions of the USA
national Artificial category, local version of
national story.
political
scoreboard Sports score reports
racing (Not covered by ClariNet)
sports
travel
advisory (For editors only -- not released)
washington
reserved (Unknown Category)
natbriefs Radio National Briefs
briefs Radio briefs
headlines Radio headlines
reg-headlines Radio regional headlines
markets Radio stock market reports
billboard
television Radio reports about Television
Most stories are usa, international, financial, sports or entertainment.
Most stories in clari.local groups are regional.
Priority:
This is a general indication of the importance of the story.
Priorities are:
"FLASH", Once a decade type stories
"BULLETIN", Top stories of the week
"urgent", Top breaking stories of the day
"major", Big non-breaking stories (artificial category)
"regular", Most stories
"daily", Lower priority stories
"deferred", /* never used */
"release-at-will", Advance material for release any time
"advance", Material for future release
"weekend", Material for weekend newspapers
Stories of the "flash," "bulletin" or "urgent" priority are what is known
as "breaking news." Each priority has its own newsgroup so that you can
track the biggest stories directly. We have never seen a posting to
clari.news.flash yet. The last known flash was "space shuttle explodes."
(Flashes are always 3 words, followed up by a bulletin.)
You usually see 2-4 bulletins a week, although there will always be
multiple versions of any bulletin story. You see 2-4 urgent stories per
day as well.
"major" is a priority we created. This is for stories that are, in wire
parlance, "skedded." They have a regular priority but are rated as important
stories by the desk editors. They go into the "top" news groups.
Format:
The format field is somewhat redundant. It describes what sort of
story this article is. It is most useful on sports stories which come
in a variety of formats. Formats depend on the ACategory. Some formats,
like a "game story," are only possible on a sports story.
advisory For editors only -- not sent out
annual Annual summary (Sports/financial/some news)
audio advisory For radio stations
breaking Urgent/bulletin/flash
briefs Short summaries of major stories
close Report at close of market trading
correspondent's advisory For reporters only -- not sent out
daily Lower priority news
daybook For reporters only -- not sent out
feature Feature stories
game story Report on a game
glances Sports at a glance report
headlines Two sentence summaries of major stories
interim Report while market is open
linescores Broken down score reports
market wrapup Final market report
open Market opening report
ratings Team rankings and reports
regular Most news
scorecard List of scores
snap scores Quick scores for radio
summary Summaries (sports/stock market/etc.)
table Sports statistics
week-end Market reports at end of week
Some stories may have multiple formats, comma delimited. Unfortunately this
is more often than not a coding error.
ANPA:, Codes:, X-takes:
These lines mostly serve as comments, used by us to track how our
software decodes the stories from the non-formalized wire format.
While it is not a supported header, here are the meanings of the fields on
the ANPA line.
ANPA: Wc: 446; Id: a0723; Sel: na--i; Adate: 3-17-1235pes; Ver 2/0; V: sked ld
Wc: Word count
Id: Internal wire story ID -- unique number for the day
Sel: Wire selector code
Adate: Date story was written
Ver: Major and minor version numbers for this story.
V: Version field, sometimes indicates reason for update
Many keywords are possible here, which we won't document.
Codes:
This comment line contains the original reporter's cryptic coding of the
story. We have translated all this into human readable information above.
It is their for our debugging purposes, only. Not a supported header.
X-Takes:
If the story was sent to us in multiple parts (don't ask why), the number of
parts received is listed here. Not a supported header.