home *** CD-ROM | disk | FTP | other *** search
- Subject: v16i088: List identifiers and declarations for C sources
- Newsgroups: comp.sources.unix,comp.lang.c
- Approved: rsalz@uunet.UU.NET
-
- Submitted-by: arizona!rupley!local (John Rupley)
- Posting-number: Volume 16, Issue 88
- Archive-name: identlist
-
- TITLE: cdeclist, identlist - list external declarations
- for C source files; list identifiers
-
- SUMMARY: The attached Lex source files are for filters that
- generate, for a C source file:
- (1) a list of external declarations (functions, arrays,
- variables, structures);
- (2) a file of identifiers suitable for inverted indexing,
- making a word list, etc.
- Run under SysV or BSD. See README for test instructions:
-
-
- Hope you find them useful,
-
- John Rupley
- uucp: ..{cmcl2 | hao!ncar!noao}!arizona!rupley!local
- internet: rupley!local@megaron.arizona.edu
-
- -------------------------------------------------------------------------
- #!/bin/sh
- # to extract, remove the header and type "sh filename"
- if `test ! -s ./README`
- then
- echo "writing ./README"
- cat > ./README << '\Rogue\Monster\'
- README - Sat Aug 20 23:22:04 MST 1988
-
- The attached Lex source files are for filters that generate,
- for a C source file:
- (1) a list of external declarations (functions, arrays,
- variables, structures);
- each declaration on one line, for manipulation
- by awk, etc.;
- initializers replaced by a /*comment*/;
- function definitions with parameter declarations and
- a dummy statement with appropriate return:
- { [return [(n)];] }
- preprocessing with cpp replaces defined names and executes
- compilation conditionals;
- (2) a file of identifiers suitable for inverted indexing, making
- a word list, etc.
-
- Use of the filters is explained by example:
- (1) mk_cdeclist and the test targets in the Makefile show the
- generation of a declaration list;
- (2) mk_indentlist shows the generation of a list of identifiers.
-
- The make file works under csh/BSD and ksh/SYSV.
- To test:
- copy a C source file and any non-system #include dependencies into
- the directory with the unshared files from cdeclist.shar
- run:
- make TESTC="C_src_file" testall
- you should get a file LLLLfilename, which is the declaration list,
- some intermediate files LLfilename and temp?, which shou
- what is going on at each stage, and a file ZZZZfilename,
- which is the list of C keywords and identifiers,
-
-
- The cdeclist filters, although simple, should parse most styles of coding.
- Adjustment may be needed for the new ANSI standard. The output
- is suitable for making a lint library.
-
- The identlist filters are offered without much being claimed for them.
- I use the list as a guide and aid, without assuming it absolutely correct.
-
- The shell scripts mk_* were written under ksh on a SYSV machine, and they
- need modification to run under csh or BSD.
-
- John Rupley
- uucp: ..{cmcl2 | hao!ncar!noao}!arizona!rupley!local
- internet: rupley!local@megaron.arizona.edu
- (H) 30 Calle Belleza, Tucson AZ 85716 - (602) 325-4533
- (O) Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 - (602) 621-3929
-
- +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
- MANIFEST: (for cdeclist.shr1)
-
- README
- Makefile
- cdeclist.1 man page
-
- uncomment.l |
- cdeclist1.l |
- cdeclist2.l | shell wrapper and filters generating declaration list
- cdeclist3.l |
- cdeclist4.l |
- mk_cdeclist |
-
- identlist.l | shell wrapper and filters generating file of identifiers
- identlist1.l |
- mk_identlist |
-
- \Rogue\Monster\
- else
- echo "will not over write ./README"
- fi
- if [ `wc -c ./README | awk '{printf $1}'` -ne 2358 ]
- then
- echo `wc -c ./README | awk '{print "Got " $1 ", Expected " 2358}'`
- fi
- if `test ! -s ./Makefile`
- then
- echo "writing ./Makefile"
- cat > ./Makefile << '\Rogue\Monster\'
- #MAKEFILE -
- #5 filters for extracting a list of declarations from a C source file;
- # delete: comments;
- # (uncomment.l)
- # filter with cpp: replace #defines; execute #ifdef control statements;
- # (cdeclist1.l pre-processes, cdeclist2.l post-processes)
- # delete: function {bodies}; initializations (= ....;);
- # (cdeclist3.l)
- # reformat: to one-line declarations; and a bit more;
- # (cdeclist4.l)
- #output should be suitable for:
- # making /*LINTLIBRARY*/, massaging with grep, awk..., etc;
- #all filters read stdin, write stdout;
- #probably need some adaptation for new ANSI standard;
- #
- #also 2 filters (identlist and identlist1) that prepare a C source
- #file for making a word list or index of identifiers; please don't
- #flame me if the word list misses a few identifiers or includes a few
- #non-existent ones -- I make no great claim for it -- and it assumes,
- #I am sure, an idiosyncratic style (mine).
- #
- #John Rupley
- # uucp: ..{cmcl2 | hao!ncar!noao}!arizona!rupley!local
- # internet: rupley!local@megaron.arizona.edu
- # (H) 30 Calle Belleza, Tucson AZ 85716 - (602) 325-4533
- # (O) Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721 - (602) 621-3929
-
- OPT=
-
- dummy:
- @echo please supply a target
-
- all: uncomment cdeclist1 cdeclist2 cdeclist3 cdeclist4 identlist identlist1
-
- #remove comments
- uncomment: uncomment.l
- lex -v uncomment.l
- $(CC) lex.yy.c $(CFLAGS) -ll $(LDFLAGS) -o uncomment
-
-
- #remove comments and quoted strings etc
- identlist: identlist.l
- lex -v identlist.l
- $(CC) lex.yy.c $(CFLAGS) -ll $(LDFLAGS) -o identlist
-
- #remove upper-case words and numbers
- identlist1: identlist1.l
- lex -v identlist1.l
- $(CC) lex.yy.c $(CFLAGS) -ll $(LDFLAGS) -o identlist1
-
- #prepare C source file for cpp processing
- cdeclist1: cdeclist1.l
- lex -v cdeclist1.l
- $(CC) lex.yy.c $(CFLAGS) -ll $(LDFLAGS) -o cdeclist1
-
- #remove additions associated with cpp processing
- cdeclist2: cdeclist2.l
- lex -v cdeclist2.l
- $(CC) lex.yy.c $(CFLAGS) -ll $(LDFLAGS) -o cdeclist2
-
- #remove function {bodies}, put in appropriate { return (n); }
- cdeclist3: cdeclist3.l
- lex -v cdeclist3.l
- $(CC) lex.yy.c $(CFLAGS) -ll $(LDFLAGS) -o cdeclist3
-
- #one-line declarations;remove initializatons;beautify a little
- cdeclist4: cdeclist4.l
- lex -v cdeclist4.l
- $(CC) lex.yy.c $(OPT) $(CFLAGS) -ll $(LDFLAGS) -o cdeclist4
-
- SHARLIST= \
- README\
- Makefile\
- cdeclist.1\
- uncomment.l\
- cdeclist1.l\
- cdeclist2.l\
- cdeclist3.l\
- cdeclist4.l\
- mk_cdeclist\
- identlist.l\
- identlist1.l\
- mk_identlist
-
- mkshar:
- shar -f cdeclist -c $(SHARLIST)
-
- TESTC="please define TESTC=C_src_file on make line"
- #HDIR=hard_wired_path_for_headers
- #CDIR=hard_wired_path_for_sources
- HDIR="."
- CDIR="."
-
- #test making a declaration list
- testc: all
- cat $(CDIR)/$(TESTC)|uncomment|cdeclist1 >temp1
- /lib/cpp -P -C -I$(HDIR) temp1 >temp2
- cat temp2|cdeclist2 >temp3
- echo /*$(TESTC)*/ >LL$(TESTC)
- echo >>LL$(TESTC)
- cat temp3|cdeclist3 >>LL$(TESTC)
-
- test2c: cdeclist2
- cat temp2|cdeclist2 >temp3
-
- test3c: cdeclist3
- echo /*$(TESTC)*/ >LL$(TESTC)
- echo >>LL$(TESTC)
- cat temp3|cdeclist3 >>LL$(TESTC)
-
- TESTLL=LL$(TESTC)
-
- testll: cdeclist4
- cat $(TESTLL)|cdeclist4 >LL$(TESTLL)
-
-
- #test making a word list
- testw: identlist identlist1
- cat $(TESTC)|identlist|identlist1 |\
- tr -s " " "\012\012"|sort|uniq >ZZZZ$(TESTC)
-
-
- #careful!
- testlint:
- lint -uvx -Ml -otemp LL$(TESTLL)
- cp llib-ltemp.ln /usr/lib/large
- lint -uvx -Ml -ltemp $(TESTC)
-
- testall: testc testll testw #testlint
- \Rogue\Monster\
- else
- echo "will not over write ./Makefile"
- fi
- if [ `wc -c ./Makefile | awk '{printf $1}'` -ne 3409 ]
- then
- echo `wc -c ./Makefile | awk '{print "Got " $1 ", Expected " 3409}'`
- fi
- if `test ! -s ./cdeclist.1`
- then
- echo "writing ./cdeclist.1"
- cat > ./cdeclist.1 << '\Rogue\Monster\'
- .TH CDECLIST 1
- .SH NAME
- mk_cdeclist, mk_identlist \- list external declarations
- for C source files; list identifiers
- .SH SYNOPSIS
- .B mk_cdeclist
- filelist
- .br
- .B mk_identlist
- filelist
- .SH DESCRIPTION
- .B Mk_cdeclist
- is a shell script that links a set of filters and the C preprocessor,
- .B cpp,
- to convert C source files
- into a list of external declarations (functions, arrays,
- variables, structures).
- Each declaration is on one line, to simplify subsequent manipulation
- by awk, etc.
- Initializers are replaced by a comment, /*INITIALIZED*/.
- Function definitions include parameter declarations and
- a dummy statement with appropriate return:
- .in +10
- { [return [(n)];] }
- .in
- The C preprocessor is used to replace defined names and to execute
- compilation conditionals;
- declarations introduced by the preprocessing are removed, and
- include directives are restored.
- .PP
- The output of
- .B mk_cdeclist
- can be compiled into a lint library.
- .PP
- .B Mk_identlist
- is a shell script with filters for converting C source files
- into a file of identifiers suitable for inverted indexing, making
- a word list, etc.
- .PP
- .SH FILES
- The filters for both
- .B mk_cdeclist
- and
- .B mk_identlist
- are from Lex source files:
- .in +5
- uncomment.l, cdeclist[1-4].l and identlist[ 1].l.
- .in
- The executables are correspondingly named.
- .br
- .B Cpp
- is used in
- .B mk_cdeclist.
- .SH AUTHOR
- John Rupley
- .br
- uucp: ..{cmcl2 | hao!ncar!noao}!arizona!rupley!local
- .br
- internet: rupley!local@megaron.arizona.edu
- .br
- Dept. Biochemistry, Univ. Arizona, Tucson AZ 85721
- .SH BUGS
- The cdeclist filters, although simple, should parse most styles of coding.
- Adjustment may be needed for the new ANSI standard.
- If there is a problem, try filtering first with
- .B cb
- or
- .B indent.
- .sp
- The identlist filters are offered without much being claimed for them.
- Use the list as a guide and aid, without assuming it absolutely correct.
- .sp
- The shell scripts mk_* were written under ksh on a SYSV machine, and they
- need modification to run under csh or BSD.
- \Rogue\Monster\
- else
- echo "will not over write ./cdeclist.1"
- fi
- if [ `wc -c ./cdeclist.1 | awk '{printf $1}'` -ne 1998 ]
- then
- echo `wc -c ./cdeclist.1 | awk '{print "Got " $1 ", Expected " 1998}'`
- fi
- if `test ! -s ./uncomment.l`
- then
- echo "writing ./uncomment.l"
- cat > ./uncomment.l << '\Rogue\Monster\'
- %{
- /*UNCOMMENT- based on usenet posting by: */
- /* Chris Thewalt; thewalt@ritz.cive.cmu.edu */
- %}
- STRING \"([^"\n]|\\\")*\"
- COMMENTBODY ([^*\n]|"*"+[^*/\n])*
- COMMENTEND ([^*\n]|"*"+[^*/\n])*"*"*"*/"
- %START COMMENT
- %%
- <COMMENT>{COMMENTBODY} ;
- <COMMENT>{COMMENTEND} BEGIN 0;
- <COMMENT>.|\n ;
- "/*" BEGIN COMMENT;
- {STRING} ECHO;
- .|\n ECHO;
-
-
- \Rogue\Monster\
- else
- echo "will not over write ./uncomment.l"
- fi
- if [ `wc -c ./uncomment.l | awk '{printf $1}'` -ne 349 ]
- then
- echo `wc -c ./uncomment.l | awk '{print "Got " $1 ", Expected " 349}'`
- fi
- if `test ! -s ./cdeclist1.l`
- then
- echo "writing ./cdeclist1.l"
- cat > ./cdeclist1.l << '\Rogue\Monster\'
- %{
- /*-CDECLIST1: prepare for cpp execution of #ifdefs, etc.; */
- /*i.e., setup to restore #includes & remove code added by cpp */
- %}
- %%
- ^\#[ \t]*include.*$ {printf("/*%s*/\n", yytext);
- printf("/*DINGDONGDELL*/\n");
- printf("%s\n", yytext);
- printf("/*DELLDONGDING*/\n");}
- .|\n ECHO;
-
-
- \Rogue\Monster\
- else
- echo "will not over write ./cdeclist1.l"
- fi
- if [ `wc -c ./cdeclist1.l | awk '{printf $1}'` -ne 289 ]
- then
- echo `wc -c ./cdeclist1.l | awk '{print "Got " $1 ", Expected " 289}'`
- fi
- if `test ! -s ./cdeclist2.l`
- then
- echo "writing ./cdeclist2.l"
- cat > ./cdeclist2.l << '\Rogue\Monster\'
- %{
- /*-CDECLIST2: remove cpp-included code, restore #include's */
- %}
- %START DING
- %%
- ^"/*DINGDONGDELL*/"$ BEGIN DING;
- ^"/*DELLDONGDING*/"$ BEGIN 0;
- <DING>^"/*#"[^*]*"*/"$ ;
- <DING>.|\n ;
- ^"/*#"[^*]*"*/"$ {yytext[yyleng-2] = 0;
- printf("%s", &yytext[2]);}
- ^[ \t]*\n ;
- .|\n ECHO;
-
-
- \Rogue\Monster\
- else
- echo "will not over write ./cdeclist2.l"
- fi
- if [ `wc -c ./cdeclist2.l | awk '{printf $1}'` -ne 291 ]
- then
- echo `wc -c ./cdeclist2.l | awk '{print "Got " $1 ", Expected " 291}'`
- fi
- if `test ! -s ./cdeclist3.l`
- then
- echo "writing ./cdeclist3.l"
- cat > ./cdeclist3.l << '\Rogue\Monster\'
- %{
- /*-CDECLIST3: for each function, remove the {function body}; */
- /* (look out for {}'s escaped or within " or ' pairs) */
- %}
- int curly, retval;
- WLF ([ \t\n\r\f]*)
- FUNCSTRT (\){WLF}\{|\;{WLF}\{)
- SKIPALLQUOTED (\"([^"\n]|\\\")*\"|\'.\'|\\.)
- %START CURLY
- %%
- <CURLY>\{ curly++;
- <CURLY>\} {if (--curly == 0) {
- if (retval > 1)
- printf(" return (%d); }", retval);
- else if (retval)
- printf(" return; }");
- else
- printf("}");
- BEGIN 0;
- }}
- <CURLY>{FUNCSTRT} curly++;
- <CURLY>{SKIPALLQUOTED}|.|\n ;
- <CURLY>"return"{WLF}\; retval |= 1;
- <CURLY>"return"{WLF}([^;]|{SKIPALLQUOTED})*\; retval |= 2;
- {FUNCSTRT} {curly=1;retval=0;ECHO;BEGIN CURLY;}
- ^[ \t]*\n ;
- .|\n ECHO;
-
-
- \Rogue\Monster\
- else
- echo "will not over write ./cdeclist3.l"
- fi
- if [ `wc -c ./cdeclist3.l | awk '{printf $1}'` -ne 720 ]
- then
- echo `wc -c ./cdeclist3.l | awk '{print "Got " $1 ", Expected " 720}'`
- fi
- if `test ! -s ./cdeclist4.l`
- then
- echo "writing ./cdeclist4.l"
- cat > ./cdeclist4.l << '\Rogue\Monster\'
- %{
- /*-CDECLIST4: process output of cdeclist3:
- delete initialization expressions;
- reformat for one-line declarations;
- some beautfying -- wise to process original source with cb|indent;
- parsing in DECLST is hacked; would be better to base it cleanly on std ANSI;
- to delete externs or whatever, try: {WLF}extern[^;]*\;{WLF}
- */
- %}
- int curly;
- W ([ \t]*)
- WLF ([ \t\n\f\r]*)
- LET [_a-zA-Z]
- DIGIT [0-9+-/*]
- DIGLET [_a-zA-Z0-9]
- NAME ([*]*{LET}{DIGLET}*)
- ARRAY (\[{DIGIT}*\])
- DECL ([;,*]|{WLF}|{NAME}|{ARRAY})
- FUNCPTR (\({DECL}*\)\({DECL}*\))
- DECLST ({DECL}|{FUNCPTR})*
- FINDSTRUCT (struct|union|enum){WLF}{NAME}?{WLF}\{
- ENDSTRUCT {WLF}\}{WLF}
- FINDFUNC \){DECLST}\{[ ]?(return[^}\n]*)?\}
- ENDFUNC {WLF}\{[ ]?(return[^}\n]*)?\}
- FINDINIT {WLF}\={WLF}
- ENDINIT1 {WLF}\;{WLF}
- ENDINIT2 {WLF}\,{WLF}
- SKIPALLQUOTED (\"([^"\n]|\\\")*\"|\'.\'|\\.)
- %START NORM DELETE DECL
- %{
- #include <ctype.h>
- main()
- {
- BEGIN NORM;
- yylex();
- return 0;
- }
- %}
- %%
- <NORM>"/*"[^\n]*"*/" print_skip(yytext);
- <NORM>"#"[^\n]*$ print_skip(yytext);
- <NORM>{WLF}{FINDINIT} {BEGIN 0;BEGIN DELETE;}
- <NORM>{FINDFUNC}|{FINDSTRUCT} {curly=0;BEGIN 0;BEGIN DECL;REJECT;}
- <DELETE>{SKIPALLQUOTED}|[^{},;] ;
- <DELETE>\{ curly++;
- <DELETE>\} curly--;
- <DELETE>{ENDINIT1} {if (curly == 0) {
- printf("\;\t/*INITIALIZED*/\n");
- ;BEGIN 0; BEGIN NORM;}}
- <DELETE>{ENDINIT2} {if (curly == 0) {
- printf(" /*INITIALIZED*/, ");
- ;BEGIN 0; BEGIN NORM;}}
- <DECL>{ENDSTRUCT} {printf("} ");
- if (--curly == 0) {
- BEGIN 0;BEGIN NORM;
- }}
- <DECL>{ENDFUNC} {printf(" ");print_skip(yytext);
- BEGIN 0;BEGIN NORM;}
- <DECL>\{ {ECHO;curly++;}
- <DECL>{WLF}\){WLF}/{ENDFUNC} printf(")");
- <DECL>{WLF}\;{WLF}/{ENDFUNC} printf(";");
- <DECL>{WLF}\;{WLF} printf("; ");
- <NORM>{WLF}\;{WLF} printf(";\n");
- <NORM,DECL>{WLF}\,{WLF} printf(", ");
- <NORM,DECL>{WLF}/[/#] ;
- <NORM,DECL>{WLF} printf(" ");
- <NORM,DECL>. ECHO;
- %%
- print_skip(s) char * s;
- {
- int c;
- while (isspace(*s))
- s++;
- printf("%s\n", s);
- while (isspace(c=input()))
- ;
- unput(c);
- }
-
-
-
-
- \Rogue\Monster\
- else
- echo "will not over write ./cdeclist4.l"
- fi
- if [ `wc -c ./cdeclist4.l | awk '{printf $1}'` -ne 2012 ]
- then
- echo `wc -c ./cdeclist4.l | awk '{print "Got " $1 ", Expected " 2012}'`
- fi
- if `test ! -s ./mk_cdeclist`
- then
- echo "writing ./mk_cdeclist"
- cat > ./mk_cdeclist << '\Rogue\Monster\'
- # MK_CDECLIST
-
- DEFAULTDIR="."
- CDIR=`dirname $1`
- if [ ${CDIR} = "." ]
- then
- CDIR=${DEFAULTDIR}
- fi
- HDIR=${CDIR}
- CDECLIST=""
- #option: if define LLOUT="Sall.1", then output concatenated onto Sall.1;
- #else get individual output files with prefix "LL";
- LLOUT="Sall.1"
- #LLOUT="dumdumdum"
-
- for i in $*
- do
- FILE=`basename ${i}`
- echo processing file: $i >&2
- if [ ${LLOUT} != "Sall.1" ]
- then
- LLOUT=LL${FILE}
- CDECLIST="${CDECLIST} ${LLOUT}"
- fi
- echo "\n\n/*********************************************/\n" >>${LLOUT}
- echo /*${FILE}*/ >>${LLOUT}
- echo >>${LLOUT}
- cat ${CDIR}/${FILE}|uncomment|cdeclist1|/lib/cpp -P -C -I${HDIR}|
- cdeclist2|cdeclist3|cdeclist4 >>${LLOUT}
- done
-
- #temporary++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
- exit
-
- #NOTE: what follows is an example of manipulation of the cdeclist output.
-
- #the following cats a list of separate "LL" files onto Sall.1
- echo
- echo +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
- echo combining files: ${CDECLIST}
- >Sall.1
- for i in ${CDECLIST}
- do
-
- cat ${i} >>Sall.1
- done
-
- #the following extracts a list of function definitions _only_;
- #(no static or external declarations; no definitions of arrays/vars/structs);
- #collect comment header lines, functions, and struct declarations;
- #get rid of register specs;
- #strip any variable declarations trailing struct declarations;
- #change return (n) -> return (0), so ok for return of pointers;
- #insert VARARGS where needed;
- #to include structure definitions, add to 1st egrep: |struct[ ]+[^;({]+\{
- #remove several functions with "static" scope -- and that give errors in
- #compilation without struct definitions (same scope);
- #this gives a partial lint library - full library needs array/variables;
-
- cat Sall.1|egrep '\/\*[^L]|\{\}|\ return[ ;]' |
- egrep -v '^static\ |INITIALIZED' |
- egrep -v '\ engr_at[(]|^del_engr[(]|\ outentry[(]|^ini_inv[(]' |
- sed -e "s/register[ ]\([^;, ][^;, ]*[;,]\)/int\ \1/g" \
- -e "s/register[ ]//g" \
- -e "s/\}[^;}]*\;$/\}\ \;/" \
- -e "s/return\ .[1-3]./return\ \(0\)/g" \
- -e "s/^panic/\/\*VARARGS1\*\/\\
- panic/" \
- -e "s/^error/\/\*VARARGS1\*\/\\
- error/" \
- -e "s/^pline/\/\*VARARGS1\*\/\\
- pline/" \
- -e "s/^impossible/\/\*VARARGS1\*\/\\
- impossible/p" >Sall.2
-
- \Rogue\Monster\
- else
- echo "will not over write ./mk_cdeclist"
- fi
- if [ `wc -c ./mk_cdeclist | awk '{printf $1}'` -ne 2230 ]
- then
- echo `wc -c ./mk_cdeclist | awk '{print "Got " $1 ", Expected " 2230}'`
- fi
- if `test ! -s ./identlist.l`
- then
- echo "writing ./identlist.l"
- cat > ./identlist.l << '\Rogue\Monster\'
- %{
- /*IDENTLIST- comments & strings deleted; punc to space; etc*/
- /*(a filter to prep C source for bib INDEXing, making a word list, ...)*/
- /*convert to white space all tokens except C keywords and identifiers;*/
- /*comment recognition based on usenet posting by: */
- /* Chris Thewalt; thewalt@ritz.cive.cmu.edu */
- %}
- W [ \t]
- STRING \"([^"\n]|\\\")*\"
- %START COMMENT
- %%
- <COMMENT>([^*\n]|"*"+[^*/\n])* ;
- <COMMENT>([^*\n]|"*"+[^*/\n])*"*"*"*/" BEGIN 0;
- <COMMENT>.|\n ;
- "/*" BEGIN COMMENT;
- {STRING} ;
- ^#.*$ ;
- [a-zA-Z_0-9]\.[a-zA-Z_0-9] ECHO;
- [a-zA-Z_0-9]\-\>[a-zA-Z_0-9] ECHO;
- \\|\||\+|\)|\(|\*|\&|\^|\%|\$ printf(" ");
- \#|\@|\!|\~|\`|\-|\=|\}|\{ printf(" ");
- \]|\[|\"|\:|\'|\;|\?|\/|\>|\.|\<|\, printf(" ");
- .|\n ECHO;
-
- \Rogue\Monster\
- else
- echo "will not over write ./identlist.l"
- fi
- if [ `wc -c ./identlist.l | awk '{printf $1}'` -ne 735 ]
- then
- echo `wc -c ./identlist.l | awk '{print "Got " $1 ", Expected " 735}'`
- fi
- if `test ! -s ./identlist1.l`
- then
- echo "writing ./identlist1.l"
- cat > ./identlist1.l << '\Rogue\Monster\'
- %{
- /*IDENTLIST1- remove strings that are all uppercase or all numbers*/
- /*(another filter to prep C source for bib INDEXing, making a word list, ...)*/
- /*
- ^[A-Z_0-9]+/{W}+ printf(" ");
- {W}+[A-Z_0-9]+$ printf(" ");
- ^[A-Z_0-9]+$ printf(" ");
- */
- %}
- W [ \t]
- %START COMMENT
- %%
- (^|{W}+)[A-Z_0-9]+/(\n|{W}+) printf(" ");
- .|\n ECHO;
-
- \Rogue\Monster\
- else
- echo "will not over write ./identlist1.l"
- fi
- if [ `wc -c ./identlist1.l | awk '{printf $1}'` -ne 335 ]
- then
- echo `wc -c ./identlist1.l | awk '{print "Got " $1 ", Expected " 335}'`
- fi
- if `test ! -s ./mk_identlist`
- then
- echo "writing ./mk_identlist"
- cat > ./mk_identlist << '\Rogue\Monster\'
- # MK_IDENTLIST
- #prepare C source files for making an inverted index, or whatever, using
- #identlist and identlist1 to: delete quoted strings, comments;
- #convert punctuation to white space; delete numbers and all upper case words;
- #the result, with some stuff flagged for later reversal,
- #should be suitable for making an inverted index of identifiers in each
- #source file.
- #
- #if use "invert" from the "bib" suite of programs, can exclude, as common
- #words, the C keywords and the system library
- #
- #(NOTE: as a test, the output of the loop containing the filters is
- #sent through tr, sort and uniq, to produce a word list)
-
- DEFAULTDIR="."
- CDIR=`dirname $1`
- if [ ${CDIR} = "." ]
- then
- CDIR=${DEFAULTDIR}
- fi
-
- for i in $*
- do
- FILE=`basename ${i}`
- echo processing file: $i >&2
- cat ${CDIR}/${FILE}|identlist|identlist1|
- sed -e '/^[ ]*$/d'
- done |tr -s " " "\012\012"|sort|uniq #produce a word list
-
-
- #temporary++++++++++++++++++++++++++++++++++++++++++++++++++
- exit
-
- #to make an inverted index, replace the above loop by the following:
-
- for i in $*
- do
- FILE=`basename ${i}`
- echo processing file: $i
- cat ${CDIR}/${FILE}|identlist|identlist1|
- sed -e 's/\([a-zA-Z_0-9]\)\.[^ \t]*[ \t]/\1\qsq\ /g' \
- -e 's/\([a-zA-Z_0-9]\)\-\>[^ \t]*[ \t]/\1qsq\ /g' \
- -e 's/\([a-zA-Z0-9]\)[_]\([a-zA-Z0-9]\)/\1quq\2/g' |
- sed -e '/^[ ]*$/d' >${FILE}
- FILELIST=${FILELIST}" "${FILE}
- done
-
-
- #NOTE: if the above changes are made, mk_identlist should be run in
- #an empty (scratch) directory;
- #intermediate files of the same name as the C source files are created;
- #the directory with the C source files can be given as DEFAULTDIR or as
- #the absolute pathname of the first file in the list;
-
- #NOTE: the following is an example of processing to get an inverted index.
-
- invert -ccommon -k5000 -l30 ${FILELIST}
- mv INDEX INDEX.long
- cat INDEX.long|
- sed -e 's/quq/\_/g' -e 's/qsq/STRUCT/g' \
- -e 's/\([^0-9]\)[0-9][0-9]*\/[0-9][0-9]*\([^0-9]\)/\1\2/g' \
- -e 's/\([^0-9]\)[0-9][0-9]*\/[0-9][0-9]*$/\1/' |
- sort -o INDEX
-
- \Rogue\Monster\
- else
- echo "will not over write ./mk_identlist"
- fi
- if [ `wc -c ./mk_identlist | awk '{printf $1}'` -ne 2001 ]
- then
- echo `wc -c ./mk_identlist | awk '{print "Got " $1 ", Expected " 2001}'`
- fi
- echo "Finished archive 1 of 1"
- exit
-
-
-
-