Tutorial 039 Funny Names - Extracting Data
neatly from Web-Pages
Its nice to be able to extract a table of data from a web-page
and process it using a BB4W program without having to do lots of tedious
retyping. (Programming is more fun than typing...).
If you just put funny names into Google you will find
lots of useful sites listing names such as Adam Zapple.
(eg try http://website.lineone.net/~gardenworks/names-a-d.htm)
Once you have found such a web-page you can save it as a text file
by clicking
- File/
- Save as.../
- Choose a suitable directory/
- Save as type (click the menu sign like a v)/
- then choose the text file option/
- edit the file name if you wish/
- Save
In this way you might have a text file that looks like this:
We love funny names
A-DE-MN-ZFront Page
Click On Headings For Explanation
Adam Zapple
Al Beback
Al Lejance
Alf Abett
Ali Barster
Amanda Sol de Werk
Amos Skittow
Amy Stake
Andy Tover
Andy Wineriss
Angus Macoatup
Ann Chovie
Ann Jyna
Ann Tenor
etc
^ Obviously the first three lines of print are superfluous and can
be edited out of the text file by hand and the file resaved.
You will see that there are leading spaces before the names and
also blank lines. If the list was long it might be very tedious to edit
these out by hand. The following program demonstrates how to clean
out these unwanted spaces and blank lines automatically.
Here is the output resulting from processing the above
file :
Press <SPACE> to choose a text
file whose lines you wish to clean up :
Full Pathname of this text file :
C:\............\Names\ funnynames.txt
Adam Zapple
Al Beback
Al Lejance
Alf Abett
Ali Barster
Amanda Sol de Werk
Amos Skittow
Amy Stake
Andy Tover
Andy Wineriss
Angus Macoatup
Ann Chovie
Ann Jyna
Ann Tenor
Anna Dapter
Anna Kronism
Anna Reksic
Anna Notherthing
Anne Dryer
Anne Kersaway
Anne Tellope
Anne Yewelevent
Annette Kurtain
etc
This technique will be taken further in subsequent
tutorials when preparing randomised lists, DATA lines for programs and in
a novel phonebook program...
Listing :
REM : Removes leading
spaces and blank lines
REM : from a text file
REM : Richard Weston, 9th July 2003
MODE 8
VDU14
COLOUR1
PRINT'" Press <SPACE> to choose
a text file whose lines you wish to clean up"
G=GET
OFF
:
DIM of% 75, ff% 18, fn% 255
!of% = 76
of%!4 = @hwnd%
of%!12 = ff%
of%!28 = fn%
of%!32 = 256
of%!52 = 6
$ff% = "Text Files"+CHR$0+"*.txt"+CHR$0+CHR$0
:
SYS "GetOpenFileName", of% TO result%
IF result% filename$ = FNnulterm$(fn%)
COLOUR7
PRINT'" Full Pathname of this text file
:"
COLOUR2
PRINT'filename$
PRINT'"Press SHIFT to scroll down through
the results"
:
fnum=OPENIN filename$
IF fnum=0 THEN PRINT "No ";filename$;"
data": END
:
COLOUR7
REPEAT
line$=""
REPEAT
temp=BGET#fnum
:REM Read byte
line$+=CHR$(temp)
UNTIL temp=10 OR temp=13
PROCcheckline
IF printworthy THEN
PRINT line$
ENDIF
UNTIL EOF#fnum
CLOSE#fnum
:
PRINT'" Press<SPACE> to go again..."
G=GET
RUN
END
:
DEF FNnulterm$(P%)
LOCAL A$
WHILE ?P% <> 0
A$ += CHR$?P%
P% += 1
ENDWHILE
= A$
:
DEF PROCcheckline
LOCAL i,L,char$,asc
:
WHILE LEFT$(line$,1)=" "
line$=MID$(line$,2) : REM
Remove leading spaces
ENDWHILE
:
L=LEN(line$)
printworthy=FALSE
FOR i=1 TO L
char$=MID$(line$,i,1)
asc=ASC(char$)
IF asc>32 AND asc<127
THEN printworthy=TRUE
NEXT i
ENDPROC
Annotated Listing :
REM : Removes
leading spaces and blank lines
REM : from a text file
REM : Richard Weston, 9th July 2003
MODE 8
VDU14 ***REM paged mode on ***
COLOUR1
PRINT'" Press <SPACE> to choose
a text file whose lines you wish to clean up"
G=GET
OFF
:
DIM of% 75, ff% 18, fn% 255 ***
Usual routine for opening a text file ***
!of% = 76
of%!4 = @hwnd%
of%!12 = ff%
of%!28 = fn%
of%!32 = 256
of%!52 = 6
$ff% = "Text Files"+CHR$0+"*.txt"+CHR$0+CHR$0
:
SYS "GetOpenFileName", of% TO result%
IF result% filename$ = FNnulterm$(fn%)
*** end of routine ***
COLOUR7
PRINT'" Full Pathname of this text file
:"
COLOUR2
PRINT'filename$
PRINT'"Press SHIFT to scroll down through
the results"
:
fnum=OPENIN filename$ *** opens
the file for reading ***
IF fnum=0 THEN PRINT "No ";filename$;"
data": END
:
COLOUR7
REPEAT *** reads characters from file
until end of line is detected at XXXX ***
line$=""
REPEAT
temp=BGET#fnum
:REM Read byte
line$+=CHR$(temp)
*** add new character ***
UNTIL temp=10 OR temp=13
*** here's XXXX *** 10 specifies "move cursor down one line" ***
*** 13 specifies "move cursor to start of new
line ***
PROCcheckline ***removes
leading spaces and decide whether linr contains printable characters ***
IF printworthy THEN
PRINT line$
ENDIF
UNTIL EOF#fnum *** end of file
marker ***
CLOSE#fnum *** close up the file ***
:
PRINT'" Press<SPACE> to go again..."
G=GET
RUN
END
:
DEF FNnulterm$(P%) *** needed for open
file routine ***
LOCAL A$
WHILE ?P% <> 0
A$ += CHR$?P%
P% += 1
ENDWHILE
= A$
:
DEF PROCcheckline
LOCAL i,L,char$,asc *** ensures these
variables are not available to the rest of the program ***
:
WHILE LEFT$(line$,1)=" "
line$=MID$(line$,2) : REM
Remove leading spaces
ENDWHILE
:
L=LEN(line$)
printworthy=FALSE
FOR i=1 TO L *** examines each character
in the line ***
char$=MID$(line$,i,1)
asc=ASC(char$)
IF asc>32 AND asc<127
THEN printworthy=TRUE *** These ASCII values are for the "visible" characters
***
NEXT i
ENDPROC
Next Tutorial
Richard Weston's Homepage