Taking the same text as shown in challenge 8, can you (a) compile a dictionary of all the words, and find the commonest, and (b) find out the average sentence length in words?
You can strip out all punctuation, and treat variations of the same word as different (so “night” and “nights” would be distinct, but “night’s” would be the same as “nights” (and indeed “nights’”!)
‘Twas the Night Before Christmas (II)
Re: ‘Twas the Night Before Christmas (II)
Clarification, please. Normally an exclamation mark would be a sentence delimiter, like a full stop. But when it's quoted, such as "On, Comet! On, Cupid! On, Donner and Blitzen!" is that really to be counted as three sentences? They don't have verbs!
But how do I tell where sentences begin and end then?You can strip out all punctuation
Re: ‘Twas the Night Before Christmas (II)
Hi Richard,
These challenges were meant to be a bit of fun - no prizes, so no arguing about rules! You are welcome to interpret it in whatever way seems sensible to you. The idea of is challenge was to get people to think a little about text processing.
You are right that stripping all the punctuation will make it hard to detect ends of sentences, so it probably wouldn't be sensible for that part! I was really thinking about simplifying the first part, to detect different words and construct a dictionary.
Best wishes,
David
These challenges were meant to be a bit of fun - no prizes, so no arguing about rules! You are welcome to interpret it in whatever way seems sensible to you. The idea of is challenge was to get people to think a little about text processing.
You are right that stripping all the punctuation will make it hard to detect ends of sentences, so it probably wouldn't be sensible for that part! I was really thinking about simplifying the first part, to detect different words and construct a dictionary.
Best wishes,
David
-
- Posts: 327
- Joined: Wed 04 Apr 2018, 06:36
Re: ‘Twas the Night Before Christmas (II)
I don't know if this helps. It isn't the full challenge, just the part that constructs the dictionary.
Once the dictionary file was complete, it would be a simple matter to read the words into an array and sort them (or use the sort.lib facilities if speed is required), which would place duplicate words next to one another. Then go through the list and remove the duplicate words. (I think this is faster than checking the file each time a word is added to see whether it is already present.) However before removing duplicates, count them.
Sentences are usually defined as what lies between two terminal punctuations, without regard to grammatical niceties such as the presence of verbs or nouns. Determining sentence lenth is just a matter of finding the string which ends with a full stop or exclamation mark or question mark, counting the number of spaces and adding 1 (for the final word).
Code: Select all
a$="Twas the night before Christmas' fairies go on the prowl."
F%=OPENOUT(@dir$+"Dictionary.txt")
REPEAT
REM Find words
space%=INSTR(a$," ")
IFspace%>0THEN
w$=LEFT$(a$,space%-1)
a$=MID$(a$,space%+1)
ELSE
w$=a$:a$=""
ENDIF
PRINTw$
REM Remove punctuation ect
punc$=",.;:!?'"+CHR$34
FORi%=1TOLENpunc$
punc%=INSTR(w$,MID$(punc$,i%,1))
IFpunc%>0w$=LEFT$(w$,punc%-1)+MID$(w$,punc%+1)
NEXT
PRINTw$
REM Lower case
FORi%=1TOLENw$
c%=ASCMID$(w$,i%,1)OR96
MID$(w$,i%,1)=CHR$c%
NEXT
PRINTw$
g%=GET
REM Add to dictionary
REMBPUT#F%,w$+CHR$13
UNTILa$=""
CLOSE#F%
Sentences are usually defined as what lies between two terminal punctuations, without regard to grammatical niceties such as the presence of verbs or nouns. Determining sentence lenth is just a matter of finding the string which ends with a full stop or exclamation mark or question mark, counting the number of spaces and adding 1 (for the final word).
Re: ‘Twas the Night Before Christmas (II)
For whom?! I recently tried to tackle them because my health is deteriorating so rapidly that I thought they might act as a form of mental exercise. In practice the majority are impossibly difficult for me (several seem to be unrelated to programming and more about mathematics), and the few that I think I should be able to manage are proving difficult. It has been quite stressful so far, anything but "fun".

They are not much of a challenge if I can redefine the rules so that 'anything goes'! I assumed that you would at least be checking any submissions for being a correct solution, otherwise what is the point?You are welcome to interpret it in whatever way seems sensible to you.

I got as far as creating the dictionary (although I wasn't sure what to do with it, the most common word is 'the' unsurprisingly). But then when I came to the second part I realised that I would have to start from scratch because of having thrown away all the punctuation at an early stage. To all intents and purposes it is a separate challenge, so I'm inclined not to bother with it.You are right that stripping all the punctuation will make it hard to detect ends of sentences, so it probably wouldn't be sensible for that part! I was really thinking about simplifying the first part, to detect different words and construct a dictionary.
For what it's worth here is what I came up with for the first part:
Code: Select all
INSTALL @lib$ + "stringlib"
INSTALL @lib$ + "sortlib"
Sort%% = FN_sortinit(1, 0)
REM Create and initialise arrays:
DIM pop%(1000), word$(1000)
REM Read the poem into one long string:
poem$ = ""
REPEAT
READ temp$
IF temp$ = "" EXIT REPEAT
poem$ += " " + temp$
UNTIL FALSE
REM Convert to lowercase:
poem$ = FN_lower(poem$)
REM Replace all punctuation with spaces:
FOR char% = &21 TO &3F
REPEAT
I% = INSTR(poem$, CHR$char%)
IF I% MID$(poem$, I%, 1) = " "
UNTIL I% = 0
NEXT
REM Split into words using library routine:
num% = FN_split(poem$, " ", word$())
REM Sort into alphabetical order to aid counting duplicates:
C% = num% : CALL Sort%%, word$(0)
REM Count and eliminate duplicates:
pop%() = 1
FOR I% = 1 TO num%-1
IF word$(I%) = word$(I% - 1) pop%(I%) = pop%(I% - 1) + 1 : word$(I% - 1) = ""
NEXT
REM Find the most common:
C% = num% : CALL Sort%%, pop%(0), word$(0)
REM Report results:
PRINT "The top twenty most common words are:"
prev$ = ""
I% = 0
FOR N% = 1 TO 20
WHILE word$(I%) = prev$ OR word$(I%) = "" I% += 1 : ENDWHILE
PRINT TAB(3) word$(I%) TAB(12) pop%(I%)
prev$ = word$(I%)
NEXT
END
DATA "'Twas the night before Christmas, when all through the house"
DATA "Not a creature was stirring, not even a mouse;"
DATA "The stockings were hung by the chimney with care"
DATA "In hopes that St. Nicholas soon would be there;"
DATA "The children were nestled all snug in their beds,"
DATA "While visions of sugar-plums danced in their heads;"
DATA "And mamma in her kerchief, and I in my cap,"
DATA "Had just settled our brains for a long winter's nap,"
DATA "When out on the lawn there arose such a clatter,"
DATA "I sprang from the bed to see what was the matter."
DATA "Away to the window I flew like a flash,"
DATA "Tore open the shutters and threw up the sash."
DATA "The moon on the breast of the new-fallen snow"
DATA "Gave the lustre of mid-day to objects below,"
DATA "When, what to my wondering eyes should appear,"
DATA "But a miniature sleigh, and eight tiny reindeer,"
DATA "With a little old driver, so lively and quick,"
DATA "I knew in a moment it must be St. Nick."
DATA "More rapid than eagles his coursers they came,"
DATA "And he whistled, and shouted, and called them by name:"
DATA """Now, Dasher! Now, Dancer! Now, Prancer and Vixen!"
DATA "On, Comet! On, Cupid! On, Donner and Blitzen!"
DATA "To the top of the porch! To the top of the wall!"
DATA "Now dash away! Dash away! Dash away all!"""
DATA "As dry leaves that before the wild hurricane fly,"
DATA "When they meet with an obstacle, mount to the sky;"
DATA "So up to the house-top the coursers they flew,"
DATA "With the sleigh full of toys, and St. Nicholas too."
DATA "And then, in a twinkling, I heard on the roof"
DATA "The prancing and pawing of each little hoof."
DATA "As I drew in my head, and was turning around,"
DATA "Down the chimney St. Nicholas came with a bound."
DATA "He was dressed all in fur, from his head to his foot,"
DATA "And his clothes were all tarnished with ashes and soot;"
DATA "A bundle of Toys he had flung on his back,"
DATA "And he looked like a peddler just opening his pack."
DATA "His eyes - how they twinkled! His dimples how merry!"
DATA "His cheeks were like roses, his nose like a cherry!"
DATA "His droll little mouth was drawn up like a bow,"
DATA "And the beard of his chin was as white as the snow;"
DATA "The stump of a pipe he held tight in his teeth,"
DATA "And the smoke it encircled his head like a wreath;"
DATA "He had a broad face and a little round belly,"
DATA "That shook when he laughed, like a bowlful of jelly."
DATA "He was chubby and plump, a right jolly old elf,"
DATA "And I laughed when I saw him, in spite of myself;"
DATA "A wink of his eye and a twist of his head,"
DATA "Soon gave me to know I had nothing to dread;"
DATA "He spoke not a word, but went straight to his work,"
DATA "And filled all the stockings; then turned with a jerk,"
DATA "And laying his finger aside of his nose,"
DATA "And giving a nod, up the chimney he rose;"
DATA "He sprang to his sleigh, to his team gave a whistle,"
DATA "And away they all flew like the down of a thistle."
DATA "But I heard him exclaim, ere he drove out of sight,"
DATA """Happy Christmas to all, and to all a good-night."""
DATA ""
Re: ‘Twas the Night Before Christmas (II)
If I run your code I get this:
Code: Select all
Twas
Twas
twas
Incidentally you shouldn't attempt to write to the @dir$ directory, it is typically not writable by a non-admin. This code will fail on most platforms (perhaps not when run in the IDE, but commonly once 'compiled' to an application bundle):
Code: Select all
F%=OPENOUT(@dir$+"Dictionary.txt")
-
- Posts: 327
- Joined: Wed 04 Apr 2018, 06:36
Re: ‘Twas the Night Before Christmas (II)
If you press any key you get the next word, and so on to the end of the string. The first occurance shows that you have successfully detected a word, the second that punctuation has been stripped, the third that it is reduced to lower case. If it was a proper program, there would be no PRINTw$, no g%=GET, and the write to file would not be REMed out.
Your remonstrance regarding writing to @dir$ is perfectly correct - if the program is on the C: drive. Typically mine are not, so writing to @dir$ works just fine, provided I have saved it before running. If I haven't done so, then the IDE assumes that the program is in the same directory as BASIC and the fault you mentioned occurs.
Your remonstrance regarding writing to @dir$ is perfectly correct - if the program is on the C: drive. Typically mine are not, so writing to @dir$ works just fine, provided I have saved it before running. If I haven't done so, then the IDE assumes that the program is in the same directory as BASIC and the fault you mentioned occurs.
Re: ‘Twas the Night Before Christmas (II)
Oh. How was I supposed to guess that?! There was no "press a key" prompt.

It has nothing to do with the 'drive' (on most platforms the concept of 'drive' has no meaning anyway, that is an MS-DOS and Windows-specific thing). You certainly cannot assume that any program you publish at this forum is going to be run in Windows!Your remonstrance regarding writing to @dir$ is perfectly correct - if the program is on the C: drive.
@dir$ represents the directory (folder) in which the program itself resides. In MacOS, Android and iOS this is always read-only so attempting to write to it will fail. In Windows and Linux it may be writable or it may not, depending on where the program is installed and the permissions (if it's in the 'official' place in Windows - C:\Program Files (x86) - it will definitely not be writable by a normal user). In the in-browser edition it will always be writable.
So the only safe thing is never to attempt to write to @dir$. If it's a 'temporary' file that you don't need to keep, store it in @tmp$; if it's a 'persistent' file store it in @usr$. Crucially, @usr$ is guaranteed to be a user-specific directory so storing a file there won't overwrite a file of the same name created by another user. That makes it ideal for things like 'saved games'.
The sole purpose of @dir$ is to identify where resource files can be found. Typically these will be resources like graphics sprites, sound-effects and music; they may also include code sub-modules if you use them. For neatness they will often be stored in a sub-directory of @dir$ rather than in @dir$ itself. Those resources will be embedded in a 'compiled' Application Bundle and automatically extracted when the program is first run (which is why you must have admin privileges for that first run).
Re: ‘Twas the Night Before Christmas (II)
In the case of a Windows .exe, that is. On other platforms the 'extraction' of embedded files (if that's even the right word) happens when the bundle is installed, not when it is first run.Hated Moron wrote: ↑Sat 25 Nov 2023, 10:51 Those resources will be embedded in a 'compiled' Application Bundle and automatically extracted when the program is first run (which is why you must have admin privileges for that first run).