'Twas the Night Before Christmas! (I)

This started out as a series of Advent challenges a couple of Christmases ago, but is now open for anyone to post challenges on any topic!
DDRM

'Twas the Night Before Christmas! (I)

Post by DDRM »

Determine the letter frequency distribution for this famous verse (text below, so we all use the same version), and give the top 12 results. Ignore all punctuation /non-alphabetic characters. How does it compare with the classic “etaoinshrdlu”?

Bonus: calculate the root mean square difference of etaoinshrdlu from your result (i.e. if 10 of these letters come in the same places, 1 is 3 places away and 1 is 4 places away, this would be SQR( (10 x 0) + (1 x 9) + (1 x 16) ) = SQR(25) = 5). Yes, I know that example is impossible, but it’s mathematically simple! (Actually, thinking about it, it's not impossible...)

Hints: save the poem as a .txt file. In BBC BASIC you will be able to read it in as a single string (look up GET$# BY and EXT# for some hints!), which is then easy to process. Don’t forget to allow for upper and lower case letters!

’Twas the night before Christmas, when all through the house
Not a creature was stirring, not even a mouse;
The stockings were hung by the chimney with care
In hopes that St. Nicholas soon would be there;

The children were nestled all snug in their beds,
While visions of sugar-plums danced in their heads;
And mamma in her kerchief, and I in my cap,
Had just settled our brains for a long winter’s nap,

When out on the lawn there arose such a clatter,
I sprang from the bed to see what was the matter.
Away to the window I flew like a flash,
Tore open the shutters and threw up the sash.

The moon on the breast of the new-fallen snow
Gave the lustre of mid-day to objects below,
When, what to my wondering eyes should appear,
But a miniature sleigh, and eight tiny reindeer,

With a little old driver, so lively and quick,
I knew in a moment it must be St. Nick.
More rapid than eagles his coursers they came,
And he whistled, and shouted, and called them by name:

"Now, Dasher! Now, Dancer! Now, Prancer and Vixen!
On, Comet! On, Cupid! On, Donner and Blitzen!
To the top of the porch! To the top of the wall!
Now dash away! Dash away! Dash away all!"

As dry leaves that before the wild hurricane fly,
When they meet with an obstacle, mount to the sky;
So up to the house-top the coursers they flew,
With the sleigh full of toys, and St. Nicholas too.

And then, in a twinkling, I heard on the roof
The prancing and pawing of each little hoof.
As I drew in my head, and was turning around,
Down the chimney St. Nicholas came with a bound.

He was dressed all in fur, from his head to his foot,
And his clothes were all tarnished with ashes and soot;
A bundle of Toys he had flung on his back,
And he looked like a peddler just opening his pack.

His eyes - how they twinkled! His dimples how merry!
His cheeks were like roses, his nose like a cherry!
His droll little mouth was drawn up like a bow,
And the beard of his chin was as white as the snow;

The stump of a pipe he held tight in his teeth,
And the smoke it encircled his head like a wreath;
He had a broad face and a little round belly,
That shook when he laughed, like a bowlful of jelly.

He was chubby and plump, a right jolly old elf,
And I laughed when I saw him, in spite of myself;
A wink of his eye and a twist of his head,
Soon gave me to know I had nothing to dread;

He spoke not a word, but went straight to his work,
And filled all the stockings; then turned with a jerk,
And laying his finger aside of his nose,
And giving a nod, up the chimney he rose;

He sprang to his sleigh, to his team gave a whistle,
And away they all flew like the down of a thistle.
But I heard him exclaim, ere he drove out of sight,
"Happy Christmas to all, and to all a good-night."
Hated Moron

Re: 'Twas the Night Before Christmas! (I)

Post by Hated Moron »

DDRM wrote: Wed 08 Dec 2021, 09:10 Bonus: calculate the root mean square difference of etaoinshrdlu from your result (i.e. if 10 of these letters come in the same places, 1 is 3 places away and 1 is 4 places away, this would be SQR( (10 x 0) + (1 x 9) + (1 x 16) ) = SQR(25) = 5).
I'm confused by this. It says to calculate the Root Mean Square difference but where is the mean in your calculation? I would have expected it to be:

Code: Select all

SQR( ( (10 x 0) + (1 x 9) + (1 x 16) ) / 12) =  SQR(25/12) ≈ 1.44
Or have I misunderstood something?
DDRM

Re: 'Twas the Night Before Christmas! (I)

Post by DDRM »

You are asked to calculate the 12 most frequent letters (i.e. the order of them). I then gave you a sequence of 12 letters, which is one common set given (i.e. 'e' is the most common, then 't', and so on) - but the sequence varies a bit depending on the source of the text (there are national variations, even within English), and randomly in short texts (which this still is).

What I meant was to square the difference of the positions you find with these positions, sum them, and then square root the result.
So,
Given: etaoinshrdlu
Yours: eatoishnrdlu

If I can read my own writing:
e,o,i,r,d,l,u are in the same positions, so score 0
a,t,s,h "are one place out", so score 1^2 = 1, for a total of 4
n is 2 places out, for a score of 4

Total is 8, so the answer would be SQR(8), or a bit less than 3.

Best wishes,

D
Hated Moron

Re: 'Twas the Night Before Christmas! (I)

Post by Hated Moron »

DDRM wrote: Mon 20 Nov 2023, 09:06 You are asked to calculate....
I understood all that, the task was perfectly clear.
What I meant was to square the difference of the positions you find with these positions, sum them, and then square root the result.
But that's not RMS (Root Mean Square) it's what's returned by BBC BASIC's MOD function when used with an array. Physically it represents the length of a vector in multi-dimensional space; I wouldn't have thought it was a useful measure in this case.
Total is 8, so the answer would be SQR(8), or a bit less than 3.
Surely you were right to ask for the RMS value in the original task? That's a useful measure of the 'average' distance (in this case it would be about 0.82).

Consider a set of equal values, let's say 2 2 2 2 2; the mean is (2 + 2 + 2 + 2 + 2) / 5 which is two, the RMS is SQR((4 + 4 + 4 + 4 + 4) / 5) = SQR(4) which is also 2. This is a key property of 'averages', if all the data values are the same.

But your measure gives SQR(20) or about 4.47. If we double the number of data points to 10 (2 2 2 2 2 2 2 2 2 2) the mean, mode, median and RMS values are still all 2 of course, but your calculation gives SQR(40) or about 6.32. I just can't see how it's useful in this case.

I'm neither a mathematician nor a statistician, but I did spend my entire career in electronics, where RMS is a very useful and practical measure. Are you quite sure that what you want is MOD() rather than the RMS value?
Hated Moron

Re: 'Twas the Night Before Christmas! (I)

Post by Hated Moron »

Hated Moron wrote: Mon 20 Nov 2023, 09:56 Are you quite sure that what you want is MOD() rather than the RMS value?
Personally I'm confident that RMS is the more useful figure. So on that basis here's my solution to the challenge:

Code: Select all

      INSTALL @lib$ + "sortlib"
      Sort%% = FN_sortinit(1, 0)

      REM Create and initialise arrays:
      DIM pop%(255), chr&(255)
      FOR I% = 0 TO 255 : chr&(I%) = I% : NEXT

      REM Read the poem and count the letters:
      REPEAT
        READ poem$
        IF poem$ = "" EXIT REPEAT
        FOR I% = 1 TO LEN(poem$) : pop%(ASCMID$(poem$,I%,1)) += 1 : NEXT
      UNTIL FALSE

      REM Combine upper and lower case:
      FOR I% = 65 TO 90 : pop%(I%) += pop%(I% + 32) : NEXT

      REM Sort A-Z into descending order of frequency:
      C% = 26 : CALL Sort%%, pop%(65), chr&(65)

      REM List the top twelve most common characters:
      FOR I% = 65 TO 76
        PRINT TAB(4) CHR$(chr&(I%) + 32), pop%(I%)
      NEXT

      REM Calculate RMS difference from classic distribution:
      ss% = 0 : n% = 0
      FOR I% = 1 TO 26
        P% = INSTR("ETAOINSHRDLU", CHR$chr&(I% + 64))
        IF P% ss% += (P% - I%) ^ 2 : n% += 1
      NEXT
      PRINT '"RMS difference = "; SQR(ss% / n%)
      END

      DATA "'Twas the night before Christmas, when all through the house"
      DATA "Not a creature was stirring, not even a mouse;"
      DATA "The stockings were hung by the chimney with care"
      DATA "In hopes that St. Nicholas soon would be there;"

      DATA "The children were nestled all snug in their beds,"
      DATA "While visions of sugar-plums danced in their heads;"
      DATA "And mamma in her kerchief, and I in my cap,"
      DATA "Had just settled our brains for a long winter's nap,"

      DATA "When out on the lawn there arose such a clatter,"
      DATA "I sprang from the bed to see what was the matter."
      DATA "Away to the window I flew like a flash,"
      DATA "Tore open the shutters and threw up the sash."

      DATA "The moon on the breast of the new-fallen snow"
      DATA "Gave the lustre of mid-day to objects below,"
      DATA "When, what to my wondering eyes should appear,"
      DATA "But a miniature sleigh, and eight tiny reindeer,"

      DATA "With a little old driver, so lively and quick,"
      DATA "I knew in a moment it must be St. Nick."
      DATA "More rapid than eagles his coursers they came,"
      DATA "And he whistled, and shouted, and called them by name:"

      DATA """Now, Dasher! Now, Dancer! Now, Prancer and Vixen!"
      DATA "On, Comet! On, Cupid! On, Donner and Blitzen!"
      DATA "To the top of the porch! To the top of the wall!"
      DATA "Now dash away! Dash away! Dash away all!"""

      DATA "As dry leaves that before the wild hurricane fly,"
      DATA "When they meet with an obstacle, mount to the sky;"
      DATA "So up to the house-top the coursers they flew,"
      DATA "With the sleigh full of toys, and St. Nicholas too."

      DATA "And then, in a twinkling, I heard on the roof"
      DATA "The prancing and pawing of each little hoof."
      DATA "As I drew in my head, and was turning around,"
      DATA "Down the chimney St. Nicholas came with a bound."

      DATA "He was dressed all in fur, from his head to his foot,"
      DATA "And his clothes were all tarnished with ashes and soot;"
      DATA "A bundle of Toys he had flung on his back,"
      DATA "And he looked like a peddler just opening his pack."

      DATA "His eyes - how they twinkled! His dimples how merry!"
      DATA "His cheeks were like roses, his nose like a cherry!"
      DATA "His droll little mouth was drawn up like a bow,"
      DATA "And the beard of his chin was as white as the snow;"

      DATA "The stump of a pipe he held tight in his teeth,"
      DATA "And the smoke it encircled his head like a wreath;"
      DATA "He had a broad face and a little round belly,"
      DATA "That shook when he laughed, like a bowlful of jelly."

      DATA "He was chubby and plump, a right jolly old elf,"
      DATA "And I laughed when I saw him, in spite of myself;"
      DATA "A wink of his eye and a twist of his head,"
      DATA "Soon gave me to know I had nothing to dread;"

      DATA "He spoke not a word, but went straight to his work,"
      DATA "And filled all the stockings; then turned with a jerk,"
      DATA "And laying his finger aside of his nose,"
      DATA "And giving a nod, up the chimney he rose;"

      DATA "He sprang to his sleigh, to his team gave a whistle,"
      DATA "And away they all flew like the down of a thistle."
      DATA "But I heard him exclaim, ere he drove out of sight,"
      DATA """Happy Christmas to all, and to all a good-night."""

      DATA ""
The question was ambiguous in one respect: it doesn't say whether the RMS (or MOD) difference is to be calculated for all twelve letters in “etaoinshrdlu” or only those which also appear in the twelve most common characters in 'Twas the night ("ehatnoisldrw"), i.e. omitting U which doesn't appear in both. I've assumed the former, but if the latter is wanted change FOR I% = 1 TO 26 to FOR I% = 1 TO 12 (it changes the RMS difference from 2.217 to 2.296).
DDRM

Re: 'Twas the Night Before Christmas! (I)

Post by DDRM »

Hi Richard,

Yes, I'm sure you are right! I'm also neither a mathematician nor a statistician, but I didn't spend my career in electical engineering either!

Best wishes,

D