Please visit my new campsite listing site ukcampingmap.co.uk


Gobbledegook

This relates to something I will blog more at length about in the next few days, but I thought I’d ask this question first to see if anyone who reads this might know the answer.

I’m going to try and approximate, using the simplest way possible, an English language sentence. The method I’m going to use is to pick a number, N, and make my selection of words from random strings of at most N letters.

  • If N = 2 a sentence would look like this: d fo mh j e l tx df d
  • If N = 5 a sentence would look like this: gh e kj jegns tyu dfa o wdu tah ttauo kk

So here’s my question:

If I want to approximate the distribution of word-lengths in the English language, which value of N should I choose?

I know it won’t be a very close approximation, but it’s very quick and easy to generate the words using this set-up.

No related posts.

Tags: , , ,

2 Responses to “Gobbledegook”

    • wheresrhys says:

      Not really unfortunately. The distribution I’ll be using for wordlengths is … can’t remember what it’s called actually… but a straight, horizontal line graph. So I need the straight line raph which is the closest approximation to the bell curve of the actual English language.

      Good and interesting research though

Leave a Reply

Security Code: