Widgets — Reverse Engineering WordNet

Hype Book vs DictionaryWith the proof copy of A Book About Hype in production, there’s not much to do but wait. There’s a significant delay. In frustration, I seriously considered printing the books myself. While it certainly can be done, that’s not what Photics is about. Patience is needed during the pandemic. Lulu has delivered quality in the past. I’m optimistic that they can do it again. Instead, I should focus my efforts on Widgets. I too need to deliver quality — a dictionary / thesaurus widget. Many people are waiting.

While working on the new book, I received emails about the Widgets app. Lots of feature requests were made. The most popular was a dictionary and/or thesaurus widget. Ah, that certainly would have been a helpful writing tool while making the book. The dictionary was one of the main reasons why I used Apple’s Dashboard. I want Widgets to have the same functionality. Having a dictionary available at a moment’s notice is nice.

The problem is that it’s not so easy to get a database of words and definitions. It’s either expensive, encumbered by license restrictions, or a technical challenge. Fortunately, there’s WordNet from Princeton University.

WordNet® is unencumbered, and may be used in commercial applications

Oooohhh… that works. So I downloaded a copy. Immediately, I saw two problems. One, it’s kinda big. 55 megabytes would increase the size of the app by 4-5 times. This is not as terrible as other options as I’ve seen, so I kept investigating. Uh… there are dozens of files here. I didn’t know where to begin. It just wasn’t making sense.

Oh wait… index.sense …I could look at that file.

WordNet index.sense — Photic

Apparently, all of the words are listed here. The eight digit number appears to be some sort of index number. I’m not sure what the other numbers or characters mean. That’s when I looked at the other “index” files. The “index.adj” file is a list of adjectives.

active a 14 6 ! & ^ = + ; 14 8 00037570 01664870 00038863 01519363 00032087 00042677 00035578 00930614 00043630 00042258 00041840 00041583 00040548 00034823

I could see the word at the beginning. It seems the letter “a” is for adjective, but then I got lost… fourteen… six… exclamation point… ampersand… carrot… equals… plus… semicolon… fourteen (again)… eight.

It didn’t make sense to me, but at least I’m closer. I know the first part, which is the word itself. The second part is the type of word. The third part… “active” …gave it away. That’s the number of definitions for the word.

I’m starting to form how I want to structure the database. Hopefully I can keep it lightweight and fast — while unleashing the power of WordNet. It’s not quite a dictionary, nor is it exactly a thesaurus. WordNet is a bit of both and more. It shows relationships between words. I’ve barely scratched the surface here, but this is a good start.

The plan is to try and release a new widget every month — for the next four years. I might change the schedule, but that’s the plan for the immediate future. Creating a dictionary/thesaurus widget is the plan for June. Using WordNet, I might actually be able to get the job done.