Category Archives: Technology

MongoDB and FUD

The Problem:

Our data set consists of gigabytes upon gigabytes of pickled Python dictionaries, CSV files, and plain text, with the odd bit of Excel or Word. I have three goals:

  • Maintain this monstrosity
  • Create a searchable index
  • Build a new version for the future.

The entire app is a single monolithic Python app: there is no such thing as a “front end” or a “back end” or “middleware”. It’s a web app but there’s no templates; it generates HTML via print statements. The same Python file may include standalone logic,or  shared logic to be used by other components. It’s a bit of a mess. Lastly the framework it uses is basically abandonware; I haven’t tried to see if it runs under any Python after 2.4, and you can be sure it won’t work under 3.

My first task was the search problem. I started with Whoosh but after about a year, it started to run into performance problems, and I’d also learned enough about information retrieval that I wanted some more features. The Whoosh guy is awesome and he’s done a hell of a thing, though; I cannot recommend it enough for smaller projects, but I needed more. I’d attended a talk at Pycon about Elasticsearch, so I switched to that, and it’s been awesome. 

My strategy was pretty simple: a cron job to regenerate the world. Since Elasticsearch is really, really fast, it took perhaps 30 minutes to reindex the entire data set, and since it’s not a 24/7 use case running it at night is no big deal. (I’d like to provide real-time search but my users rarely need it; they’re content to have today’s new data appear tomorrow)

This worked so well for 2 reasons. First, I’d learned enough about the “common data set” that I could make the custom indexer pretty easy to work with since I knew enough about my users search needs that I could ignore 99.9% of the data. And second, Python dictionaries map really well to JSON, which Elasticsearch uses as its input and output.

In building the regenerate-the-world scripts, I had written a huge amount of code to 1)walk the entire flat-file “database” and 2)make lots and lots of sense of it all. I did stuff like, “ensure that every disparate part of the app always refers to a Project by the faux-primary-key ‘projectid’ instead of ‘pj’ and ‘projid’ and whatever else”. My indexer did a pretty decent job of cleaning up this semi-schemaless data; so now what?

Since our app uses CouchDB, it was my first choice, and very quickly abandoned. I loathe CouchDB. It makes a lot of sense in our app, but not for a general-purpose data store. 

Up next was “any ol’ RDBMS”, which means MySQL. Attempts to hammer the semi-schemaless data into relational format resulted in a data model so complex and byzantine, it was practically recursive. Instead of 3rd normal form I made a wormhole into a hell-dimension. So, no.

Despondent and generally upset, I tried MongoDB. And it worked! Experiments worked really well! 

  • As I said, Python dictionaries map very well to JSON/BSON so the amount of friction in import/export was minimal.
  • Ad-hoc queries
  • easy blob storage for stuff like Word documents
  • It’s fast (importing the world took perhaps 20 minutes)
  • It’s easy to set up (compile and go, basically)
  • Support for every language and platform I could think of
  • Has some replication capability in case I ever need it

I wasn’t really sure about a couple things, mainly backup-and-restore, but that was really my only concern, and the Mongo docs on the topic seemed straightforward enough; my users can tolerate an hour of downtime.

And now, the point of my little story: I think Mongo DB is picked on more than just about any platform save PHP. There is so much fear, uncertainty, and doubt spread about it, it’s started to leak into my world and freak me out.

Consider the most recent thing, the “randomly log stuff” bit in the Java driver. Places like /r/shittyprogramming were all over it with digital brickbats. Every thread was then a free-for-all of “here’s now MongoDB screwed me over/Here’s why MongoDB sucks” stories from all over the internets.

Panic set in. This data is mission-critical; while my users can tolerate small amounts of downtime and don’t need OTP-type features, it’s still mission-critical data. Have I fucked up royally here? Have I set myself up for epic fail? Or am I just giving in to the sort of FUD that pervades every goddamn internet discussion about any sort of technology? Let’s face it: people pile on and rarely are they anywhere nearly as awesome as they think they are. 

At this point I’m not entirely sure what to do. My thought was to return to the cold comfort of MySQL, using a Friendfeed-style schemaless system. It’s a huge orthogonal step but I’ve recovered horribly fucked MySQL databases after three-too-many bottles of Tequila, so it’s safe and well-understood. It puts the impetus on me to write the entire friggin’ access layer, but whatever. I know about Postgres and JSON, but I don’t know Postgres at all.

Am I giving in to FUD? Do I stay the course, trusting that my proven, real-world positives outweigh potential negatives?

A little Coda 2 plugin

I made a little Coda 2 plugin for fun: wrapping js-beautify. You can download it here

Pay close attention to the documentation; the script environment doesn’t do what you think it does, and so you’ll be very grouchy.

I want to love Coda a lot; as I mentioned in my review of Komodo IDE, I mention as one of its downsides it’s so very not-native. Coda is extremely, wonderfully, beautifully native. So there’s that.

But it’s ultimately an app for cranking out WordPress plugins, not doing serious work. If you want to build serious, production web applications – and especially in something other than PHP – look elsewhere.

Homage vs Rip-Off

Via Daring Fireball, this shows a comparison between the designs of Dieter Rams and Apple/Jony Ive. 

The comments sum it up nicely: these aren’t even the same class of product and are separated by decades. A radio from 1967 compared to a desktop computer from 2003 is a sensible, sane comparison to a sensible, sane person? You can’t figure out the difference between those two things, and a company making cell phones today that are essentially copies of another company that makes cell phones today?

Put another way: There’s a world of difference between being in the band Teenage Bottlerocket and playing a song whose lyrics start with, “Hey, Ho, Lets Go” and is mostly A and D barre chords. 

NodeJS isn’t cancer (more like a bad case of food poisoning)

Recently at work I got involved in a somewhat lengthy (and at times a little heated) discussion about the efficacy of Node.js being introduced into our stack. 

For context, our application consists of a few disparate components: a kernel of applications written in C/C++/assembler, then a huge mass of mostly Python (654,852 SLOC according to sloccount) to handle everything of core system administration to the web. There is considerable Python expertise in-house; at one point we even employed core Python committers. Our build and test rigs use Python. One of my projects, converting the internal network applications to virtual machines, makes heavy use of libvirt’s Python API. There is more Erlang in production than Javascript. 

Naturally, for the next iteration of the web side of the application, a completely from-scratch rewrite using Node.js makes the most sense, right? Of course it does. At least, that’s the argument I lost. But I’m not bitter[1].

But! I’m an adult, and so I’m aware that I either do my job or find a new one. I think the AR15 platform is a piece of shit but I didn’t really waste any time yelling at the Marine Corps that there’s lots of better ways to kill bad guys, I just did my best to keep mine clean and my proficiency at peak. (Also, we’re talking about a prototype here; we have nothing to lose and everything to gain. If it doesn’t work, no problem, we have other directions to go. If it does work, our application is that much better.)

So to prepare for the clusterfuck^Wgreat leap forward in our application, I’ve been spending some quality time with Node.js. The original Node.js rant, “Node.js is cancer”, is no longer on the web, sadly. You can find other rants elsewhere; my current favorite are on Youtube, here and here. (“You may recall sequential code; that’s the code you can read.”)

I’ve found a few things about Node that I really like, though.  It’s not all cancer.

Fun Thing The First: Modules Are Pretty Great

I really, really don’t miss modeling everything as a class. Maybe I’m just bored of OO, but organizing code by gross function feels much more natural and easy these days. I’m sure over the long run it’ll lead to tears[2] but right now it’s very pleasant to work with.

Fun Thing The Second: The Real Deal On One True Platform

One of the holy grails of web work is that you can do all your real work in one platform. For most of history, people talked about the “web stack”. Oh, it was a stack alright; a stack of fecal matter. It was perfectly reasonable to have in one medium-sized web application a set of technologies like:

  • HTML
  • JavaScript
  • CSS
  • Some sort of scripting language (Perl, Python, PHP, Ruby) for core logic
  • SQL
  • Shell scripts
  • Probably another scripting language – perhaps 2 or more – driving all the tangential parts (sysadmin, deployment, back-end services, etc etc)
  • And if you’re lucky, something compiled, like a C library/extension for one or more of your scripting languages

I’ve seen people suffer from horrible task switching: one time a coworker spent over an hour typing what amounted to Perl into a PHP file, because he’d spent the first part of the day fixing a perl script. He went right into the next task and his brain didn’t come along. We’ve all done it.

I’m pretty sure Node.js comes very close to fixing this. Throw in a NoSQL database and you don’t even need to worry about SQL! The worst case is the odd shell script here and there. 

JSLint provides a fully automated dunce-cap to keep the stupid down 

Thanks to jslint you can pretty much punch anyone in the neck who tries to do something really stupid. If you are some sort of “all Node.js, all the time” and your code doesn’t at least mostly clear a reasonable set of jslint settings, congrats! You’re awful.

It’s not cancer but I’m pretty sure it’s a lot like having food poisoning

I’m immensely frustrated with Node.js. To name but one example, everyone insists on making everything a motherfucking EventEmitter despite Node.js shipping with fully synchronous functions. Yes, the core library ships with not even remotely asynchronous functions. Why? I don’t know, maybe the core team realized sometimes it’s just easier. This logic never filtered out to the great npm-famous masses, so every time I grab a simple library for doing anything, I’m on column 120 before I get to the heart of the matter.

Moreover, I feel like I’ve been sold a bill of goods when I see 3/4 of the README dedicated to hooking up the application to memcached and Redis. Node.js is scalable, please pay no attention to the 40 proxies, key-value caches, and other doodads actually keeping the application running.

But whatever. At the end of the day, every large web application requires all that stuff anyway so perhaps building it in from the start to serve the 5 people using your application is actually the right choice.

It’s probably worth it, if Node.js can truly deliver a smaller “stack”; given the size and complexity of our system, it may end up that we can iterate faster by not having to context-switch. Moreover lots of really talented people are working on making Javascript better; Python in stuck in that weird place of “we’re not porting our library to 3 until there’s a critical mass of users/we’re not switching our app to use 3 until everyone ports their libraries to it”. That in and of itself should give any reasonable person The Fear(™); Node.js and Javascript aren’t going anywhere any time soon and to its credit Node.js has been very out in front of “this API is stable” stuff.

So no, it’s not cancer, but it gives me a lot of occasional irregularity. 

Footnotes:

[1] Haha, I’m lying, I’m insanely bitter.

[2] I imagine code reviews thus: “What the hell is this code doing here?” “Because fuck you, I was in a hurry.”

Komodo IDE 7

The Good:
Here’s the tl;dr and the bottom line to my ramblings:

There’s probably not a better IDE for working with dynamic languages, specifically Perl, Python, PHP, Ruby, or JavaScript, and the web stack in general.

The entire application is deeply knowledgeable about web and system development with those platforms. (As usual, Java-based dynamic languages like Clojure and Groovy will probably want to stick to a Java-based IDE.)

A long time ago, I mused that the perfect modern editor/IDE was like Emacs, but instead of a Lisp, it would use JavaScript. Komodo is that editor: there’s the addition of a little Python but for the most part, it’s JavaScript most of the way down (in the way that Emacs is built on top of a Lisp runtime in C).

Komodo one-ups this idea by building on top of  XUL. That means it’s crazy-extensible. I mean, it’s so extensible that between macros, run commands, snippets, and full-on extensions, you could do about 80% of IDE in Edit (It would be hacky and awkward and hard to maintain, so it’s probably not the best idea ever in terms of productivity, but it’s probably at least possible). There’s a fair number of really good extensions available already, and it contains a template for making new ones.

If it’s doable with a dynamic language, and you need to do it, it’s probably in Komodo. Debugging, profiling, HTTP inspector, regular expression toolkit, the works. Code completion is fantastic, and doesn’t rely on tag files (actually I think it does, but it builds its own transparently in the background, instead of you needing to update or manage them; other implementations exist). 

Developers are responsive and helpful. Bug tracking is accessible; developers encourage you to use their Bugzilla to log feature requests and track the status of bugs. Documentation is very good, albeit a little obtuse at times.

IDEs have a reputation for being “Angry Fruit Salad“: lots of panes with little space devoted to the editor. Usually my editor looks like this:

Screen Shot 2012 06 19 at 6 44 06 AM

and I have macros and keybindings to show/hide things as needed. At a glance, most people assume it’s Vim or Emacs or something.

Scintilla is a more capable editor component than it gets credit for, and the Activestate folks have really pushed it beyond “text editor widget”. Just look at the docs for creating user-defined languages: that’s some really powerful stuff there. It’s way complex but I imagine it’s pretty much impossible not to support your favorite language, be it code or markup. It also supports split views.

There’s a bunch of other functionality, like the Regex toolkit, integrated testing, and tons more.

The Not-so-Good:
Not native.

I know, I know. Just a minute ago I was raving about XUL. And it’s based on the Mozilla platform, so of course it’s native!

Mozilla is mostly native and easily the gold standard for “write once or twice, run in most places”. Still: this stuff matters to Mac users, quite often, and to simply hand-wave it away isn’t enough. It will almost certainly lack the tiny details that Mac users love. It will almost certainly trail somewhat far behind the cutting edge of Mac development. It will always be “a little weird”. If you’re counting on it instantly fitting in with your toolchain of other Mac OS X apps, you may be heartbroken or at least slightly miffed. It is so worth getting past that, because it’s really a powerful tool.

On the other hand, if you’re a Firefox die-hard, you probably stopped caring forever ago, and JavaScript is always better than Applescript for doing real work. 

Can be a little flaky at times. If you’re used to a rock-solid editor that never crashes or even acts weird, it’s a tad frustrating. This improves every time the Mozilla people improve the underlying components, and 7 is quite stable; but I can’t even remember the last time Vim crapped on me. Like, ever. I know Emacs people that consider restarting their editor an unclean act. If you’re one of these people, yeah, it’s going to make you sad.

Although it’s crazy-extensible, I find the Firefox extension-building process somewhat daunting. YMMV, but it’s an important consideration when thinking about “how can I bring over my workflow from another editor”. Most tasks can be handled by scripting, though; it’s probably not likely you’ll need to create a real extension, and I secretly hope they’ll have something similar to Mozilla Jetpack in a future version to make extensions easier to work with. Macros are super-powerful, though, so don’t think you’re forced to wade into extension development just to do something small.

The Other Stuff:
Most IDEs have a pretty large learning curve, and often aren’t as written-about as Vim, Emacs, or Textmate; so discovery of features, especially scripting, can take a while. The forums/knowledge base/FAQ is pretty good, though.

Vi emulation is not bad. It has a few rough edges (visual block mode, anyone?) but it’s certainly workable for your middle-of-the-road Vi user. (An example of things that might make a long-time Vim “power user” grouchy: there is no concept of a “leader”, as such, so if you remap your leader to something like , to avoid the default and slightly inaccessible \, you might not be able to get the results you want).

Publishing and remote browsing are workable but could use some improvement; publishing assumes a simple 1:1 mapping of a project on your machine to the remote machine, so some complex projects can’t use it. 

I really want a shell. Yes, it has language REPLs built-in (which is great!), but I want a real terminal. Sometimes you just need a shell, you know?

If you have an editor like Vim/Emacs/Textmate that you’re used to using for both code and prose, it’s unlikely to replace it for prose. It’s just not a prose editor, as such. It’s certainly adaptable but it’s forte is not, say, rendering your exquisite markdown in real-time. (But as I’ve said a million times, you can make it bend to your will – I have a macro to work with Marked.app, and it works great)

Things I didn’t get to use:

  • Collaboration. Everyone at work that uses Komodo either uses Edit or isn’t on IDE 7.
  • Stackato. I’m not in the OMGCLOUD business right now; we deploy to physical hardware, and we’re currently building out our own cloudy thing. It sure looks great but I don’t have a lot of reason to expend effort on it at this time.
  • Database stuff. We use CouchDB at work, so I have no need for a SQL editor.

The Bottom Line:

I guess I already gave away the big finish: there’s no better code editor/IDE for the web stack. 

It’s like WebOS all over again

So a long time ago, Palm had a pretty neat idea: there’s web applications almost literally everywhere, so why can’t we start using them directly on hardware?

(Note that I have no idea if this was the actual question they asked, but for my purposes, it’s good enough.)

From this they created WebOS, which ran on their Pre phones.

WebOS was pretty freaking awesome. Everything on the hardware was exposed through a pretty nice JavaScript API; you could play music, access location, dial the phone, check email, contact instant messaging services, whatever. The UI was created entirely with HTML and CSS, and they shipped a pretty nice set of “widgets” (basically, CSS and JavaScript) to quickly add UI elements.

There were tons of features that were really, really genius on the Palm, that has yet to be copied to Android, iPhone, or WinPhone. If nothing else, I still believe that the Palm/WebOS touch-based interaction was so much farther ahead and just damn better than all the others.

Anyway, it wasn’t all sweetness and light. There were … problems:

  • They used a forked version of the otherwise wonderful WebKit, that never quite worked just like the iPhone or Android browsers, so you had to deal with yet another layout engine.
  • Their JavaScript engine was JavaScriptKit, which is a good engine but at the time it wasn’t as fast as, say, v8. Today the benchmarks war is often one of increments but at the time, it was provably slower.
  • At the time, people were raving over a bunch of games and apps (mostly on the iPhone) that simply aren’t possible with HTML5 today, much less 2009. Need proof? See this list. Show me HTML5 versions of those apps. Hell, show me Flash versions of those apps.
  • jwz detailed his difficulties in getting one of the first apps onto their app store, among other problems.

Palm eventually gave in and shipped a sort of “native” runtime, allowing developers to ship C/C++/whatever apps.

You could make small complaints about the hardware, as well: AAPL was really hitting their design stride in 2009, and the Pre was a noble but ultimately failed effort to make a device that really stands out. It was really good but only when placed against the not-an-iPhone pack.

(Daring Fireball has some thoughts from mid–2009 that are pretty spot-on.)

Ultimately I think had Palm been competing in a world without the iPhone, it would be the #1 device today. They just didn’t get a critical mass of developers to attact enough users, because everyone was gaga over the iPhone.

So what’s this got to do with anything? Well, there’s this: Boot2Gecko. The tl;dr is simple: “Take WebOS and s/WebKit/Gecko/g”. It’s the same story as WebOS.

Seriously. Watch this presentation and tell me, that’s not the same stuff Palm said 3 years ago.

I really wish someone from Mozilla would explain how their thing is going to be any different, other then they have experience making “web APIs”. They’re not even targeting a hardware platform, like Palm did; so they are already 1 step behind Palm, who at least had a reference platform and “total package” right out of the gate.

I’ve reverted to Firefox and Tbird lately, because Safari is going through yet another spurt of growing pains and Google can kiss my ass. It’s pretty amazing how Firefox has been able to turn the ship and go from crashy bloatware to speedy; just a year ago Firefox was basically unusuable for me (and if Chrome’s adoption is any indicator, a lot of people) but I’m actually pretty content these days. I imagine others would, too, but they’ve already leapt on board of Chrome and one browser switch per generation is enough for most people, especially if the story is simply “hey, it sucks less now”.

So I’d really like to know why I should bet the farm on Boot2Gecko; why I should tell my bosses, “No way, you guys, the future isn’t Android and iPhone, it’s gonna be this MozillaPhone thing”. I’d really like to know how they’re going to overcome all the problems that killed Palm.

Translating the “Rails is DEAD!” talk to plain (?) English

So apparently a bunch of people woke up this morning and decided Rails sucks.

Let me translate some of this for you.

I’m becoming more and more certain that this means that Rails-style MVC frameworks on the server-side are going to end up being phased out in favour of leaner and meaner frameworks that better address the new needs of thick-client architecture.

Uh, yeah.

What that means is:

Part of the reason we have these monstrous frameworks is because at some point all software is just one giant if/switch around the seemingly endless list of bugs stemming from one of (client OS, client hardware, client browser, client plugins, deranged and clearly pants-on-head-stupid client workflows, and the vagaries of quantum physics.

But, because programmers think code rusts, and we love to rewrite the HELL out of it, we come up with new and exciting ways to do just that.

Much of the OP‘s complaints can be easily and succintly explained by simple things: IE6 is now dead, most browser vendors have some sort of agreement to not put us all though the same old BS browser war we went through last time. JavaScript is now a “first class” language and not an annoyance for “know-nothing” Web designers or inflicted on the company “rock stars”. Enough JavaScript exists “in the wild” that we have stable platforms to use to build apps.

So! It’s the trifecta of awesome rewrite-your-app time:

“Everyone’s settled on non-blocking IO as the only way to scale.” “Client-side tempting is where it’s at these days.” “We’ll less trouble since it’s all the same language and platform everywhere!”

Time to rewrite those apps!

Fast-forward 2 years when “I’m becoming more and more certain that this means that Node.js-style single-threaded engines with client-side logic are going to end up being phased out in favor of leaner and meaner megathreading frameworks that better address the new needs of the quantum-core architecture”.

The Enemy of my enemy

In “Web Sites and a Plug-in Free Web“, IE’s “Program Manager Lead” says:

“A plug-in free Web benefits consumers and developers and we all take part in the transition.”

I’m sort of cynical about this whole “no plugins” thing, when it comes to Microsoft. IE was a submarine made of Swiss cheese for the most critical years of the the web – crashy and a security nightmare.

The reason it was a security nightmare was because of plugins. And it was those plugins – mostly ActiveX controls and Flash, among others – that Microsoft needed as a critical cornerstone of its “embrace and extend” policies.

So: the web community in general looks forward to a world without plugins, it’s sort of annoying to have you guys act magnanimous and not acknowledge that you played a pretty big role in creating the mess.

 

New Phone!

I got a new phone, and a new phone number. If you’re curious that number is:

 

 

Contrary to what you’d probably guess, I didn’t get an iPhone; I got a Blackberry, and the free one at that (a 9300 series).

“But Blackberry sucks! They’re going out of business!”, you may think. You’re right, sort of.

One thing I learned since I got an iPad was I don’t like the smartphone form factor. Increasing the size of the device (what seems like the current Android device solution) to me just seems to create a form factor that’s further inferior to both the slightly smaller iPhone and larger iPad.

The things I want most from a phone are the 3 “T’s”: Talk, Text, and Twitter. I’d like decent email support. And that’s … pretty much it. Maps or a web browser or Angry Birds or ssh or 10 billion other things, I simply don’t do. I made myself angry (and spent too much money) trying to convince myself otherwise.

For all of the OMG BLACKBERRY SUCKS talk, it is still a pretty good email and text device. From my point of view, the recent problems with RIM are more about “how did you idiots make so many bad decisions in a row, when you were on top?” more than “your technology sucks”. They’re making bad decisions with an otherwise workable tech platform.

One day, when I go places and do things, maybe I’ll upgrade to an iPhone. Until then, my iPad is all I need for most everything, and the Blackberry is my ideal handset.

Syncing and RSS readers

Brent Simmons, he of NetNewsWire fame and demigod of the indie Mac scene, details all the problems with syncing in general and Google Reader in particular.

I disabled Google Reader sync last night and switched back to NetNewsWire on the desktop, doing local sync.

On the plus side, there’s not a HUGE amount of feeds I follow any more. Also, I never go anywhere so the lack of Google Reader sync isn’t crippling.

BUT: my main use case is, Mac downstairs as “digital hub” and iPad everywhere else (couch, bed, porch, etc). Unless I’m sitting at my Mac to do work, I’m using my iPad; I watch TV on my Apple TV; I listen to music streamed from the Mac via Home Sharing from iTunes.

What I really really want is to have NetNewsWire/iPad to sync to the desktop, much in the way the Yojimbo iPad app does.

I know it’s not some “look at me I’m SUPER MOBILE”, work-at-the-coffee-shop thing, but it works great and is probably (hopefully?) less error-prone than the scenarios Brent describes.