Care Bears and Dr. Sbaitso… the conspiracy is revealed!

My son is three, so I grant him a lot of leeway when it comes to his choices of video. However, I was shocked, appalled, and disturbed when my wife mentioned he was totally in love with the Care Bears: Big Wish Movie. Naturally I beat the crap out of him. Many times. But he still likes that god-forsaken video.

So one day he asked me to watch it with him. When a three-year-old asks you to do something, man, let me tell you, you’d better think real hard before refusing. Unless you really need to teach the kid that what he’s asking for is not allowed or bad for him, you obey. So anyway, I sat down and watched a fair amount of the movie.

I was once again totally shocked about midway through the movie when they made a reference to Dr. Sbaitso:

Funshine Bear: [dons Groucho Marx glasses and imitates Sigmund Freud] Hmm, interesting. Tell me about “caring.”
Wish Bear: I can’t. I feel all empty inside.
Funshine Bear: Interesting. Tell me about “empty inside.”

< Quotes modified from IMDB >

Okay, to be fair it could have been any Eliza like algorithm, but the point is clear – the designers of the Care Bears CGI movie (or script writer I suppose) have a geeky background, and actually put in a reference that probably one in a thousand people watching the movie would ever get.

I don’t claim to like the Care Bears all of a sudden, but I can’t help but laugh every time I see or hear that scene. Whoever decided to put that into the movie KICKS ASS.

Yet another Awesome Software Discovery!

This time it’s a piece of javascript to compute ideal body weight in a variety of ways, the most interesting of which claims to tell you what other people like you consider their ideal body weight. Very “slick” little system, if you care about such things.

I’m always on the lookout for crazy new technology, so when I found this “ideal weight calculator”:http://www.halls.md/ideal-weight/body.htm, I was overjoyed by how many different algorithms seemed to be present. When I looked at the source and found that the author was using javascript, I was again very excited. This meant I could look at (and possibly learn) his algorithms!

And so here they are: “javascript source for calculating ideal weight”:http://www.halls.md/ideal-weight/body2.js.

But read the copyright message with me and bask in the author’s sheer genius! Clearly he (or maybe she? No idea, don’t care) considers the algorithms to be proprietary and will MESS YOU UP if you steal them! So I guess I won’t bother to learn them. Hell, merely looking at them is probably illegal.

So aside from the author’s painful arrogance and stupidity, what can we learn about him from this script? Simple – he thinks he’s some sort of omniscient deity (don’t mess with me lest I strike ye down, mortal! And I will know if you try: “you won’t get away with copying this code”), and yet he doesn’t have even the tiniest iota of smarts when it comes to securing what he claims is “truly my unique creation and algorithms”.

O, Great and Wonderful Physican (yes, that’s right kids, he points out to all us lesser mortals that he’s a god damn physician!), I beseech thee! A bit of simple and kind advice for you: if you want an algorithm to be protected, don’t publish it on the web. In un-obfuscated javascript no less. Obfuscation isn’t bulletproof, not by a long shot, but it’s better than nothing.

And really, go for a server-side approach if you’re as paranoid as you seem to be. Once you use javascript, everybody who visits your site has copied it. This is not because they’re all thieves, but because of a little thing called the browser cache. Not only that, but anybody can view your proprietary algorithm and rewrite it. Copyright it all you want, a rewritten version of the algorithm is going to be COMPLETELY LEGAL! Copyrights only protect exact (or very nearly exact) duplication. You need a patent to protect an algorithm. For a basic description of the algorithm, read below. I was gonna rewrite it in javascript, but it’s really quite worthless, so explaining it should piss off our good doctor well enough.

<By the way, the message should be “It’s copyrighted”, or since you’re talking about scripts (plural), maybe “They’re copyrighted”. Note the apostrophes. Apostrophes can be your friend.>

The good news is that his script is so mundane and, dare I say it, not unique – most of the script is other people’s work on pretty standard formulas. Why, you ask, is this good news? Because he doesn’t actually need to worry about people stealing it!

His “secret formula” is well worth discussion, however:

You go to the site. You put in your height and weight. His script uses a very standard formula to calculate BMI. His “people’s choice” code then cuts BMI down by 40% or 50% (gender determines this) and then adds a gender-specific value (11.5 for men, 11 + age x 0.03 for women). Then reversing the very standard BMI calculation gives you what other people supposedly consider to be an ideal weight!

That’s right, a simple algorithm that tells you what other people just like you consider ideal! But because of the simplicity of the script, it gets worse – say you’re a 440 pound, 5’6″ adult male. According to this brilliant physician, the average person that height, weight, age, and gender think that 291 pounds is their ideal weight! That’s right, little ones. If you’re extremely obese, your beliefs of what is and isn’t an ideal weight become so skewed that you think being slightly less obese (but still very obese) is “ideal”. Funnier still, of course, is that as your weight changes, so will your ideal. So once our example 440lb guy gets down to 400, he thinks his ideal is 271lbs. Doesn’t matter if it takes him a month or fifteen years to drop 40lbs, his new ideal is still going to be 271.

BUT WAIT, THERE’S MORE! When you’re not an adult, the script tells you that your peers consider your desired weight to be something that is based entirely on height and weight! So the average 440 pound, 5’6″, 18-year-old male longs to weigh 131 pounds. The moment he’s older than 18.5 (no idea where the doc pulled these numbers from), he longs to weigh 291. Yup. One day he goes to sleep hoping to be in a healthy weight range, then he wakes up thinking he was wrong, and should in fact weigh more than twice his original goal.

Arrogance, stupidity, bad programming, and then weird assumptions followed by even more stupidity. This is possibly my best Awesome Software Discovery yet!

Be careful of Rubyforge gem!

I just discovered a weird issue. The comments under my name will explain it better than I can here, but sufficed to say, if you use Net::HTTP in ruby, do not install the RubyForge gem!

Read all about “the issue”:http://rubyforge.org/tracker/index.php?func=detail&aid=8907&group_id=1024&atid=4025 on the rubyforge bug tracking page.

The wonderful world of Cross-site scripting (XSS) – OR – why input filtering is bad

I have been dealing with XSS at my so-called “real job” recently, and it has come to my attention that a lot of people in this world are under the mistaken impression that it’s better to do “input filtering” than “output filtering”. As I pretty much came up with these terms myself (they may or may not exist elsewhere; I’m just too lazy to find out), I’ll define them for you:

Input Filtering: Scrubbing XSS-dangerous data out of your input before it gets saved anywhere.

Output Filtering: Scrubbing XSS-dangerous data only upon display.

Now, the most important concept here is that XSS is most dangerous when a user can see immediate results without alerting you, the web designer. So if you have a page that repeats their parameters back at them (say a search page where you put “Your search for $parameter could not be found”), that’s A) independent of input vs. output scrubbing, and B) extremely by far the most dangerous kind of XSS vulnerability. Why? Because it allows a user to post a link to your site that can execute malicious javascript. Bad, bad, bad.

After echoing user parameters is fixed, you have to look at how you display stored data. This is where the type of scrubbing comes into play – do you scrub the data before storing to your database / file system? Or do you only scrub when you’re about to display the data?

I will soon prove that input scrubbing is for pansies who are paranoid and tend to make up pathetic lies about their imaginary 20-year-old girlfriends.

Why input filtering is inefficient

  • It’s bad to store data in a display-specific way (have to unencode when displaying PDF, email/text reports, etc).
  • You have to modify other areas of code than just DB storage, such as searching (search for “<blah>” won’t yield “&lt;blah&gt;”), which may not be immediately obvious.
    • You could just auto-filter all incoming data, but there may be cases where you really can’t or don’t want to. I personally dislike blind filtering like this unless there is no better option.
  • If you have existing data, you have to check it for pre-existing problems. With large data, this can be very slow.
  • If you’re truly paranoid (as I am), you still won’t trust the DB data and will need to find a way to have input filtering work nicely with output filtering. This is a whole lot more work than just doing one or the other.
  • If you use a good MVC system like Rails, you can actually escape all text fields as they’re read from the database if you want. With a carefully written ActiveRecord plugin to Rails, I’d bet you could have all accessors automatically escape their data if it’s textual. And even provide a method for getting at the unsafe data.
    • I still don’t like such blind scrubbing logic, but better to blindly display scrubbed data than to blindly alter data before it hits your database.

Why input filtering can be dangerous

  • If you can’t trust your programmers to do proper output filtering, why would you trust them to do proper input filtering?
    • Yes, input filtering is liable to be in fewer locations, particularly if you filter all incoming parameters, but it’s still not a silver bullet, and has a lot of long-term risks when mistakes do happen (read on for details).
  • Compare to output filtering in terms of the bug factor:
    • Bugs will happen. If you truly believe you don’t ever write code with bugs, then by all means ignore this section. I’ll get a good laugh when you tell me about your first big project that went from a two-week estimate to a six-month half-finished-and-then-rewritten-from-scratch project from hell.
    • If you mess up an output filter:
    • You probably have an issue that’s confined to a single area on your site (the area you messed up).
    • You do a quick hotfix, and the site is once again safe.
    • If you mess up an input filter:
    • Every area of the site that contains the data you missed is at risk.
    • You do a quick hotfix to stop anything new from coming in, but existing data is still currently at risk.
    • You find and quickly fix the very obvious offending data in the database.
    • You wait until the site is slow (or you can take it down) and run through all data entered since you suspect the exploit came into existence, fixing it record by record.
  • If future XSS issues arise, you have to retroactively fix your old data again instead of merely fixing your filter.
    • New xss vulnerabilities won’t arise, you say? Maybe so, but how many times have we computer folk shot ourselves in the foot with presumptions about the future? (We’ll never need more than 640k memory, nobody will still be using this old software when y2k finally hits, etc)
    • Note that XSS attackers have discovered that in some cases, the backtick character (`) will work to do specific JS-oriented attacks. This is not a character that is scrubbed by at least two different html_escape types of functions that I know of. Enjoy retroactive data-fixing? Me too!

Why input filtering can be better (and my incessant arguments to prove that it really can’t)

The most logical argument I was given is that in a large enterprise, control of data output gets pretty tricky. So as far as I’m concerned, large companies are the only place the below issues even have a tiny bit of merit. And even then….

  • In a large enterprise, you know that nobody will inadvertantly display unsafe data, because all data is safe.
    • Unless of course somebody writes a program that makes changes to the DB. Less likely than a rogue program that merely displays data, I agree, but still a possibility. In an organization that’s big enough to be at risk of multiple apps reading data that wasn’t built by the “proper” people, I’d say there is a definite risk that apps will be writing to said data as well.
    • At my job, there have been several cases where somebody who wasn’t even a part of IT (a manager and a content designer) modified data directly in SQL, bypassing any hope of safeguards.
    • In a large enterprise, I think it’s even more important than ever that all access to the DB goes through knowledgeable IT staff. Yes, I know this is a pipe dream, but I still think proper procedures can allow output filtering to be the clearly correct option.
  • You can detect problems with input filters more easily, because you have the data that could be dangerous right at your fingertips. If need be, write a program that periodically audits your data to check for unsafe characters. If you messed up an input filter, this program can save you.
    • Good testing does this same thing for output filtering. It’s far harder to write perfect tests for your app’s HTML output than to write a program to audit the DB for unsafe data, but it’s still the right way to do it.
    • Resource usage is wasteful in my opinion, when the resources are being used to prevent data from simply being stored in its original state.
    • If you have a large amount of data that is changing all the time, this solution may simply not be doable. In what situation would you have that much data changing that regularly? Oh… I don’t know… maybe in a big corporate enterpise?