My Take on Passwords

I’ve been wanting to write about this for quite some time now. There is a lot of effort that is put into making systems secure, and it all goes to waste when you choose a weak password. “123456789” is a weak password, “password” is a weak password, “god” is a weak password, your pet’s name is a weak password. A weak password is anything that can be easy to guess by someone that knows you well, or by a computer. A password that is only a few characters long is a weak password no matter how many symbols or strange characters you use because it can be easily guessed by a computer. Nonetheless, websites all over the web want you to choose a password that is at least N number of character or at most M number of characters; a password that contains symbols, but does not contain you name, or part of your email, or your user name, and the list goes on. This is damn stupid.

Over the years, we’ve been trained to chose really bad passwords. We’ve been lead to believe that “m00Npi3” is a strong password because it is over 4 characters long, and has weird characters. Sure, your friends may not be able to guess it, but a computer could do it rather easily. However, we go on about our lives believing that is a good password, and then we use it for everything. Websites all over the web warn us not to use the same password everywhere, but who wants to remember a hundred passwords? Sure, you can use a plugin on your browser that remembers the passwords for you, but what about when you need to access your email from the public library, or from your friend’s house? Good luck!

Yet, we’ve been lead to believe this is all for our own good.

I mentioned I’ve been wanting to write about this for a long time, but today paypal was the last load of crap that I was willing to take before hitting the keyboard. I logged into my account, and paypal kindly suggests that I should change my password. I started changing my passwords last week, so I thought “hey, perfect timing!”. I clicked the link they provided, and I was taken to a page that asked me to confirm that I was who I said I was by providing either my bank account number, my credit card number, or my debit card number. WTF? Why? I’m already in my account! But OK, lets just pretend that this actually makes any sense, because after all I could be an attacker trying to hijack someone else’s account. So I filled out my information, and I’m taken to another page that asks me for my current password, my new password, and a confirmation of my new password. I go on and enter my new password. Paypal tells me that my password is too weak because so far I’ve entered only letters, but I don’t mind, I know the juicy stuff comes in a little bit, but then, all the sudden paypal says that I’ve entered all the allowed characters, which are not many (20). WTF? Why? Why can’t I have a long password, Why?!? Paypal just made me less secure by limiting the amount of characters I can use for my password. Are they going to start charging for extra characters now? I would pay 1 cent a piece, no kidding, as long as I could get a longer password, but then that would be something really bad wouldn’t it? Imagine a company that charges you to let you choose the password you want. Wouldn’t that be something?

Anyway, I decided to leave my current password as it is. Thanks PayPal!

What is the Big Deal

Twenty characters are enough for a password, aren’t they? After all, people want to get 4-letter passwords so they don’t forget them, but that is just stupid. I can see a valid reason to set a minimum amount of characters, but why limit the maximum amount?

You may be wondering why it is such a big deal for me. Let me explain how I set my passwords.

Chosing Long-A** Passwords that You Can Remember

I start buy choosing something memorable to me, for example, I really like the movie V, so I may want to use a base for my password like:

Remember remember the fifth of november

The first problem I see is that there are spaces, and for some stupid reason a lot of websites don’t want you to use spaces in your password, so let’s fix that:

Rememberrememberthefifthofnovember

There you have it, 34 freaking characters, and this is just the base of my password. I should note that by removing the spaces I just made it harder to type it, which is a bad thing, I’ll explain why later.

Now that we have a strong base, lets add a little bit of other characters. I will use a memorable date, for example. Note that I’m just choosing a random date here in this article, but in real life I would choose a really memorable date, but that few people know, such as the date of your first kiss, if you remember that.

Rememberrememberthefifthofnovember+12092000+

We are now at 44 characters, our password has uppercase letters, numbers, and non-alphanumeric characters. Now this base is easy to remember because I’m using a memorable phrase, and a memorable date, and the stragne characters are just separator. In fact, you could use them in the phrase as well:

Remember+remember+the+fifth+of+november+12092000+

49 characters so far. Now, lets make it unique for each site:

Twitter:

Remember+remember+the+fifth+of+november+12092000+bluebird

Facebook

Remember+remember+the+fifth+of+november+12092000+oldfirends

Email

Remember+remember+the+fifth+of+november+12092000+spamspam

Computer

Remember+remember+the+fifth+of+november+12092000+stupidmachine

Good luck trying to guess those passwords, even with a computer. However, Paypal won’t let me use any of that, what a stingy website. They will only give me 20 characters. What am I supposed to do with that?

Following this recipe makes it easy to create long passwords that are easy to remember, and extremely hard to guess. Not only that, but it makes it easy to change them too. For example, if I wanted to update my twitter password, I could just add something to it, which makes it even stronger:

Remember+remember+the+fifth+of+november+12092000+bluebird+new+fing+password!

That is 76 characters long, and I can guarantee you that I will remember it tomorrow without having to memorize it, because it is made up of stuff that I already know. But I sure won’t remember this:

“SIgz@OHis4!,Erw”

Which is a password generated by a random password generator, which by the way, says that it is “easy” to remember as:

“SIERRA INDIA golf zulu @ OSCAR HOTEL india sierra 4 ! , ECHO romeo whiskey”

WTF?

But a lot of websites recommend that you use one of those random password generators.

Hard to Write, Hard to Remember; Bad Combo

I mentioned before that by not letting me use spaces, websites make passwords harder to write, and that is not good. The reason is that if my password is hard to write I will have to either write it slowly, or attempt to write it a few times. This is bad because it gives people time to see what you type. You should be able to type your long-a** password at lighting speed. I don’t care if your system doesn’t take spaces for some stupid reason, fix that on your end. Get rid of my spaces before sending my password, or even better, fix your stupid system! I should be able to use as many characters as I want, and any of them. The password needs to be easy for me to remember, but harder for people and computers to guess. However, a lot of websites force me to create passwords that are easy for computers to guess and hard for me to remember, and type.

I don’t think there is any need to mention this, but if my password is hard to remember, then I’m already in a bad situation because I will have to write it down somewhere.

Why are We Using Passwords Anyway?

Seriously, why haven’t we come up with a better way? Oh, right we have. There is Open ID, and Mozilla Persona, to mention a couple, but event those are not the perfect solution. There has to be a better way, and if we look hard enough we will find it. But we’ve settled for less. We have accepted password as the one way to do authentication, and to make it worst, we have made it hard for people to use passwords, and we have misguided them to believe that a good password should be hard for them to remember, type, and guess. That is why people think that a random number is a good password, even if it consists of only 5 digits.

There is a lot more I can write about passwords, but the ultimate thought would be that we need to get rid of them. However, as bad as it is, we have to stick to passwords for now, but I wish websites would at lease make that easy, and safe.

Finally, you should check out this comic by XKCD:
http://xkcd.com/936/

About Namespases

Last Wednesday I had the first meeting with my C++ study group. If you read my last post, you will know what I’m talking about. In this first session, I noticed the professor had a little bit of trouble explaining namespaces mainly because explaining namespaces in 5 minutes to people with little programming background is hard. In this post, I will try to explain namespaces taking more than 5 minutes, and hope that it may be helpful for some people.

I think my first encounter with namespaces was back when I was doing Flash and Actionscript. They also came back when I was studying XML, and also in php and python. Javascript does not have namespaces, but pseudo-namespaces that help us accomplish the same basic task. Since implementation of namespaces varies from language to language, this post is language agnostic, concentrating on the concept rather than on any particular implementation.

Identifiers

Before we can actually talk about namespaces, we need to look at a different concept: Identifiers. Understanding identifiers is key in understanding other concepts, among which is that of namespaces.

Identifiers are nothing else than names that we give to variables, constants, functions and classes (there are other “things” that we also give names to, but that I’m not going to talk about for the sake of simplicity). When we declare a variable, we normally do it using a name, and then we assign a value. When we create a function, we normally give it a name that we can use to invoke the function later. Wen we declare a constant, we give it a name so we can refer to it later. When we create a class, we give it a name so we can instantiate it later.

All of these names that we give to “things” are identifiers, and they should be unique to the thing we are naming. There are cases where the name is not unique, like in the case of overloading, but for now, lets not consider those cases.

Scope

Scope is another concept we need to understand. If you understand scope, you’ve understood half of closures, another important and powerful concept available in some programming languages, but we will talk about that some other time.

I like to think of scope as the reach that an identifier has within a program. Although this is not the proper definition, and it may not be the most accurate, it is one that I’ve found makes it easy to understand the concept. Simply put, the scope of an identifier is anywhere where you can use that identifier to find the entity that it refers, the entity being the value of a variable, a function, a class or the value of a constant.

There are different kinds of scope, and knowing the kind of scope that applies to the language that you are learning is important in order to become a better programmer. For example, Javascript has function scope, meaning that all variable declared within a function are available only within that function. Knowing this can save you from a lot of headache. Some other languages, like C, have block scope, meaning that a variable is available (sometimes also called visible) only within the block where it was declared. The block being delimited by curly braces ({ and }). There are other kinds of scope, but we don’t need to know all of them. We only need to understand what scope is.

Scope by Example

Suppose you have a function f1, and a function f2 declared in a language that has function scope. Now, in f1, you declare a variable called v1, and in f2 you try to reference that variable. What you will get is an error because the variable v1 is local to the function f1, which prevents function f2 from accessing v1. That is scope.

Namespaces

Now that you have an understanding of identifiers and scope, we can begin to talk about namespaces. First, lets consider the following problem:

Suppose you are working on a program that has the functionality of an email client, and of a news reader. At some point during the program’s execution, you will need to fetch information from the net such as news and emails. You have divided your program in two different parts. One is the part that handles all of the email functionality, and the other one is the one that handles all of the reader functionality. Since you need to fetch information from the net, you have created two classes, one for each part of the program. One class fetches information from the email server, and the other one from the news server. You call both of these classes Fetcher.

I hope at this point you can see the problem, but if you can’t, look a bit closer and realize that we have two classes named Fetcher, and this will be a problem.

We will assume that the scope of these classes is global, meaning you can reach them from anywhere in the program. Depending on the language and the compiler, you can have one of multiple possible errors, but they all boil down to the same issue. You have two classes that are named the same, and no way to distinguish one from the other.

Why would anyone do that? Why not just call them EmailFetcher, and NewsFetcher? Well, sometimes you did not write those classes. Maybe you are using a library that comes with those classes, so you have no real say in what the classes are named.

To solve that problem, some clever people came up with namespaces. Namespaces, are, like the name suggests, spaces for names, or identifiers. Namespaces help us distinguish from two identifiers that have the same name but that belong to different context.

The simplest example of namespaces is computer directories. A computer directory, most commonly known as a folder, is a place where you can store files. Suppose you have a directory dir1. Inside dir1 you create a file called myFile.txt. Now, what would happen if you want to create a new file, with different content, but that is also called myFile.txt? The computer will either complain that you cannot have the same name on two different files, or worst, overwrite the original file. One solution to this problem would be to create a second directory called dir2, and in dir2 put the new myFile.txt file. Now you have two files that have the same name, but in different context (think of every directory as a separate context).

This is clearer with a more real-life-like example. Suppose you have your Music directory, where all your music is stored. Two of your favorite bands have a song called “The Wild Loop”, so the file has a name like the_whild_loop.mp3. What happens when you add both of these songs to your Music library? Do they get overwritten? No. What happens is that your music player program may be using a different directory to store the songs of each band.

Lets say one of the bands is called The Pythonist, and the other one is called The Rubyist. The files in your Music directory may be organized like this:

Music
 |- The Pythonist
    |- Album Name
       |- the_wild_loop.mp3
 |- The Rubyist
    |- Album Name
       |- the_wild_loop.mp3

As you can see, you have two the_wild_loop.mp3 files, and not only that, but you also have two directories called Album Name, but one is in the context of The Pythonist, and the other one in the context of The Rubyist.

That is basically what namespaces do in programming languages. They allow us to have entities referred by the same name but in a different context. This context lets us distinguish one entity from the other.

Since my study group is about C++, I will publish another posts about the implementation of namespaces in C++ later today or tomorrow, but I hope this article has been a good introduction to namespaces.

Doing Responsive Design, but Doing it Wrong

I don’t do a lot of front end. Most of the time I’m doing backend, and database related stuff with php and MySql. I have a draft about my complaints regarding php, but I will finish that later. Today I want to rant about responsive web design.

When it comes to responsive web design, I think we’ve made a lot of mistakes. I remember my first attempt at doing responsive web design. It was awful, but I quickly realized that I was doing it wrong. The problem, I believe, is that none of us really knew how to do it, and we were learning from people, who, like us, didn’t have any idea of what they were doing. I remember reading articles that talked about the iphone resolution, the ipad, resolution, portrait mode, landscape mode, etc. Few of them even bothered to mention other devices. Quickly, responsive design became just another way of saying iPhone/iPad optimized design.

Being a firm believer that the web should be for all, that really touched a nerve. I’ve always hated the fact that designers give too much attention to apple products, and neglect the rest of the market. Regardless of the market share of apple products, designers should not be neglecting the rest of the market.

I want to make a little parenthesis here to mention something I read a few days ago that really made me think. At a forum, someone was complaining that a certain thing was not working on IE, and he said “Don’t tell me to use another browser. IE users are as important as other browser users”. Not IE 6/7 users though, they should be burnt with fire.

I wanted to make that little parenthesis there because the same thought can be applied to the rest of users. The fact that I don’t use an iPhone doesn’t make me a less important user to your site. The web was meant to be inclusive, not leaving out anybody just because they don’t want to or can’t buy a certain device.

I don’t think we should provide support for every possible device/platform. There are moments when support simply needs to be dropped in order for technology to continue moving forward. We don’t support IE 6 anymore. We barely support IE7, but don’t go out of our way to make it work, and look the same as modern browsers. IE 8 is going to be out of support soon too, so this is not about supporting all the things, but rather supporting as many users as possible.

One of the things that I think are wrong with responsive design is the idea of breakpoints. A user in StackOverflow goes as far as saying:
“960 x 800 x 768 x 640 x 480 x 360 are the sizes you must follow for responsive web design”
Please notice that “must”. I think this is the worst answer I’ve seen in all my research on responsive design.

Breakpoints are a bad idea because they assume certain sizes in a world of ever increasing device sizes. We have huge screens that are widely used with massive resolutions, and we have small screens that are also widely use with tiny resolutions.

Sitting on your corner of the San Francisco cafe that you like to frequent, it is easy to forget about the rest of the world. You see people on their iPhone, their iPad, and mac book airs, You assume it is the same everywhere else. So, why bother supporting other devices right?

On the other hand, if you like to go to Tom n’ Toms in Koreatown L.A., you will begin to think you should support the latest galaxy phone, and huge-screen laptops.

The point is that no matter where you are, it is easy to forget about the rest of the world. There is people in less modernized cities using cheap smartphones with tiny screens. I’ve seen many phones, of who know what brand that run android, and have small screens. A lot of these run on Metro PCS, so you know that their users have internet access, and they are using it. I wonder how your beautiful site with preset breakpoints looks on those phones.

People say you should design mobile first. Has anyone really done that though? And for that matter, what does mobile first even mean? Does it mean small screens? low bandwidth? low processing power and memory? Or is it all of the above?

Have you ever started a design at 350px wide? How did that work for you? Is mobile first really a good idea? Who knows. All we know so far is that we are trying to do responsive design, and many are doing it wrong.

The fact that devices all have different resolutions hints us that breakpoint-based responsiveness is not a good idea, but there are times when you want to have that. My advice is to stop thinking in terms of breakpoints, and think in terms of intervals. Rather than trying to make your site look good at 800px wide, make it look good at between 500 and 800px wide. That way you end up with a site that looks good on any resolution.

What is the difference?
Think of a site that has the following breakpoints:
1500 – max
1024 – it was popular a few years ago
800 – some tablets, and landscape mode smartphones.
600 – some smartphones
400 – older phones, or the Metro PCS ones I was talking about.

This site will look good at those resolutions, but what would happen if there is a user with a device that has 700px? The site, which is optimized for 800 would probably not display correctly because the site is 780 px wide to allow 10px margin on each side.

Now, think of a site that is designed with intervals. It would have the following intervals:

1025-1500
801-1024
601-800
401-600
300-400

Suddenly, the site has too look good on screen between 601 and 800, which means the site will have to fit in that range, and thus it will cover the user with the 700px wide tablet.

However, a better way to do responsive design is to forget about breakpoints, and intervals all together, and adjust the site as needed. It is very simple, take your site, and start re-sizing the browser window, and adding media queries as needed.

I’ve seen many templates that have media queries with a comment like “iPhone 3G”, then another set of media queries with a comment like “iPhone 4s”, and another one with a comment of “iPad mini”, etc. That is the worst way of doing responsive design. In my experience, the best way of accommodating all possible devices is to stop thinking about devices, and just build a site that will render nicely all the way from 350px to 1500+px. This means that when you set your browser window at 2000px, and start re-sizing all the way down to 350px, the site should look good at all times. That is real responsive design.

No matter if you use intervals for fixed-width sites, or if you use fluid sites with just the necessary media queries, your sites should display nicely even if I set my browser at 437px wide in my 1920x1080px screen just because that is the way I roll.

Finally, stop assuming. I know a person who, when I point out that their site looks a little cut off, he says, most people use huge screen nowadays, so it should not matter. I’ve told them, please don’t put those notifications on the side of the site because in small resolutions they don’t display well, and he says, “People use big screens, so it should not matter”. I’ve told them, your site is weird because if I want to select a new option, I have to scroll up, select the option, and scroll down again, and he says “It’s OK, people use huge screen, so it should not matter”. I’ve told him, that new map feature is cool, I can hover over a name, and the map gets highlighter, but it is too big, I can either see the names or the map, but not both at the same time, so it defeats the purpose, and he says “It’s OK, people use huge screens so it should not matter.”

I use a pretty decent size screen, yet I have this problems, why? Because I don’t use the browser at full width, I have it at around 1000px right now, and that is how I like it. I will not re-size my window so I can view your site. I will not change my habits just because you prefer to assume most people have a huge screen, and they have nothing better to do than to waste the huge screen with a full-sized browser. I will not wait for you to update your site. Whatever you offer, there is 1000 more people offering it too, and some of them care about people enough to create something that can be used on my 1000px wide window. You should never assume people will not see your 800px layout on a non-mobile device. So, stop looking at the headers of the request to determine if the person is using a mobile device. Always serve the content they ask for, and make sure your site is capable of adjusting itself to fit nicely in their window, regardless of the size of it. Just because the headers sent by my browser don’t say android, it doesn’t mean I’m viewing your site at more than 1024px wide.

Lastly, optimize, optimize, and optimize. Don’t try to fit your user with the best bandwidth, latest device, and biggest screen. Your aim is not to make your site look good to them, but to make it look good, and work well for your user with the lowest bandwidth, the smallest screen, and the simplest device. I’m not saying support that IE6 user, but don’t ignore the users who can’t connect to your site from the best device out there. Knowing when to stop supporting a certain device, platform, or browser is a tricky thing, but as a general rule, 2 to 3 latest version, and 2 to 3 year old devices should be the limit, and remember, supporting retina displays is not as important as supporting 400px wide devices.

Design for people, not for devices.

The Dream of Semantic Web

When I first started learning web development, there was a huge emphasis on semantics. Back then there was no html 5, and xhtml was the hype. If I remember well, it was around early 2006. People back then seemed to be talking about semantics, and content vs presentation. CSS was in everyone’s mind, and there was a huge push to move javascript out of html. I learned html, and some php back then, and then I stepped away from the web scene. I used to take my text editor out every once in a while and write some code, but none of it ever did anything useful. By the end of 2009, I lost my job at a club, and was fortunate enough to get my first client almost right away. My first gig as a web programmer was to develop an online comic viewer which pre-loaded the 3 pages following the one you where currently viewing so that the experience was as smooth as possible. I wish I still had the source code.

By the time I came back to the web scene in January 2010, a lot had changed. HTML 5 was beginning to make a reputation, and the talk of semantics was not as loud anymore. I don’t believe semantics ever really took off, and even now, there are people trying to push semantics into the web, but honestly, I don’t feel like there is a lot of interest from most developers. I am not talking about famous people that give talks around the world. I am talking about the bast majority of developers; the ones that build the site for the local supermarket. These developers care about getting the job done, and getting paid. For them semantics is such a pain in the neck.

Why Semantics

I believe it is sad we don’t care about semantics. Semantics are a very important part of web authoring. They give meaning to what is otherwise a bunch of markup, and text. For you and me, it may be easy to look at a phone number in the screen and know that it is a phone number, and to know to whom it is related. Our brains are powerful enough to make the connection, to identify patterns, and to know that what it is looking at is a phone number, but computers are dumb. They need semantics to make the connection. They may be able to identify that something is a phone number based on patterns, but they don’t know what that phone number means. They don’t know to whom it belongs, or what in the page is related to that phone number.

Another example would be a sale. A human can visit an online store and see that there is a big banner announcing a sale. A human will know that there is a sale, but a computer won’t, unless there is some kind of semantic bit that tells it that the banner is for a sale.

I hope you see why semantics are important. However, make no mistake, I’m not here trying to convince you to care about semantics. On the contrary, I’m just here to rant about how hard it is to actually implement semantics in the web, and ultimately to argue that we may be wrong in our efforts towards a semantic web.

I appreciate the semantics of the web, not as a developer, but as web user. When I say web user, I don’t mean a web parasite: someone who just goes to youtube, facebook, and twitter, or who spends all day on funny sites looking at memes and cats. What I mean by web user is someone who uses the web to gather information, do research, and collect data. When you do these things, you appreciate the importance of semantics, because better semantics means you can automate your search for information. When there are good semantics, a computer can talk to a website, and understand what it is saying without human interaction. That is the dream.

Imagine a web where you can sit and ask the computer, “What is the phone number of my closest pizza parlor?.” The computer then connects to the internet, searches for pizza parlors close to your current location, grabs the phone number, and gives it back to you. We have things like this today, or at least very similar. The mobile industry is pushing hard towards these kind of interaction with the machine, and the only way this can be possible today is by incorporating semantics so that computers can talk to each other, and identify the information they need. Semantics tell the computer what a piece of information means.

Maybe it is because when I learned web development there was a huge emphasis in semantics that I believe that a good developer implements semantics. However, as a developer I have internal debates all the time regarding semantics, mostly because semantics get in the way of me developing happily.

Classes for Semantics

The problem, I believe, is how we try to implement semantics. The most basic step towards the semantic web is in the elements we use. Once we have used the semantically correct elements, we rely mostly on classes to give meaning to the data withing those elements’ tags. However, classes are also the main means of specifying styles. Since most of us care more about getting the job done than we do about semantics, we usually end up creating classes like “inner_content”, “main_content”, “column”, “sidebar”, and other like them. Some of these classes are arguably semantic, and this uncovers one of the main problems with semantics: semantics are very subjective.

Because semantics are basically the meaning of things, they depend a lot on the interpreter. Something can be semantically correct for one person, but not so much for another. Lets take for example, the class name “main_content”. One could argue that it is semantically correct, since it specifies that the content of the element is the main content in the document, but it could also be argued that it provides no useful information on what this content is about, which is basically what semantics should do. Another example could be the class name “phone_number”. It is very semantic, because it specifies that the content of the element is a phone number, but at the same time, it does not provide any information regarding whose phone number it is. We could argue that if we want real semantics, we should use the class name “company_phone_number”, or “president_phone_number”, or “store_phone_number”.

However, if we use many class names like such, we end up with a document that is hard to style. A quick solution would be to use two class names for each phone number. One would specify that the element represents a phone number, and the other one would specify what the phone number means, or to whom it is related. But if we do that, we may end up with documents that have thousands of different class names. To overcome this issue, we have created standards such as microformats.

False Semantics

The number one rule I learned in regards of class names is that a class should specify what the element is about, not how it should look like. Following that principle, I try to avoid using class names such as “blue_button”, “floating_bar”, “left_column”, and the like. But this is not always possible. There are times when you need to design an element without knowing what content is actually going to be placed in it. Think for example, of templates. We design templates, and we mean that some element should ideally contain certain kind of content, but we don’t know what is actually going to be in there. This can create false semantics.

Imagine that you develop a template for a portfolio. The template has a section for portfolio items. These items have an image, a title, and a description. You give these element a class name of “portfolio_item”. Someone sees the template, and thinks that the portfolio section would be great to create a directory. The portfolio item image could serve as the person’s picture, the title, as the person’s name, and the description could be used to enter the person’s details. There is now a site that has false semantics. This does happen in real life.

This raises the question, Is class-based semantics really a good idea? Regardless of the answer to that question, one of the most widely adopted initiatives toward a semantic web, microformats, uses classes extensively to provide semantic meaning to data. To be fair, regardless of how we implement semantics in our documents, there will always be the risk of false semantics when working with templates. However, one of my main concerns with class names for semantics is that classes are used to style elements.

Class Names to Describe Structure

As I mentioned earlier, one of the main rules regarding class names is that they should describe the contents of the element, not the visual representation. However, sometimes you need elements purely for the sake of design. Think of instances when you need to have columns available. You will most likely create some sort of markup to use every time you need to use columns. These markup needs some kind of styling, which means you will most likely add class names to the tags in the markup, but these classes describe the visual representation of the element, or at the very least they describe the type of visual structure that these elements create. For example, some of these classes may be “columns_container”, “column_one_of_four”, or “column_two_of_three”. These class names have no real semantic meaning. They simply describe how the elements are rendered in the page.

What is the Difference?

By thinking of that example, you may ask what the difference between a class name of “column” and one of “phone_number” is. For a person like me, who has invested a lot of time thinking about semantics, the difference is obvious, but for some people it may not be.

The easiest way to identify if a class name is semantic or not, is to see if you can guess what the element contains just by looking at the class name. If you cannot tell exactly what content the element has, then the class name is not semantic enough. The class name “phone_number” tells you right away that the content is going to be a phone number, but the class name “column” doesn’t tell you what the content is, but rather, that the content will most likely be displayed in a column. Thinking like this makes it really easy to see why a class name of “main_content” is not very semantic. It tells you that the content is probably the most important part of the page, but you don’t really know what that content is about.

However, we need those kinds of non-semantic classes if we want to come up with beautiful websites with complex structures. Back when the internet was still a baby, there was not much need for beauty in sites. The web was mostly a way to share information. Website were really simple, usually consisting of only text. It was mostly meant for the exchange of scientific information. However, as the web became available to a wider range of the population with different needs and likes, it was necessary to implement ways for developers to add beauty to the web. But the web was initially not meant to be like it is today. It evolved in its own, and it may have done it in the wrong way.

I believe that if we were to invent the web today, with the knowledge that we’ve gained, we would do it much differently. However, this is one of those situation where we need the experience we gained from the mistakes we made.

A Step in the “Right” Direction

I think the best move we’ve made so far in regards to semantics was the creation of XML. XML is hated for a lot of people, and for good reasons. XML is hard. I’m sure many of you will say, XML is very easy. What most people know about XML is:

It is a markup language.
You create your own tags.

I can see why they think it is easy. But if you actually read the XML documentation, you will see that it is not as easy as it seems. XML is hard to work with, hard to parse, you need to write DTDs, it is very verbose, hard for humans to read, it is easy to make mistakes in XML, and many other reasons. But the one thing it got right is that you define your own tags. This means you can tackle semantics really well. But the closest we’ve come to XML in the web was XHTML, and that didn’t end very well.

Semantics is a Dream

Semantics is just a dream. We like to think that somewhere, somehow there are people who have the key to a web semantic, and to turning iron into gold. The truth is that semantics is most likely not possible to accomplish with the current approaches and technology. That is not to say you should not care at all. A little semantics is better than no semantics. But a pure semantic web is far from our reach.

A New Hope

One of the main problems with semantics is that it leaves the low lever work to the author. Programmers are almost inherently lazy when it comes to doing repetitive tasks. The sounds of semantics doesn’t really appeal to a programmer because it requires manually specifying the meaning of data. It is true that the web is not entirely built by programmers. There are a lot of people out there who only know html and css, and they are building websites. Moreover, there is also a bast amount of people who don’t really know html and css but who are also building websites. The first group may be willing to implement semantics, but they mostly care only about the paycheck. The second group don’t care at all about any of these. They haven’t even cared enough to learn to create proper markup. So, the 3 kinds of people that build the web are not really interested on investing time on low level, repetitive tasks such as implementing semantics. I believe, it is at this point that we should look for other alternatives such as machine learning, natural language processing, and content recognition algorithms. If our goal is really to create a method for computers to understand data without human interaction, the approach should not be trying to create languages that computer can understand, but creating computers that understand the human language. While semantics may be a dream, creating such computers is the future.

Web Scraping with PHP

Web scraping is an interesting thing to do. There is a lot of data on the web, and there are many interesting things that can be done with it if it is scraped and organized in more meaningful ways. There are many ways of scraping data, and you may choose the one that is best for what ever it is you are trying to do. From the simplest of ways, manually copy and pasting, to the more complex such as automatic link following and computer-simulated human interaction, web scraping is useful, interesting and fun.

Imagine you want to study plane ticket prices and how the fluctuate over time.You may want to just bookmark the site, and visit every day. Copy the price and paste it into a spread sheet. This is OK since you only need to get a price and you could set some kind of reminder to make sure you don’t forget to do it. But what happens when you want to get price for the same ticket but different travel agencies, or even different airlines? Lets say you want to compare 100 different agencies. Now copy and paste doesn’t seem like a good idea.

A few days ago I had to gather information on car dealers from a site that allowed you to find car dealers near you based on your zip code. I had to run all the US zip codes, and get all the information into a database. That definitely didn’t sound like something I wanted to do with copy and paste, so I did what programmers do best: let the computer do the work.

I think many programmers will agree when I say that programmer are inherently lazy. I’m not talking about programmers spending all the time like a couch potato, because in fact many programmers work day and night. What I mean is that we like to do as little work as possible to accomplish a task. That is why we program in the first place, because we want to automate tasks so that we don’t have to bother ourselves with repetitive tasks.

So, how do you program a web scraper? There are many ways to do it, but the basic idea is always the same: fetch a resource from the net, usually a web page, analyze the code searching for the data that is relevant to you. Save that data somewhere. That is all you really need to know to start scraping data. So, lets build a simple web scraper in php.

Before continuing, I’d like to mention that there are scraping solutions already made. There is software for scraping data, and there are libraries written for many languages that specialize on data scraping. However, for the sake of learning, we are going to code our scrapers here by hand.

What do we need?
We need php, and a way to interact with the DOM. If you read my previous entry where I talk about php and the DOM, you know of a few options to do that. We will also need an idea of what we want to scrape. Lets start with something simple. We will scrape the box office information from IMDB. We will only do it this one time, but in a real life situation you may want to set a cron job to scrape the data daily, weekly or at any other interval of time.

The first thing we need to know is the URL of the page from which we want to scrape the data. In this case it is http://www.imdb.com/chart/
Then we need to know how to find the data in which we are interested. For that, since we are going to be using the DOM, you can just look at the source code of the page. You will notice that the information that we want to scrape is in a tr element with a class name of either chart_odd_row or chart_even_row. That is what we will use to identify the information.

Now, let the fun begin.

Crete a file called box_office_scraper.php. I placed it inside a directory called IMDB. Open the file on your favorite editor, and get ready to type.

First, lets get the document from the web:


$url = "http://www.imdb.com/chart/";
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$document = curl_exec($curl);

echo $document;

We declare a variable to hold the url of the document we want to fetch.
Then we initialize curl, passing in the url.
We set the CURLOPT_RETURNTRANSFER option so that curl returns a string containing the document rather than printing it.
Then we execute curl, and save the returned string in a variable.
Finally we echo the contents of the variable just to make sure we got everything right. You should now be seeing the same thing you would see if you visited the url directly. We echo the contents just to verify, but we don’t really want to echo the contents, so you can now delete that echo line.

Now that we have the document, it is time to search for the data we want, but first we need to create a DOM representation of the document we have. Continue editing you file:


$dom_rep = new DOMDocument;
$dom_rep->loadHTML($document);

If you reload your page now, you should no longer see the page that you were seeing before (provided you deleted the echo). Depending on your php configuration, you may, however, see a bunch of warnings. That is because the document is malformed. Personally, I prefer to turn those warnings off in this case. Usually, I like to have all error and warnings visible, so I can find ways to get rid of them by fixing whatever is causing them, but in this case those warning are only polluting my page. If you want to get rid of the warnings, just add this line at the top of the document, right after the opening php tag:


error_reporting(E_ERROR);

This way you tell php to only report errors. We want to be able to see when something goes wrong.

Now that we have a DOM representation of the document, we can start working with it. Since we know the data we want is in tr elements, we can just grab them all and see if they have the class names we are looking for. Unfortunately, the DOM library that comes with php has no way to get elements by class name, so we will have to write our own.


$all_trs = $dom_rep->getElementsByTagName('tr');
$trs_we_want = array();
foreach ($all_trs as $tr) {
  $class_name = $tr->getAttribute('class');
  if (preg_match("/chart_(even|odd)_row/", $class_name)) {
    $trs_we_want[] = $tr;
  } 
}

We wrote a simple loop, but we could have written a more robust function. In this case the loop is enough.

Now that we have all the elements we need, we can proceed to get the data. One thing to notice is that we will get 30 tr elements, but we are only interested in the first 10. We get 30 because we also get the ones from the other two tables in the page.

Lets loop our elements up to the 10th and get the data:


for ($i = 0; $i getElementsByTagName('td');
  $the_tds_arr = array();

  foreach ($the_tds as $td) {
    $the_tds_arr[] = $td;
  }

  $movie_title = $the_tds_arr[2]->nodeValue;
  $rank = $the_tds_arr[0]->nodeValue;
  $weekend = $the_tds_arr[3]->nodeValue;
  $gross = $the_tds_arr[4]->nodeValue;
  $weeks = $the_tds_arr[5]->nodeValue;
  echo "
"; echo "

$movie_title

"; echo "Rank: $rank
"; echo "Weekend: $weekend
"; echo "Gross: $gross
"; echo "Weeks: $weeks
"; echo "
"; }

As you can see, we are only looping and getting the data that we want. We created the $all_tds_arr array because we cannot access the $all_tds as an array. We could have used more DOM, but the idea here was to keep it as simple as possible. In this example we are only printing the info on screen, but on a real life situation you may want to save it to a file, a database, or a spreed sheet, or some other kind of back end that you have.

Here is all the code:


error_reporting(E_ERROR);

$url = "http://www.imdb.com/chart/";
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$document = curl_exec($curl);

//echo $document;

$dom_rep = new DOMDocument;
$dom_rep->loadHTML($document);

$all_trs = $dom_rep->getElementsByTagName('tr');
$trs_we_want = array();
foreach ($all_trs as $tr) {
  $class_name = $tr->getAttribute('class');
  if (preg_match("/chart_(even|odd)_row/", $class_name)) {
    $trs_we_want[] = $tr;
  }
}

for ($i = 0; $i getElementsByTagName('td');
  $the_tds_arr = array();

  foreach ($the_tds as $td) {
    $the_tds_arr[] = $td;
  }

  $movie_title = $the_tds_arr[2]->nodeValue;
  $rank = $the_tds_arr[0]->nodeValue;
  $weekend = $the_tds_arr[3]->nodeValue;
  $gross = $the_tds_arr[4]->nodeValue;
  $weeks = $the_tds_arr[5]->nodeValue;
  echo "<div>";
  echo "<h2>$movie_title</h2>";
  echo "Rank: $rank<br />";
  echo "Weekend: $weekend<br />";
  echo "Gross: $gross<br />";
  echo "Weeks: $weeks<br />";
  echo "</div>";
}

On the next post we will see how we could use the libraries we talked about in the past for working with the DOM to make scraping easier.

What Is Currying Useful For?

Have you ever heard of currying? Currying is basically taking a function that takes 2 or more arguments and turning it into one that returns a function that can be called with a single argument. The returned function then returns yet another function which also can be called with a single argument. This continues for as many times as the number of arguments in the original functions. In other words, instead of doing this:


sum(1, 2, 3);

You do this:


sum(1)(2)(3);

But, why would you want to do that? At first sight it does not seem very useful. However, it is.

Imagine you have a function that takes two arguments. The first argument would be a DOM node reference, and the second would be a string. The function takes the string, and sets it as the text content of the element referenced as the first argument. In Javascript the function would be something like this:


function writeToNode(node, value) {
  node.innerHTML = value;
}

The curried version of that function would be something like this:


function writeToNode(node) {
  return function (value) {
    node.innerHTML = value;
  }
}

As you can see writeToNode now returns a function that takes the value as the argument. This function is the one to perform the task. What is the benefit of that? Well, it all actually depends on what you are doing, but think of for example a situation where you want to be able to show a different message depending on some condition:


var nodeRef = writeToNode(node);

if (condition) {
  nodeRef('Condition is true');
} else {
  nodeRef('Condition is false);
}

This example is pretty simple, so the benefit may be not even noticeable, but in a more complex situation they are obvious.

Why would you want to change your code to be like that anyway? Why not just do this:


// Here the function is not curried.
if (condition) {
  nodeRef(node, 'Condition is true');
} else {
  nodeRef(node, 'Condition is false);
}

Well that works well in this example (if you ignore the fact that you are repeating your self there anyway), but what happens when instead of passing a node reference you want to pass an id? The non-curried function becomes this:


function writeToNode(id, value) {
  var node = document.getElementById('id');
  node.innerHTML = value;
}

If you are used to optimizing your code, then I’m sure you already spotted the problem with the function. Every time you call that function, there is DOM interaction, which you should reduce as much as possible. Sure you could use a closure to save a reference to the DOM node, but that brings in its own problems:


var writeToNode = (function() {
  var node;
  return function (id, value) {
    if (!node) {
      node = document.getElementById(id);
    }
    node.innerHTML = value;
  }
}());

By doing this, you’ve accomplished a few things. You’ve made it so that the function only interacts with the DOM once, but you’ve also made it so that the node can never be changed. There are ways to overcome that but they make the function even more complex. Since we are talking about complexity, you have also made the function more complex than it needs to be.

There are many other benefits of currying, but at the end it all comes down to a single simple principle: Always use the best tool for the job. Sure currying has benefits over the example shown here, but there are instances where currying will not be the way to go. So, always look for the best way to accomplish the task at hand.

Responsive Web Design.

Lately, I’ve been working on some responsive layouts. My task on these projects is to make a fixed-width layout into a responsive one. I’ve found it to be an engaging process.

Responsive web design is one of the “modern” big ideas of web development. It is becoming increasingly popular, and there is no sign of it going away anytime soon. The reason, I believe, is that responsive web design is just what we need on this era of small devices and huge screens. There has been an increased usage of big screens in the last few years. Whereas a few years ago 1024 was almost a standard, and we were all designing for those screens, nowadays people seem to have a tendency for much wider screens. Most laptops today come with at least 1200px resolutions, and people who work with desktop machines usually prefer even greater resolutions.

Along with this tendency for higher screen resolutions however, there is also an ever increasing usage of a wide variety of devices, each with different screen resolution, and many of them with portrait and landscape modes. This has presented a new challenge for designers and developers. Whereas in the past we could easily decide for 1024 and leave out a few old timers with 600px resolutions, now we cannot do that anymore. If we continue on that path, we are leaving out the increasing amount of people who browse the web on mobile devices daily.

The question then became: How do we deal with this new challenge? The answer is responsive layouts. Responsive layouts started earlier than we realize. They have their origins on the old, and never popular liquid layouts. Liquid layouts was the first step we took on dealing with the problem of different screen resolutions. However, they were never really popular because they presented a few challenges that, although not impossible to overcome, made fluid layouts more of a hassle tan static (fixes-width) ones. I think one of the reasons for this preference for static layouts is that web designers are mostly not real web designers, but graphic designers that saw an opportunity to make a few, and sometimes more than a few extra bucks. Graphic designers have some kind of obsession with pixel perfection. In itself that is not a bad thing at all, but when you bring that idea to the web, really bad things can happen. One of those things is that we are much too used to pixel units that we have almost forgotten how to work with percentages, and ems. This is a problem because those units are the building blocks of responsive web layouts.

I know I’ve been a bit aggressive towards graphic designers. Maybe I just envy their ability to create beautiful things out of a white canvas, but I prefer to think that my reasons are legitimate rather than selfish. I know that if you know and love web as much as I do, you will agree with me. Back to our topic of concern, how is responsive web design different from liquid layouts?

Responsive web design has been made possible thanks to new CSS3 features. Specifically media queries. Media queries are something somehow old and new at the same time. CSS2 offered the possibility to set different stylesheets for different mediums like screen, print, and screen readers, but CSS3 offers the awesome possibility of detecting the width, max-width, orientation (landscpape or portrait), and device width. This offers web designers/developers the chance of applying different styles depending on the width of the device, or the screen, or even the orientation. A two column layout can be easily converted into a single column layout when the size of the screen makes the columns too narrow to look beautiful or to be even readable. That is just awesome.

No longer do we need to have two or more versions of a single site to serve different devices. As programmers (I like to believe the web is build buy programmers, please let me continue deceiving myself), most of us are lazy. Moreover, we really hate doing the same thing twice. That is why we build libraries, and frameworks that simplify our work. I don’t think there is anything good, or fun about having to maintain two or more versions of the same site.

There is also another problem associated with these multiple-site approach. Often some sniffing techniques are used to decided which version to serve. I really have no idea of what they sniff for, but my guess is they check browser, and/or OS. This is a problem because sometimes you end up serving mobile versions of a site to devices that are fully capable of handling desktop versions of such site. For example, I have an Asus transformer tablet with 1200px(?) screen resolution, but sometimes I’m served little layouts that waste most my screen. One could argue that is still a good idea to serve such version to a mobile device given the cost of mobile internet, and the capabilities of it, but my tablet can connect to the internet only via wifi, so there is no real reason to do such thing. I would love to have the developers of those sites in front of me and tell them to stop sniffing and to start looking at the capabilities of my device, or at the very least give me an option to use the full site, not a semi functional version of it. This is why responsive layouts arise a champion on this matter.

Responsive layouts do not assume anything, but rather they set a few rules that apply or not depending on the device and window size. Moreover, responsive designs has a mobile first philosophy, which I see as the layout version of progressive enhancement. Mobile first is a great idea because it forces the designer into thinking about what is really important on each page. Mobile first lets us consider the content in a more responsible way, and to realize what is important, and what is not.

There is much more to talk about on this matter, and not everything is perfect. There are some parts of responsive web design that are not necessarily good, but overall, responsive web design is definitely one of the great ideas of web design that the modern world has come up with.

IF you wish to read more on this matter, I recommend the following articles:

http://friendly-machine.com/posts/2011/when-a-trickle-becomes-a-flood

http://unstoppablerobotninja.com/entry/fluid-images

http://www.alistapart.com/articles/howtosizetextincss/

http://webdesignerwall.com/tutorials/css3-media-queries

http://www.alistapart.com/articles/responsive-web-design/

Some of them are not necessarily about responsive web design, but about some of the things that make responsive web design possible, like fluid images.

Would You Please Learn How to Design Websites

I’d like to start by saying that I am not, nor do I pretend to be a graphic designer. I don’t do beautiful illustrations, logos, or graphics. I am not in the business of creating nice business cards, or great corporate identities. I do web.

I see a lot of graphic designers who develop wordpress themes, and other kinds of web related designs. I want to concentrate on wordpress themes, because that is what I’ve been dealing with lately. It really makes me tired, bored, sick, and it frustrates me when some guy who has been doing graphic design all his life comes to me and tells me: “The sidebar looks 2 pixels to the left on IE6.” The web is like that; dealt with it.

The beautiful web is more and more populated with sites that use too much resources. Websites nowadays load a stylesheet for modern browsers; one for IE7; another for IE6; another for hand-held devices; and tons of javascript to force their own font and to add shadows to sucky IE version. Seriously, that is not the way to do web.

I blame the guys who like everything with tons of eye candy. Those people who think it is really cool to have a stupid menu flying around are the ones destroying the precious web. What ever happened to sites that just worked? That were beautiful, functional, user friendly, and resources friendly. I admit it, it is pretty cool to have webGL, CSS animations, canvas coolness, and other tons of nice features, but you don’t have to use them all at once. Those tools should be only that: tools, but instead, they’ve become the website itself.

The root of the problem, I believe, is graphic designers, and graphic designer wannabies that once learned about jQuery, or some other of those toys. They thought, “look, I can make a big poster that moves!” I would like to salute them all with a huge facepalm.

When real web designers design, magic happens. We think in elements, not just colors and lines. Real web designers have a pretty good idea, not only of what is possible, and what is not, but of the amount of work and resources that something would take. We don’t think about an animation in terms of $(‘#elem’).animate(), but in terms of timeouts, loops, calculations, site repaints, usability, and accessibility. How does this animation help the user? How does it not help? What happens if the user is blind? What happens if the user has some kind of body movement difficulties? What happens if the user has and old computer? What happens if the user is using an old device/software? What happens if the user’s computer is doing some sort of heavy computation? At what point do we stop this animation and fall back to a more resources friendly version of the site? There is much to consider to have time to worry about 2 pixels on the sidebar.

Real web designers know their site won’t look the same everywhere. This shadow won’t come out on IE6, but that is OK, because I’ve designed the site in such a way that even without the shadow the site looks beautiful, and more importantly it works. Real web designers know that it is more important to save an http request and a few KB than to force that shadow to appear on IE6. The website will not look the same everywhere, and real web designers know that. This is what allows them to designs websites that look beautiful everywhere, even if they don’t look the same. We build for the user.

You are more likely to receive a complaint saying “Your website takes ages to load,” than one that says “on firefox the sidebar looks 2px to the right.” Please, learn how to design websites.

A Bit About Privacy on Javascript. Or, Take a Look, You Can’t See Me

So, it seems my last “talk” on nested functions left a big hole. I touched a little about privacy and showed a piece code that partially implements private variables. This code turned out to have a big hole in it. A security breach, if you’d like to call it that way. The code I showed was purely for demonstration purposes and it was not meant to be a full implementation of private variables. That, plus the fact that I don’t work with so-called data types lead to a poor implementation of a powerful concept.

Lets begin. I’m going to show the code first, because that is what you people like right? Then you go ahead and find holes in it. Then come back here and read the explanation of what is going on and why it is good. If you find holes in the implementation of private variables, let us know in the comments.

function foo(param1){
     this.constructor(param1);
 }
 foo.prototype = function(){
     var private = {};
     var i = 0;
     function c(p){
         var t = new Date().getTime() + i++;
         private[t] = p;
         this.constructor = null;
         this.getI = function(){
             return t;
         };
     }
     function b(){
         return private[this.getI()];
     }
     return {
         constructor : c,
         bar : b,
     }
 }();
 var c = new foo("private, I am");
 var d = new foo("or not so much.");
 alert (c.bar());

So, this is almost the same code as last time, but with a few minor changes and the introduction of one of those evil functions that get created everytime an outer function is called. OMG, I’m the worst programmer (lol).

First we changed private to an object. Yes, an object within an object. Objection?(If you know memes, you’ll get the joke). We need private to be an object to avoid the problem we had that caused the “private” variable to be modified by another object. Then we have our c function. Everything happens inside c. First, we create a variable t that holds the value returned by getTime, and we add i to it. This is in case two objects are created at the same time.

Next, we save our private value inside private with its reference set to t. Unset the constructor, and create a new method for the object on the fly. This is the part where people start dying because I created a function that gets created again every time the outer function executes.

So, why is Buzu teaching bad things? Short answer is I’m a rebel. Long answer is, it turns out this functions are not bad if used properly. There is nothing bad with a little function like this, and in fact, it makes a lot of sense. This function belongs to the object itself, not to the prototype. Besides, look at it, it’s so little and cute. How bad can it be?

Anyway, I hope you are still with me. The newly created method returns data that is not useful to an attacker. More over, if an attacker wanted to access the private property of another object. He would have to know when that object was created, and what the value of i was at that time, so it’s pretty hard. Now, even if an attacker guessed such value, how is he going to access private? it is inside a function that returned already. Now, how is that different from this:

function Foo(paramOne) {
    var thisIsPrivate = paramOne;

    this.bar = function() {
        return thisIsPrivate;
    };
}

var foo = new Foo("Hello, Privacy!");
alert(foo.bar()); // alerts "Hello, Privacy!"

Well, the short answer is, it is hella lot different. Long answer has to do with how many functions you have to create on each method. With the method I suggested, you only need to create one function no matter how many private variables you have. With this method, you need to create a function per private variable. That IS bad.

And, again, let me remind you that it is foolish to try to suggest privacy. If you want something to be private, make it so. Be proactive on your defence.

Now, I don’t usually work with data types, but I know them. They become useful if you find the need to extend objects, but I rarely find that need. In fact, most things can be done with pure objects. And with pure objects privacy can be achieved without functions that get created every time another function executes. Life is good when you work with pure objects. I would much rather have the ability to clone objects than to extend data types.

I Love My Work, but I Hate Web Development

The more I work on web development, the more I realise how much I hate it. This is a very strange situation, especially because I do really love my work; I just hate the way I have to do it.

Let me explain what I mean. I hope that by the end of this long read you get the idea that I’m setting to communicate here, and that you join me on my rebellion against web development as it is today.

I will start with an example. Imagine a world where there are no computers, or typewriters, or even paper. All you have is rock and hard tools to scribe your thoughts. In this world there is a very high demand for books, which are made out of stone and cannot be mass-produced easily. You are a fiction writer. People love your books, and you love writing them. Except, that you hate having to write them.

It is not the idea of creating content that you hate, or the motion of writing itself, but rather the fact that it takes you a very long time to put those books on stone because of the rudimentary tools that you have. This is precisely what I mean in the title.

It is not the idea of creating an awesome website or webapp that I hate, but rather the nuisance that it represents actually creating them with the set of rudimentary technology that we have available. Even more annoying is the fact that not only do we have to use a very basic set of tools, but we also need to support even more archaic tools. And to add insult to injury, we are supposed to obey by a set of rules that have been laid out by people who mostly do not understand the web of today. Let along that one of tomorrow, which we should have gotten yesterday.

The more I work on web, the more I realise how wrong it feels to do it. It really feels wrong to have to create my content using a limited set of tags that were thought of by people who knew nothing about what I’m trying to create, and who would probably never care. It feels wrong to have to use an API that is mostly redundant (DOM), and designed by people whose work is not to create with this API. It feels wrong to have to do the same thing more than once to make sure that it works across browsers and platforms. I’m sure that by now you are getting the idea.

This is why Ajax libraries have succeeded. Ajax libraries are created, maintained, pushed, shaped, and refined by content creators, not by specification-and-standards writers. The people behind jQuery, Dojo, MooTools, YUI, and many others, are people who have created these tools to solve problems they have encountered in their journey as web and application developers. We need more of this, and less of W3C nonsense.

I used to be standards oriented until I discovered that the people behind the standards have no idea of what they are doing. Don’t get me wrong here. I have a lot of respect for the people who set out to standardise a huge mess such as the web. It is just that they have spent too much time trying that they haven’t realised the web changed.

The best time machine mankind has invented is called webstandards. While reading some of them, and not only web standards, one can feel like it’s re-living 2000 all over again. And it’s scary because sometimes you are actually reading a year-2000 document. This is insane. You cannot standardise the web of today when your base for standards is the web of 2000.

There are many examples of things that should have been done differently. One of the most common case-scenarios that frustrates me is CSS. In my opinion, and I express it because this is a very subjective essay, CSS is wrong. It starts with classes, then it degrades with some aspects of selectors, especially how they are interpreted, and it hits the bottom with the syntax passing for some other stages in the process.

The idea of classes is a language that has no way of extending this classes is wrong. One can argue that you can extend a class by adding the class name that you want to extend to the CSS declaration of the object from which you want to extend, like this:

.car{
color: blue;
width: 200px;
height: 50px;
}

If we want to create a new element that has the class name of truck, and we want it to extend the CSS of car, we can do it like this:

.car, .truck{
color: blue;
width: 200px;
height: 50px;
}
.truck{
width: 300px;
}

This is fairly simple, but not quite usable when working on real projects because most of the time, by the moment you reach truck, you already added a hundred lines under the .car declaration. You have two options. One is to move your .truck declaration 100 lines up and mess up your CSS file organisation. The other is to leave them both 100 lines apart from each other which is also not good because the it becomes a maintenance problem, which by the way the first option does too.

Moreover, this is not extending, it is merely subscribing two classes to the same set of rules. Subscriptions are good, but not as the base for extending classes.

How I think it should be done:

.car{
color: blue;
width: 200px;
height: 50px;
}
/*100 lines of code go here.*/
.truck extends .car{
width: 300px;
}

What is the big difference? you might ask. This option also has the problem of having both declarations 100 lines apart from each other. However, in this version, there is a clear way to know that the styles that you see for .truck are not the only ones affecting .truck, where as in the current way of doing so, there in no such visual cue.

The whole idea of classes is wrong in most programming languages, but that is material for a whole other essay.

There are many ideas that have been laid out to solve the many issues of CSS, two that come to mind are OOCSS, and Stylus. Both are worth a look.

CSS is not a programming language. Rather, it is a styling language, which was good enough for when the web was a bunch of static documents. It even made sense when the webapps where things like forums, blogs, and e-papers. Today we need a styling language that borrows ideas from programming languages. One feature that would be a first good step in the right direction is adding variables to the language, and arithmetic capabilities. Stylus implements these ideas.

CSS is a styling language, but we need it to become a state representation language. The web of today is state based, and CSS doesn’t do well in this area. Moreover, we need it to be an animation language. So far Javascript has done that job, but CSS has demonstrated that it has potential to do it to.

Another reason why I hate web development is HTML. There is a lot of excitement about HTML5, which is good, but there are fundamental flaws in the design of HTML that just make the language not suitable for the web. There are, for example, security issues with window. This could be thought of as a problem with javascript, but it is not. Javascript has no window, that is an HTML global variable. Javascript has a global object, and window refers back to that global object, but window is not part of Javascript, it is the global object implementation in HTML. HTML5 does nothing to solve this issue. Instead, it adds more complexity to a system that is in itself complex, and complexity is the first enemy of security. But there are more fundamental problems with HTML.

HTML is a very closed language. HTML has a defined set of tags. This tags have a define behaviour, and you cannot expand this tags. Technically you can, but at some point you will run into trouble. There is no way to create semantically meaningful markup with the language that we have today. This can be argued, of course, but the truth is that HTML is too small to cover the needs of developers today. In order to create meaningful markup you need to use classes, and IDs, and this becomes a problem when you are creating webapps.

Once I said that HTML5 was built for blogs. The new additions to HTML that conform the HTML5 specification has a clear influence from the blogging community. They wanted an article tag, an aside tag, a section tag, a header tag, a footer tag and all of the rest because that is what they need. If you get together with blog designers, all they talk about is content, footer, sidebars and sections. We cannot build the web on top of blogging engines.

We need a language that allows us to specify more than just content sections; we need to be able to specify UI elements other than input boxes and buttons. There are some new elements that allows us to do this, like some new form elements that promise a brighter future, but it is not enough. We should have gotten those back when the talks about web 2.0 were all over the web.

HTML is just not suited for web development today.

It seems that from the stack (HTML, CSS, and Javascript), the only language being pushed in the right direction is Javascript (actually Ecmascript). This is a good thing.

Finally, the DOM is a mess. When people say the hate Javascript, the usually mean this:

var a = document.getElementById(‘myID’);
var b = document.createElement(‘span’);
b.appendChild(document.createTextNode(‘My text’);
a.appendChild(b);

I look at them and say, please step back from the computer and read a book. That is not Javascript. What you are seeing right there, that mess is the DOM’s API.

The DOM has many design issues that are beyond my willingness to explain them. The one thing you should know about it is that it was poorly design and it’s an awful thing with which to work. It is ridiculously repetitive and has horrible method names. Don’t let people fool you into thinking that is the way to write methods, because it is not.

There are many other reasons to hate web development, like the one we call IE*, but I think you get the idea.

There are two things that need to happen for web development to really advance. The first is we need to rethink HTML and CSS, and the second is getting rid of browsers that are holding back web development. Until this happens, we will be living in a world with a wasted web.

*IE is getting better, but that is of no use when the older version stick around for too long like they do.