Shoot Yourself on the Foot by Using PHP’s Extract

PHP’s extract function is one of those function that we use mostly because we are lazy. It makes it really easy to access associative arrays’ members without having to write too much. For example, imagine you have to process a form submitted by the user that has 50 different fields. You would have to do something like this in order to access the field values:


$name = $_POST['name'];
$meail = $_POST['email'];
$company = $_POST['company'];
.....

You would have to write 50 lines that look like those ones above to access every field value in the submitted form. I don’t think I need to mention what a pain that is. In fact, it is such a burden that on a lazy evening we might just write something like this:


extract($_POST);

Which is a really nice way to shoot yourself on the foot.

The problem with that is that it may end up overwriting variables defined previously. I guess you could argue that the first piece of code in this post does the same thing. For example if $name is already defined, it would be overwritten, which is certainly true. However, you are aware of that, and even if you are not, you will notice it a some point. The problem with extract is that another variable could be inserted and you would never know it. Lets see an example of what I mean:

Yesterday, while working on a project, I realized the security risks of extract. I have been working on this project for the last few days, and I was using extract in a few places on the code, but I was feeling uneasy about that. I thought there might be security implications with extract. In my experience, any function that makes handling unsafe data easy has some security implications. I decided to look into that, and sure enough there are security risks if extract is used carelessly, and just because you are too lazy to do things the long, usually right, way.

The applications that I’m working on has members, so we need a way to log them in and we do it through a form. This for is posted to a file on the application’s engine that takes care of processing it and determining whether the user should be logged in or not. For testing purposes I was using a variable a flag for users being logged in or not. Usually you would use sessions to do that, but for testing purposes a variable was good enough. Lets imagine the code looked like this:


$logged_in;

if(logged_in){
   //render screen for logged in users
}else{
   //render log in form
}

As I mentioned before, the log in form posts to a file that takes care of the log in logic. Then, if something goes wrong, it redirects to the log in page, and sends the submitted data using GET. The log in page then checks if that data is been submitted via GET, and if so, it uses it to display the submitted values on the form so the user doesn’t have to write all his info again. With that in mind, lets modify the code a bit:


$logged_in;

if($_GET && isset($_GET['data'])){
    userialize($_GET['data']);
    extract($data);//This makes getting the data information easy.
}

if(logged_in){
   //render screen for logged in users
}else{
   //render log in form
}

You can see where we are using extract. It will create variables based on the indexes of the data array. Now, what happens if someone modifies that data array? Suppose the array looks like this:


$data = array('name'=>'John Doe', 'email'=>'john@doe.com');

When we use extract this would create two variables. This is what we want, so that instead of doing $data['name'] we can access the value directly by doing $name. But if someone modifies the array to something like this:


$data = array('name'=>'John Doe', 'email'=>'john@doe.com', 'logged_in'=>true);

Then we have problems. This will create a logged_in variable with true as its value. This would overwrite whatever value is set on our $logged_in variable, and let the user log in without the right credentials. Now, remember that the code above is all just for the sake of this example. I don’t think anyone who knows what they are doing would write code like that. However, it serves the purpose of showing how dangerous it can be to use extract carelessly.

The extract function can be called with parameter that specify how it should handled variable collisions. But in my opinion, you should just avoid using it when processing user submitted data or any other kind of unsafe data.

I recommend reading the extract documentation if you plan on using it.