Category Archives: Web Development

Are array_* functions faster than loops?

I was discussing the speed of filtering an array with a colleague and I had been under the assumption that using PHP array_* functions are considerably faster than using a loop (foreach, for, while) . I could not find evidence of that when I was doing some Google searches though so I decided to do a little bit of a speed test on my own.

For all of these tests I set up the following beforehand:

ini_set('memory_limit', '500M');
$data = range(0, 1000000);

Test 1: array_filter vs loops

Type Time
foreach 0.37
while 0.58
for 0.61
array_filter 0.74
// array_filter loop average 0.74 seconds
$start = microtime(true);
$data = array_filter($data, function ($item) {
    return $item%2;
});
$end = microtime(true);

echo $end - $start;

// Foreach loop average 0.37 seconds
$start = microtime(true);
$newData = array();
foreach ($data as $item) {
    if ($item%2) {
        $newData[] = $item;
    }
}
$end = microtime(true);

echo $end - $start;

// For loop average 0.61 seconds
$start = microtime(true);
$newData = array();
$numItems = count($data);
for($i=0;$i<=$numItems;$i++) {
    if ($data[$i]%2) {
        $newData[] = $data[$i];
    }
}
$end = microtime(true);

echo $end - $start;

// While loop average 0.58 seconds
$start = microtime(true);
$newData = array();
$numItems = count($data);
$i = 0;
while ($i <= $numItems) {
    if ($data[$i]%2) {
        $newData[] = $data[$i];
    }
    $i++;
}
$end = microtime(true);

echo $end - $start;

Test 2: array_map vs loops

Type Time
foreach 0.65
while 0.69
for 0.76
array_map 1.38
// array_map average 1.38 seconds
$start = microtime(true);
$data = array_map(function ($item) {
    return $item+1;
}, $data);
$end = microtime(true);

echo $end - $start;

// For loop average 0.65 seconds
$start = microtime(true);
$newData = array();
foreach ($data as $item) {
    $newData[] = $item+1;
}
$end = microtime(true);

echo $end - $start;

// For loop average 0.76 seconds
$start = microtime(true);
$newData = array();
$numItems = count($data);
for($i=0;$i<=$numItems;$i++) {
    $newData[] = $data[$i]+1;
}
$end = microtime(true);

echo $end - $start;

// While loop average 0.69 seconds
$start = microtime(true);
$newData = array();
$numItems = count($data);
$i = 0;
while ($i <= $numItems) {
    $newData[] = $data[$i];
    $i++;
}
$end = microtime(true);

echo $end - $start;

Test 3: array_walk vs loops

Type Time
foreach 0.65
while 0.69
array_filter 0.72
for 0.76
// array_walk average 0.72 seconds
$start = microtime(true);
$data = array_walk($data, function ($item) {
    return $item+1;
});
$end = microtime(true);

echo $end - $start;

// Foreach loop average 0.65 seconds
$start = microtime(true);
$newData = array();
foreach ($data as $item) {
    $newData[] = $item+1;
}
$end = microtime(true);

echo $end - $start;

// For loop average 0.76 seconds
$start = microtime(true);
$newData = array();
$numItems = count($data);
for($i=0;$i<=$numItems;$i++) {
    $newData[] = $data[$i]+1;
}
$end = microtime(true);

echo $end - $start;

// While loop average 0.69 seconds
$start = microtime(true);
$newData = array();
$numItems = count($data);
$i = 0;
while ($i <= $numItems) {
    $newData[] = $data[$i];
    $i++;
}
$end = microtime(true);

echo $end - $start;

End Notes

I was incorrect when I thought that array_* functions are faster! Albeit the speed difference is pretty negligible when talking about using it once or twice during page execution. I won’t stop using the array_* functions because of my findings, they still offer a cleaner way of writing the code. I will only second guess using them when I am writing code that has the potential to be processed thousands of times.

I tested these with a pretty basic data set. Speed may vary depending on the type of data being used as well as the version of PHP (I used 5.5.4).

GruntJS “Warning: Uglification failed.”

I was working with Grunt the other day attempting to update some legacy code to use uglify and I kept getting errors such as:

>> Uglifying source path/to/file.js failed.
Warning: Uglification failed.
Unexpected token punc «(», expected punc «:».
Line 12 in path/to/file.js
 Use --force to continue.

Aborted due to warnings.

When I went to line 12 I saw absolutely nothing wrong. It was a comment block like the following:

/*
An unobtrusive comment goes here
*/

I tried removing the file from the list being included and it happened again. Same error different file. Everything I was reading said it was most likely invalid Javascript doing it, but since I write code perfectly, that could not be the case. Fast forward 10-20 minutes and I found this issue on Github that sounded similar. A comment from Eric Range advised that their code editor had added a line ending of “\n\r” to the file and that caused their build to fail. So I took a look at line endings and sure enough, a handful of the files were using “\n\r”. I did a find/replace and changed them all to just a single new line (“\n”). I attempted to uglify it once more and yet again I was shown an error! I was beginning to think uglify and I were not going to be good friends.

Running "uglify:build" (uglify) task
{ message: 'Unexpected token: eof (undefined)',
  filename: 'file.js',
  line: 3032,
  col: 0,
  pos: 129381,
  stack: 'Error\n    at new JS_Parse_Error (/path/to/node/node_modules/grunt-contrib-uglify/node_modules/uglify-js/lib/parse.js:196:18)\n    at js_error (/path/to/nodenode_modules/grunt-contrib-uglify/node_modules/uglify-js/lib/parse.js:204:11)\n    at croak (/path/to/nodenode_modules/grunt-contrib-uglify/node_modules/uglify-js/lib/parse.js:678:41)\n    at token_error (/path/to/nodenode_modules/grunt-contrib-uglify/node_modules/uglify-js/lib/parse.js:682:9)\n    at unexpected (/path/to/nodenode_modules/grunt-contrib-uglify/node_modules/uglify-js/lib/parse.js:688:9)\n    at block_ (/path/to/nodenode_modules/grunt-contrib-uglify/node_modules/uglify-js/lib/parse.js:1000:28)\n    at /path/to/nodenode_modules/grunt-contrib-uglify/node_modules/uglify-js/lib/parse.js:974:25\n    at function_ (/path/to/nodenode_modules/grunt-contrib-uglify/node_modules/uglify-js/lib/parse.js:979:15)\n    at expr_atom (/path/to/nodenode_modules/grunt-contrib-uglify/node_modules/uglify-js/lib/parse.js:1187:24)\n    at maybe_unary (/path/to/nodenode_modules/grunt-contrib-uglify/node_modules/uglify-js/lib/parse.js:1357:19)' }
>> Uglifying source path/to/file.js failed.
Warning: Uglification failed.
Unexpected token: eof (undefined).
Line 3032 in path/to/file.js
 Use --force to continue.

Aborted due to warnings.

I looked at the differences between the files that worked and the ones that failed. The only difference I found was that the failed files used tabs instead of spaces (they were created at a time when there was no convention for tabs vs spaces). So I converted the files that were using tabs to spaces and tried again. Bingo! It worked perfect.

Final Solution

My final solution was to change all files to use unix line endings (“\n”) and to convert tabs to spaces.

RequireJS “jQuery not defined”

This is a super quick and simple post… but maybe it’ll save someone time down the line. I sure won’t make the mistake again. I spent more time than I would like to admit figuring out this issue. There are a lot of posts sprinkled around about different causes of the error, none of which were the cause of my error.

Given the following RequireJS config:

require.config({
    "baseUrl":"\/path\/to\/dir\/",
    "paths":{
        "jQuery":"jquery.js"
    }
});

I was trying to do the following:

require([jQuery], function ($) {
    console.log($);
});

That gave me the dreaded “jQuery not defined” error!

Solution

Add quotes around “jQuery”!

require(["jQuery"], function ($) {
    console.log($);
});

Without the quotes jQuery was being treated as a Javascript variable instead of a string. So when it was looking for the jQuery dependency it did not exist. “jQuery” on the other hand does.

Working with iterators and arrays SPL

Iterators are the next thing from the SPL I am going to take a look at. For those that are not familiar with iterators yet, an iterator is like a list or a collection of items that you can move through (traverse). If that sounds familiar to you, it should. They share a lot of features and functionality that an array has in PHP. The biggest differences are that they are often more efficient with memory, faster to process, and object oriented.

 Iterators

There are a lot of iterators in PHP. Not all of them are part of the SPL though, such as SimpleXMLIterator, so keep that in mind as you may come across an iterator in places you may not expect. It can be pretty intimidating looking at the list, and it only gets worse when you see something called a RecursiveIteratorIterator.

ArrayIterator

ArrayIterator is probably going to be the best spot to start out. ArrayIterator bridges the gap between Array and Iterator and allows for conversion back and forth. In it’s most basic implementation it accepts an array as the parameter and it converts it into an iterator.

$cities = array('Torrington', 'Burlington', 'New York City', 'Warwick');
$cities = new ArrayIterator($cities);

/*
You can alternatively use these methods to add items to an ArrayIterator
$cities = new ArrayIterator();
$cities->append('Torrington');
$cities[] = 'Burlington';
$cities[] = 'New York City';
$cities->append('Warwick');
*/

The first thing you will realize is that the ArrayIterator has methods for some of the more common array functions such as asort, ksort, natsort, key, etc.. The second thing you will realize is that iterators do not have a 1:1 conversion in terms of array functions to iterator methods. So it may not be a suitable replacement for all instances where an array is currently used.

Chain Iterators Together

One great thing about iterators is the ability to chain them together. It opens up some pretty powerful capabilities such as filtering, merging items, and limiting output.  Some of it like limiting results, is exclusive to iterators, but other things like filtering can also be done using the array_ functions.

$cities = array('Torrington', 'Burlington', 'New York City', 'Warwick');
$cities = new ArrayIterator($cities);
$cities = new LimitIterator($cities, 0, 2);
foreach ($cities as $city) {
    echo $city;
}
// returns Torrington, Burlington

FilterIterator

FilterIterator is an abstract class that you can use to create custom filters to apply to the data in an iterator.

class CityFilter extends FilterIterator {
    protected $City;

    public function __construct(Iterator $iterator, $city) {
        parent::__construct($iterator);
        $this->City = $city;
    }

    public function accept() {
        return $this->current() == $this->City;
    }
}

$cities = array('Torrington', 'Burlington', 'New York City', 'Warwick');
$cities = new ArrayIterator($cities);
$cities = new CityFilter($cities, 'Burlington');

foreach ($cities as $city) {
    echo $city;
}
// returns Burlington

Merging Iterators

I mentioned earlier that you can merge iterators together. You do that with AppendIterator.

$cities1 = array('Torrington', 'Burlington');
$cities1 = new ArrayIterator($cities1);

$cities2 = array('New York City', 'Warwick');
$cities2 = new ArrayIterator($cities2);

$cities = new AppendIterator();
$cities->append($cities1);
$cities->append($cities2);

foreach ($cities as $city) {
    echo $city;
}
// returns Torrington, Burlington, New York City, Warwick

Converting an Iterator to an Array

So I showed you how to go from an array to an iterator. But what if you want to take your iterator and go back to an array if you need to use a function that is not available as a method? You have two options.

Option 1: ArrayIterator::getArrayCopy()

$cities = array('Torrington', 'Burlington', 'New York City', 'Warwick');
$cities = new ArrayIterator($cities);
$cities = $cities->getArrayCopy();

Option 2: iterator_to_array() function

$cities = array('Torrington', 'Burlington', 'New York City', 'Warwick');
$cities = new ArrayIterator($cities);
$cities = iterator_to_array($cities);

One difference between the two options is that ArrayIterator::getArrayCopy() will not execute chained iterators while iterator_to_array() will.

$cities = array('Torrington', 'Burlington', 'New York City', 'Warwick');
$cities = new ArrayIterator($cities);
$cities = new LimitIterator($cities, 0, 2);

$cities1 = iterator_to_array($cities);
print_r($cities1);
// returns Array ( [0] => Torrington [1] => Burlington )

$cities2 = $cities->getArrayCopy();
print_r($cities2);
// returns Array ( [0] => Torrington [1] => Burlington [2] => New York City [3] => Warwick )

 Closing

There are a lot more iterators I didn’t go over for you to explore, like the EmptyIterator in all its glory! I’m noticing one of the biggest advantages over array_ functions is the ease of reading the code. I can’t tell you how many times i’ve had to look at something that uses array_walk and array_map in the same variable and had to spend a few minutes working my way through what it does.

$something = array_walk(array_map(function(){
// some logic
}, $data) ,function() {
// some logic
});

An alternative to that might be to do:

$something = ArrayIterator($data);
$something = CallbackFilterIterator($something, function($current, $key, $iterator) {
// some logic
});

Working with files and SPL

The Standard PHP Library (SPL) has been around for a while now. Most of it was introduced in PHP 5.3 back in 2009,  but some of it has been around since PHP 5.0 released way back in 2004. The only experience I had with the SPL prior to writing this post was using a few of the objects based on recommendations from Stackoverflow answers. So I never really delved into the complete library to see what is has to offer. The biggest hurdle I found with learning more about the SPL is that the classes are not that well documented. The best documentation is from the comments in most cases!

I’m sure most of us have had to work with files in some manner. For me it usually takes the form of a logger or parser (CSV or XML). One of the things I have always thought would be helpful is an object to wrap the fopen, fwrite, fread, fclose functions so you didn’t need to maintain the file handle. Enter the SPL. It wraps a lot of the reading, writing, searching, and parsing into easy to use objects so that you don’t need to preserve that file handle in a variable.

SplFileInfo

SplFileInfo offers a way to get information about a file (as the name would suggest). Because it handles some of the more basic aspects of working with files, it won’t come as a shock that a lot of the other file related SPL objects extend SplFileInfo. Need to get the extension? Use SplFileInfo::getExtension(). How about the last modified time? SplFileInfo::getMTime(). Is the file a symbolic link to a different file? SplFileInfoisLink() to check and SplFileInfogetRealPath() to resolve to the actual location. The only thing it doesn’t do is handle reading and writing, but that is where SplFileInfo::openFile() comes into the picture. The openFile method will return an instance of SplFileObject that can be used to do reading/writing.

SplFileObject

SplFileObject inherits from SplFileInfo, so it has all of the same capabilities plus being able to read and write.

CSV Parsing

If you need to parse a CSV you can use the setFlags() method before parsing a file to set flags that will change how it handles parsing. SplFileObject::setFlags(SplFileObject::READ_CSV) sets the stage for parsing a csv file.  While on the topic of csv files… SplFileObject::setCsvControl() allows you to set the delimiter, enclosure, and escape character.

Reading Lines

To read the file line by line it’s as easy as using a loop.

foreach ($obj as $line) {
// SplFileObject::__toString() is set as an alias of SplFileObject::current(), which makes echo'ing the object return the current line.
echo $line;
}

DirectoryIterator

DirectoryIterator is used to list files and subdirectories. Like SplFileObject, it inherits from SplFileInfo. I’ll discuss iterators in another post, but for the sake of understanding the power of DirectoryIterator it is enough to know they are most often used in loops.

foreach (new DirectoryIterator('/path/to/dir') as $file) {
    echo $fileInfo->getFilename() . "\r\n";
}

This will yield something like:

.
..
filename.txt
directory
image.jpg

One useful method available in DirectoryIterator but not available in SplFileInfo is DirectoryIterator::isDot(). This returns true if the “file” is a “.” or “..” system file.

RecursiveDirectoryIterator

RecursiveDirectoryIterator is the recursive version of DirectoryIterator. It will go down into the subdirectories and list files and subdirectories and so on and so on.

$files = new RecursiveDirectoryIterator('/path/to/dir');
$files = new RecursiveIteratorIterator($files);
foreach ($files as $file) {
    echo $file . "\n\r";
}

This will yield something like:

.
..
filename.txt
directory
directory/file.txt
directory/subdirectory
directory/subdirectory/image.jpg
image.jpg

Closing

I only touched on some of the basic and useful parts of the SPL that I found to make working with files much easier and more logical. There is so much more for you and I to learn about them though.

Install Elasticsearch Mac OSX 10.9.1

I waded through a few sets of instructions that didn’t work for me, so I figured i’d post the ones that did work.

  1. Download and unzip/untar the Elasticsearch files http://www.elasticsearch.org/download
  2. Download the latest version of Java (woo!)
  3. Copy the directory to /usr/local: sudo mv elasticsearch /usr/local
  4. To start it just execute: /usr/local/elasticsearch/bin/elasticsearch -f
  5. To stop it, just hit command+c

Optional steps to allow you to start the service with just the ‘elasticsearch -f’ command:

  1. Open your .bash_profile file for your bash profile settings
  2. Add export ES_HOME=~/usr/local/elasticsearch/bin
  3. Add export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_67.jdk/Contents/Home
  4. Add export PATH=$ES_HOME/bin:$JAVA_HOME/bin:$PATH
  5. Run ‘source ~/.bash_profile’ from terminal

PHP pcntl_waitpid and pcntl_wait not working

I did a fair amount of work with using pcntl_fork recently to create child processes for an import script. The two main benefits of forking are that the processes are contained (if one fails, they all do not fail) and the speed increase. Instead of running an import of 100 posts sequentially, it would instead run 10 child processes importing 10 posts each.

Anyway, to get to the point. I was testing the importer from the browser and finding really inconsistent and perplexing things happening.

pcntl_wait and pcntl_waitpid not working

The code was setup to store an array of the pids used, and to not allow more than 10 child processes at a time. So it was supposed to hit pcntl_wait or pcntl_waitpid and clear out the pid that had finished.  The problem was that it never seemed to do that. I could see it was starting the processes and it was exiting them fine (ps -Hwfe from terminal verified there were no zombie processes).

My first thought after reading a bit about it was that perhaps the child process was finishing before the parent process had a chance to be informed the child process was starting. I tried adding sleep(10) into spots to try and delay the child process from finishing so quickly. Still didn’t work.

Solution

After realizing that pcntl_fork wasn’t supposed to be used from the browser when PHP was installed as an Apache module… I moved to using CLI to execute the script. I set it up so that I was using shell_exec() to execute the command after setting and uploading the import file. It worked great from CLI and I proceeded to bang my head into the wall a few times.

tldr; Use pcntl_fork, pcntl_wait, pcntl_waitpid in scripts run from CLI only… do not use them in the browser as that does not work as expected.

Using Illuminate without Laravel

If you have read past posts you know that I have decided to build a movie indexing app for personal use to make searching and finding movies I own easier. I decided to use just the Illuminate database component that Laravel uses to handle the CRUD.

First step, pull it in (I used composer).

{
    "name" : "name of app",
    "description" : "Desc of app",
    "license" : "MIT",
    "require" : {
        "illuminate/database": "*"
    }
}

Setting up the database connection is relatively easy if you use the docs on the Github and Laravel as a guide.

$config = array(
    'db' => array(
        'driver' => 'mysql',
        'host' => '127.0.0.1',
        'database' => 'imdb',
        'username' => 'username',
        'password' => 'password',
        'collation' => 'utf8_general_ci',
        'charset' => 'utf8',
        'prefix' => 'imdb_',
        'port' => ''
    )
);

$connFactory = new \Illuminate\Database\Connectors\ConnectionFactory(new \Illuminate\Container\Container());
$conn = $connFactory->make($config['db']);
$resolver = new \Illuminate\Database\ConnectionResolver();
$resolver->addConnection('default', $conn);
$resolver->setDefaultConnection('default');
\Illuminate\Database\Eloquent\Model::setConnectionResolver($resolver);

Then all you really need to do is create your class file:

<?php
namespace ProjectName\Models;

use Illuminate\Database\Eloquent\Model;

class Movie extends Model {
    protected $fillable = array('title', 'release_date');
    protected $guarded = array('id', 'updated_at', 'created_at');
}

You can easily add a record to the database by doing:

$movie = new Movie(array('title' => A movie title, 'release_date' => '2014-01-23'));
$movie->save();

Nice and quick to get rolling so you aren’t spending so much time on the repetitive task of CRUD for each model.

Parsing the IMDB movie list

I’ve been working on a personal project to help index the movies I own so it’s easier to search and pick a movie when we feel like watching one. The site will allow you to add movies by name, and it will auto populate the release date, actors, plot, and any other pertinent info.

What I found after some research is that there are not many options for searching the IMDB movie database, most certainly nothing official.  After a few “close but no cigar” moments with some APIs others had built I was determined to make use of what IMDB does provide.  IMDB provides text files with movie, actor, actress, etc.. info. You might be thinking, “Levi that is awesome, why didn’t you start with those?!”. Well if you take a peak at them you will quickly find out the trouble with working with them.

  1. The do not have IDs to tie things together, they rely on the titles. So that means you must rely on the movie title to match all of the bits of info together. That isn’t such a problem until you realize there is also a “movie AKA” file. So each movie can be known by more than one name.
  2. The data can be hard to figure out sometimes. For examples:
    • “2010-11 Regular Season” (2007) {Hurricanes vs. Red Wings: November 21, 2013 (#2013.17)} 2013 – what this means is that the title is from 2010-11, it was released in 2007, and it is from season 2013 episode 17. Confusing, right?
    • Inherit the Earth (????) 2004 – release date is normally in the parenthesis. In this example it is not in the parentheses but it is at the end of the line.

 

After a lot of work this is the regular expression I ended up using to match as many titles as I could. I’m sure i’ll still need to tweak it over time to get it 100% working.

/^([\s\S]*)\(([\d{4}]*|\?*)(?:\/)?([\w]*)?\)(\s*{([\w!\s:;\/\.\-\'"?`_&@$%^*<>~+=\|\,\(\)]*)(\s*\(#([\d]*)\.([\d]*)\))?})?\s*([\d{4}]*)?(?:-)?([\d{4}]*)?/iu

It’ll probably be easiest to understand this using an example movie title.

“Blaulicht” (1959) {Das Gitter (#4.1)}1962

([\s\S]*) – this will match any whitespace and non-whitespace character
\( – this will stop that first matching when it runs into an opening parenthesis, normally signifying the start of a release date.
([\d{4}]*|\?*) – this matches the year
(?:\/)? – sometimes the year was entered as (2004/I) or (2005/IV).  This allows an optional forward slash after the year.
([\w]*)? – Along with the previous item,  this accommodates finding characters after the year. Still not sure exactly what that means though.
\) – this just signified the end of the release date.
(\s*{([\w!\s:;\/\.\-\'”?`_&@$%^*<>~+=\|\,\(\)]*) – this looks way more complex than it is (it goes along with the next bit, so look at those as a whole as opposed to separate chunks). It looks for a space after the year and then it matches the content inside the curly brace until it hits a pound sign signaling an episode.
(\s*\(#([\d]*)\.([\d]*)\))?})? – This matches the episode name, season, and episode number if it runs into #.
\s* – just allows for as many spaces as needed between this and anything after.
([\d{4}]*)? – these last 3 go hand in hand. They will match an optional year range. Some of the shows have year ranges for how long it has been running.
(?:-)?
([\d{4}]*)?

 

The last important bit is the modifiers on the preg_match. “i” ensures a case insensitive match and “u” makes it do a utf-8 search. Otherwise it fails on some foreign characters not in ISO-8859-1.

json_encode setting a value to null?

I was recently working on generating a JSON object using data from a database and ran into the problem that my “description” field was showing as null after I ran it through json_encode(). I checked the array prior to running it into json_encode() and it showed the description like it should. After pondering, testing, and mentally throwing my computer out the window… I wondered if the problems might be that there was a character json_encode() couldn’t handle. I didn’t spot anything when I was looking at the value in the database, so I turned to Google. I read some posts suggesting that it may be caused because of an encoding issues (the text was supposed to be encoded as UTF-8, but had been inserted with some invalid characters). The docs for json_encode say that the string being encoded must be UTF-8 to work.

If you need to clean a string before doing json_encode, this will ensure only valid UTF-8 characters are used:

iconv('UTF-8', 'UTF-8//IGNORE', $string);