Archive for Optimization

Saving Bytes: Efficient Data Storage (MySQL) - Part 1

Hey there! Welcome to Making the Web - my personal blog about website development. Feel free to subscribe to my RSS feed to keep up with the latest. Alternatively, subscribe by email. Hope you enjoy this article!

When storing data in databases, it is important to ensure the data is stored using a minimal amount of space, while retaining its value and processing efficiency.

There are many common practices amongst MySQL developers who write applications which store data in a human-interpretable way; they often don't think about the computer's interpretation of the data, and often fail to realise that it can be more efficient and space-saving to store data in a certain form.

It is especially important to consider data storage techniques in high-demand applications, and where speed and storage-efficiency are key values. Web services require these key values: it is important that they can cope with extra demand, and can satisfy their user's desire for immediacy.

MySQL DataTypes
(Screenshot from phpMyAdmin)

In the first part of Saving Bytes: Efficient Data Storage, we will look at the storage of strings.

__('Read the rest of this entry »')

Comments (3)

Even More Tips for faster PHP scripts

Following on from "Tips for faster PHP Scripts" and "More Tips for Faster PHP Scripts", I bring you Even More Tips for faster PHP Scripts.

Because I've already talked about the main, general, optimization tips, these ones will tend to be more specific and may not apply to everyone - nonetheless, it's worth a read in case you ever do come across these situations.

Here are tips 11-15:

11. is_numeric() vs. ctype_digit()
To see whether is_numeric() or ctype_digit() is the fastest method, I called each function 10,000,000 times. Here are the resuts:

is_numeric(): 9.943268 seconds
ctype_digit(): 11.801991 seconds
is_numeric() is 15.75% faster than ctype_digit()

It's worth using is_numeric() over ctype_digit() where it's appropriate to do so. Although a difference of 0.0000001858723 may seem small, it does all add up.

__('Read the rest of this entry »')

Comments (24)

More Tips for faster PHP scripts

This post is part 2 of a 3 part series. For the other parts, visit these posts:

Tips for faster PHP scripts
Even More Tips for faster PHP scripts

Here's the list:

6. Pre-increment is faster than post-increment
Try to pre-increment, rather than post-increment, where possible. It is faster because post-increment creates a temporary variable while in the process of incrementing. So, this:

++$var;

…is faster than this:

$var++;

This rule also applies to decrementation as well. To test this assertion, I created two for loops. The first for loop used the post-increment option, while the second for loop used pre-increment instead. The total number of iterations was 10,000,000. Here are the results:

Post-increment: 2.148 seconds
Pre-increment: 1.692 seconds
Time saved: 0.456 seconds; 21.23%

7. Regular Expressions for input validation?
It is always a good idea to try and avoid regular expressions, where possible and practical. There are functions in PHP which will do exactly what some regular expressions do, but faster. Take this example:

if(ereg('[0123456789]', $number)) {
// Is integer
}else{
// Is not integer
}

It is much faster to do this instead:

if(ctype_digit($number)) {
// Is integer
}else{
// Is not integer
}

To test this, I used ereg('[0123456789]', $number) 1,000,000 times, followed by using ctype_digit($number) 1,000,000 times. Here are the results:

Regular Expressions: 2.401 seconds
ctype_digit: 0.985 seconds
Time saved: 1.416 seconds; 58.98%

8. split() or explode()
The split() function supports regular expressions, while explode() does not. It is often faster to use explode() when you do not need to use regular expressions.I done yet another test. I used split() to split a string without regular expression requirements, and then used explode() to split the same string. I repeated this 1,000,000 times. My results are:

split(): 5.453 seconds
explode(): 3.556 seconds
Time saved: 1.897 seconds; 34.79%

9. Use time() rather than date('U')
When you want to get the current Unix timestamp, it is faster to use time() rather than date('U'). To test this, I used the time() function 100,000 times, followed by date('U') 100,000 times. My results are as follows:

date('U'): 19.162 seconds
time(): 0.057 seconds
Time saved: 19.105 seconds; 99.7%

10. Fastest type of loop
In PHP, there are a number of loops available for you to use. There are while loops, do-while loops, and for loops. To see which one of these were fastest, I used each of them to perform 100,000,000 iterations. These are the loops which I used:

while(++$a<100000000){}

for(;++$a<100000000;){}

do{}while(++$a<100000000)

Here are my results:

while(++$a<100000000){}: 15.519 seconds
for(;++$a<100000000;){}: 17.577 seconds
do{}while(++$a<100000000): 13.744 seconds

As you can see, my results show that a do-while loop is 21.81% faster, compared to a for loop.

Comments (31)

Choosing Optimal MySQL Data Types

It is too often that developers use the wrong data types in their databases. As queries to databases take a major amount of time in any application, it is essential that the database and its tables are as optimised as possible to ensure the best efficiency, especially in high-demand applications.

In MySQL, the most simple optimisation is to select the correct, most appropriate, and therefore most efficient data types. But many MySQL developers, and other database developers, sometimes fail to do this; non-MySQL developers - such as PHP programmers who use MySQL as just a necessity for their database-driven website - often fail miserably.

To help you understand the data types available, and hopefully use them correctly, I have written about some of the most common - the ones which you probably use every day:

SIGNED or UNSIGNED?
For people who don't know what signed or unsigned is, I'll give a brief explanation:

With signed integers, one bit is reserved so the numbers can be made negative. With unsigned, a bit is not reserved and numbers can only be positive. If you had a SIGNED INT, the value can be between -2,147,483,648 and 2,147,483,647. But, with an UNSIGNED INT, the value can range from 0 to 4,294,967,295.

If you will never use a negative number in a field, then you should use UNSIGNED. If you have an auto-incrementing primary key, then you'd want to use UNSIGNED because it allows for more rows.

Too big?
You should think carefully before creating fields. If you have a 'users' table on a small website, with UNSIGNED INT as the primary key, then ask yourself, "Will I ever get 4,294,967,295 users?" The most likely answer to this question is no. You've got to think about what the highest value is ever likely to be. Then, select the right numeric data type which can cover that amount. I've made a little numeric data type table with the minimum and maximum values:

Type Minimum Maximum
SIGNED TINYINT -128 127
UNSIGNED TINYINT 0 255
SIGNED SMALLINT -32,768 32,767
UNSIGNED SMALLINT 0 65,535
SIGNED MEDIUMINT -8,388,608 8,388,607
UNSIGNED MEDIUMINT 0 16,777,215
SIGNED INT -2,147,483,648 2,147,483,647
UNSIGNED INT 0 4,294,967,295
SIGNED BIGINT -9,223,372,036,854,775,808 9,223,372,036,854,775,807
UNSIGNED BIGINT 0 18,446,744,073,709,551,615

What's the difference between CHAR and VARCHAR?
The difference between CHAR and VARCHAR is that CHAR is fixed length, while VARCHAR is variable length. This means that CHAR is always the same size and takes up the same amount of bytes, while VARCHAR varies. VARCHAR also adds an extra byte. When should you use CHAR?

To store MD5 passwords, which are always 32 characters long (when stored in hexadecimal form; it may be better to store them as 16 bytes in a BINARY field?).
To store same-length barcodes.

When should you use VARCHAR?

To store email addresses, which vary in size
To store peoples names, which vary in size

You'd usually want to use VARCHAR when storing most user input, because their input can vary in size.

Store as numbers
When possible, you should store data in numbers, because it can be searched and ordered more quickly. For example, IP address can be stored like this:

123.123.123.123

using VARCHAR(15). If you first translated it to an IPv4 Internet network address (in PHP, you can use the ip2long() function), you could store it as:

2071690107

using a SIGNED INT.

Choosing the right date/time type
In MySQL, there are 5 different date and time types:

DATE
TIME
DATETIME
TIMESTAMP
YEAR

DATE gives the date to the nearest day, TIME gives the time (but not the date) to the nearest second, DATETIME gives the date and time to the nearest second, TIMESTAMP gives the number of seconds since epoch, and YEAR gives the date to the nearest year.

YEAR only requires 1 byte of storage, so is the best option if you only need to know the year.
DATE and TIME are the next smallest type of date/time storage, requiring just 3 bytes.
TIMESTAMP requires 4 bytes.
DATETIME requires 8 bytes.

In almost any case, it is probably best to use TIMESTAMP, which gives the date and time to the nearest second.

__('Read the rest of this entry »')

Comments (31)

Tips for faster PHP scripts

This post is part 1 of a 3 part series. For the other parts, visit these posts:

More Tips for faster PHP scripts
Even More Tips for faster PHP scripts

I have listed my top tips for writing faster (optimized) PHP code:

1. Multiple arguments with echo
When you use echo, you probably use something like this:

echo $variable1 . 'string1' . $variable2 . $variable3;

But, you may be forgetting that echo can take multiple arguments. So you can write it like this:

echo $variable1 , 'string1' , $variable2 , $variable3;

Passing multiple arguments to echo is faster than joining the strings first, and then passing them to echo. I done a little test in PHP. I echoed 10 different strings 500 thousand times, firstly with the concatenation (joining) method, and then with the multiple arguments method. Here are my results:

Time for concatenation method: 37.83755 seconds
Time for multiple arguments method: 37.68789 seconds
Time saved: 0.15966 seconds; 0.396%

As you can see, the difference is very small. It is not worth going over all your old PHP scripts and changing the dots to commas, unless you are extremely desperate for speed. It is more a case of preference, than there being a best way.

Oh, and another small tip (without it's own number): use echo instead of print - it's faster!

__('Read the rest of this entry »')

Comments (45)

Making the Web | Chalvedon School and Sixth Form College | Messenger History | GCSE(WIKI): Simple bitesize revision for secondary school, KS4 students. | Encrypt files for free - high-grade encryption
Who's Populating The Web?