Best optimization questions in November 2010

Using Assembly Language in C/C++

27 votes

I remember reading somewhere that to really optimize & speed up certain section of the code, programmers write that section in Assembly language. My questions are -

  1. Is this practice still done? and How does one do this?
  2. Isn't writing in Assembly Language a bit too cumbersome & archaic?
  3. When we compile C code (with or without -O3 flag), the compiler does some code optimization & links all libraries & converts the code to binary object file. So when we run the program it is already in its most basic form i.e. binary. So how does inducing 'Assembly Language' help?

I am trying to understand this concept & any help or links is much appreciated.

UPDATE: Rephrasing point 3 as requested by dbemerlin- Because you might be able to write more effective assembly code than the compiler generates but unless you are an assembler expert your code will propably run slower because often the compiler optimizes the code better than most humans can.

The only time it's useful to revert to assembly language is when

  • the CPU instructions don't have functional equivalents in C++ (e.g. atomic operations, single-instruction-multiple-data instructions, explicit memory cache synchronisation, single instructions that give div and mod results)

    OR

  • for some inexplicable reason - the optimiser is failing to use the best CPU instructions

...AND...

  • the use of those CPU instructions would give some significant and useful performance boost to bottleneck code.

Simply using inline assembly to do an operation that can easily be expressed in C++ - like adding two values or searching in a string - is actively counterproductive, because:

  • the compiler knows how to do this equally well
    • to verify this, look at its assembly output (e.g. gcc -S) or disassemble the machine code
  • you're artificially restricting its choices regarding register allocation, CPU instructions etc., so it may take longer to prepare the CPU registers with the values needed to execute your hardcoded instruction, then longer to get back to an optimal allocation for future instructions
    • compiler optimisers can choose between equivalent-performance instructions specifying different registers to minimise copying between them, and may choose registers in such a way that a single core can process multiple instructions during one cycle, whereas forcing everythingt through specific registers would serialise it
      • in fairness, GCC has ways to express needs for specific types of registers without constraining the CPU to an exact register, still allowing such optimisations, but it's the only inline assembly I've ever seen that address this
  • if a new CPU model comes out next year with another instruction that's 1000% faster for that same logical operation, then the compiler vendor is more likely to update their compiler to use that instruction, and hence your program to benefit once recompiled, than you are
  • the compiler will select an optimal approach for the target architecture its told about: if you hardcode one solution then it will need to be a lowest-common-denominator or #ifdef-ed for your platforms
  • assembly language isn't as portable as C++, both across CPUs and across compilers, and even if you seemingly port an instruction, it's possible to make a mistake re registers that are safe to clobber, argument passing conventions etc.
  • other programmers may not know or be comfortable with assembly

Optimizing javascript and css requests

8 votes

I need to optimize the loading speed of several existing websites. One of the issues that I have is the amount of requests per page. The websites have 7 or more different types of pages which should load different set of css and javascripts because they contain different widgets or functionality. Currently each widget or functionality has its own javascript file. I am planning to combine and minify the files to have fewer requests.

  1. Would it be a good practice to combine and minify all javascripts necessary on each type of page into one file (and to do the same for css)? e.g.
    • home page has just one homepage.js,
    • list pages have just listing.js,
    • detail pages have just detail.js,
    • etc.
  2. Is it better to combine only those files which are always used together? e.g.
    • jquery.js + jquery.cookie.js + common.js,
    • list.js + paging.js + favorite.js,
    • detail.js + favorite.js,
    • etc.
  3. What about having one file for all javascripts that should load in the head and one file for all javascripts that should load at the end of body, e.g.
    • init.js goes to <head> and do.js goes to <body>.
  4. What about having one file for common functions and one for administrative functions which is loaded if the user has specific permissions?
  5. Are there any strategies how to balance between 1., 2., 3., and 4.?
  6. What is a recommended amount of javascript and css requests for a page?

I am considering large-scale websites a.k.a. portals or social networks.

(BTW, there are some libraries which requests I can't control, e.g. TinyMCE or google maps).

Usually you can use the following pattern:

  1. main.js - bundle all scripts here that are used by several pages on the website.
  2. page.js - all js specific to the page. This would mean bundling together js of all widgets on a page.

With this practice, you just have 2 requests for your JS on each page and you get a clear separation/structure in your JS. For all pages except the first one, it will be just one request as main.js would be cached.

You can use the same principle for the CSS as well. This is effective but as mentioned by another answer, you can actually take this further and bundle everything in 1 js only. Its a preference or style. I like to break it into 2 as it keeps things logical for me.

Ensure the following though:

  1. Namespace your JS else you might end up with errors when you bundle them together.
  2. Do your pages a favor and push them at the bottom of the page.

EDIT: I thought I would update the answer to answer some of your points.

Point 2: Is it better to combine only those files which are always used together?

Ans: Personally, I don't think so. If you are serving all files which are being used together, it doesn't matter which group they belong to or how they land up on the page.This is because we combine JS files to reduce the amount of HTTP Requests.

Once your JS is combined & minified & in PROD, you are not expect to debug or make sense out of it. So to bind together logically related JS files is a moot point. Its in your DEV environment where you would like to have all these logically related code files together.

Point 3: What about having one file for all javascripts that should load in the head and one file for all javascripts that should load at the end of body?

Ans: There are certain cases where you are somehow forced to inject JS in the HEAD. Ideally, you shouldn't do it as SCRIPT tags are blocking in nature. So unless you really need to, place all your JS ( 1 or multiple files ) at the end of the BODY tag.

Point 4: What about having one file for common functions and one for administrative functions which is loaded if the user has specific permissions?

Ans: This seems like a reasonable approach to split your JS code. Depending upon the user privileges, you can fork your JS code.

Point 6: What is a recommended amount of javascript and css requests for a page?

Ans: This is a very subjective question. It depends on what you are building. If you are worried about too much JS being loaded on page load, you can always split it and use on-demand SCRIPT injection methods to split the load.

How do I optimize my stylesheet by removing unmatched and/or unnecessary CSS selectors

7 votes

I have inherited a massive stylesheet with many thousand selectors and I'm certain that a good number of them are unnecessary and never actually match elements on the site. In the interests of optimizing, I'd like to remove those orphaned selectors/rules.

Are there any tools that would allow me to compare the CSS against the entirety of the site to identify which selectors are required and which are not?

The site has AJAX components, so writing a curl/wget script to traverse the site and then loop through each selector and grep for a match isn't particularly feasible either (even though that would be kinda fun...)

All suggestions welcomed.

Thanks, JD

There is a Firefox plugin called "Dust-Me Selectors".

https://addons.mozilla.org/en-US/firefox/addon/5392/

"It extracts all the selectors from all the stylesheets on the page you're viewing, then analyzes that page to see which of those selectors are not used. The data is then stored so that when testing subsequent pages, selectors can be crossed off the list as they're encountered."

It's a fairly manual process but could be what you're looking for.

Why is this range based query so much quicker.

7 votes

At work we had a query on a table that had the following structure:

ip_from(number), ip_to(number), country, city, state, isp, latitude, longitude.

This table had approx 6.1 million rows.

To find out the details for a given IP address we used a query like the following:

SELECT * 
  FROM Ip2location
WHERE
  :ip_num BETWEEN ip_from AND ip_to;

On Oracle 10 in our dev database this took approximately 17 seconds to return a row, depending on the ip_num passed in. On our beefier live system it took maybe 5-6 seconds, which was still too slow to do in real time and we needed to select this via a background job.

Not ideal, especially as our real time systems really needed the ip details.

The type of index used was a standard BTREE index spanning both ip_from and ip_to. We looked into a lot of things to try to speed this up such as range partitioning. We didn't apply that in the end as it requires Oracle Enterprise. We also looked at increasing the concurrency of the table but that had no noticeable effect.

Anyway when having my morning coffee I realised that I thought there could be a performance enhancement by running the following query: (This is from memory, there may be a couple of mistakes. Also we selected individual fields not everything)

SELECT * 
  FROM ip2location
WHERE 
  ip_from = (
    SELECT max(ip_from)
      FROM ip2location
      WHERE ip_from <= :ip_num
  )
AND
  ip_to >= ip_num;

This works for our data set because there's no overlapping ranges between ip_from and ip_to.

However what I wasn't prepared for is how much faster the second query is. The time on our dev database was reduced from 17 seconds to 0.007 seconds.

This makes little sense to me. I'd expect some performance increase, but not that much. Shouldn't the database statistics have figured out there is no overlap and optimised accordingly? Also there has to be a recognised quicker way to select using ranges?

My question is: why is the second query so much faster even using a sub-select?

the performance increase is obvious. Its because there is an index on ip_from , so max(ip_from) can be obtained in constant time because as you know indexing sorts out the values. the range is also easily computed because of binary search over the btree.

while in the previous query has to do a table scan all over the data to compute the range bounds

mySQL: is it possible to make this query any faster?

7 votes

I have a table "test" containing millions of entries. Each row contains a floating point "feature" and a "count" how often this feature is present in item "id". The primary key for this table is the combination of "id" and "feature", i.e. every item may have multiple features. There are usually a couple of hundred to a couple of thousand feature entries per item id.

create table test 
(
    id      int not null,
    feature double not null,
    count   int not null
);

The task is to find the 500 most similar items to a given reference item. Similarity is measured in number of identical feature values in both items. The query I have come up with is quoted below, but despite properly using indices its execution plan still contains "using temporary" and "using filesort", giving unacceptable performance for my use case.

select 
    t1.id,
    t2.id,
    sum( least( t1.count, t2.count )) as priority 
from test as t1
inner join test as t2 
     on t2.feature = t1.feature
where t1.id = {some user supplied id value} 
group by t1.id, t2.id 
order by priority desc
limit 500;

Any ideas on how to improve on this? The schema can be modified and indices added as needed.

With the current schema, this query hardly can be improved.

You already have an index on feature and this is the best you can do with the current schema design.

The problem is more similar than is not a relationship of order. If a is more similar to b than it is to c, it does not imply that c is less similar to a than it is to b. Hence, you cannot build a single index describing this relationship, and need to do it for each item separately, which would make your index N^2 entries long, where N is the number of items.

If you always need only top 500 items, you can limit your index to that figure (in which case it will hold 500 * N entries).

MySQL does not support indexed or materialized views, so you will have to do it yourself:

  1. Create a table like this:

    CREATE TABLE similarity
            (
            id1 INT NOT NULL,
            id2 INT NOT NULL,
            similarity DOUBLE NOT NULL,
            PRIMARY KEY (id1, id2),
            KEY (id1, similarity)
            )
    
  2. Whenever you insert a new feature into the table, reflect the changes in the similarity:

    INSERT
    INTO    similarity
    SELECT  @newid, id,
            LEAST(@newcount, count) AS ns
    FROM    test
    WHERE   feature = @newfeature
            AND id <> @newid
    ON DUPLICATE KEY UPDATE
    SET     similarity = similarity + ns;
    
    
    INSERT
    INTO    similarity
    SELECT  @newid, id,
            LEAST(@newcount, count) AS ns
    FROM    test
    WHERE   feature = @newfeature
            AND id <> @newid
    ON DUPLICATE KEY UPDATE
    SET     similarity = similarity + ns;
    
  3. On a timely basis, remove the excess similarities:

    DELETE  s
    FROM    (
            SELECT  id1,
                    (
                    SELECT  similarity
                    FROM    similarity si
                    WHERE   si.id1 = s.id1
                    ORDER BY
                            si.id1 DESC, si.similarity DESC
                    LIMIT 499, 1
                    ) AS cs
            FROM    (
                    SELECT  DISTINCT id1
                    FROM    similarity
                    ) s
            ) q
    JOIN    similarity s
    ON      s.id1 = q.id1
            AND s.similarity < q.cs
    
  4. Query your data:

    SELECT  id2
    FROM    similarity
    WHERE   id1 = @myid
    ORDER BY
            similarity DESC
    LIMIT 500
    

Optimise a function in Javascript

5 votes

I'm quite new to javascript, but have managed to write a working xml function :)

I was hoping someone could please give me a rundown of how to optimise the function. At present there's a different function for each state's weather, but I was hoping I could simplify this somehow.

Code is pasted here: http://pastie.org/private/ffuvwgbeenhyo07vqkkcsw

Any help is greatly appreciated. Thank you!

EDIT: ADDING CODE EXAMPLES OF BOTH XML FEEDS:

Function 1 (UV): http://pastie.org/private/jc9oxkexypn0cw5yaskiq

Function 2 (weather): http://pastie.org/private/pnckz4k4yabgvtdbsjvvrq

I would suggest you create an array that holds the UV element IDs, suffixes and weather station ids (stationid)

var areas = [
     {id:'sydney',suffix:'syd',stationid:'YSSY'},
     {id:'melbourne',suffix:'mel',stationid:'YMML'},
     {id:'brisbane',suffix:'bri',stationid:'YBBN'},
     {id:'perth',suffix:'per',stationid:'YPPH'},
     ...
 ]

and then for the UV

// FUNCTION 1 - UV
function getUV() {

    // BEGIN AUSTRALIAN UV FUNCTION

    $.get('http://www.arpansa.gov.au/uvindex/realtime/xml/uvvalues.xml', function(d) {

        //SYDNEY UV
             $(areas).each(function(){
            var area = this;
        $(d).find('location#'+area.name).each(function(){

            var $location = $(this); 
            var uvindex = $location.find('index').text();

            var areauv = areauv += '<span>' + uvindex + '</span>' ;
            $('#uv-'+area.suffix).empty().append($(areauv)); // empty div first

            if (uvindex <= 2.0) {
                $('#risk-'+area.suffix).empty().append('Low');
                $('#curcon-'+area.suffix).empty().append('You can safely stay outdoors and use an SPF 15 moisturiser.');
            } else if (uvindex <= 5.0) {
                $('#risk-'+area.suffix).empty().append('Moderate');
                $('#curcon-'+area.suffix).empty().append('Wear protective clothing outdoors and use an SPF 15 or SPF 30 moisturiser.');
            } else if (uvindex <= 7.0) {
                $('#risk-'+area.suffix).empty().append('High');
                $('#curcon-'+area.suffix).empty().append('Wear protective clothing, limit your time outdoors and use an SPF 30 moisturiser.');
            } else if (uvindex <= 10.0) {
                $('#risk-'+area.suffix).empty().append('Very High');
                $('#curcon-'+area.suffix).empty().append('Use caution, limit exposure to the sun and use an SPF 30 moisturiser.');
            } else if (uvindex <= 20.0) {
                $('#risk-'+area.suffix).empty().append('Extreme');
                $('#curcon-'+area.suffix).empty().append('Use extreme caution, avoid exposure to the sun and use an SPF 30 moisturiser.');
            } else {
                $('#risk-'+area.suffix).empty().append('Unavailable');
                $('#curcon-'+area.suffix).empty().append('Information is currently unavailable.');
            }

        });
            });

        // END OF AUSTRALIAN UV FUNCTION

    });
}

and for the weather

function getWeather() {

        // BEGIN AUSTRALIA and NEW ZEALAND WEATHER FUNCTION
            $(areas).each(function(){
                var area = this;
        $.get('http://api.wxbug.net/getLiveCompactWeatherRSS.aspx?ACode=XXXPRIVATEXXX&stationid='+area.stationid+'&unittype=1&outputtype=1', function(d){
        $(d).find('weather').each(function(){

            var $weatherinfo = $(this); 
            var degrees = $weatherinfo.find('temp').text().replace(/\.0$/i, "");
            var conditions = $weatherinfo.find('current-condition').text();
            var icon = $weatherinfo.find('current-condition').attr('icon').replace(/\.gif$/i, ".png").split('http://deskwx.weatherbug.com/')[1];

            var temperature = temperature += '<span>' + degrees + '</span>' ;   
            $('#temp-'+area.suffix).empty().append($(temperature));

            var winformation = winformation += '<span>' + conditions + '</span>' ;
            $('#info-'+area.suffix).empty().append($(winformation));

            var wicon = wicon += '<img src="' + icon + '" alt="Weather Unavailable" />' ;
            $('#icon-'+area.suffix).empty().append($(wicon));

        });
        });
            });
}