Introduction
If your routine work involves working with SQL queries, as is the case for many,
then there is no doubt that you come face to face with both simple and complex
queries.
Query performance becomes a real issue in production, because it directly affects
delivery and user-experience. There are some really low-hanging fruits when it comes to
query tuning or optimization, yet they are often overlooked.
This post will show you some of those low-hanging
fruits with respect to MySQL, but the tips also generally apply to other Relational Database Management Systems (RDBMS) such as Oracle, MS SQL Server, PostgreSQL, etc.
The Low-Hanging Fruits
Inefficient or slow queries may take a toll on system resources, i.e. memory, processor,
disk and network. Such queries need to be considered for optimization. You can
try to profile slow queries by adding the EXPLAIN keyword in front of a SELECT
statement. It will show you some details, including the execution path, and whether the query is using an index or the whole table scan. Generate query plan as follows:
EXPLAIN QUERY PLAN SELECT ...
When writing SQL queries in MySQL, the following basic tips will help to optimize
your queries. These basic tips are generally extensible to any other RDBMS, not just MySQL.
The following list of top 10 "AVOID" tips assumes that the reader is conversant with SQL, and
has written SQL queries before. As an individual, you will often have to think outside the box to find suitable alternatives as you try to avoid some of these pitfalls.
-
SELECT *: Avoid extravagance. Select only the required fields rather than SELECT *.
-
DISTINCT: Avoid using SELECT DISTINCT if possible. You rather creatively look out for other alternatives.
-
UNION: Avoid using UNION, instead use UNION ALL.
-
WILDCARDS: Avoid using wildcards at the beginning of a predicate. A predicate is a
condition or expression that evaluates to a Boolean i.e. True or False, used
for filtering in WHERE clause. A wildcard at the beginning of a search does not
apply indexing, but runs a full table scan – which is costly.
-
FUNCTIONS: Avoid using functions in predicates e.g. WHERE len(surname) > 10. Here,
len(surname) will not use index, hence slow. The solution is to ensure that you also index on the function. For example, CREATE INDEX len_idx ON table_name(LEN(suname)). This is called function-index or expression-index.
- OR: Avoid using too many instances of “OR” in predicates.
- WHERE: Avoid using WHERE for joining tables, Use JOIN rather than WHERE for joining tables. You need to understand INNER, LEFT, RIGHT, and CROSS JOINS very well.
- JOINS: Avoid using too many JOINS in a query. The same also applies to too many subqueries.
- ORDER BY: Avoid composite ORDER BY with different orders i.e. a mix of ASC and DESC. You may also combine this with INDEX to speed up
the sorting.
- INDEX: Avoid using non-indexed fields in predicates. Create INDEX for fields that you use for searching or filtering (i.e. where
you use WHERE), JOINs and ORDER BY. In the same vein, avoid keeping unused INDEXES since they will unnecessarily slow down saves in a transactional system (OLTP).
Note that indexes are updated every time a record is inserted, deleted or
updated.
Conclusion
While the above tips may look trivial to some – and often taken for granted – they are indeed powerful tools in the day-to-day SQL assignments. They are pretty easy to implement as long as you can make them part of your practice.
If properly implemented, you may gradually notice that your users will complain less about poor system performance.
Important hint about Index: Using index speeds up record retrieval ... it allows us to locate the exact data without scanning the whole table, just like in a text book.. Clustered index creates physical ordering of rows. Every primary key is automatically a clustered index, and there can be only one clustered index on a table at any one time. It is like a Dictionary, without a separate index page. To create another clustered index, first remove primary key and previous index. Non-Clustered index is an index structure separate from the data stored
in a table that reorders one or more selected columns. The non-clustered
index is created to improve the performance of frequently used queries
not covered by a clustered index. It’s like a textbook, where the index page
is created separately at the beginning of that book.