A database can be a virtual treasure trove of information, but like the treasure of the Caribbean pirates of long ago, the stuff that you really want is probably buried and hidden from view. The SQL SELECT statement is your tool for digging up this hidden information.
Even if you have a clear idea of what you want to retrieve, translating that idea into SQL can be a challenge. If your formulation is just a little off, you may end up with the wrong results — but results that are so close to what you expected that they mislead you. To reduce your chances of being misled, use the following tips.
If you retrieve data from a database and your results don't seem reasonable, check the database design.
Why this matters:
What to check:
Create a test database that has the same structure as your production database, but with only a few representative rows in the tables. Choose the data so that you know in advance what the results of your queries should be.
Best practices:
Run each test query on the test data and see whether the results match your expectations. If they don't, you may need to reformulate your queries.
Benefits:
Joins are notoriously counterintuitive. If your query contains one, make sure that it's doing what you expect before you add WHERE clauses or other complicating factors.
Common join mistakes:
INNER JOIN when you need LEFT JOINHow to verify joins:
SELECT * FROM each table separatelyEXPLAIN or query plans to understand executionExample of verifying a join:
-- First, check the individual tables
SELECT COUNT(*) FROM customers; -- Returns 1000
SELECT COUNT(*) FROM orders; -- Returns 5000
-- Then check the join result
SELECT COUNT(*)
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id; -- Should return ~5000, not 1000
-- If you get unexpected counts, investigate the join conditionQueries with subselects (subqueries) take data from one table and, based on what is retrieved, take some data from another table. Therefore, by definition, such queries can really be hard to get right.
Why subqueries are tricky:
Best practices for subqueries:
SELECT firstJOINs or CTEs are clearer and more efficientExample of testing a subquery:
-- Step 1: Test the inner query first
SELECT customer_id
FROM orders
WHERE order_date > '2024-01-01';
-- Step 2: Verify it returns what you expect
-- Step 3: Then use it in the outer query
SELECT *
FROM customers
WHERE id IN (
SELECT customer_id
FROM orders
WHERE order_date > '2024-01-01'
);If you have two or more levels of subselects, you need to be even more careful. Consider using Common Table Expressions (CTEs) for better readability and debugging.
Sometimes when you mix AND and OR, SQL doesn't process the expression in the order that you expect. Use parentheses in complex expressions to make sure that you get the desired results.
SQL operator precedence:
Without parentheses, SQL evaluates operators in this order:
NOTANDORThis means that condition1 AND condition2 OR condition3 is evaluated as (condition1 AND condition2) OR condition3, which may not be what you intended.
Example of potential confusion:
-- Without parentheses (may be confusing)
SELECT *
FROM orders
WHERE customer_id = 1
OR customer_id = 2
AND total > 100;
-- This is evaluated as:
-- WHERE customer_id = 1 OR (customer_id = 2 AND total > 100)
-- With parentheses (clear intent)
SELECT *
FROM orders
WHERE (customer_id = 1 OR customer_id = 2)
AND total > 100;
-- This clearly means: orders from customer 1 or 2, AND total > 100Best practices:
AND and ORParentheses also help to ensure that the NOT keyword is applied to the term or expression that you want it to apply to.
The only thing you can be absolutely sure of in this world is change. Count on it. Six months from now, you won't remember why you wrote things the way you did, unless you carefully document what you did and why you did it that way.
What to document:
Documentation methods:
-- Good: Documented query
/*
* Purpose: Find top 10 customers by revenue in 2024
* Business Rule: Include only completed orders
* Date: 2024-12-15
* Author: Data Team
*/
SELECT
c.name,
SUM(o.total) AS total_revenue
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.status = 'completed'
AND o.order_date >= '2024-01-01'
GROUP BY c.id, c.name
ORDER BY total_revenue DESC
LIMIT 10;Benefits of documentation:
SQL queries are often used to extract data for business analysis or reporting. Beginners may not fully understand the business requirements or the data model, which can lead to queries that produce incorrect or incomplete results.
Common issues:
How to avoid these issues:
Example of clarifying requirements:
SQL is a programming language, and like any programming language, it has its own syntax rules. One of the most common mistakes is syntax errors, such as using the wrong keyword, omitting a comma, or using the wrong syntax for a function.
Why documentation matters:
SQL provides a variety of built-in functions for data manipulation, aggregation, and analysis. However, it is important to use these functions correctly and to understand their parameters and return values.
What to read:
Where to find documentation:
Example of checking function documentation:
Before using a function like DATE_TRUNC() in PostgreSQL, check:
Mastering SQL requires more than just knowing the syntax. These tips will help you:
Remember: SQL is a tool for solving business problems. The best SQL query is one that correctly answers the business question, is maintainable, and can be understood by others.
Keep practicing, keep learning, and don't hesitate to refer back to documentation when needed. The more you work with SQL, the more intuitive it will become.