My SQL Habits

July 15, 2024

my-sql-habits

SQL has been my professional bread and butter for years. Over time, every analyst develops their own style, and I'm no exception. Every style works as long as it fits your analytical culture.

These aren't things they teach you in school, but they're invaluable in real-world scenarios. As someone who frequently interviews candidates, I can tell you that these practices clearly show you use SQL every day. They set you apart and demonstrate your expertise.

Here are some of the SQL habits I've developed over time that make the most sense to me:

One Field Per Line

Always put each field on a new line. This improves readability and allows for easy commenting.

Instead of this:

select id, title, content, author_id, published_date, category, tags from blog_posts where status = 'published';

Do this:

select
    id,
    title,
    content,
    author_id,
    published_date,
    category,
    tags
    -- views_count, -- Commented out for now, may be needed later
    -- last_updated_date
from blog_posts
where status = 'published'

This approach not only makes your code more readable but also makes it easier to work with when using Language Models (LLMs) for code generation or analysis.

Use CTEs, Avoid Subqueries

Common Table Expressions (CTEs) are your friends. They make your code more readable and often more efficient than subqueries.

Instead of this:

select title, author_name
from blog_posts
where author_id in (
    select author_id
    from blog_authors
    where country = 'Canada'
)

Do this:

with canadian_authors as (
    select author_id
    from blog_authors
    where country = 'Canada'
)
select
    blog_posts.title,
    blog_authors.author_name
from blog_posts
join canadian_authors
    on blog_posts.author_id = canadian_authors.author_id
join blog_authors
    on blog_posts.author_id = blog_authors.author_id

Use Descriptive CTE Names

When naming your CTEs, be descriptive. This makes your code self-documenting and easier to understand.

Instead of this:

with a as (
    select * from blog_views where view_date >= '2023-01-01'
),
b as (
    select * from blog_posts where category = 'Technology'
)
select a.post_id, b.title, count(*) as view_count
from a join b on a.post_id = b.id
group by a.post_id, b.title

Do this:

with recent_views as (
    select * from blog_views where view_date >= '2023-01-01'
),
tech_posts as (
    select * from blog_posts where category = 'Technology'
)
select
    recent_views.post_id,
    tech_posts.title,
    count(*) as view_count
from recent_views
join tech_posts
    on recent_views.post_id = tech_posts.id
group by recent_views.post_id, tech_posts.title

Use Full Table Names Instead of Aliases

While aliases can be useful, using full table names can make your queries more immediately understandable, especially in complex joins.

Of course, when the table names are too long, you can always abbreviate them.

Instead of this:

select u.username, bp.title, c.comment_text
from users u
join blog_posts bp on u.user_id = bp.author_id
join comments c on bp.id = c.post_id

Do this:

select
    users.username,
    blog_posts.title,
    comments.comment_text
from users
join blog_posts
    on users.user_id = blog_posts.author_id
join comments
    on blog_posts.id = comments.post_id

Use Lowercase for SQL Keywords

While some developers prefer uppercase SQL keywords, using lowercase can improve readability and reduce visual noise. It's a personal preference, but consistency is key.

Instead of this:

SELECT author_id, COUNT(*) AS post_count, AVG(word_count) AS avg_word_count
FROM blog_posts
WHERE published_date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY author_id
HAVING COUNT(*) > 10

Do this:

select
    author_id,
    count(*) as post_count,
    avg(word_count) as avg_word_count
from blog_posts
where published_date between '2023-01-01' and '2023-12-31'
group by author_id
having count(*) > 10

Use the Last CTE as Your Final Query

As a personal habit, you can use the last CTE as your final query. This approach helps in wrapping up your thoughts and makes the query structure clear.

with author_stats as (
    select
        author_id,
        count(*) as post_count,
        avg(word_count) as avg_word_count
    from blog_posts
    where published_date >= '2023-01-01'
    group by author_id
),
prolific_authors as (
    select *
    from author_stats
    where post_count > 50
),
final_result as (
    select
        users.username,
        prolific_authors.post_count,
        prolific_authors.avg_word_count,
        count(distinct comments.id) as total_comments
    from users
    join prolific_authors
        on users.user_id = prolific_authors.author_id
    left join blog_posts
        on users.user_id = blog_posts.author_id
    left join comments
        on blog_posts.id = comments.post_id
    group by users.username, prolific_authors.post_count, prolific_authors.avg_word_count
)
select * from final_result

In this example, final_result is both a CTE and the query we're actually running, making it clear where our analysis ends.

Conclusion

By following these best practices, you'll create SQL code that's not only more efficient but also easier to read and maintain.

Always do what works for you. Happy querying!