Can I accomplish this in a single SQL statement?

RoundSparrow@lemmy.ml · edit-2 11 months ago

Can I accomplish this in a single SQL statement?

iZom@feddit.uk · 11 months ago

I think you can use RANK as a window function and specify TOP 1000

RoundSparrow@lemmy.ml · 11 months ago

Ok, I’m doing some reading: https://medium.com/@amulya349/how-to-select-top-n-rows-from-each-category-in-postgresql-39e3cfebb020

RoundSparrow@lemmy.ml · edit-2 11 months ago

Good results with this approach. I hadn’t considered the RANK OVER PARTITION BY criteria_a values and it works like a champ. It moves the ORDER BY into the realm of focus (criteria_a) and performance seems decent enough… and it isn’t difficult to read the short statement.

SELECT COUNT(ranked_recency.*) AS post_row_count
FROM
  (
     SELECT id, post_id, community_id, published,
        rank() OVER (
           PARTITION BY community_id
           ORDER BY published DESC, id DESC
           )
     FROM post_aggregates) ranked_recency
WHERE rank &lt;= 1000
;

Gives me the expected results over the 5+ million test rows I ran it against.

If you could elaborate on your idea of TOP, please do. I’m hoping there might be a way to wall the LIMIT 1000 into the inner query and not have the outer query need to WHERE filter rank on so many results?

iZom@feddit.uk · 11 months ago

Glad this is working for you. Using TOP probably was a bad idea and I think the way you used RANK <=1000 is a better approach.

If there was a way to safely exclude any of the records - like if you knew that when published was older than X days/months/years it would never make it into the final results, you could filter them out before ranking them. That might squeeze a little more performance out of the query, but could be risky if the data isn’t predictable enough.

RoundSparrow@lemmy.ml · 11 months ago

Thank you

bahmanm@lemmy.ml · 11 months ago

I’m not at my desk ATM but I think this is a prime usecase for crosstabs.