{"id":891,"date":"2014-01-10T15:36:36","date_gmt":"2014-01-10T20:36:36","guid":{"rendered":"http:\/\/www.phishie.com\/wordpress\/?p=891"},"modified":"2014-01-10T15:36:38","modified_gmt":"2014-01-10T20:36:38","slug":"5-degrees-of-pokemon-bacon","status":"publish","type":"post","link":"https:\/\/www.phishie.com\/wordpress\/2014\/01\/5-degrees-of-pokemon-bacon\/","title":{"rendered":"5 Degrees of Pok\u00c3\u00a9mon Bacon"},"content":{"rendered":"<p>One of the side projects I work on outside of my <a href=\"http:\/\/omniti.com\">job<\/a> is a Japanese anime site called <a href=\"http:\/\/anidb.net\/\">AniDB<\/a>. It was put together by some German guys quite some time ago with Perl, <a href=\"http:\/\/www.postgresql.org\/\">PostgreSQL<\/a>, and some old donated hardware\/bandwidth from an ISP. They were kind\/foolish enough to let me help out with the site about two years ago. Anidb has almost anything you could possibly need to know about Japanese anime series (much like <a href=\"http:\/\/www.imdb.com\/\">imdb<\/a> for movies). Including things like what characters know each other between the different series.<!--more--><\/p>\n<p><a href=\"http:\/\/www.phishie.com\/wordpress\/wp-content\/uploads\/2014\/01\/satoshi.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-full wp-image-892\" alt=\"satoshi\" src=\"http:\/\/www.phishie.com\/wordpress\/wp-content\/uploads\/2014\/01\/satoshi.png\" width=\"216\" height=\"276\" \/><\/a>AniDB&#8217;s data is populated by its users. \u00c2\u00a0Unfortunately, not all of them are as careful as others about providing complete entries for the character data they add. \u00c2\u00a0Many of the minor characters get entered but never linked to the rest of the story. \u00c2\u00a0We needed to find these missing\/orphaned characters so they can be investigated and cleaned up by the AniDB staff.\u00c2\u00a0\u00c2\u00a0The question we were trying to answer was, &#8220;out of any character in any Pok\u00c3\u00a9mon series, who doesn&#8217;t have some sort of relationship to <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=character&amp;charid=6693\">Ash\/Satoshi <\/a>(the main character)?&#8221; It&#8217;s almost impossible to not SOMEHOW be related to the main character. \u00c2\u00a0 Because of how the reporting system is setup, we also wanted this to be a single query so we could just throw it in a report. \u00c2\u00a0AniDB&#8217;s tables are setup a little weird. This is in part due to necessary design and in part due to historical reasons that stem from organic growth. \u00c2\u00a0This design is part of the complexity of the problem.<\/p>\n<p>For the first part of our report, we wanted to get a list of EVERY character in any type of Pokemon anime. To do that, the tables we have are:<\/p>\n<pre style=\"font-weight: bold;\">animetb -- The Anime Table\r\n-------\r\nid -- Primary Key for the anime\r\n...\r\n\r\nseqtb -- The Sequel Table\r\n-----\r\naid -- Foreign Key to animetb. The subject anime.\r\nnext -- Foreign Key to animetb. The object anime (sequel)\r\ntype -- What kind of relation. Sequel, side-story, OVA, etc.<\/pre>\n<p>So to start with, we had\u00c2\u00a0<a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=anime&amp;aid=230\">the original Pokemon anime<\/a> that started it all in 1997. \u00c2\u00a0This is anime id <code>230<\/code> in our system. \u00c2\u00a0We need to traverse the tree structure of anime relations to get a list of every anime related to the original Pokemon. \u00c2\u00a0This in itself can be quite complicated. \u00c2\u00a0Here&#8217;s a <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=rel&amp;aid=230\">graph of how all these anime are related<\/a>. \u00c2\u00a0To do this I needed a great feature of PostgreSQL called <a href=\"http:\/\/www.postgresql.org\/docs\/8.4\/static\/queries-with.html\">Recursive Common Table Expressions<\/a>\u00c2\u00a0(CTEs). \u00c2\u00a0To explain how these work, lets go through the (slightly modified) example in the PostgresSQL docs.<\/p>\n<pre style=\"font-weight: bold;\">WITH RECURSIVE temp(n) AS (\r\n  VALUES (1)\r\n  UNION ALL\r\n  SELECT n+1 FROM temp WHERE n &lt; 10\r\n)\r\nSELECT sum(n) FROM temp;<\/pre>\n<p>The way this works is, we create a named recursive CTE called &#8220;temp&#8221;. \u00c2\u00a0Recursive CTEs contain several parts. \u00c2\u00a0The non-recursive term: <span style=\"font-weight: bold;\">VALUES(1)<\/span>,\u00c2\u00a0which unsurprisingly just returns the result set of &#8220;1&#8221;. \u00c2\u00a0 This is following by a <span style=\"font-weight: bold;\">UNION<\/span> or <span style=\"font-weight: bold;\">UNION ALL<\/span> connection. \u00c2\u00a0Then the fun part. The recursive part:<\/p>\n<pre style=\"font-weight: bold;\">SELECT n+1 FROM temp WHERE n &lt; 10<\/pre>\n<p>When Postgres runs this query, it will see the recursive CTE and evaluate the first part, which returns the single value 1. \u00c2\u00a0It then takes this result set as the content of the &#8220;temp&#8221; table and will then run the second recursive part with this result set. \u00c2\u00a0 Since we&#8217;re just selecting N+1, the result, in this case, will be &#8220;2&#8221;. \u00c2\u00a0Since this iteration created new values to add to &#8220;temp&#8221;, it will incrementally keep going and get &#8220;3&#8221; then &#8220;4&#8221; and so forth until the query returns nothing, at which point it will end. \u00c2\u00a0The final result of the recursive CTE is a &#8220;temp&#8221; table with 10 rows in it. \u00c2\u00a0Values \u00c2\u00a0&#8220;1&#8221; through &#8220;10.&#8221; \u00c2\u00a0 We can then query &#8220;temp&#8221;, in our final query, to sum() it and get our grand total of &#8220;55.&#8221;<\/p>\n<p>We&#8217;re going to use the same strategy with anime. \u00c2\u00a0We&#8217;re returning our hardcoded first Pokemon anime (<code>id 230<\/code>) as the starting value, and then \u00c2\u00a0start pulling the relations with this query:<\/p>\n<pre style=\"font-weight: bold;\">WITH RECURSIVE pokemon_animes(animeid) AS (\r\n  VALUES( 230 )\r\n  UNION\r\n  SELECT s.next FROM seqtb s, pokemon_animes p WHERE p.animeid = s.aid\r\n)\r\nSELECT animeid FROM pokemon_animes ORDER BY animeid;<\/pre>\n<p>Our first iteration will return <code>230<\/code>, which we will run the recursive query with returning our second set :<\/p>\n<pre style=\"font-weight: bold;\">921 | <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=anime&amp;aid=921\">Pocket Monsters: Mizu no Miyako no Mamorigami Latias to Latios<\/a>\r\n990 | <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=anime&amp;aid=990\">Gekijouban Pocket Monsters: Mewtwo no Gyakushuu<\/a>\r\n991 | <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=anime&amp;aid=991\">Pocket Monsters: Celebi Toki o Koeta Deai<\/a>\r\n992 | <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=anime&amp;aid=992\">Pocket Monsters: Kesshoutou no Teiou<\/a>\r\n1020 | <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=anime&amp;aid=1020\">Pocket Monsters: Maboroshi no Pokemon Lugia Bakutan<\/a>\r\n1041 | <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=anime&amp;aid=1041\">Pocket Monsters Advanced Generation<\/a>\r\n3491 | <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=anime&amp;aid=3491\">Pocket Monsters: Pikachu no Fuyuyasumi (1999)<\/a>\r\n3492 | <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=anime&amp;aid=3492\">Pocket Monsters: Pikachu no Fuyuyasumi (2000)<\/a>\r\n3493 | <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=anime&amp;aid=3493\">Pocket Monsters: Pikachu no Fuyuyasumi (2001)<\/a>\r\n7117 | <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=anime&amp;aid=7117\">Pocket Monsters Crystal: Raikou Ikazuchi no Densetsu<\/a><\/pre>\n<p>Our recursive query will then get run again with the above set of ids returning another, third set, and so on until we have no more results. \u00c2\u00a0This gives us a list of all the Pokemon anime that we can use to get all of the character ids. \u00c2\u00a0To pull these character ids will require use of another table from AniDB.<\/p>\n<pre style=\"font-weight: bold;\">chartb -- The Character Table\r\nid : Primary Key for the character\r\nparentid : The 'guise' for the character.\r\n\u00e2\u20ac\u00a6<\/pre>\n<p>By using our list of anime ids and AniDB&#8217;s &#8220;charanimereltb&#8221; link table, we can get a listing \u00c2\u00a0of all characters in any Pokemon anime with:<\/p>\n<pre style=\"font-weight: bold;\">WITH RECURSIVE pokemon_animes(animeid) AS (\r\n  VALUES( 230 )\r\n  UNION\r\n  SELECT s.next FROM seqtb s, pokemon_animes p WHERE p.animeid = s.aid\r\n)\r\nSELECT c.charid FROM charanimereltb c, pokemon_animes a WHERE c.aid = a.animeid;<\/pre>\n<p>That gets us halfway there! \u00c2\u00a0Now we just need all the characters related to Satoshi to compare the set to. \u00c2\u00a0The parentid\/guise from the table above was the first type of relationship tracked in AniDB. \u00c2\u00a0The guise is used to associate alternate forms of a character. \u00c2\u00a0Because this relationship is chronological in story lines, it can usually be represented this way without any problems. \u00c2\u00a0<span style=\"line-height: 1.5em;\">As time passed, a real relations table had been added. \u00c2\u00a0However, since guises were already in the chartb, these were never moved because of the number of changes that would be required to do so. \u00c2\u00a0Technical debt is incurred, but usually we can work around that. \u00c2\u00a0However, eventually this system was further abused by some of the data entry people. \u00c2\u00a0 \u00c2\u00a0The &#8216;wild&#8217; version of a Pokemon was created and guises were created for more specific versions of the individual trainer&#8217;s instances of these fur balls. \u00c2\u00a0This brings us to where we are now.<\/span><\/p>\n<pre style=\"font-weight: bold;\">charcharreltb -- The Character to Character Relation Table\r\ncharid : The subject of the relation. FK to chartb\r\nnext : The object of the relation. Also a FK to chartb.\r\ntype : How they are related, friend\/enemy\/etc\r\n...<\/pre>\n<p><a href=\"http:\/\/www.phishie.com\/wordpress\/wp-content\/uploads\/2014\/01\/fushi.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-full wp-image-893\" alt=\"fushi\" src=\"http:\/\/www.phishie.com\/wordpress\/wp-content\/uploads\/2014\/01\/fushi.jpg\" width=\"400\" height=\"300\" srcset=\"https:\/\/www.phishie.com\/wordpress\/wp-content\/uploads\/2014\/01\/fushi.jpg 400w, https:\/\/www.phishie.com\/wordpress\/wp-content\/uploads\/2014\/01\/fushi-300x225.jpg 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/a>This setup causes some fun problems. In this case, to really find out everyone who is related to a character, you not only want all of the relationships about that instance of the character, but any other versions\/guises of the character as well (or any other versions\/guises of the relations you are related to.)<\/p>\n<p>For example (and this will get geeky),\u00c2\u00a0<a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=character&amp;charid=6693\">Satoshi<\/a> and <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=character&amp;charid=22350\">Fushigidane<\/a> are related to each other as Fushigidane (Japanese for Bulbasaur), which is Satoshi&#8217;s personal trained Pok\u00c3\u00a9mon. Fushigidane is a type\/guise of <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=character&amp;charid=18\">Bulbasaur<\/a>. Bulbasaurs can evolve into any <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=character&amp;charid=20\">Ivysaur<\/a>. Thus all 3 forms in the system are &#8220;related&#8221; to Satoshi.<\/p>\n<p><b>6693 &lt;-related-&gt; 22350 -guise-&gt; 18 -relation-&gt; 20<\/b><\/p>\n<p>Or for even crazier example. \u00c2\u00a0A <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=character&amp;charid=169\">Clefairy\/Pippi<\/a> is a type of Pokemon. These are a guise of <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=character&amp;charid=23859\">a particular Clefairy<\/a> (id 23859) that was trained\/owned by <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=character&amp;charid=23857\">Akai Isamu<\/a>(id 23857). Akai also has another <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=character&amp;charid=23858\">personal trained Pokemon<\/a> (id 23858) that is a <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=character&amp;charid=69\">Pikachu type<\/a>\/guise (id 69). That, of course, is also a guise to the <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=character&amp;charid=8091\">Pikachu<\/a> (id 8091) that everyone knows and loves belonging to <a href=\"http:\/\/anidb.net\/perl-bin\/animedb.pl?show=character&amp;charid=6693\">Satoshi<\/a> (6693). Therefore, all of these characters are related to Satoshi.<\/p>\n<p><b>169 &lt;-guise_of- 23859 &lt;-related-&gt; 23857 &lt;-related-&gt; 23858 -guise_of-&gt; 69 &lt;-guise_of- 8091 &lt;-related-&gt; 6693<\/b><\/p>\n<p>This is a pretty nasty chain. And remember, we need to do it with a single query! \u00c2\u00a0One limitation of recursive queries is that you can only refer to the recursive table once and it may not be (unfortunately) in a subquery. \u00c2\u00a0 This provided a challenge, since we needed to return a single value, but I needed to pull from several different potential relations or guises! \u00c2\u00a0 I was finally able to work around this with two other great utilities that PostgreSQL offers -\u00c2\u00a0<a href=\"http:\/\/www.postgresql.org\/docs\/9.3\/static\/arrays.html\">Arrays<\/a>\u00c2\u00a0and the <a href=\"http:\/\/www.postgresql.org\/docs\/9.3\/static\/functions-array.html\">unnest() function<\/a>,\u00c2\u00a0which will expand an array into a set of rows. \u00c2\u00a0It ended up looking something like this:<\/p>\n<p><a href=\"http:\/\/www.phishie.com\/wordpress\/wp-content\/uploads\/2014\/01\/pippi.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright size-full wp-image-894\" alt=\"pippi\" src=\"http:\/\/www.phishie.com\/wordpress\/wp-content\/uploads\/2014\/01\/pippi.jpg\" width=\"235\" height=\"228\" \/><\/a><\/p>\n<pre style=\"font-weight: bold;\">WITH RECURSIVE related_to_satoshi(charid) AS (\r\n  VALUES (6693)\r\n  UNION\r\n  SELECT unnest(ARRAY[c.next, n.charid, d.parentid, e.parentid, g.id])\r\n  FROM related_to_satoshi s\r\n    -- related to character\r\n    LEFT JOIN charcharreltb c ON (c.charid = s.charid)\r\n      -- get the character to get any guise of them\r\n      LEFT JOIN chartb d ON (c.charid = d.id)\r\n        -- get any reverse relations to the character\r\n        LEFT JOIN charcharreltb n ON (n.next = s.charid)\r\n      -- get that character to get any guise from them\r\n      LEFT JOIN chartb e ON (n.next = e.id)\r\n    -- get my guise so we can look it up next iteration too\r\n    LEFT JOIN chartb g ON (s.charid = g.parentid)\r\n)<\/pre>\n<p>Basically, we pull in everyone who was related, or a guise of the character, and then keep iterating over them. \u00c2\u00a0We throw them all into the array, and then unnest it for the final result set. \u00c2\u00a0 Because we UNION everything we remove any duplicate values and keep us from going into any possible loops (important!).<\/p>\n<p>Adding in a non-recursive CTE for clarity we can come to our final solution of:<\/p>\n<pre style=\"font-weight: bold;\">WITH RECURSIVE related_to_satoshi(charid) AS (\r\n  VALUES (6693)\r\n  UNION\r\n  SELECT unnest(ARRAY[c.next, n.charid, d.parentid, e.parentid, g.id])\r\n  FROM related_to_satoshi s\r\n    LEFT JOIN charcharreltb c ON (c.charid = s.charid)\r\n      LEFT JOIN chartb d ON (c.charid = d.id)\r\n    LEFT JOIN charcharreltb n ON (n.next = s.charid)\r\n      LEFT JOIN chartb e ON (n.next = e.id)\r\n    LEFT JOIN chartb g ON (s.charid = g.parentid)\r\n),\r\npokemon_animes(animeid) AS (\r\n  VALUES( 230 )\r\n  UNION\r\n  SELECT s.next FROM seqtb s, pokemon_animes p WHERE p.animeid = s.aid\r\n),\r\nall_characters AS (\r\n  SELECT DISTINCT c.charid FROM charanimereltb c, pokemon_animes a WHERE c.aid = a.animeid\r\n)\r\nSELECT DISTINCT a.charid as not_related\r\nFROM all_characters a\r\nLEFT JOIN related_to_satoshi r USING(charid)\r\nWHERE r.charid IS NULL\r\nORDER BY a.charid;<\/pre>\n<p>The editors were thrilled and this has started to become a new report for helping clean up the system! \u00c2\u00a0 Finally we could catch them all! \u00c2\u00a0 PostgreSQL was chosen as the database for AniDB because of its reliability and ability to do these tough reports. \u00c2\u00a0It&#8217;s not a decision we&#8217;ve ever regretted.<\/p>\n<p><em>\u00c2\u00a0&#8220;Meowth! That&#8217;s right!&#8221;<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the side projects I work on outside of my job is a Japanese anime site called AniDB. It was put together by some German guys quite some time ago with Perl, PostgreSQL, and some old donated hardware\/bandwidth from an ISP. They were kind\/foolish enough to let me help out with the site about [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18,21,20],"tags":[],"class_list":["post-891","post","type-post","status-publish","format-standard","hentry","category-anime","category-postgresql","category-work"],"_links":{"self":[{"href":"https:\/\/www.phishie.com\/wordpress\/wp-json\/wp\/v2\/posts\/891","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.phishie.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.phishie.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.phishie.com\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.phishie.com\/wordpress\/wp-json\/wp\/v2\/comments?post=891"}],"version-history":[{"count":35,"href":"https:\/\/www.phishie.com\/wordpress\/wp-json\/wp\/v2\/posts\/891\/revisions"}],"predecessor-version":[{"id":930,"href":"https:\/\/www.phishie.com\/wordpress\/wp-json\/wp\/v2\/posts\/891\/revisions\/930"}],"wp:attachment":[{"href":"https:\/\/www.phishie.com\/wordpress\/wp-json\/wp\/v2\/media?parent=891"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.phishie.com\/wordpress\/wp-json\/wp\/v2\/categories?post=891"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.phishie.com\/wordpress\/wp-json\/wp\/v2\/tags?post=891"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}