Aoimirai - Kpop tools Full Disclaimers

As with any data mining, aggregation and statistics, there are plenty of disclaimers on how data are acquired and treated, as well proposital limitations or characteristics of the system. As much I would like to leave every single detail available as soon as possible, it would make the site pretty unreadable with so many fine print and disclaimers everywhere, so this page contains all of them.

Artist database

Artists are added manually as they become famous, have significant sales or MV views, or someone report a missing one by mail or inside the account system. General data is gathered from wikipedia and fansites. If you find anything wrong with an artist data, such as name, debut video, social medias and so on, mail me. I do not monitor new profiles and accounts for artists (other than those posted in r/kpop) so with time I will mostly have outdated information about their social medias. Original data was gathered in December 2018 with a review of Instagram accounts in April 2019.

Some small (usually only one MV) sub-units or soloists are not added separately and instead are listed inside the main group for simplicity sake. The biggest example of such move is with LOOΠΔ and the pre-debut sub-units. Some small collabs between artists are also often not added because there are too many combinations that have just one song and wouldn't make much sense to add as a whole new artist, as well when an artist is listed only as "featured". Exceptions are usually when one of the collaboration artists is noteworthy enough that allowing the views/sales of the collaboration into the artist makes sense. Example is Soyou, with her big plethora of collaborations.

As a side note, while we do not allow videos not hosted by official channels, some debut videos that were released prior to Korean studios posting their releases in Youtube, and which were never officially posted, might be linked to a non official account if that is the only available video. The Debut system also uses a date correction in case the date of the debut video is not the release date (it was posted on Youtube later, therefore the date in Youtube is not when the video was originally released).

Its worth mentioning that Korean artists that move out of Korea and start a solo carer in a foreign country, in foreign language, will not be counted towards their Kpop profile. A good example is Tiffany after she left SM, since she moved to the US, signed with an US label and releases America-oriented English-language musics, therefore not Kpop. Here is a non-extensive list of artists that are NOT included on this site:

  • Tiffany Young (after moving to America)
  • Kris Wu (after moving to China)
  • Z-Girls (is not K-pop, self declared "international pop" that just happens to be based in Korea)
  • Z-Boys (same)
  • Jackson Wang (China)

Youtube data (views and likes)

Videos are added manually as they are released, a new artist is added to the database, or reported as missing. Since I maintain this site alone with some very few kind souls using the account system to help, new releases might take a few days to be registered. Also please allow a few days for me to catch up on reports and mails if you send one. Videos with very little view count are added depending on their significance (debuts, only available video for an artist, etc..) but might be left out, same thing with alternate versions. 

The reason we do not add "Mirrored" (unless the only available dance practice and on an official channel) is that these videos are used for fans learning/practicing the choreography and are often replayed a lot for that reason, causing such videos to have a huge view count for different reasons than usual MV's or even choreography videos. 

Views are gathered from the main cron bot, which crawls Youtube gathering views and likes every 15 seconds. Since there are about 4000 videos to fetch data, and the bot averages 180 crawls per hour, it currently cycles the whole database in one day - however, priority videos (recent releases, high view/day) get updated more frequently (every 3 hours).

At the first day of the month, the bot prioritizes videos and artists on the top lists to get the most accurate top list around 18:00 UTC for the Top History page.

Videos that are deleted get flagged and are never updated again - One should remind that some times a video is temporarily disabled but returns later, and since I cannot check that, there is a small possibility that a video marked as removed ends up returning. From time to time I check some of these videos to see if it were only temporarily disabled and are back up. 

Note that the bot only reads the HTML file from Youtube, fetching views and likes, not even touching media files and therefore not even starting a view. Currently it does not use Youtube API, it uses the HTTP page.

Sales

Sales are calculated as a sum of data (hard to find) prior to 2011, and data starting 2011 when GAON started. 

For sales PRIOR to 2011, dubbed the MIAK era (Music Industry Association of Korea), data are hard to find and usually requires digging into archives and fan-pages. Fans often update Wikipedia with some data but few artist pages are fully accurate and updated. The initial MIAK data was gathered from Wikipedia, with other sources used for some artists depending on how readily available and trustworthy they are. While for the GAON period no HANTEO data is used, there is a big chance some of the MIAK era data comes from HANTEO. Whenever possible, how MIAK era data were gathered is displayed when you click "Sales composition" in an artist.

GAON is a different method than HANTEO, whereas GAON use physical copies shipped from distribution lines, not sales to end users, HANTEO use real-time reports from stores about sales to end users. For instance, when a new debut happens, the studio will order a certain number of physical copies to be distributed to stores. Stores will then purchase these as they see fit (might not purchase all printed copies). GAON adds the copies that were shipped to stores, regardless if they end up being sold or not. Because of this, as time goes by and stores return unsold items, GAON corrects their data accordingly. GAON is known to be an accurate source for older releases because of that, but since they do not have current sales, they are a terrible source for debuts and new releases - thus TV shows, Awards and such use HANTEO data for up-to-date sales. Sometimes GAON data is used for end-of-year awards.

Unfortunately, GAON is far from accurate and in my experience they are very disorganized on their own. Other than their internal disorganization, a common problem between GAON and HANTEO (less with GAON) is that fans usually buy releases in bulk to try and influence shows and awards, and then RETURN those copies for a full refund. Both GAON and HANTEO then have to update the sales down, causing sales often to go down (this happens faster with HANTEO since its real-time, but happens about the same with GAON anyway). The problem is that GAON do not report these updates on real time, and on top of that, they only report the top 100 sales each month - therefore, small numbers that can end up having a big meaning month after month are not shown. Their end-of-year report, which do contain these updates, also only show the top 100, so we still miss a lot of tweaking. Therefore, even raw GAON data is not reliable.

GAON sales in this site are gathered from a bot that sums all sales from GAON monthly, then use GAON yearly to detect and correct updates. This is the most accurate one can get from GAON data, but as explained above, it has plenty of limitations. Also, since we need the yearly report to correct data, all sales for the current year should be taken with a grain of salt. They could be incorrect and we need to wait the Yearly report to correct it.

Only physical copies sold in Korea are counted.

Followers

Because I do not follow social media accounts closely, I do not suggest you to trust the numbers on this site blindly. Artists with all social media registered are ok, because the bot will gather the followers every week - the problem arises from artists without certain social medias registered, since they won't be added to the follower totals. Also, notice that a lot of artists either choose not to have every social media (for instance, Taeyeon only uses Instagram), some artists don't even have them, or are not allowed to by their publishers, because their posts are done in the publisher account instead (so, for instance, some SM artists don't have their own social media, instead updates are posted on SM social media and so on). 

Since social media count is so unreliable, I removed the focus on it from the site, even though it is still accessible (you can check them by going to the artist page and clicking an artist to see the expanded information, you will find follower totals and links to the social medias there)

The social media follower bot is slower than the Youtube bot and takes about a week to update every artist.

Other tools

The "This day in Kpop" uses the date videos were published - usually in Korean time. The "This week" feature uses the week number a video was updated, taking into consideration the first and last weeks of an year can overlap with the adjacent year (so the first week of 2018 contains the last week of 2017 and so on).

While debut videos are marked, notice that not all debuts had videos, so some artists might not have debuts marked on the site. Also, most artists that debuted before 2005 don't have a MV available (some do, but its rare). Debut videos that were posted at a later date (officially or by fans) have a "corrected" date so they appear on the correct historical date, not when the video was posted, as a Debut (but the video date is retained for average view/days)

The Top 30 history is gathered every first day of the month around 18:00 UTC automatically (for a small period it might show doubled because of an auto-backup) ever since May 2017, but due to a bug on the data gathering, Artist totals were incorrect and thus discarded up to May 2018. For that reason, for top MV's we have data starting May 2017, and for top Artists, only from May 2018.

Supporting the site

This site is totally non-profit and the only money making capabilities were ads (which were small, non-intrusive and as of December 2018 were removed since they were totally useless, 2 years and I have not reached the US$ 100 goal for payment from all my sites put together) and Tipping using Pay Pal. While these appear on all site (I run 2 domains, and both have more than just Kpop) I am yet to have received more than US$50 from donations. The costs of running the server are about US$15 per month, so I pay to keep the site up even if I add all donations and ad revenue. I also often request help from the community (usually via Twitter) to improve the site or the database, but it is extremely rare for someone to help, and a lot more common to people to nitpick problems, so I also don't get any non-monetary help to keep this up. Therefore, I ask some patience towards lacking artists, mvs or features, since this is a hobby site that are maintained only by me, and the majority of people are all talk but no help.


Ads by google (click here to hide, consider tipping me to maintain the site)
This year donations/tips (click to tip/donate): $3, 2018 donations: $36, Server cost yearly: $180