Aoimirai - Kpop tools Full Disclaimers

As with any data mining, aggregation and statistics, there are plenty of disclaimers on how data are acquired and treated, as well proposital limitations or characteristics of the system. As much I would like to leave every single detail available as soon as possible, it would make the side pretty unreadable with so many fine print and disclaimers everywhere, so this page contains all of them.

Artist database

Artists are added manually as they become famous, have significante sales or MV views, or someone report a missing one by mail or reddit. General data is gathered from wikipedia and fansites. If you find anything wrong with an artist data, such as name, debut video, social medias and so on, mail me. I do not monitor new profiles and accounts for artists so with time I will mostly have outdated information about their social medias.

Some small (usually only one MV) sub-units or soloists are not added and instead are listed inside the main group for simplicity sake. The biggest example of such move is with LOOΠΔ and the pre-debut sub-units. Some small collabs between artists are also often not added because there are too many combinations that have just one song and wouldn't make much sense to add as a whole new artist, exceptions are usually when one of the collaboration artists is noteworthy enough that allowing the views/sales of the collaboration into the artist makes sense. Example is Soyou, with her big plethora of collaborations.

As a side note, while we do not allow videos not hosted by official channels, some debut videos that were released prior to Korean studios posting their releases in Youtube, and which were never officially posted, might be linked to a non official account if that is the only available video. The Debut system also uses a date correction in case the date of the debut video is not the release date (it was posted later, therefore the date in Youtube is not when the video was originally released).

Its worth mentioning that Korean artists that move out of Korea and start a solo carer in a foreign country, in foreign language, will not be counted towards their Kpop profile. A good example is Tiffany after she left SM, since she moved to the US, signed with an US label and releases America-oriented English-language musics, therefore not Kpop. 

Youtube data (views and likes)

Videos are added manually as they are released, a new artist is added to the database, or reported as missing. Since I maintain this site alone, new releases might take a few days to be registered. Also please allow a few days for me to catch up on reports and mails if you send one.

Views are gathered from the main cron bot, which crawls Youtube gathering views and likes almost every minute. Since there are about 3000 videos to fetch data, and the bot averages 50~70 crawls per hour, it can take up to 3 days to complete updating every video before the cycle starts again - however, priority videos (big views, recent releases, high view/day) get updated more frequently (some daily, recent ones each 12 hours) causing the time to update every video to increase to 4 or 5 days.

At the first day of the month, the bot prioritizes videos and artists on the top lists to get the most accurate top list around 18:00 UTC for the Top History page.

Videos that are deleted get flagged and are never updated again - I should remind that some times a video is temporarily disabled but returns later, and since I cannot check that, there is a small possibility that a video marked as removed ends up returning. From time to time I check some of these videos to see if it were only temporarily disabled and are back up.

Note that the bot only reads the HTML file from Youtube, fetching views and likes, not even touching media files and therefore not even starting a view.

Follower data

A second bot runs each 20 minutes to gather followers. It checks Facebook, Twitter, Youtube channels, Youtube users, vLive profile and Instagram to detect the followers on each platform for each artist (each run, it checks all those sites for an unique artist). Since we have about 300 artists, it can take up to a week for all follower data to be updated. Profiles that are closed are removed from the update cycle.

Please notice that Facebook rounds the number of followers to the nearest thousand. Also notice that some artists don't have their own social media account, instead using the label's account, or in case of soloists, the group they belong. For that reason, some high-profile artists might not have many followers. 

Also notice I do not follow up when new accounts for artists are created and therefore might be missing some. The only trustworthy are vLive, which are unique accounts per artists and don't change. Again, some soloists don't have an vLive account and post their solo activities in the label or main group account.

Sales

Sales are calculated as a sum of data (hard to find) prior to 2011, and data starting 2011 when GAON started. 

For sales PRIOR to 2011, dubbed the MIAK era (Music Industry Association of Korea), data are hard to find and usually requires digging into archives and fan-pages. Fans often update Wikipedia with some data but few artist pages are fully accurate and updated. The initial MIAK data was gathered from Wikipedia, with other sources used for some artists depending on how readily available and trustworthy they are. While for the GAON period no HANTEO data is used, there is a big chance some of the MIAK era data comes from HANTEO. Whenever possible, how MIAK era data were gathered is displayed when you click "Sales composition" in an artist.

GAON is a different method than HANTEO, whereas GAON use physical copies shipped from distribution lines, not sales to end users, HANTEO use real-time reports from stores about sales to end users. For instance, when a new debut happens, the studio will order a certain number of physical copies to be distributed to stores. Stores will then purchase these as they see fit (might not purchase all printed copies). GAON adds the copies that were shipped to stores, regardless if they end up being sold or not. Because of this, as time goes by and stores return unsold items, GAON corrects their data accordingly. GAON is known to be an accurate source for older releases because of that, but since they do not have current sales, they are a terrible source for debuts and new releases - thus TV shows, Awards and such use HANTEO data for up-to-date sales. Sometimes GAON data is used for end-of-year awards.

Unfortunately, GAON is far from accurate and in my experience they are very disorganized on their own. Other than their internal disorganization, a common problem between GAON and HANTEO (less with GAON) is that fans usually buy releases in bulk to try and influence shows and awards, and then RETURN those copies for a full refund. Both GAON and HANTEO then have to update the sales down, causing sales often to go down (this happens faster with HANTEO since its real-time, but happens about the same with GAON anyway). The problem is that GAON do not report these updates on real time, and on top of that, they only report the top 100 sales each month - therefore, small numbers that can end up having a big meaning month after month are not shown. Their end-of-year report, which do contain these updates, also only show the top 100, so we still miss a lot of tweaking. Therefore, even raw GAON data is not reliable.

GAON sales in this site are gathered from a bot that sums all sales from GAON monthly, then use GAON yearly to detect and correct updates. This is the most accurate one can get from GAON data, but as explained above, it has plenty of limitations. Also, since we need the yearly report to correct data, all sales for the current year should be taken with a grain of salt. They could be incorrect and we need to wait the Yearly report to correct it.

Only physical copies sold in Korea are counted.

Other tools

The API and Database Download are provided as is. Data are acquired directly from the database. The API should be used for dynamic data (like views) while the database download is better to get all data. At the moment, the download system is available per request.

The "This day in Kpop" uses the date videos were published - usually in Korean time. The "This week" feature uses the week number a video was updated, taking into consideration the first and last weeks of an year can overlap with the adjacent year (so the first week of 2018 contains the last week of 2017 and so on).

While debut videos are marked, notice that not all debuts had videos, so some artists might not have debuts marked on the site. Also, most artists that debuted before 2005 don't have a MV available (some do, but its rare). Debut videos that were posted at a later date (officially or by fans) have a "corrected" date so they appear on the correct historical date, not when the video was posted, as a Debut (but the video date is retained for average view/days)

The Top 30 history is gathered every first day of the month around 18:00 UTC automatically (for a small period it might show doubled because of an auto-backup) ever since May 2017, but due to a bug on the data gathering, Artist totals were incorrect and thus discarded up to May 2018. For that reason, for top MV's we have data starting May 2017, and for top Artists, only from May 2018.

Supporting the site

This site is totally non-profit and the only money making capabilities are ads (which are small, non-intrusive and as of December 2018 totally useless, its been 2 years and I have not reached the US$ 100 goal for payment) and Tipping using Pay Pal. While these appear on all site (I run 2 domains, and both have more than just Kpop) I am yet to have received more than US$50 from donations. The costs of running the server are about US$20 per month, so I pay to keep the site up even if I add all donations and ad revenue. I also often request help from the community (usually via Twitter) to improve the site or the database, but it is extremely rare for someone to help, and a lot more common to people to nitpick problems, so I also don't get any non-monetary help to keep this up. Therefore, I ask some patience towards lacking artists, mvs or features, since this is a hobby site that are maintained only by me.


Ads by google (click here to hide, consider tipping me to maintain the site)
Check other of my articles, like: Freshwater fish condition diagnostic tool