Some insights from Malaysia Youth Data Bank System

I have just finished exploring some of the data available on Malaysia Youth Data Bank System (“ydata”). I mentioned yesterday that I wanted to analyse the education data on the website but the data available is boring and doesn’t give me much valuable insights. All I get is the total number of students by type of degree, by uni (IPTA only), from 2015 to 2017. Assuming the data is correct, the key takeaways are:

  1. From 2015 to 2017, total number of students who went to IPTA remains relatively constant with an average of about 500,000. Almost 2/3 of them took degree (~300,000), followed by diploma (almost 20% of them). So the focus should really be on the students taking degree and diploma, the rest (such as masters, PHD, professional are insignificant, relatively).
  2.  Zooming into the students with degree, almost a quarter of them studied at UiTM (i.e. slightly above 70,000 students per year), while the remaining number of students (~230,000 per year) are distributed evenly across other IPTAs with an average of 13,000 per university. So it seems that UiTM has been the preferred choice for degree. It makes sense because it is our country’s largest higher learning institution in terms of size and population with over 12 state campuses. The chart below shows 2017 numbers.
Total number of students in Malaysia taking degree courses in IPTA in 2017

Nothing much that I can gather really from the data, unless I have access to the performance of the students from various courses, at least from the most occupied universities such as UiTM, UUM, USM and UM. The only reliable data I can refer to is QS World University Ranking that shows UM is the top university in Malaysia while UiTM the most occupied university comes in at number 9 (source:

Anyway, the data in other categories such as social activities, media and technology are more intriguing to play with.

For media and technology category, the data that they have is internet penetration rate for the age group of 15-40 by gender (the whole website is in BM, took me awhile to translate that into English, guess I need to improve my BM). You can see it here. The good thing about Ydata is they are trying to use Tableau to perform the analysis and infact the web-based software is available on the website for you to play if you want to explore the data yourself, as easy as ticking and un-ticking the boxes.  But the problem is, they get too excited that they don’t really use the right chart for the right data. For example, if you have static data from different groups in different years, the best chart is bar chart to see the comparison, not line graph.

I reproduced the chart from media and technology category using powerBI, as below.

internet by gender
Internet penetration rate in Malaysia by gender

The trend is pretty much static over the years with the average of internet penetration rate for men and women being 45% and 35% respectively. By right, the total penetration rate should increase over time (although an average of 80% is considered high but there’s room to grow definitely, Brunei is currently the Southeast Asian country with highest internet penetration rate of 95% as of June 2018). And more importantly, the % of women using interest should increase further. We need to encourage more women (especially the marginalized) to have access to internet (or at least smartphone) and learn how to use it to generate income and achieve financial independence. We can already see the trend in Instagram, with many of them sell products (mostly food) online. As Melinda Gates expressed in Bill & Melinda Gates 2019 Annual Letter, “mobile phones are most powerful in the hands of poorest women”.

Finally, some insights I gathered by eyeballing the data from social activities category are as follows:

  1. Total number of drug addicts have increased significantly from 15,765 in 2013 to 25,922 in 2017.
  2. Penang has the highest number of drug addicts from 2013 to 2017, followed by Johor. I find this bizarre, I thought Penang and JB are among the top states with better standard of living. I guess that comes with a price, i.e. higher cost of living and hence some of the youth struggle to cope.
  3. The age group of 16-20, i.e. teenagers experience the highest number of accidents. Effect of just gotten their driving license maybe?

I could go on and on, but that’s it for today. The whole point of this exercise is to show that collecting quality data and then analyzing it are extremely important for us to understand better any type of issues (education, socioeconomic etc) to make better decisions and policies. Something that I would encourage the youth to do more. And this is just basic, descriptive analytics.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s