tables2graphs has useful examples including R code, but there’s a simpler way. There’s an R package for (almost) everything, and (of course) you’ll find one to produce coefficient plots. Actually there are several ones.
The one I end up using most is the
coefplot function in the package arm. It handles most common models out of the box. For those it doesn’t, you can simply supply the coefficients. Here’s the code for the coefficient plot shown. The first two lines are just to get the data in case you’re interested in full replication.
The default in arm is to use a vertical layout, so
coefplot(m1) works wonderfully. Often I prefer the horizontal layout, which is easily done with
vertical=FALSE; I also add custom margins so that the variable…
View original post 361 more words
Dalam aktivitas content mining, data mining, social network analysis dan sebagai bagian dari pekerjaan data science, maka melakukan mining terhadap media sosial merupakan hal yang ‘wajib’. Dalam entri blog kali ini saya akan menuliskan mengenai crawling percakapan dan konten dari media sosial Twitter menggunakan bahasa R. Penjelasan mengenai R ada di halaman wikipedia ini. R dibangun secara crowdsourcing dimana banyak saintis dan programmer membuat modul modul khusus untuk meningkatkan fungsi fungsi dari bahasa R.
Salah satu package / library / modul yang menarik adalah twitteR, modul ini dibuat untuk mengakses API dari Twitter, sehingga kita bisa melakukan operasi operasi seperti melihat profile, melihat daftar teman, daftar followers, pencarian kata kunci dan lain lainnya. Operasi yang sering saya lakukan adalah pencarian kata kunci untuk kemudian saya lakukan data mining, sentiment analysis atau social network analysis.
Langkah langkah yang perlu dilakukan adalah yang pertama kali…
View original post 204 more words
I wanted yet another opportunity to get to use the fabulous caret package, but also to finally give plot.ly a try. To scratch both itches, I dipped into the UCI machine learning library yet again and came up with a survey data set on the topic of contraceptive choice in Indonesia. This was an interesting opportunity for me to learn about a far-off place while practicing some fun data skills.
According to recent estimates, Indonesia is home to some 250 million individuals and over the years, thanks to government intervention, has had its fertility rate slowed down from well over 5 births per woman, to a current value of under 2.4 births per woman. Despite this slow down, Indonesia is not generating enough jobs to satisfy the population. Almost a fifth of their youth labour force (aged 15-24) are unemployed (19.6%), a whole 6.3% more than a recent…
View original post 1,514 more words
Apologies for the long delay in posts; I have been busy learning about Big Data/Hadoop/Spark/etc infrastructure lately… which leads me into a perfect introduction for this post. Data scientists who can create compelling big data visualizations are in extreme demand from my experience (not that I consider myself particularly capable in either of those areas at this point, just getting started). Being the curious programmer that I am, I wanted to delve into this area a little more on my own. Knowing that I am a very goal driven person, I created a little project to basically train myself on the new data visualization technologies and as a bonus I get to help others who are interested as well. So along came…
Visual Black Box – https://github.com/PaulMontgomery/VisualBlackBox
VBB is an open source/Apache 2 licensed, push driven, real time, HTML5 data visualization tool (like line graphs, bar charts, heat maps, US…
View original post 742 more words
With the benefit of smart electricity meters it’s possible to obtain hourly data showing household consumption in KWh. I downloaded this dataset for my own house in CSV format from United Energy’s EnergyEasy portal.
With some massaging, the data can be formatted to a structure which which makes aggregation easier. The excellent tool OpenRefine made this task easier, effectively unpivoting half-hourly measures which were in many columns into a single column, so that the data looks like this:
Day,Interval,Quantity 2012-01-01,0000,0.05 2012-01-01,0030,0.05 2012-01-01,0100,0.044 2012-01-01,0130,0.05 2012-01-01,0200,0.044 [...] 2013-12-31,2130,0.025 2013-12-31,2200,0.019 2013-12-31,2230,0.025 2013-12-31,2300,0.025 2013-12-31,2330,0.025
- During which hours of the day is the highest average energy consumption? Is this different in summer vs winter? Has this changed from 2012 to 2013?
View original post 180 more words