Day 6 of #100daysofcode #Dataengineering
Welcome back my most valued reader! Just so you know, you are part of the reasons why I'm still trying to push this hard. I'm trying my possible best to ensure I keep updating you on what and how my day went as regards to coding. Yes I know you are eager to know what I was able to cover today.
Today is Friday, and my Friday's are always super busy, but I won't let that hold me back. So I was able to atleast spend 3 hours today learning and practicing codes. Today I focus mainly on Grouping.
Grouping and Aggregate Functions
When I saw this topic the first thing that came to mind was Grouping as regards My SQL, no doubt I was quite right. In pandas Grouping involves combining multiple pieces of data into a single result.
At the end of this class believe me when I said yes I was happy, because I was working with a large data of about 30 columns and 80k rows, I was able to have a clean data at the end of the day, I tried to use my SQL statement to see if they work in same way, yes in SQL it's also possible, but I have learn new thing today.
I'm trying to make this article to be atleast one minute reading. So I won't bug you much.
I learn and practiced how to use the Aggregate function such as; count, median etc, I was able to see that the count value is a number of a non NaN rows, and this basically means that it's count the non missing rows.
And example of count can be seen below.
Employee['salary'].count()
I also learn how to make use of value_counts and this can be applied in any survey data.
I won't leave you with doubt if I was able to practice Grouping itself. Yes is the answer, consider below code and correct me for any errors.
One thing you need to know first is that Group by operation in pandas involves some combination of spliting the objects, applying a function and combining the results. This is evident by reviewing the code below.
Grouping country and the social media column together and using UnitedStates as basis. together and viewing the first 20 rows.
developer['country'].value_counts()
developer_grp =developer.groupby['country']
developer_grp.get_group('UnitedStates')
developer['SocialMedia'].value_counts().head(20)
Post a Comment