Besides Pandas and Seaborn I would also like to provide you with a brief introduction to matplotlib. Matplotlib is another important module and library in Python. It is used for data visualization. Below coding example will get you started. The full documentation on matplotlib is available here: https://matplotlib.org/index.html
# importing matplotlib, matplotlib.pyplot import matplotlib import matplotlib.pyplot as plt # importing pandas and numpy since we want to visualize data stored in data frames import pandas import numpy # import a data set and show the header data_df = pandas.read_csv("oica.csv",sep=",") data_df.head()
year | country | output | |
---|---|---|---|
0 | 2018 | Argentina | 466649 |
1 | 2018 | Austria | 164900 |
2 | 2018 | Belgium | 308493 |
3 | 2018 | Brazil | 2879809 |
4 | 2018 | Canada | 2020840 |
# step 1: create the surface, the figure; allows e.g. setting the size # the figure is like a page; a figure can contain multiple plots, i.e. axes plt.figure(figsize=(10,10))
<Figure size 720x720 with 0 Axes>
<Figure size 720x720 with 0 Axes>
# .subplots() returns the figure and the axes; # the axes are the base coordinates that you plot on; # as stated, a figure can contain multiple axes plt.figure(figsize=(10,10)) plt.subplots()
(<Figure size 432x288 with 1 Axes>, <matplotlib.axes._subplots.AxesSubplot at 0x257b791bb08>)
<Figure size 720x720 with 0 Axes>
# step 2: plot a dot plot, i.e. scatter plot; # add the plot to the axes object plt.figure(figsize=(10,10)) plt.plot(numpy.sort(data_df["output "]),marker="o",markersize=2)
[<matplotlib.lines.Line2D at 0x257b6180108>]
# step 3: add title and axis labels; # -- set figure size plt.figure(figsize=(20,10)) # -- create plot plt.plot(numpy.sort(data_df["output "]), marker="o", markersize=6, linewidth=2, linestyle ="--", color="orange") # -- set title plt.title("Automotive industry annual production output figures", fontdict={"fontname":"Times New Roman", "fontsize":32}) # -- assign xlabel plt.xlabel("data point no.", fontdict={"fontname":"Comic Sans MS", "fontsize":18, "color":"red"}) # -- assign ylabel plt.ylabel("annual production output figure", fontsize=18, color="green") # -- adjust xticks plt.xticks(size=16, color="purple")
(array([-200., 0., 200., 400., 600., 800., 1000.]), <a list of 7 Text xticklabel objects>)
# a more structured way of working with matplotlib is to work with reference handlers # -- set up some data vectors to be plotted y1 = [1,2,3.3,5.1,7] y2 = [2,4,5,5.5,5.75] x = range(0,len(y1)) # -- create an empty picture (i.e. = figure); capture a handler fig = plt.figure() # fig indicates that this is a "figure" # -- create a subplot on the empty picture, i.g. the emtpy figure; capture a handler ax = plt.subplot() # ax indicates that this is a "axes"; the axes is basically the graph # -- creating line plots on onto the axis, using axes handler reference ax.plot(x, y1, label='$y1 = series 1, growing fast',color="black") ax.plot(x, y2, label='$y2 = series 2, growing slowly',color="grey") # -- adding a title to the axes, using axes handler reference ax.set_title('Comparison of two time series', fontsize=18, color="green") # -- add x and y axis labels, using axes handler reference ax.set_xlabel("x axis values", fontsize=14, color="red") ax.set_ylabel("y axis values", fontsize=14, color="purple") # -- add legend, by default within plot frame ax.legend(fontsize=10) # -- add a grid ax.grid(b=True, color="blue", alpha=0.1) # -- show everything plotted in this section up until this point plt.show()
# let's now look at some additional examples; # e.g. we can make histograms using matplotlib # -- importing random to create some random numbers import random # -- use randint() from random to create some random integers x = [] for i in range(0,100): x.append(random.randint(a=0,b=100)) # -- create a figure fig = plt.figure(figsize=(10,10)) # -- add axes to figure ax = plt.subplot() # -- add histogram to axes, using axes object handler ax.hist(x, bins=20, histtype="bar", color="pink") # -- add title to histogram, using axes object handler ax.set_title("a histogram, created with matplotlib in Python", fontsize=22, color="darkgreen") # -- add labels to x and y axis, using axes object handler ax.set_xlabel("observation value range", fontsize=16, color="darkgreen") ax.set_ylabel("absolute frequency", fontsize=16, color="darkgreen") # -- adjust x and y tick labels, using axes object handler # -- also: adjust x and y axis ticks themselves, using axes object handler ax.tick_params(axis="x", size=12, width=5, color="blue", labelsize=20, labelcolor="red") ax.tick_params(axis="y", size=12, width=5, color="blue") # -- show everything plotted in this section, up until this point plt.show()
# another example: 3D surface plot with matplotlib in Python # credit: https://stackoverflow.com/questions/3810865/matplotlib-unknown-projection-3d-error # -- first, some data to plot x = [1,2,3] y = [1,2,3] z = [[1,2,3], [1,2,3], [1,2,3]] # -- create figure, using pyplot fig = plt.figure(figsize=(10,10)) # -- creating axes, using pyplot from mpl_toolkits.mplot3d import axes3d, Axes3D ax = Axes3D(fig) # -- create surface plot ax.contour(x,y,z,extend3d=True)
<matplotlib.contour.QuadContourSet at 0x257b7899f48>
Data scientist focusing on simulation, optimization and modeling in R, SQL, VBA and Python
Leave a Reply