Introduction to Data Visualization

Bar Chart: A bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars with heights or lengths proportional to the values that they represent. The bars can be plotted vertically or horizontally. A vertical bar chart is sometimes called a line graph.

A bar graph is a nice way to display categorical data.

Below is the pictorial representation for the bar chart

[codesyntax lang=”python”]

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


[codesyntax lang=”python”]

prob={"X_axis finite discrete random variables":['x0=1','x1=5','x2=10','x3=10','x4=5','x5=1'],'col2':[1/32,5/32,10/32,10/32,5/32,1/32]}
a=sns.barplot(x='X_axis finite discrete random variables',y='col2',data=prob).set_ylabel('1/32                      5/32                        10/32')

Histogram: A histogram is an accurate representation of the distribution of numerical data. It differs from a bar graph, in the sense that a bar graph relates two variables, but a histogram relates only one. To construct a histogram, the first step is to “bin” (or “bucket”) the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent, and are often (but are not required to be) of equal size.

A histogram is a best way to display continuous data.

Below is the pictorial representation for the histogram:

[codesyntax lang=”python”]

ax.set_ylabel("Number of people")


Box whisker plot: A box and whisker plot—also called a box plot—displays the five-number summary of a set of data. The five-number summary is the minimum, first quartile, median, third quartile, and maximum.

In a box plot, we draw a box from the first quartile to the third quartile. A vertical line goes through the box at the median. The whiskers go from each quartile to the minimum or maximum.

Box plots are non-parametric: they display variation in samples of a statistical population without making any assumptions of the underlying statistical distribution.

Here is the example of the box whisker plot:

[codesyntax lang=”python”]



Line plot: This shows the trend of the data. The scale is very important when comparing two or more line plots.

A line chart allows us to track the development of several variables at the same time. It is best to use a line plot when comparing fewer than 25 numbers.

[codesyntax lang=”python”]

               'price':[10,11,10,12,13,12,12.5,17,15,16,17,15], \

plt=sns.lineplot(x='month',y='price',data=sales_df,sort=False,sizes=[1,20]).set(ylim=(0, 20))
plt=sns.lineplot(x='month',y='sales',data=sales_df,sort=False,sizes=[1,20]).set(ylim=(0, 20))


Scatter plot: A scatter plot is a two-dimensional data visualization that uses dots to represent the values obtained for two different variables – one plotted along the x-axis and the other plotted along the y-axis.

Below is the example for the scatter plot:


[codesyntax lang=”python”]

rng = np.random.RandomState(0)
x = rng.randn(100)
y = rng.randn(100)
colors = rng.rand(100)

plt.scatter(x, y, c=colors, alpha=0.9, cmap='viridis')




Vamsi Krishna Yadav Chukka


1 thought on “Introduction to Data Visualization for Data Scientist”

  1. The first 4 options you have mentioned are all BI tools and not necessarily for visualization. You need to create the cubes and the infrastructure to get the reports out of them.

Leave a Reply

Your email address will not be published. Required fields are marked *

DataJango - Chatbot