Sunday, December 23, 2018

PyPlot - Best Line Fit


It's a pretty common need, a scatter plot of data points and generating a 'best fit line' .



#!/usr/bin/python
import random;
import matplotlib.pyplot as plt
import numpy as np;

def run():
  L=[];
  for x in range(0,500):
    y=random.randint(0,200) + x*3;
    L.append((x,y));

  out = [(float(x), float(y)) for x, y in L];
  for i in out:
     plt.scatter(i[0],i[1]);
     plt.xlabel('X');
     plt.ylabel('Y');
     plt.title('My Title');

  plt.show();

#---main---
run();

The above python snippet generates a scatterplot of data points around the line segment y=3x, applying a random dY.  Take a peek at the scatterplot, and it becomes clear that it follows a linear progression.


Generating a best-fit line is done by:
1) splitting the (x,y) tuples into a list of X values and a list of Y values.
2) plotting a best-fit line using the list of X and list of Y values.

This is done with the following plot command; plotting a red ('r') line, with line width ('lw') of 5:

  plt.plot(np.unique(Lx), np.poly1d(np.polyfit(Lx, Ly, 1))(np.unique(Lx)), lw=5, color='r');

The final script looks like this;


#!/usr/bin/python
import random;
import matplotlib.pyplot as plt
import numpy as np;

def run():
  L=[];
  for x in range(0,500):
    y=random.randint(0,200) + x*3;
    L.append((x,y));

  out = [(float(x), float(y)) for x, y in L];
  for i in out:
     plt.scatter(i[0],i[1]);
     plt.xlabel('X');
     plt.ylabel('Y');
     plt.title('My Title');

  Lx= [ x[0] for x in L ]
  Ly= [ x[1] for x in L ]
  plt.plot(np.unique(Lx), np.poly1d(np.polyfit(Lx, Ly, 1))(np.unique(Lx)), lw=5, color='r');

  plt.show();

#---main---
run();





Cheers.

No comments:

Post a Comment