Python

Replacing MATLAB with Python - Part 3: Histograms and more data-indexing

I am sharing how I used Python to complete the second part of an example task originally done with MATLAB. I learned more about slicing dataframes and making histograms with two different libraries

Luis Medina

26 Sep 2022 — 8 min read

Photo by Li Jiangang / Unsplash

Getting used to doing engineering or scientific computations using Python can be tricky, especially if starting with a MATLAB background. Some things might seem actually easier to do once you understand how Python works, and some other things might seem like they require a lot of additional work to get the results you want, compared to MATLAB. That happened to me, particularly when creating figures and plotting data, I struggled to obtain the exact same look as my MATLAB figures.

Speaking of replicating what we do in MATLAB, this is what I’m actually doing with this series of posts about Replacing MATLAB with Python. The last few entries have been centered around an example task that I first solved with MATLAB, and now I am trying to do the same with Python instead. To keep this post short, I’m saving you the explanation of the context and what we are trying to achieve since I’ve already talked about it in the previous two posts.

I hope this kind of examples can be useful if you are also learning Python and trying to do similar things for school, work, or just hobby.

In the previous post, I completed the first part of the task, estimating the vehicle speeds and elapsed times of two drivers, using some raw data. We saw different examples of how to index (slice) pandas dataframes quickly to sort out our data. I also used both Pyplot and Plotly to create figures with subplots of time series. This time, I’ll show you how I completed the second part of the task. This will be a useful exercise to practice making histograms. I will try again both libraries since they are very different and I want to learn both of them.

Let's begin!

Creating histograms with Pyplot

I want to compare how much time the drivers spent at different speeds. A common way to do this is by making a histogram of their speeds.

As with MATLAB, I learned two different ways to do this using the “standard” libraries for Python. In this case, I'm calling Pyplot “standard” only because it is probably the first plotting library most people see and start using when coming from MATLAB. The syntax and logic are quite similar - it is part of Matplotlib.

I will start by creating a figure with two subplots, and will use each subplot to illustrate a different approach - this is just the same thing I did with MATLAB here, I want to highlight the parallelism.

fig, axs = plt.subplots(1,2, sharex=True, figsize = (3000/300, 1000/300), dpi = 300)
fig.suptitle('Different types of histogram using Pyplot (driver speed comparison)', fontweight='bold')

# Define common bins to use
histbins = np.linspace(0, 100, num=20, endpoint=True)

Now I will plot on the left side two histograms at once. To do this, I put the two dataframe columns in a list using [<data1>, <data2>] - remember from the previous posts that we have our data stored in two dataframes, df1 and df2. Then knowing the order of my list, I can define the format consistently for each data series.

# Left side: Side-by-side bars, not normalized
axs[0].set_title('Side-by-side bars', loc = 'center')
axs[0].grid(alpha= 0.3)
axs[0].hist([df1['Vehicle Speed'], df2['Vehicle Speed']], bins = histbins, 
                                                          density=False, 
                                                          histtype='bar', 
                                                          color=['blue', 'red'], 
                                                          label=['Driver 1', 'Driver 2'], 
                                                          edgecolor = 'black')
axs[0].set_ylabel('Counts')
axs[0].set_xlabel('Vehicle speed [km/h]')
axs[0].legend()

Notice that these two histograms are not normalized (density = False) and the counts are represented using side-by-side bars.

Next, on the right side, I will create two overlapping histograms, this time normalized. Here it’s especially important to use common bins, otherwise, the comparison will be skewed. This is why I defined histbins in advance. It is also a good practice to add some transparency using the alpha argument to be able to see the areas where they will overlap.

# Right side: Overlapping and normalized histograms
axs[1].set_title('Overlapping histograms', loc = 'center')
axs[1].grid(alpha= 0.3)
axs[1].hist(df1['Vehicle Speed'], bins = histbins, density=True, 
                                  histtype='bar',  color='blue', 
                                  label='Driver 1', edgecolor = 'black', 
                                  alpha = 0.5)

axs[1].hist(df2['Vehicle Speed'], bins = histbins, density=True, 
                                  histtype='bar',  color='red', 
                                  label='Driver 2', edgecolor = 'black', 
                                  alpha = 0.5)
axs[1].set_ylabel('Probability Density Function (PDF)')
axs[1].set_xlabel('Vehicle speed [km/h]')
axs[1].legend()

The result is very close to the MATLAB figure:

As we’ve already discussed, the second driver was faster (spent more time at higher speeds).

The next thing we want to do is to figure out at what speeds were the two of them braking, i.e. at which vehicle speed they were starting to press the brake pedal at corner entries. This gives us a better idea of what is making them have such different speeds since both of them were using the exact same car (at different moments, of course).

Slicing Pandas dataframes to identify brake pedal pressings

We need to find the speeds at the start of braking. With MATLAB, I first tried doing this with a for-loop (which took ages) and then showed how faster it was to use indexing operations instead. Here, I will jump straight to the indexing part.

As in the MATLAB example, I first defined a low threshold value of pressure to identify the start of pedal presses.

p_start = 2

Then, I want to find the rows that satisfy these three conditions:

The current pressure is lower than the threshold
Pressure in the next instant is greater than the threshold
The vehicle is moving above a minimum speed

To compare the two instants, I made copies of the columns containing the brake pressures and shifted them by one row upwards. This way I have both the current and the next pressure values in the same rows. Then I deleted the last row of each dataframe, since it would contain an empty value due to the column shift.

df1['BrakesShifted'] = df1['BrakeFront'].shift(-1)
df1 = df1[:-1]

df2['BrakesShifted'] = df2['BrakeFront'].shift(-1)
df2 = df2[:-1]

Maybe an unorthodox trick? I don’t know, but it worked great. Having that, I just needed to use the loc method to slice the dataframes using the three conditions mentioned before, and store the results in a new pair of dataframes:

Brake_start1 = df1.loc[(df1.BrakeFront <= p_start) & 
                       (df1.BrakesShifted > p_start) & 
                       (df1['Vehicle Speed'] >= 15) ]

Brake_start2 = df2.loc[(df2.BrakeFront <= p_start) &
                       (df2.BrakesShifted > p_start) & 
                       (df2['Vehicle Speed'] >= 15) ]

Next, I plotted the results to make sure they make sense. This time I will use Plotly to practice using a different library. The approach it uses is very different from Matplotlib. Just remember from the previous post that I imported the library and the methods for working with subplots using

import plotly.graph_objects as go
from plotly.subplots import make_subplots

The first thing to do having imported the library is to create a figure from the graphical objects library. I had to specify in advance that the figure will have secondary axes. To do this, I pass to the function make_subplots an argument consisting of a list of dictionaries (in this case just one, indicating that the secondary axis is enabled). How would I know that? Very simple, I learned it from the documentation.

fig = make_subplots(specs=[[{"secondary_y": True}]])

This created a figure, and now I can add traces for the vehicle speed and brake pressure. I want to overlap the two of them, using a common x-axis (time base). To do this, I added two independent traces, one for each variable, but the data for the x-axis is the same for both.

# Add line plots
fig.add_trace(
    go.Line(y=df1['Vehicle Speed'] ,x=df1.Time_s,name = 'Vehicle Speed'),
    secondary_y=False)
fig.update_yaxes(title_text="Speed [km/h]", range=[0, 80], secondary_y=False)

fig.add_trace(
    go.Line(y=df1['BrakeFront'] ,x=df1.Time_s,name = 'Brakes Pressure'),
    secondary_y=True)
fig.update_yaxes(title_text="Pressure [bar]", range=[0, 50], secondary_y=True, color = 'red')

Notice that I specified the option secondary_y=True for the second set of data (in this case the pressure). In MATLAB, I would have issued a yyaxis right command just before creating the second line plot. The procedure here is simply different.

Next, I wanted to add vertical lines to indicate the instants that I identified as starts of braking maneuvers. If the pressure was rising suddenly starting from those points, and the vehicle speed was plausible for a braking maneuver, then the identified points make sense.

To make the lines, I had to resort to a for-loop this time:

for time in Brake_start1.Time_s:
    fig.add_shape(go.layout.Shape(type = "line",
                                    yref = "y", xref = "x",
                                    x0 = time, y0=0,
                                    x1 = time, y1 = 100))

Shame on me for not having a better method. This won’t let me sleep at night until I figure out how to do it more efficiently. Or maybe not, we’ll see.

To finish, I added some formatting with the following code:

fig.update_xaxes(title_text="Time [s]", range = [700, 750] )
fig.update_layout(title = {
                            'text': '<b>Identified starts of braking maneuvers</b>', 
                            'xanchor' : 'center',
                            'x' : 0.5, 'y': .95, 'font_size' : 12
                            },
                showlegend=False)

fig

The result looks quite good! And the interactive plot can be used in websites, like this one 😃

Creating histograms with Plotly

OK! the points identified as starts of braking maneuvers make sense. Now it’s time to use Plotly to make some histograms.

I created a figure and defined the common bins to use. Notice that the bins are defined using a dictionary:

#Create figure
fig = go.Figure()

#Define common bins
bins = dict(start= 10,
            end=  100,
            size= 5)

Then, I added the traces and formatted the figure:

fig.add_trace(go.Histogram(x=Brake_start1['Vehicle Speed'],
                           xbins = bins, 
                           histnorm='probability density', 
                           opacity=0.9,
                           name = 'Driver 1'))

fig.add_trace(go.Histogram(x=Brake_start2['Vehicle Speed'],
                           xbins = bins, 
                           histnorm='probability density',
                           opacity=0.7,
                           name = 'Driver 2'))                            

fig.update_layout(
    title_text='<b>Vehicle speeds at the start of braking</b>', 
    xaxis_title_text='Vehicle speed [km/h]', 
    yaxis_title_text='Probability Density', 
    bargap=0.2,)

fig.show()

The histograms don’t lie! Again, but this time with Python, we can see that the first driver was braking earlier than the second one. Although they both reached speeds of 80 km/h, it seems that the first driver never made a braking maneuver above 75 km/h. Instead, driver 1 made a significantly greater number of braking maneuvers between 65 and 70 km/h - the peak for driver 2 is around 75 - 80 km/h instead.

Conclusion

Just like that, we have completed the last part of the original post I wrote about doing this with MATLAB. This was a very simplified or superficial analysis, but I think it was a great toy example for me to practice and learn more Python with some specific tasks. I had to filter data, index dataframes, plot time-series data and create different types of histograms. I even used figures with subplots and secondary axes.

I’m starting to embrace the fact that Python is meant to be used in a different way, and things are actually easier now that I focus on learning how to use the language to complete a task, rather than replicating exactly what I did using MATLAB, especially when it comes to plotting data. If I get the same look, then it’s fine. If I don’t, but the results are decent, I don’t stress too much about formatting - unless it is strictly required.

I really enjoyed making this series of posts, and I hope you have found some useful things here. Anyways, the Replacing MATLAB with Python series does not finish here, I am just making some space for other articles I have prepared, and will certainly add other posts for this series in the future. Perhaps more related to the original subject of this blog, which is actually making stuff. We’ll see.

Thank you for reading this and stay tuned for the next one!

Cheers!

The Pythonic way of doing things

One of the most popular things about Python is its readability. When the language is being used properly, the code is more efficient, and it almost reads like plain English. The best way to achieve this is to follow a set of guidelines for coding style, which are known as the “Pythonic” way. I will try to learn this and apply it to my code as I go. I think it makes you think more in terms of code efficiency and make better use of the language.