fn_babs_groupby_multiindex

Practical Data Analysis in Python - Grouping and Aggregating Data

Grouping and querying data

Now we will slice and dice our dataset using grouping, aggregation functions, summary statistics, and various querying and indexing operations. To really understand the bike sharing data, we need to look at natural groupings such as starting landmark, subscription type, and the time of day of the start of a bicycle trip. The work done in this section will set up some beautiful and informative plots in the next section of the analysis. Most of the tables below have plotting representations in the visualization section.

Grouping and aggregating or reducing data and obtaining summary measures and statistics is the heart of data analysis and work together with the visualization process including those of an exploratory nature. The iterative process of data discovery at best leads to new questions and insights being discovered which inform and refine the data carpentry process. This is where the interactive repl excels. Think of it as a circular inquistive process of data munging, grouping and aggregating, and visualizing. We will explore using SQL inspired methods of grouping operations in Pandas and the use of mult-indexes and hierarchial indexing.

In [2]:
import numpy as np
import pandas as pd
from datetime import datetime, time
In [3]:
%matplotlib inline
import seaborn as sns
from ggplot import *
In [8]:
print pd.__version__
print np.__version__
0.14.1
1.9.0

In [4]:
%load_ext ipycache

Grouping and aggregating

It's time to explore various ways of grouping and aggregating data. Let's review some terminology. Dockcount refers to the number of docks installed, subscription type is either customer or subscriber, and landmark refers to an area covered by San Francisco, San Jose, Redwood City, or Palo Alto.

In the data munging notebook, we created a column indicating whether the duration of a segment of a bicycle trip was under thirty minutes or over thirty minutes as this is the alloted time before being charged overtime fees. So, what percentage of total trips were over thirty minutes?

What percentage of total trips were over thirty minutes?

In [88]:
dmerge4.thirtymin.value_counts()
Out[88]:
in_thirty      134912
over_thirty      9103
dtype: int64

Only 6% of total rides had a duration over thirty minutes so it looks like people are good about returning bikes on time. Or, were they? When we look at the statistics based on landmark and subscriber type, we will get a better picture of who and where have longer duration times.

How does average duration vary by time of day for trips both under thirty minutes and over thirty minutes?

In [6]:
dmerge4.groupby(['thirtymin','timeofday'])[['duration_f']].mean()
Out[6]:
duration_f
thirtymin timeofday
in_thirty evening 9.736020
mid_day 10.387775
morning 8.914918
night 9.641814
wee_hours 10.150168
over_thirty evening 161.593282
mid_day 164.873060
morning 218.527759
night 271.402983
wee_hours 345.829388

Here is another way to check on how many observations out of the total had trip durations over thirty minutes

In [94]:
len(dmerge4[dmerge4['thirtymin']=='over_thirty'])
Out[94]:
9103

How many trips began and ended at the same station?

Only 6878 out 144015 observations or 4.8% were round trips so it would seem that people are really using the bicycles to get from point A to point B

In [96]:
len(dmerge4[dmerge4['startstation']== dmerge4['endstation']])
Out[96]:
6878

Let's look at average duration by subscription type. Customers are defined has having either a twenty-four hour or three day pass while Subscribers are defined as having an annual pass. Average durations for subscribers is almost ten minutes while it is sixty minutes for customers. Perhaps customers are recreational riders who take longer trips while subscribers are shorter distance commuters. We'll learn more when we slice and dice the data later.

What is average duration by subscription type?

In [97]:
dmerge4.groupby(['subscriptiontype']).agg({'duration_f' : np.mean})
Out[97]:
duration_f
subscriptiontype
Customer 60.491974
Subscriber 9.832834

How many observations in the dataset belong to each subscription type?

In [6]:
dmerge4.groupby('subscriptiontype').size()
Out[6]:
subscriptiontype
Customer             30368
Subscriber          113647
dtype: int64

What landmark occurs the most frequently among subscription type of subscribers?

In [98]:
dmerge4[dmerge4['subscriptiontype'] =="Subscriber"].landmark.value_counts()
Out[98]:
San Francisco    102735
San Jose           7219
Mountain View      2157
Palo Alto           905
Redwood City        631
dtype: int64

So, we've learned that the majority of total subscribers are in San Francisco and take very short trips.

What are the top five start stations?

In [99]:
dmerge4.groupby('startstation').duration.sum().order(ascending=False)[:5]
Out[99]:
startstation
Harry Bridges Plaza (Ferry Building)        12579544
Embarcadero at Sansome                      10294998
San Francisco Caltrain (Townsend at 4th)     8457338
Market at 4th                                7706843
Powell Street BART                           6100023
Name: duration, dtype: int64

What landmark has the highest total duration?

The table below indicates that San Francisco had the highest total duration but keep in mind that the majority of total observations are also in San Francisco so of course the total would be higher. Using summary statistics will give a better idea of bicycling behavior than looking at total values. Although, for some purposes, it is good to know the total as well. For example, bicycle wear and tear based on total durations.

In [104]:
dmerge4.groupby("landmark")['duration_i','subscriptiontype'].aggregate(np.sum)
Out[104]:
duration_i
landmark
Mountain View 121895
Palo Alto 131810
Redwood City 38924
San Francisco 2454177
San Jose 207658
In [5]:
%%cache station.pkl stationdata
stationdata

What percentage of total bicycle docks area located in each landmark?

54% of all docks are located in San Francisco so this may partially explain why total duration and bicycle trips taken in San Francisco are higher and why durations are higher in Palo Alto where only 6% of the docks are located.

In [10]:
stationdata.groupby('landmark')[['dockcount']].sum()/stationdata[['dockcount']].sum()
Out[10]:
dockcount
landmark
Mountain View 0.095823
Palo Alto 0.061425
Redwood City 0.094185
San Francisco 0.544636
San Jose 0.203931

What are the dockcounts by landmark and station name?

In [6]:
stationdata.groupby(['landmark','name'])[['dockcount']].sum()
Out[6]:
dockcount
landmark name
Mountain View Castro Street and El Camino Real 11
Evelyn Park and Ride 15
Mountain View Caltrain Station 23
Mountain View City Hall 15
Rengstorff Avenue / California Street 15
San Antonio Caltrain Station 23
San Antonio Shopping Center 15
Palo Alto California Ave Caltrain Station 15
Cowper at University 11
Palo Alto Caltrain Station 23
Park at Olive 15
University and Emerson 11
Redwood City Broadway at Main 15
Franklin at Maple 15
Mezes Park 15
Redwood City Caltrain Station 25
Redwood City Medical Center 15
Redwood City Public Library 15
San Mateo County Center 15
San Francisco 2nd at Folsom 19
2nd at South Park 15
2nd at Townsend 27
5th at Howard 15
Beale at Market 19
Broadway St at Battery St 15
Civic Center BART (7th at Market) 23
Clay at Battery 15
Commercial at Montgomery 15
Davis at Jackson 15
Embarcadero at Bryant 15
Market at Sansome 27
Mechanics Plaza (Market at Battery) 19
Post at Kearney 19
Powell Street BART 19
Powell at Post (Union Square) 19
San Francisco Caltrain (Townsend at 4th) 19
San Francisco Caltrain 2 (330 Townsend) 23
San Francisco City Hall 19
South Van Ness at Market 19
Spear at Folsom 19
Steuart at Market 23
Temporary Transbay Terminal (Howard at Beale) 23
Townsend at 7th 15
Washington at Kearney 15
Yerba Buena Center of the Arts (3rd @ Howard) 19
San Jose Adobe on Almaden 19
Arena Green / SAP Center 19
Japantown 15
MLK Library 19
Paseo de San Antonio 15
SJSU - San Salvador at 9th 15
SJSU 4th at San Carlos 19
San Jose City Hall 15
San Jose Civic Center 15
San Jose Diridon Caltrain Station 27
San Jose Government Center 15
San Pedro Square 15
San Salvador at 1st 15
Santa Clara at Almaden 11
St James Park 15

69 rows × 1 columns

What are the total number of bicycle trips by landmark?

When looking at number of bicycle trips based on landmark, we see that 129,853 or 90% of total observations in the dataset were in San Francisco. 6% of total observations were in San Jose while only 1% were in Palo Alto.

In [102]:
dmerge4.groupby("landmark")[['duration_i']].aggregate(np.size)
Out[102]:
duration_i
landmark
Mountain View 2728
Palo Alto 1706
Redwood City 793
San Francisco 129853
San Jose 8935

What are average durations by landmark?

Looking at average duration by landmark, San Francisco has the lowest duration followed by San Jose while Palo Alto had the highest duration. We can hypothesize that this may be partly due to fewer docks in Palo Alto or greater distances to travel as well as the subscription profile of Palo Alto bicycle users.

In [11]:
dmerge4.groupby("landmark")[['duration_i']].aggregate(np.mean)
Out[11]:
duration_i
landmark
Mountain View 44.682918
Palo Alto 77.262603
Redwood City 49.084489
San Francisco 18.899656
San Jose 23.240963

What are average duration and dockcount by landmark and time of day for first six months of the bike share system?

In [106]:
timelandmark = dmerge4.groupby(['timeofday','landmark']).agg({'dockcount' : np.mean, 'duration_f': np.mean,  }).reset_index()
timelandmark

timelandmark.dockcount.max()
Out[106]:
timeofday landmark duration_f dockcount
0 evening Mountain View 34.096655 18.079393
1 evening Palo Alto 66.580671 14.565495
2 evening Redwood City 50.191700 18.198381
3 evening San Francisco 16.224655 19.349130
4 evening San Jose 18.737437 17.385234
5 mid_day Mountain View 82.030353 18.465995
6 mid_day Palo Alto 109.684100 14.439331
7 mid_day Redwood City 76.925088 17.982456
8 mid_day San Francisco 27.520457 19.436506
9 mid_day San Jose 23.002496 16.776488
10 morning Mountain View 29.786291 19.715105
11 morning Palo Alto 52.318829 16.978308
12 morning Redwood City 22.378051 22.647059
13 morning San Francisco 15.172821 19.948991
14 morning San Jose 18.945759 19.782161
15 night Mountain View 126.402177 20.061224
16 night Palo Alto 83.482982 14.017544
17 night Redwood City 65.530455 20.000000
18 night San Francisco 19.055272 19.432370
19 night San Jose 35.876195 18.286119
20 wee_hours Mountain View 81.724706 21.117647
21 wee_hours Palo Alto 150.501852 16.185185
22 wee_hours Redwood City 13.585000 20.000000
23 wee_hours San Francisco 47.574147 18.926380
24 wee_hours San Jose 127.363475 17.474576

Where and at when was the highest average dockcount?

The highest dockcount was in Redwood City in the morning.

In [107]:
timelandmark.dockcount.max()
Out[107]:
22.647058823529413

What are the top five longest average duration grouped by time of day and landmark?

The highest average duration was in Palo Alto in the wee hours.

In [113]:
timelandmark.duration_f.order(ascending=False)[:5]
Out[113]:
21    150.501852
24    127.363475
15    126.402177
6     109.684100
16     83.482982
Name: duration_f, dtype: float64

What are the average dockcount and durations by time of day and landmark for the first six months of the bike share system?

Pandas produces nice readable multi-index tables.

In [33]:
np.round(dmerge4.groupby(['timeofday','landmark']).agg({'dockcount' : np.mean, 'duration_i': np.mean,  }))
Out[33]:
dockcount duration_i
timeofday landmark
evening Mountain View 18 34
Palo Alto 15 67
Redwood City 18 50
San Francisco 19 16
San Jose 17 19
mid_day Mountain View 18 82
Palo Alto 14 110
Redwood City 18 77
San Francisco 19 28
San Jose 17 23
morning Mountain View 20 30
Palo Alto 17 52
Redwood City 23 22
San Francisco 20 15
San Jose 20 19
night Mountain View 20 126
Palo Alto 14 83
Redwood City 20 66
San Francisco 19 19
San Jose 18 36
wee_hours Mountain View 21 82
Palo Alto 16 151
Redwood City 20 14
San Francisco 19 48
San Jose 17 127

Grouping and aggregating time-series data

Let's obtain mean values by day of the month by creating a grouped object from the day component of the timestamp and applying an aggregation function. We want to know if there is higher demand or trip duration at certain days of the month.

In [8]:
day_means = dmerge4.groupby('day').aggregate(np.mean)
day_means
Out[8]:
tripid duration startterminal endterminal bike max_temperature_f mean_temperature_f min_temperaturef max_dew_point_f meandew_point_f station_id lat long dockcount hour month hourdiff durationminutes duration_i duration_f
day
1 73878.288738 1816.236080 56.575208 56.547493 432.526833 69.336105 60.620559 51.125976 51.413706 46.602923 56.575208 37.743179 -122.356093 19.177879 13.309146 8.684051 0.312169 30.270601 30.271353 30.270668
2 80351.443674 1273.880168 57.369151 57.382025 433.812825 66.287447 58.723942 50.482298 51.430800 47.488240 57.369151 37.747079 -122.360427 19.446150 13.231245 8.443179 0.175043 21.231336 21.227532 21.231300
3 98238.235528 1135.100522 57.273354 57.356113 436.655799 63.523302 56.361338 49.060397 47.898433 42.093626 57.273354 37.743680 -122.357780 19.405016 13.084222 7.262696 0.238662 18.918342 18.918704 18.918305
4 94721.030403 1101.818939 57.161696 57.186871 437.853021 64.680093 56.829783 48.094694 44.157242 39.132843 57.161696 37.742673 -122.356006 19.465918 13.127808 8.324942 0.201588 18.363649 18.365414 18.363654
5 95326.050161 1173.435091 56.902045 56.801292 432.594833 65.471259 56.496878 47.178902 44.220883 37.509580 56.902045 37.742166 -122.354954 19.422390 13.254682 8.448439 0.231001 19.557252 19.561895 19.557175
6 94048.015109 1202.085258 57.029570 57.127131 439.644507 68.017052 58.334556 48.195985 49.679258 42.862724 57.029570 37.744666 -122.357646 19.334988 13.356357 7.755666 0.209152 20.034754 20.031945 20.034757
7 89738.111635 1146.163488 57.201200 57.143561 438.932719 71.251125 62.068138 52.430684 52.893936 45.455967 57.201200 37.745178 -122.358731 19.379687 13.105207 7.541033 0.214699 19.102725 19.097493 19.102704
8 84325.793095 1386.051549 57.085836 57.125325 436.400804 67.203358 59.201939 50.980137 51.354694 47.276425 57.085836 37.746047 -122.359317 19.342871 13.099551 8.004493 0.278080 23.100859 23.105226 23.100865
9 87579.505991 1144.179954 57.035023 57.111521 439.260829 64.475576 56.658986 48.722811 47.882949 43.535714 57.035023 37.744060 -122.357155 19.301843 13.283641 8.159217 0.232028 19.069666 19.067512 19.069680
10 100164.634270 941.555230 57.384660 57.602459 434.971507 63.406518 56.257026 48.694184 49.707650 46.169789 57.384660 37.746870 -122.359709 19.375878 13.193404 7.282397 0.194770 15.692587 15.689500 15.692576
11 96293.107869 982.990743 57.493862 57.524452 436.820487 62.358221 56.467498 50.252767 48.957939 46.354397 57.493862 37.746363 -122.359027 19.456027 13.155162 8.425639 0.219763 16.383179 16.385389 16.383139
12 101958.670921 1180.773218 57.565482 57.644021 441.782250 63.251522 56.464559 49.364422 50.882388 46.859415 57.565482 37.748906 -122.361073 19.408404 13.290006 8.206362 0.233458 19.679554 19.679560 19.679543
13 108374.918449 1113.472481 57.505760 57.708173 439.695922 65.535564 57.216310 48.298775 50.977144 46.897605 57.505760 37.747396 -122.361078 19.364600 13.081185 7.314683 0.224173 18.557875 18.563357 18.557855
14 107914.239321 1331.676583 57.442601 57.464912 435.837338 68.113272 59.215294 50.194889 49.682113 45.583715 57.442601 37.747588 -122.360382 19.458810 13.210717 7.004195 0.222349 22.194610 22.194699 22.194594
15 100907.198866 1433.743385 57.403612 57.506510 440.304704 71.000420 61.042839 50.341033 47.715456 41.582108 57.403612 37.751215 -122.364489 19.319194 13.206636 7.464301 0.265855 23.895723 23.896472 23.895792
16 101172.588106 1202.279385 58.006985 57.971463 437.107763 71.047895 60.637597 49.693474 49.862303 42.463381 58.006985 37.750837 -122.364332 19.475354 13.297945 7.656955 0.217322 20.037990 20.040511 20.038014
17 99680.420043 1112.376430 57.745106 57.859081 438.638884 68.343090 59.417523 49.817794 49.477418 44.306842 57.745106 37.749572 -122.362625 19.422563 13.311882 7.760225 0.187633 18.539607 18.532468 18.539643
18 103648.507761 1175.414987 57.896343 57.921677 439.719536 65.938805 57.212310 48.155219 50.161285 45.867440 57.896343 37.749978 -122.363236 19.535593 13.156824 8.155754 0.191258 19.590250 19.589652 19.590268
19 109548.449822 1014.061177 57.204903 57.208045 434.562958 67.482296 59.006704 50.093861 51.146868 45.393882 57.204903 37.745790 -122.358802 19.439556 12.929395 7.885397 0.254766 16.901020 16.898387 16.901001
20 112970.702050 1087.413808 56.822654 57.027832 436.416613 65.947573 56.888457 47.428479 50.418554 45.236893 56.822654 37.740981 -122.353057 19.434520 13.388997 7.395901 0.189213 18.123563 18.130744 18.123556
21 121094.538382 954.694196 56.866029 56.913252 435.960266 65.016434 56.554816 47.660495 49.190347 43.927606 56.866029 37.740547 -122.353294 19.523403 13.082172 6.843561 0.169128 15.911570 15.910547 15.911610
22 111480.550011 1387.006294 56.852102 57.083390 429.968982 66.873455 58.093504 48.799506 47.600135 42.593392 56.852102 37.743398 -122.355859 19.332209 13.213082 7.225893 0.271297 23.116772 23.113059 23.116777
23 104470.365624 1258.963995 56.603943 56.477925 435.362195 67.403986 58.234248 48.505787 49.957565 44.531719 56.603943 37.737760 -122.349702 19.325761 13.089156 7.512859 0.253751 20.982733 20.980283 20.982722
24 114953.246078 1067.051404 57.187242 57.069777 437.834228 65.759496 57.860652 49.440132 50.049752 46.008258 57.187242 37.741583 -122.354357 19.540875 13.002064 6.421965 0.200041 17.784190 17.784476 17.784242
25 107939.261435 1299.720000 57.650335 57.680383 440.331866 64.618756 57.094163 48.879234 49.399043 45.445550 57.650335 37.750310 -122.363495 19.331866 13.258947 7.657225 0.262201 21.662000 21.659522 21.661989
26 101270.468742 1384.907149 57.155632 57.270905 438.265102 66.128198 57.792931 49.027697 48.403060 44.431285 57.155632 37.744375 -122.357659 19.326035 13.073068 8.448958 0.225007 23.081786 23.084674 23.081862
27 121824.804303 1052.402609 57.767910 57.717098 432.240787 65.005264 57.621881 50.223850 48.471275 45.037308 57.767910 37.747197 -122.359888 19.470588 13.196384 6.535134 0.177386 17.540043 17.534218 17.540053
28 120628.722106 1437.819277 57.629387 57.609481 439.961498 65.838397 58.754584 51.148245 52.253798 47.578575 57.629387 37.748278 -122.361829 19.413305 13.337873 6.576218 0.270299 23.963655 23.969094 23.963620
29 88695.445887 1615.017825 57.283422 57.482047 433.769544 67.336389 60.667176 53.387319 53.541380 49.878024 57.283422 37.746765 -122.360245 19.294882 13.633308 7.565317 0.260250 26.916964 26.918513 26.916883
30 87779.856531 1533.849251 57.189507 57.323769 440.363812 66.780514 59.447323 51.803640 53.407709 49.461456 57.189507 37.745982 -122.358819 19.537473 13.335974 7.986938 0.234904 25.564154 25.560385 25.564158
31 98845.505424 1693.046905 57.640715 57.788449 437.254627 66.005105 57.257498 48.310785 49.848117 44.868220 57.640715 37.748891 -122.362839 19.306318 13.128909 7.504467 0.272176 28.217448 28.219847 28.217553

31 rows × 36 columns

We can also create a grouped object using Pandas grouper. We can actually specify a resampling and a grouping using the Grouper. So, in this case I want the frequency to be monthly and group by landmark and subscription type.

In [114]:
three_grouper = dmerge4.groupby([pd.Grouper(freq='M',key='startdate'),"landmark","subscriptiontype"])[['duration_f']].mean().reset_index()
three_grouper
Out[114]:
startdate landmark subscriptiontype duration_f
0 2013-08-31 Mountain View Customer 39.107407
1 2013-08-31 Mountain View Subscriber 108.082692
2 2013-08-31 Palo Alto Customer 43.618750
3 2013-08-31 Palo Alto Subscriber 16.942083
4 2013-08-31 Redwood City Customer 101.255000
5 2013-08-31 Redwood City Subscriber 9.770526
6 2013-08-31 San Francisco Customer 67.455091
7 2013-08-31 San Francisco Subscriber 12.192543
8 2013-08-31 San Jose Customer 56.080820
9 2013-08-31 San Jose Subscriber 11.547216
10 2013-09-30 Mountain View Customer 182.846638
11 2013-09-30 Mountain View Subscriber 8.582500
12 2013-09-30 Palo Alto Customer 160.404739
13 2013-09-30 Palo Alto Subscriber 9.989910
14 2013-09-30 Redwood City Customer 70.426667
15 2013-09-30 Redwood City Subscriber 6.197643
16 2013-09-30 San Francisco Customer 46.922383
17 2013-09-30 San Francisco Subscriber 10.614401
18 2013-09-30 San Jose Customer 72.983308
19 2013-09-30 San Jose Subscriber 9.735506
20 2013-10-31 Mountain View Customer 112.790000
21 2013-10-31 Mountain View Subscriber 12.124853
22 2013-10-31 Palo Alto Customer 83.578580
23 2013-10-31 Palo Alto Subscriber 12.032308
24 2013-10-31 Redwood City Customer 131.333600
25 2013-10-31 Redwood City Subscriber 28.843174
26 2013-10-31 San Francisco Customer 52.653610
27 2013-10-31 San Francisco Subscriber 9.647517
28 2013-10-31 San Jose Customer 104.471317
29 2013-10-31 San Jose Subscriber 8.921913
40 2013-12-31 Mountain View Customer 240.204133
41 2013-12-31 Mountain View Subscriber 11.058692
42 2013-12-31 Palo Alto Customer 145.392346
43 2013-12-31 Palo Alto Subscriber 11.520840
44 2013-12-31 Redwood City Customer 323.863000
45 2013-12-31 Redwood City Subscriber 3.947755
46 2013-12-31 San Francisco Customer 57.743898
47 2013-12-31 San Francisco Subscriber 9.461105
48 2013-12-31 San Jose Customer 42.284125
49 2013-12-31 San Jose Subscriber 20.031462
50 2014-01-31 Mountain View Customer 290.265333
51 2014-01-31 Mountain View Subscriber 6.763415
52 2014-01-31 Palo Alto Customer 114.681616
53 2014-01-31 Palo Alto Subscriber 31.803939
54 2014-01-31 Redwood City Customer 297.106923
55 2014-01-31 Redwood City Subscriber 7.313182
56 2014-01-31 San Francisco Customer 54.000413
57 2014-01-31 San Francisco Subscriber 9.358774
58 2014-01-31 San Jose Customer 53.236957
59 2014-01-31 San Jose Subscriber 8.794458
60 2014-02-28 Mountain View Customer 152.230787
61 2014-02-28 Mountain View Subscriber 6.197237
62 2014-02-28 Palo Alto Customer 164.791429
63 2014-02-28 Palo Alto Subscriber 10.650678
64 2014-02-28 Redwood City Customer 342.915000
65 2014-02-28 Redwood City Subscriber 5.328372
66 2014-02-28 San Francisco Customer 59.656208
67 2014-02-28 San Francisco Subscriber 9.561857
68 2014-02-28 San Jose Customer 125.013058
69 2014-02-28 San Jose Subscriber 8.391254

70 rows × 4 columns

By unstacking the data created in the grouper object, we now have duration by customer and subscriber by landmark. This is a nice way to present the data in tabular format.

In [6]:
three_grouper_unstack = dmerge4.groupby([pd.Grouper(freq='M',key='startdate'),"landmark","subscriptiontype"])[['duration']].mean().unstack().reset_index()
three_grouper_unstack.head()
Out[6]:
startdate landmark duration
subscriptiontype Customer Subscriber
0 2013-08-31 Mountain View 2346.444444 6485.000000
1 2013-08-31 Palo Alto 2617.075000 1016.541667
2 2013-08-31 Redwood City 6075.250000 586.157895
3 2013-08-31 San Francisco 4047.301633 731.557012
4 2013-08-31 San Jose 3364.852459 692.814433
In [7]:
three_grouper_unstack = dmerge4.groupby([pd.Grouper(freq='M',key='startdate'),"landmark","subscriptiontype"])[['duration']].mean().unstack()
three_grouper_unstack.head()
Out[7]:
duration
subscriptiontype Customer Subscriber
startdate landmark
2013-08-31 Mountain View 2346.444444 6485.000000
Palo Alto 2617.075000 1016.541667
Redwood City 6075.250000 586.157895
San Francisco 4047.301633 731.557012
San Jose 3364.852459 692.814433

Average duration by month faceted by landmark

In [12]:
grouperl = dmerge4.groupby([pd.Grouper(freq='M',key='startdate'),'landmark']).mean()[['duration_f']]
grouperl.head(10)
Out[12]:
duration_f
startdate landmark
2013-08-31 Mountain View 72.944340
Palo Alto 33.615000
Redwood City 25.680870
San Francisco 44.081851
San Jose 28.740570
2013-09-30 Mountain View 67.345988
Palo Alto 108.553665
Redwood City 17.353211
San Francisco 24.351433
San Jose 29.502793

Grouping on a variable with many values is also possible. In this case, we can see summary statistics on every single duration as an example.

In [25]:
dmerge4.groupby([pd.Grouper(freq='M',key='startdate'),'duration_f']).sum()[:5]
Out[25]:
tripid duration startterminal endterminal bike max_temperature_f mean_temperature_f min_temperaturef max_dew_point_f meandew_point_f station_id lat long dockcount day hour month hourdiff durationminutes duration_i
startdate duration_f
2013-08-31 1.03 7416 62 60 60 511 71 64 57 57 56 60 37.804770 -122.403234 15 31 20 8 0 1.033333 1
1.05 4576 63 66 66 520 74 68 61 61 58 66 37.774814 -122.418954 19 29 14 8 0 1.050000 1
1.13 7250 68 35 35 100 76 69 62 61 59 35 37.444521 -122.163093 11 31 16 8 0 1.133333 1
1.15 13518 138 122 122 965 149 133 117 118 114 122 75.590069 -244.799470 34 61 35 16 0 2.300000 2
1.17 4607 70 10 10 661 81 72 63 62 61 10 37.337391 -121.886995 15 29 14 8 0 1.166667 1

5 rows × 36 columns

We can be very specific and the level of granularity can be very detailed. We can pinpoint the average duration for a very specific profile or demographic which can be very powerful for a data-driven decision making including marketing or civic infrastructure goals for the bike sharing system.

This table gives an indication of monthly durations based on starting landmark and endstation. In other words, we get an idea of how long it takes to reach by bicycle and the differences between customers and subscribers all in one table. We can identify which customers are keeping the bicycles for longer periods of time.

In [39]:
np.round(dmerge4.groupby([pd.Grouper(freq='M',key='startdate'),"landmark","subscriptiontype","endstation"])[['duration_f']].mean())
Out[39]:
duration_f
startdate landmark subscriptiontype endstation
2013-08-31 Mountain View Customer Evelyn Park and Ride 26
Mountain View Caltrain Station 61
Mountain View City Hall 11
Park at Olive 34
Rengstorff Avenue / California Street 16
San Antonio Caltrain Station 52
Subscriber Evelyn Park and Ride 22
Mountain View Caltrain Station 8
Mountain View City Hall 215
Park at Olive 453
Rengstorff Avenue / California Street 10
San Antonio Caltrain Station 15
Palo Alto Customer California Ave Caltrain Station 19
Cowper at University 24
Palo Alto Caltrain Station 105
Park at Olive 17
Rengstorff Avenue / California Street 25
San Antonio Caltrain Station 17
University and Emerson 62
Subscriber California Ave Caltrain Station 13
Cowper at University 4
Palo Alto Caltrain Station 25
Park at Olive 15
San Antonio Caltrain Station 30
University and Emerson 17
Redwood City Customer Franklin at Maple 31
Palo Alto Caltrain Station 279
Redwood City Caltrain Station 64
Subscriber Broadway at Main 4
Franklin at Maple 7
2014-02-28 San Francisco Subscriber Yerba Buena Center of the Arts (3rd @ Howard) 8
San Jose Customer Adobe on Almaden 28
Arena Green / SAP Center 73
Japantown 19
MLK Library 360
Paseo de San Antonio 26
SJSU - San Salvador at 9th 19
SJSU 4th at San Carlos 236
San Jose City Hall 56
San Jose Civic Center 106
San Jose Diridon Caltrain Station 177
San Pedro Square 181
San Salvador at 1st 72
Santa Clara at Almaden 37
St James Park 87
Subscriber Adobe on Almaden 5
Arena Green / SAP Center 5
Japantown 11
MLK Library 18
Paseo de San Antonio 8
SJSU - San Salvador at 9th 12
SJSU 4th at San Carlos 6
San Jose City Hall 10
San Jose Civic Center 8
San Jose Diridon Caltrain Station 7
San Jose Government Center 19
San Pedro Square 7
San Salvador at 1st 6
Santa Clara at Almaden 5
St James Park 9

1001 rows × 1 columns

Let's look at the breakdown of average duration by both landmark and subscription type. We can see some clear differences between customers and subscribers. In most cases, subscribers are taking short trips.

In [40]:
np.round(dmerge4.groupby([pd.Grouper(freq='M',key='startdate'),"landmark","subscriptiontype"])[['duration_f']].mean()).unstack()
Out[40]:
duration_f
subscriptiontype Customer Subscriber
startdate landmark
2013-08-31 Mountain View 39 108
Palo Alto 44 17
Redwood City 101 10
San Francisco 67 12
San Jose 56 12
2013-09-30 Mountain View 183 9
Palo Alto 160 10
Redwood City 70 6
San Francisco 47 11
San Jose 73 10
2013-10-31 Mountain View 113 12
Palo Alto 84 12
Redwood City 131 29
San Francisco 53 10
San Jose 104 9
2013-11-30 Mountain View 135 6
Palo Alto 249 10
Redwood City 175 6
San Francisco 59 10
San Jose 81 8
2013-12-31 Mountain View 240 11
Palo Alto 145 12
Redwood City 324 4
San Francisco 58 9
San Jose 42 20
2014-01-31 Mountain View 290 7
Palo Alto 115 32
Redwood City 297 7
San Francisco 54 9
San Jose 53 9
2014-02-28 Mountain View 152 6
Palo Alto 165 11
Redwood City 343 5
San Francisco 60 10
San Jose 125 8

Let's look at each level of duration on an hourly basis.

In [43]:
dmerge4.groupby([pd.Grouper(freq='H',key='startdate'),'duration_f']).mean().head()
Out[43]:
tripid duration startterminal endterminal bike max_temperature_f mean_temperature_f min_temperaturef max_dew_point_f meandew_point_f station_id lat long dockcount day hour month hourdiff durationminutes duration_i
startdate duration_f
2013-08-29 09:00:00 2.90 4069 174 64 64 288 74 68 61 61 58 64 37.782259 -122.392738 15 29 9 8 0 2.900000 3
2.97 4086 178 45 45 379 74 68 61 61 58 45 37.794231 -122.402923 15 29 9 8 0 2.966667 3
3.63 4081 218 27 27 150 80 70 64 65 61 27 37.389218 -122.081896 15 29 9 8 0 3.633333 4
4.78 4084 287 27 27 138 80 70 64 65 61 27 37.389218 -122.081896 15 29 9 8 0 4.783333 5
12.73 4080 764 66 69 315 74 68 61 61 58 66 37.774814 -122.418954 19 29 9 8 0 12.733333 13

5 rows × 36 columns

In this table, we are looking at hourly data by landmark.

In [47]:
hourlytry = dmerge4.groupby([pd.Grouper(freq='H',key='startdate'),'landmark'],as_index=False)[['duration_f']].mean()
hourlytry
Out[47]:
startdate landmark duration_f
0 2013-08-29 09:00:00 Mountain View 4.205000
1 2013-08-29 09:00:00 Redwood City 63.820000
2 2013-08-29 09:00:00 San Francisco 14.695556
3 2013-08-29 10:00:00 Mountain View 6.325000
4 2013-08-29 10:00:00 Palo Alto 18.775000
5 2013-08-29 10:00:00 San Francisco 37.356250
6 2013-08-29 10:00:00 San Jose 21.225000
7 2013-08-29 11:00:00 San Francisco 40.284737
8 2013-08-29 11:00:00 San Jose 2.957500
9 2013-08-29 12:00:00 Palo Alto 11.658000
10 2013-08-29 12:00:00 Redwood City 8.353333
11 2013-08-29 12:00:00 San Francisco 16.223173
12 2013-08-29 12:00:00 San Jose 12.387500
13 2013-08-29 13:00:00 Mountain View 1628.550000
14 2013-08-29 13:00:00 Palo Alto 23.544286
15 2013-08-29 13:00:00 San Francisco 53.158529
16 2013-08-29 13:00:00 San Jose 26.937273
17 2013-08-29 14:00:00 Mountain View 15.880000
18 2013-08-29 14:00:00 San Francisco 24.205366
19 2013-08-29 14:00:00 San Jose 6.595000
20 2013-08-29 15:00:00 Mountain View 878.300000
21 2013-08-29 15:00:00 Palo Alto 11.620000
22 2013-08-29 15:00:00 San Francisco 23.209492
23 2013-08-29 15:00:00 San Jose 16.200000
24 2013-08-29 16:00:00 Mountain View 4.450000
25 2013-08-29 16:00:00 San Francisco 14.293684
26 2013-08-29 16:00:00 San Jose 5.440000
27 2013-08-29 17:00:00 Mountain View 12.970000
28 2013-08-29 17:00:00 Palo Alto 10.422000
29 2013-08-29 17:00:00 Redwood City 8.925000
9781 2014-02-28 11:00:00 San Jose 5.270000
9782 2014-02-28 12:00:00 Palo Alto 3.320000
9783 2014-02-28 12:00:00 San Francisco 6.805385
9784 2014-02-28 12:00:00 San Jose 5.766667
9785 2014-02-28 13:00:00 Palo Alto 4.850000
9786 2014-02-28 13:00:00 San Francisco 7.604667
9787 2014-02-28 13:00:00 San Jose 43.920000
9788 2014-02-28 14:00:00 San Francisco 9.297692
9789 2014-02-28 14:00:00 San Jose 5.252000
9790 2014-02-28 15:00:00 Mountain View 4.330000
9791 2014-02-28 15:00:00 San Francisco 14.967143
9792 2014-02-28 15:00:00 San Jose 12.970000
9793 2014-02-28 16:00:00 Mountain View 4.470000
9794 2014-02-28 16:00:00 San Francisco 49.152857
9795 2014-02-28 16:00:00 San Jose 5.080000
9796 2014-02-28 17:00:00 Mountain View 5.625000
9797 2014-02-28 17:00:00 San Francisco 9.254571
9798 2014-02-28 17:00:00 San Jose 6.200000
9799 2014-02-28 18:00:00 Mountain View 4.740000
9800 2014-02-28 18:00:00 Palo Alto 13.415000
9801 2014-02-28 18:00:00 San Francisco 8.176364
9802 2014-02-28 19:00:00 Mountain View 9.435000
9803 2014-02-28 19:00:00 San Francisco 132.491364
9804 2014-02-28 19:00:00 San Jose 11.815000
9805 2014-02-28 20:00:00 Mountain View 4.620000
9806 2014-02-28 20:00:00 Redwood City 24.380000
9807 2014-02-28 20:00:00 San Francisco 7.406667
9808 2014-02-28 21:00:00 San Francisco 7.203333
9809 2014-02-28 22:00:00 San Francisco 8.572500
9810 2014-02-28 23:00:00 San Francisco 539.150000

9811 rows × 3 columns

Group operations on a datetime index

In [34]:
dmerge4.groupby(dmerge4.index.month).agg(['count','sum','mean','min','max'])[['duration_f']]
Out[34]:
duration_f
count sum mean min max
1 24428 412751.48 16.896655 1.00 9772.60
2 19024 332260.54 17.465335 1.02 3077.22
8 2102 90672.74 43.136413 1.03 4022.23
9 25243 663946.54 26.302204 1.00 9958.62
10 29105 578249.15 19.867691 1.00 7156.40
11 24219 500883.27 20.681418 1.00 12037.27
12 19894 375728.66 18.886532 1.00 10322.03

Let's make a table of average dockcount by each precipitation level and subscription type

In [29]:
i = pd.DataFrame(dmerge4.groupby(['precipitation_in','subscriptiontype']).dockcount.mean()).reset_index()
i
Out[29]:
precipitation_in subscriptiontype dockcount
0 0 Customer 18.940762
1 0 Subscriber 19.472218
2 0.01 Customer 19.660550
3 0.01 Subscriber 19.833333
4 0 Customer 18.942500
5 0 Subscriber 19.528072
6 0.01 Customer 18.397083
7 0.01 Subscriber 19.411516
8 0.02 Customer 19.000000
9 0.02 Subscriber 18.341772
10 0.03 Customer 12.000000
11 0.03 Subscriber 17.666667
12 0.04 Customer 17.166667
13 0.04 Subscriber 18.132075
14 0.05 Subscriber 17.545455
15 0.06 Customer 19.915663
16 0.06 Subscriber 18.818182
17 0.07 Customer 11.000000
18 0.07 Subscriber 15.000000
19 0.08 Customer 17.666667
20 0.08 Subscriber 19.000000
21 0.09 Customer 19.952381
22 0.09 Subscriber 19.588079
23 0.1 Customer 16.600000
24 0.1 Subscriber 19.102564
25 0.12 Subscriber 25.000000
26 0.14 Customer 17.000000
27 0.14 Subscriber 19.000000
28 0.16 Subscriber 15.000000
29 0.17 Customer 13.000000
48 0.45 Subscriber 18.428571
49 0.47 Customer 21.000000
50 0.47 Subscriber 19.105263
51 0.49 Customer 11.000000
52 0.49 Subscriber 11.000000
53 0.53 Subscriber 17.000000
54 0.61 Subscriber 17.608696
55 0.63 Customer 18.714286
56 0.63 Subscriber 19.572519
57 0.64 Customer 20.161290
58 0.64 Subscriber 19.519608
59 0.66 Customer 16.756098
60 0.66 Subscriber 17.800000
61 0.68 Subscriber 18.407407
62 0.74 Customer 18.902439
63 0.74 Subscriber 19.868009
64 0.78 Customer 15.000000
65 0.78 Subscriber 18.200000
66 0.83 Subscriber 15.000000
67 0.85 Customer 19.355556
68 0.85 Subscriber 18.407407
69 0.88 Customer 15.000000
70 0.88 Subscriber 18.000000
71 0.97 Customer 19.000000
72 0.97 Subscriber 18.130435
73 1.06 Subscriber 13.285714
74 1.71 Customer 20.333333
75 1.71 Subscriber 17.000000
76 T Customer 19.195219
77 T Subscriber 19.671620

78 rows × 3 columns

Whate are the total number of observations in each of the initial six months. Other than first month of operation, the distribution of observations distribution is pretty evenly spread out.

In [39]:
dmerge4.groupby(dmerge4.index.month).size()
Out[39]:
1     24428
2     19024
8      2102
9     25243
10    29105
11    24219
12    19894
dtype: int64
In [77]:
dmerge4.groupby(dmerge4.index.month).size().order(ascending=False)[:5]
Out[77]:
10    29105
9     25243
1     24428
11    24219
12    19894
dtype: int64

Working with multi-level indexes

Let's create a multi-level index without grouping because we'll often want to perform queries on the full dataset.

In [31]:
dmerge4twolevels = dmerge4.set_index(['landmark','subscriptiontype'])
dmerge4twolevels
Out[31]:
lubstartdate startstation tripid duration startdate startterminal enddate endstation endterminal bike day hour month timeofday diff hourdiff durationminutes duration_i duration_f thirtymin
landmark subscriptiontype
San Francisco Subscriber 2013-08-29 2nd at Folsom 4636 186 2013-08-29 15:11:00 62 2013-08-29 15:14:00 2nd at Townsend 61 366 29 15 8 evening 00:03:00 0 3.100000 3 3.10 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4820 813 2013-08-29 17:35:00 62 2013-08-29 17:48:00 Townsend at 7th 65 409 29 17 8 evening 00:13:00 0 13.550000 14 13.55 in_thirty
Customer 2013-08-29 2nd at Folsom 5001 236 2013-08-29 20:00:00 62 2013-08-29 20:04:00 Market at Sansome 77 567 29 20 8 night 00:04:00 0 3.933333 4 3.93 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4812 186 2013-08-29 17:30:00 62 2013-08-29 17:33:00 2nd at Folsom 62 409 29 17 8 evening 00:03:00 0 3.100000 3 3.10 in_thirty
Customer 2013-08-29 2nd at Folsom 4946 827 2013-08-29 19:07:00 62 2013-08-29 19:21:00 Embarcadero at Vallejo 48 342 29 19 8 evening 00:14:00 0 13.783333 14 13.78 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4390 825 2013-08-29 12:33:00 62 2013-08-29 12:47:00 2nd at South Park 64 331 29 12 8 mid_day 00:14:00 0 13.750000 14 13.75 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4939 816 2013-08-29 19:01:00 62 2013-08-29 19:14:00 Harry Bridges Plaza (Ferry Building) 50 539 29 19 8 evening 00:13:00 0 13.600000 14 13.60 in_thirty
Customer 2013-08-29 2nd at Folsom 4949 1524 2013-08-29 19:11:00 62 2013-08-29 19:36:00 Commercial at Montgomery 45 400 29 19 8 evening 00:25:00 0 25.400000 25 25.40 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4582 335 2013-08-29 14:14:00 62 2013-08-29 14:20:00 2nd at Folsom 62 342 29 14 8 mid_day 00:06:00 0 5.583333 6 5.58 in_thirty
Customer 2013-08-29 2nd at Folsom 4951 1409 2013-08-29 19:13:00 62 2013-08-29 19:36:00 Commercial at Montgomery 45 380 29 19 8 evening 00:23:00 0 23.483333 23 23.48 in_thirty
Customer 2013-08-29 2nd at Folsom 4945 797 2013-08-29 19:08:00 62 2013-08-29 19:21:00 Embarcadero at Vallejo 48 406 29 19 8 evening 00:13:00 0 13.283333 13 13.28 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4938 840 2013-08-29 19:01:00 62 2013-08-29 19:15:00 Harry Bridges Plaza (Ferry Building) 50 607 29 19 8 evening 00:14:00 0 14.000000 14 14.00 in_thirty
Subscriber 2013-08-29 2nd at South Park 4557 130 2013-08-29 13:57:00 64 2013-08-29 13:59:00 2nd at South Park 64 371 29 13 8 mid_day 00:02:00 0 2.166667 2 2.17 in_thirty
Subscriber 2013-08-29 2nd at South Park 4069 174 2013-08-29 09:08:00 64 2013-08-29 09:11:00 2nd at South Park 64 288 29 9 8 morning 00:03:00 0 2.900000 3 2.90 in_thirty
Subscriber 2013-08-29 2nd at South Park 4564 1693 2013-08-29 14:04:00 64 2013-08-29 14:32:00 2nd at South Park 64 272 29 14 8 mid_day 00:28:00 0 28.216667 28 28.22 in_thirty
Subscriber 2013-08-29 2nd at South Park 4465 419 2013-08-29 13:11:00 64 2013-08-29 13:18:00 Post at Kearney 47 498 29 13 8 mid_day 00:07:00 0 6.983333 7 6.98 in_thirty
Subscriber 2013-08-29 2nd at South Park 4566 1699 2013-08-29 14:04:00 64 2013-08-29 14:33:00 2nd at South Park 64 371 29 14 8 mid_day 00:29:00 0 28.316667 28 28.32 in_thirty
Subscriber 2013-08-29 2nd at South Park 5083 755 2013-08-29 22:05:00 64 2013-08-29 22:18:00 Harry Bridges Plaza (Ferry Building) 50 270 29 22 8 night 00:13:00 0 12.583333 13 12.58 in_thirty
Subscriber 2013-08-29 2nd at South Park 4503 728 2013-08-29 13:29:00 64 2013-08-29 13:41:00 Embarcadero at Vallejo 48 331 29 13 8 mid_day 00:12:00 0 12.133333 12 12.13 in_thirty
Subscriber 2013-08-29 2nd at South Park 4645 850 2013-08-29 15:20:00 64 2013-08-29 15:35:00 San Francisco Caltrain (Townsend at 4th) 70 288 29 15 8 evening 00:15:00 0 14.166667 14 14.17 in_thirty
Subscriber 2013-08-29 2nd at South Park 4426 982 2013-08-29 12:52:00 64 2013-08-29 13:08:00 2nd at South Park 64 593 29 12 8 mid_day 00:16:00 1 16.366667 16 16.37 in_thirty
Subscriber 2013-08-29 2nd at South Park 4640 335 2013-08-29 15:19:00 64 2013-08-29 15:24:00 San Francisco Caltrain (Townsend at 4th) 70 438 29 15 8 evening 00:05:00 0 5.583333 6 5.58 in_thirty
Subscriber 2013-08-29 2nd at South Park 4567 1684 2013-08-29 14:04:00 64 2013-08-29 14:33:00 2nd at South Park 64 569 29 14 8 mid_day 00:29:00 0 28.066667 28 28.07 in_thirty
Customer 2013-08-29 2nd at Townsend 4276 173 2013-08-29 11:45:00 61 2013-08-29 11:48:00 2nd at South Park 64 280 29 11 8 mid_day 00:03:00 0 2.883333 3 2.88 in_thirty
Customer 2013-08-29 2nd at Townsend 4876 309 2013-08-29 18:09:00 61 2013-08-29 18:14:00 San Francisco Caltrain 2 (330 Townsend) 69 565 29 18 8 evening 00:05:00 0 5.150000 5 5.15 in_thirty
Subscriber 2013-08-29 2nd at Townsend 4411 223 2013-08-29 12:40:00 61 2013-08-29 12:43:00 San Francisco Caltrain 2 (330 Townsend) 69 259 29 12 8 mid_day 00:03:00 0 3.716667 4 3.72 in_thirty
Customer 2013-08-29 2nd at Townsend 4397 950 2013-08-29 12:36:00 61 2013-08-29 12:52:00 Steuart at Market 74 312 29 12 8 mid_day 00:16:00 0 15.833333 16 15.83 in_thirty
Customer 2013-08-29 2nd at Townsend 4888 1077 2013-08-29 18:16:00 61 2013-08-29 18:34:00 Grant Avenue at Columbus Avenue 73 309 29 18 8 evening 00:18:00 0 17.950000 18 17.95 in_thirty
Customer 2013-08-29 2nd at Townsend 4402 1776 2013-08-29 12:39:00 61 2013-08-29 13:09:00 Embarcadero at Folsom 51 313 29 12 8 mid_day 00:30:00 1 29.600000 30 29.60 in_thirty
Customer 2013-08-29 2nd at Townsend 4460 693 2013-08-29 13:08:00 61 2013-08-29 13:20:00 Harry Bridges Plaza (Ferry Building) 50 314 29 13 8 mid_day 00:12:00 0 11.550000 12 11.55 in_thirty
Subscriber 2014-02-28 Temporary Transbay Terminal (Howard at Beale) 198483 839 2014-02-28 16:41:00 55 2014-02-28 16:55:00 San Francisco Caltrain (Townsend at 4th) 70 281 28 16 2 evening 00:14:00 0 13.983333 14 13.98 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198770 850 2014-02-28 22:19:00 65 2014-02-28 22:34:00 Post at Kearney 47 479 28 22 2 night 00:15:00 0 14.166667 14 14.17 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198405 317 2014-02-28 14:08:00 65 2014-02-28 14:13:00 San Francisco Caltrain (Townsend at 4th) 70 412 28 14 2 mid_day 00:05:00 0 5.283333 5 5.28 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198750 254 2014-02-28 20:22:00 65 2014-02-28 20:26:00 San Francisco Caltrain (Townsend at 4th) 70 268 28 20 2 night 00:04:00 0 4.233333 4 4.23 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198468 267 2014-02-28 16:33:00 65 2014-02-28 16:38:00 San Francisco Caltrain (Townsend at 4th) 70 400 28 16 2 evening 00:05:00 0 4.450000 4 4.45 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198213 854 2014-02-28 09:06:00 65 2014-02-28 09:20:00 Market at Sansome 77 409 28 9 2 morning 00:14:00 0 14.233333 14 14.23 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198098 558 2014-02-28 07:55:00 65 2014-02-28 08:04:00 South Van Ness at Market 66 515 28 7 2 morning 00:09:00 1 9.300000 9 9.30 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198587 610 2014-02-28 17:32:00 65 2014-02-28 17:42:00 Harry Bridges Plaza (Ferry Building) 50 435 28 17 2 evening 00:10:00 0 10.166667 10 10.17 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198648 408 2014-02-28 18:20:00 65 2014-02-28 18:27:00 Civic Center BART (7th at Market) 72 414 28 18 2 evening 00:07:00 0 6.800000 7 6.80 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198465 891 2014-02-28 16:32:00 65 2014-02-28 16:46:00 Temporary Transbay Terminal (Howard at Beale) 55 570 28 16 2 evening 00:14:00 0 14.850000 15 14.85 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198633 243 2014-02-28 18:08:00 65 2014-02-28 18:12:00 San Francisco Caltrain (Townsend at 4th) 70 389 28 18 2 evening 00:04:00 0 4.050000 4 4.05 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198128 221 2014-02-28 08:12:00 65 2014-02-28 08:16:00 San Francisco Caltrain (Townsend at 4th) 70 514 28 8 2 morning 00:04:00 0 3.683333 4 3.68 in_thirty
Palo Alto Subscriber 2014-02-28 University and Emerson 198399 291 2014-02-28 13:49:00 35 2014-02-28 13:54:00 University and Emerson 35 165 28 13 2 mid_day 00:05:00 0 4.850000 5 4.85 in_thirty
Subscriber 2014-02-28 University and Emerson 198625 233 2014-02-28 18:03:00 35 2014-02-28 18:07:00 Cowper at University 37 138 28 18 2 evening 00:04:00 0 3.883333 4 3.88 in_thirty
San Francisco Subscriber 2014-02-28 Washington at Kearney 198275 570 2014-02-28 09:44:00 46 2014-02-28 09:54:00 Powell Street BART 53 384 28 9 2 morning 00:10:00 0 9.500000 10 9.50 in_thirty
Subscriber 2014-02-28 Washington at Kearney 198277 918 2014-02-28 09:45:00 46 2014-02-28 10:00:00 South Van Ness at Market 66 399 28 9 2 morning 00:15:00 1 15.300000 15 15.30 in_thirty
Subscriber 2014-02-28 Washington at Kearney 198671 956 2014-02-28 18:34:00 46 2014-02-28 18:50:00 Embarcadero at Bryant 54 548 28 18 2 evening 00:16:00 0 15.933333 16 15.93 in_thirty
Subscriber 2014-02-28 Washington at Kearney 198145 555 2014-02-28 08:28:00 46 2014-02-28 08:37:00 Howard at 2nd 63 407 28 8 2 morning 00:09:00 0 9.250000 9 9.25 in_thirty
Customer 2014-02-28 Washington at Kearney 198381 727 2014-02-28 13:07:00 46 2014-02-28 13:19:00 Market at 4th 76 324 28 13 2 mid_day 00:12:00 0 12.116667 12 12.12 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198533 292 2014-02-28 17:03:00 68 2014-02-28 17:08:00 San Francisco Caltrain 2 (330 Townsend) 69 479 28 17 2 evening 00:05:00 0 4.866667 5 4.87 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198491 278 2014-02-28 16:44:00 68 2014-02-28 16:49:00 San Francisco Caltrain (Townsend at 4th) 70 314 28 16 2 evening 00:05:00 0 4.633333 5 4.63 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198305 522 2014-02-28 10:05:00 68 2014-02-28 10:14:00 Steuart at Market 74 326 28 10 2 morning 00:09:00 0 8.700000 9 8.70 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198289 561 2014-02-28 09:50:00 68 2014-02-28 10:00:00 Market at 10th 67 518 28 9 2 morning 00:10:00 1 9.350000 9 9.35 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198453 282 2014-02-28 16:19:00 68 2014-02-28 16:24:00 San Francisco Caltrain (Townsend at 4th) 70 437 28 16 2 evening 00:05:00 0 4.700000 5 4.70 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198710 323 2014-02-28 19:13:00 68 2014-02-28 19:19:00 San Francisco Caltrain 2 (330 Townsend) 69 339 28 19 2 evening 00:06:00 0 5.383333 5 5.38 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198685 271 2014-02-28 18:48:00 68 2014-02-28 18:52:00 San Francisco Caltrain (Townsend at 4th) 70 454 28 18 2 evening 00:04:00 0 4.516667 5 4.52 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198087 159 2014-02-28 07:42:00 68 2014-02-28 07:44:00 5th at Howard 57 435 28 7 2 morning 00:02:00 0 2.650000 3 2.65 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198639 342 2014-02-28 18:13:00 68 2014-02-28 18:19:00 Civic Center BART (7th at Market) 72 292 28 18 2 evening 00:06:00 0 5.700000 6 5.70 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198566 338 2014-02-28 17:20:00 68 2014-02-28 17:25:00 San Francisco Caltrain (Townsend at 4th) 70 477 28 17 2 evening 00:05:00 0 5.633333 6 5.63 in_thirty
Customer 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198386 394 2014-02-28 13:22:00 68 2014-02-28 13:28:00 Steuart at Market 74 442 28 13 2 mid_day 00:06:00 0 6.566667 7 6.57 in_thirty

144015 rows × 52 columns

Creates a groupby object dataframe on which aggregate actions can be performed and can group by the multiindexes

In [40]:
#creates a groupby object dataframe and can group by the multiindexes
grouplevel0 = dmerge4twolevels.groupby(level=0)
grouplevel0
Out[40]:
<pandas.core.groupby.DataFrameGroupBy object at 0x7ffbc802c8d0>
In [41]:
grouplevel0.mean()
#same result as dmerge4.groupby('landmark').mean()
Out[41]:
tripid duration startterminal endterminal bike max_temperature_f mean_temperature_f min_temperaturef max_dew_point_f meandew_point_f lat long dockcount day hour month hourdiff durationminutes duration_i duration_f
landmark
Mountain View 113506.415689 2681.418988 28.782991 28.980572 266.556085 67.732405 54.907991 44.681085 48.479106 42.678886 37.394312 -122.083341 18.888563 15.521261 13.072947 6.898827 0.251833 44.690316 44.682918 44.690359
Palo Alto 92832.010551 4635.689918 35.669988 35.193435 208.521688 69.908558 59.389215 48.392145 48.810082 44.058617 37.440460 -122.156431 15.171161 15.759086 13.461313 7.947831 0.566237 77.261499 77.262603 77.261483
Redwood City 82721.219420 2945.804540 23.134931 24.148802 237.013871 68.611602 59.397226 50.197982 51.931904 45.466583 37.486049 -122.229363 19.766709 15.145019 12.639344 8.464061 0.161412 49.096742 49.084489 49.096608
San Francisco 101092.908674 1133.965253 61.825657 61.868821 455.360123 66.115962 58.201636 49.776278 49.848767 45.052167 37.787144 -122.400225 19.565963 15.746290 13.177293 7.668687 0.231462 18.899421 18.899656 18.899424
San Jose 98496.783100 1394.662899 7.371684 7.563290 284.057862 69.322104 58.393397 46.986682 47.791046 42.897146 37.334148 -121.892161 17.977952 15.714605 13.638500 7.585115 0.095467 23.244382 23.240963 23.244376

5 rows × 37 columns

In [42]:
grouplevel1 = dmerge4twolevels.groupby(level=1)
grouplevel1
Out[42]:
<pandas.core.groupby.DataFrameGroupBy object at 0x7ffb96e0db90>
In [43]:
grouplevel1.mean()
Out[43]:
tripid duration startterminal endterminal bike max_temperature_f mean_temperature_f min_temperaturef max_dew_point_f meandew_point_f lat long dockcount day hour month hourdiff durationminutes duration_i duration_f
subscriptiontype
Customer 79116.721714 3629.519659 56.400586 56.781909 433.907238 68.987487 60.303115 51.149137 51.473953 46.698992 37.745722 -122.358957 18.944810 15.439970 13.707949 8.313192 0.580051 60.491994 60.490022 60.491974
Subscriber 106806.906130 589.969564 57.538334 57.505486 437.863692 65.713868 57.602277 49.049971 49.218783 44.373692 37.745863 -122.358849 19.526261 15.816247 13.069760 7.481157 0.132648 9.832826 9.833106 9.832834

2 rows × 37 columns

The level can also be specified by name

In [44]:
#can also specify level by name (same as above)
dmerge4twolevels.groupby(level='subscriptiontype').sum()
#dmerge4twolevels.sum(level='subscriptiontype')  #same as above
Out[44]:
tripid duration startterminal endterminal bike max_temperature_f mean_temperature_f min_temperaturef max_dew_point_f meandew_point_f lat long dockcount day hour month hourdiff durationminutes duration_i duration_f
subscriptiontype
Customer 2402616605 110221253 1712773 1724353 13176895 2095012 1831285 1553297 1563161 1418155 1146262.097908 -3715796.806334 575316 468881 416283 252455 17615 1837020.883333 1836961 1837020.26
Subscriber 12138284461 67048271 6539059 6535326 49761895 7468184 6546326 5574382 5593567 5042937 4289704.103845 -13905716.142004 2219101 1797469 1485339 850211 15075 1117471.183333 1117503 1117472.12

2 rows × 37 columns

Create the level without dropping it as a column

In [45]:
#don't drop the level as a column; another way to select the level of a multi-level index
dmerge4twolevels.xs('Subscriber', level='subscriptiontype',drop_level=False)
Out[45]:
lubstartdate startstation tripid duration startdate startterminal enddate endstation endterminal bike day hour month timeofday diff hourdiff durationminutes duration_i duration_f thirtymin
landmark subscriptiontype
San Francisco Subscriber 2013-08-29 2nd at Folsom 4636 186 2013-08-29 15:11:00 62 2013-08-29 15:14:00 2nd at Townsend 61 366 29 15 8 evening 00:03:00 0 3.100000 3 3.10 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4820 813 2013-08-29 17:35:00 62 2013-08-29 17:48:00 Townsend at 7th 65 409 29 17 8 evening 00:13:00 0 13.550000 14 13.55 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4812 186 2013-08-29 17:30:00 62 2013-08-29 17:33:00 2nd at Folsom 62 409 29 17 8 evening 00:03:00 0 3.100000 3 3.10 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4390 825 2013-08-29 12:33:00 62 2013-08-29 12:47:00 2nd at South Park 64 331 29 12 8 mid_day 00:14:00 0 13.750000 14 13.75 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4939 816 2013-08-29 19:01:00 62 2013-08-29 19:14:00 Harry Bridges Plaza (Ferry Building) 50 539 29 19 8 evening 00:13:00 0 13.600000 14 13.60 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4582 335 2013-08-29 14:14:00 62 2013-08-29 14:20:00 2nd at Folsom 62 342 29 14 8 mid_day 00:06:00 0 5.583333 6 5.58 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4938 840 2013-08-29 19:01:00 62 2013-08-29 19:15:00 Harry Bridges Plaza (Ferry Building) 50 607 29 19 8 evening 00:14:00 0 14.000000 14 14.00 in_thirty
Subscriber 2013-08-29 2nd at South Park 4557 130 2013-08-29 13:57:00 64 2013-08-29 13:59:00 2nd at South Park 64 371 29 13 8 mid_day 00:02:00 0 2.166667 2 2.17 in_thirty
Subscriber 2013-08-29 2nd at South Park 4069 174 2013-08-29 09:08:00 64 2013-08-29 09:11:00 2nd at South Park 64 288 29 9 8 morning 00:03:00 0 2.900000 3 2.90 in_thirty
Subscriber 2013-08-29 2nd at South Park 4564 1693 2013-08-29 14:04:00 64 2013-08-29 14:32:00 2nd at South Park 64 272 29 14 8 mid_day 00:28:00 0 28.216667 28 28.22 in_thirty
Subscriber 2013-08-29 2nd at South Park 4465 419 2013-08-29 13:11:00 64 2013-08-29 13:18:00 Post at Kearney 47 498 29 13 8 mid_day 00:07:00 0 6.983333 7 6.98 in_thirty
Subscriber 2013-08-29 2nd at South Park 4566 1699 2013-08-29 14:04:00 64 2013-08-29 14:33:00 2nd at South Park 64 371 29 14 8 mid_day 00:29:00 0 28.316667 28 28.32 in_thirty
Subscriber 2013-08-29 2nd at South Park 5083 755 2013-08-29 22:05:00 64 2013-08-29 22:18:00 Harry Bridges Plaza (Ferry Building) 50 270 29 22 8 night 00:13:00 0 12.583333 13 12.58 in_thirty
Subscriber 2013-08-29 2nd at South Park 4503 728 2013-08-29 13:29:00 64 2013-08-29 13:41:00 Embarcadero at Vallejo 48 331 29 13 8 mid_day 00:12:00 0 12.133333 12 12.13 in_thirty
Subscriber 2013-08-29 2nd at South Park 4645 850 2013-08-29 15:20:00 64 2013-08-29 15:35:00 San Francisco Caltrain (Townsend at 4th) 70 288 29 15 8 evening 00:15:00 0 14.166667 14 14.17 in_thirty
Subscriber 2013-08-29 2nd at South Park 4426 982 2013-08-29 12:52:00 64 2013-08-29 13:08:00 2nd at South Park 64 593 29 12 8 mid_day 00:16:00 1 16.366667 16 16.37 in_thirty
Subscriber 2013-08-29 2nd at South Park 4640 335 2013-08-29 15:19:00 64 2013-08-29 15:24:00 San Francisco Caltrain (Townsend at 4th) 70 438 29 15 8 evening 00:05:00 0 5.583333 6 5.58 in_thirty
Subscriber 2013-08-29 2nd at South Park 4567 1684 2013-08-29 14:04:00 64 2013-08-29 14:33:00 2nd at South Park 64 569 29 14 8 mid_day 00:29:00 0 28.066667 28 28.07 in_thirty
Subscriber 2013-08-29 2nd at Townsend 4411 223 2013-08-29 12:40:00 61 2013-08-29 12:43:00 San Francisco Caltrain 2 (330 Townsend) 69 259 29 12 8 mid_day 00:03:00 0 3.716667 4 3.72 in_thirty
Subscriber 2013-08-29 5th at Howard 4909 1220 2013-08-29 18:36:00 57 2013-08-29 18:57:00 Embarcadero at Vallejo 48 404 29 18 8 evening 00:21:00 0 20.333333 20 20.33 in_thirty
Subscriber 2013-08-29 5th at Howard 4970 404 2013-08-29 19:36:00 57 2013-08-29 19:43:00 Powell at Post (Union Square) 71 320 29 19 8 evening 00:07:00 0 6.733333 7 6.73 in_thirty
Subscriber 2013-08-29 5th at Howard 4657 298 2013-08-29 15:30:00 57 2013-08-29 15:35:00 5th at Howard 57 631 29 15 8 evening 00:05:00 0 4.966667 5 4.97 in_thirty
Subscriber 2013-08-29 5th at Howard 4892 1492 2013-08-29 18:21:00 57 2013-08-29 18:46:00 Embarcadero at Vallejo 48 577 29 18 8 evening 00:25:00 0 24.866667 25 24.87 in_thirty
Subscriber 2013-08-29 5th at Howard 4950 1424 2013-08-29 19:12:00 57 2013-08-29 19:35:00 Harry Bridges Plaza (Ferry Building) 50 508 29 19 8 evening 00:23:00 0 23.733333 24 23.73 in_thirty
San Jose Subscriber 2013-08-29 Adobe on Almaden 4419 556 2013-08-29 12:45:00 5 2013-08-29 12:55:00 San Jose Civic Center 3 671 29 12 8 mid_day 00:10:00 0 9.266667 9 9.27 in_thirty
Subscriber 2013-08-29 Adobe on Almaden 4741 412 2013-08-29 16:45:00 5 2013-08-29 16:52:00 San Jose City Hall 10 679 29 16 8 evening 00:07:00 0 6.866667 7 6.87 in_thirty
Subscriber 2013-08-29 Adobe on Almaden 4502 210 2013-08-29 13:28:00 5 2013-08-29 13:31:00 San Jose Civic Center 3 664 29 13 8 mid_day 00:03:00 0 3.500000 4 3.50 in_thirty
Subscriber 2013-08-29 Arena Green / SAP Center 4365 1094 2013-08-29 12:24:00 14 2013-08-29 12:42:00 Arena Green / SAP Center 14 647 29 12 8 mid_day 00:18:00 0 18.233333 18 18.23 in_thirty
San Francisco Subscriber 2013-08-29 Beale at Market 4787 479 2013-08-29 17:12:00 56 2013-08-29 17:20:00 Market at 4th 76 546 29 17 8 evening 00:08:00 0 7.983333 8 7.98 in_thirty
Subscriber 2013-08-29 Beale at Market 4374 563 2013-08-29 12:23:00 56 2013-08-29 12:32:00 Commercial at Montgomery 45 410 29 12 8 mid_day 00:09:00 0 9.383333 9 9.38 in_thirty
Subscriber 2014-02-28 Temporary Transbay Terminal (Howard at Beale) 198170 888 2014-02-28 08:47:00 55 2014-02-28 09:02:00 Grant Avenue at Columbus Avenue 73 319 28 8 2 morning 00:15:00 1 14.800000 15 14.80 in_thirty
Subscriber 2014-02-28 Temporary Transbay Terminal (Howard at Beale) 198105 277 2014-02-28 07:59:00 55 2014-02-28 08:03:00 Harry Bridges Plaza (Ferry Building) 50 420 28 7 2 morning 00:04:00 1 4.616667 5 4.62 in_thirty
Subscriber 2014-02-28 Temporary Transbay Terminal (Howard at Beale) 198483 839 2014-02-28 16:41:00 55 2014-02-28 16:55:00 San Francisco Caltrain (Townsend at 4th) 70 281 28 16 2 evening 00:14:00 0 13.983333 14 13.98 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198770 850 2014-02-28 22:19:00 65 2014-02-28 22:34:00 Post at Kearney 47 479 28 22 2 night 00:15:00 0 14.166667 14 14.17 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198405 317 2014-02-28 14:08:00 65 2014-02-28 14:13:00 San Francisco Caltrain (Townsend at 4th) 70 412 28 14 2 mid_day 00:05:00 0 5.283333 5 5.28 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198750 254 2014-02-28 20:22:00 65 2014-02-28 20:26:00 San Francisco Caltrain (Townsend at 4th) 70 268 28 20 2 night 00:04:00 0 4.233333 4 4.23 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198468 267 2014-02-28 16:33:00 65 2014-02-28 16:38:00 San Francisco Caltrain (Townsend at 4th) 70 400 28 16 2 evening 00:05:00 0 4.450000 4 4.45 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198213 854 2014-02-28 09:06:00 65 2014-02-28 09:20:00 Market at Sansome 77 409 28 9 2 morning 00:14:00 0 14.233333 14 14.23 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198098 558 2014-02-28 07:55:00 65 2014-02-28 08:04:00 South Van Ness at Market 66 515 28 7 2 morning 00:09:00 1 9.300000 9 9.30 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198587 610 2014-02-28 17:32:00 65 2014-02-28 17:42:00 Harry Bridges Plaza (Ferry Building) 50 435 28 17 2 evening 00:10:00 0 10.166667 10 10.17 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198648 408 2014-02-28 18:20:00 65 2014-02-28 18:27:00 Civic Center BART (7th at Market) 72 414 28 18 2 evening 00:07:00 0 6.800000 7 6.80 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198465 891 2014-02-28 16:32:00 65 2014-02-28 16:46:00 Temporary Transbay Terminal (Howard at Beale) 55 570 28 16 2 evening 00:14:00 0 14.850000 15 14.85 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198633 243 2014-02-28 18:08:00 65 2014-02-28 18:12:00 San Francisco Caltrain (Townsend at 4th) 70 389 28 18 2 evening 00:04:00 0 4.050000 4 4.05 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198128 221 2014-02-28 08:12:00 65 2014-02-28 08:16:00 San Francisco Caltrain (Townsend at 4th) 70 514 28 8 2 morning 00:04:00 0 3.683333 4 3.68 in_thirty
Palo Alto Subscriber 2014-02-28 University and Emerson 198399 291 2014-02-28 13:49:00 35 2014-02-28 13:54:00 University and Emerson 35 165 28 13 2 mid_day 00:05:00 0 4.850000 5 4.85 in_thirty
Subscriber 2014-02-28 University and Emerson 198625 233 2014-02-28 18:03:00 35 2014-02-28 18:07:00 Cowper at University 37 138 28 18 2 evening 00:04:00 0 3.883333 4 3.88 in_thirty
San Francisco Subscriber 2014-02-28 Washington at Kearney 198275 570 2014-02-28 09:44:00 46 2014-02-28 09:54:00 Powell Street BART 53 384 28 9 2 morning 00:10:00 0 9.500000 10 9.50 in_thirty
Subscriber 2014-02-28 Washington at Kearney 198277 918 2014-02-28 09:45:00 46 2014-02-28 10:00:00 South Van Ness at Market 66 399 28 9 2 morning 00:15:00 1 15.300000 15 15.30 in_thirty
Subscriber 2014-02-28 Washington at Kearney 198671 956 2014-02-28 18:34:00 46 2014-02-28 18:50:00 Embarcadero at Bryant 54 548 28 18 2 evening 00:16:00 0 15.933333 16 15.93 in_thirty
Subscriber 2014-02-28 Washington at Kearney 198145 555 2014-02-28 08:28:00 46 2014-02-28 08:37:00 Howard at 2nd 63 407 28 8 2 morning 00:09:00 0 9.250000 9 9.25 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198533 292 2014-02-28 17:03:00 68 2014-02-28 17:08:00 San Francisco Caltrain 2 (330 Townsend) 69 479 28 17 2 evening 00:05:00 0 4.866667 5 4.87 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198491 278 2014-02-28 16:44:00 68 2014-02-28 16:49:00 San Francisco Caltrain (Townsend at 4th) 70 314 28 16 2 evening 00:05:00 0 4.633333 5 4.63 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198305 522 2014-02-28 10:05:00 68 2014-02-28 10:14:00 Steuart at Market 74 326 28 10 2 morning 00:09:00 0 8.700000 9 8.70 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198289 561 2014-02-28 09:50:00 68 2014-02-28 10:00:00 Market at 10th 67 518 28 9 2 morning 00:10:00 1 9.350000 9 9.35 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198453 282 2014-02-28 16:19:00 68 2014-02-28 16:24:00 San Francisco Caltrain (Townsend at 4th) 70 437 28 16 2 evening 00:05:00 0 4.700000 5 4.70 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198710 323 2014-02-28 19:13:00 68 2014-02-28 19:19:00 San Francisco Caltrain 2 (330 Townsend) 69 339 28 19 2 evening 00:06:00 0 5.383333 5 5.38 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198685 271 2014-02-28 18:48:00 68 2014-02-28 18:52:00 San Francisco Caltrain (Townsend at 4th) 70 454 28 18 2 evening 00:04:00 0 4.516667 5 4.52 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198087 159 2014-02-28 07:42:00 68 2014-02-28 07:44:00 5th at Howard 57 435 28 7 2 morning 00:02:00 0 2.650000 3 2.65 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198639 342 2014-02-28 18:13:00 68 2014-02-28 18:19:00 Civic Center BART (7th at Market) 72 292 28 18 2 evening 00:06:00 0 5.700000 6 5.70 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198566 338 2014-02-28 17:20:00 68 2014-02-28 17:25:00 San Francisco Caltrain (Townsend at 4th) 70 477 28 17 2 evening 00:05:00 0 5.633333 6 5.63 in_thirty

113647 rows × 52 columns

Querying data

By setting up the initial query, We can then change the inputs for interactive reporting of results. We can answer very specific questions by querying.

Let's created a grouped object, apply aggregate functions, and then query the data. In this table, we have the average duration for subscription type of customer only.

In [28]:
grouped1 = dmerge4.groupby(['landmark','subscriptiontype'])
group2 = grouped1[['duration']].agg([np.mean, np.std])
group2
query1 = group2.query('subscriptiontype == "Customer"')
query1
Out[28]:
duration
mean std
landmark subscriptiontype
Mountain View Customer 10679.844133 35553.063019
Palo Alto Customer 8856.419476 39184.301869
Redwood City Customer 11584.882716 23257.019736
San Francisco Customer 3212.968434 9861.482292
San Jose Customer 4675.431818 17693.747427

Return a dataset from a multi-index dataframe that contains only startterminal number 62.

In [71]:
#subseting and indexing a non-grouped multi index:
dmerge4twolevels.query('startterminal == 62')
Out[71]:
lubstartdate startstation tripid duration startdate startterminal enddate endstation endterminal bike day hour month timeofday diff hourdiff durationminutes duration_i duration_f thirtymin
landmark subscriptiontype
San Francisco Subscriber 2013-08-29 2nd at Folsom 4636 186 2013-08-29 15:11:00 62 2013-08-29 15:14:00 2nd at Townsend 61 366 29 15 8 evening 00:03:00 0 3.100000 3 3.10 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4820 813 2013-08-29 17:35:00 62 2013-08-29 17:48:00 Townsend at 7th 65 409 29 17 8 evening 00:13:00 0 13.550000 14 13.55 in_thirty
Customer 2013-08-29 2nd at Folsom 5001 236 2013-08-29 20:00:00 62 2013-08-29 20:04:00 Market at Sansome 77 567 29 20 8 night 00:04:00 0 3.933333 4 3.93 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4812 186 2013-08-29 17:30:00 62 2013-08-29 17:33:00 2nd at Folsom 62 409 29 17 8 evening 00:03:00 0 3.100000 3 3.10 in_thirty
Customer 2013-08-29 2nd at Folsom 4946 827 2013-08-29 19:07:00 62 2013-08-29 19:21:00 Embarcadero at Vallejo 48 342 29 19 8 evening 00:14:00 0 13.783333 14 13.78 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4390 825 2013-08-29 12:33:00 62 2013-08-29 12:47:00 2nd at South Park 64 331 29 12 8 mid_day 00:14:00 0 13.750000 14 13.75 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4939 816 2013-08-29 19:01:00 62 2013-08-29 19:14:00 Harry Bridges Plaza (Ferry Building) 50 539 29 19 8 evening 00:13:00 0 13.600000 14 13.60 in_thirty
Customer 2013-08-29 2nd at Folsom 4949 1524 2013-08-29 19:11:00 62 2013-08-29 19:36:00 Commercial at Montgomery 45 400 29 19 8 evening 00:25:00 0 25.400000 25 25.40 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4582 335 2013-08-29 14:14:00 62 2013-08-29 14:20:00 2nd at Folsom 62 342 29 14 8 mid_day 00:06:00 0 5.583333 6 5.58 in_thirty
Customer 2013-08-29 2nd at Folsom 4951 1409 2013-08-29 19:13:00 62 2013-08-29 19:36:00 Commercial at Montgomery 45 380 29 19 8 evening 00:23:00 0 23.483333 23 23.48 in_thirty
Customer 2013-08-29 2nd at Folsom 4945 797 2013-08-29 19:08:00 62 2013-08-29 19:21:00 Embarcadero at Vallejo 48 406 29 19 8 evening 00:13:00 0 13.283333 13 13.28 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4938 840 2013-08-29 19:01:00 62 2013-08-29 19:15:00 Harry Bridges Plaza (Ferry Building) 50 607 29 19 8 evening 00:14:00 0 14.000000 14 14.00 in_thirty
Subscriber 2013-08-30 2nd at Folsom 6115 69 2013-08-30 16:30:00 62 2013-08-30 16:31:00 2nd at Folsom 62 633 30 16 8 evening 00:01:00 0 1.150000 1 1.15 in_thirty
Customer 2013-08-30 2nd at Folsom 5370 1074 2013-08-30 11:03:00 62 2013-08-30 11:21:00 Harry Bridges Plaza (Ferry Building) 50 559 30 11 8 mid_day 00:18:00 0 17.900000 18 17.90 in_thirty
Customer 2013-08-30 2nd at Folsom 5210 493 2013-08-30 09:36:00 62 2013-08-30 09:44:00 Clay at Battery 41 360 30 9 8 morning 00:08:00 0 8.216667 8 8.22 in_thirty
Customer 2013-08-30 2nd at Folsom 5371 1052 2013-08-30 11:03:00 62 2013-08-30 11:21:00 Harry Bridges Plaza (Ferry Building) 50 385 30 11 8 mid_day 00:18:00 0 17.533333 18 17.53 in_thirty
Subscriber 2013-08-30 2nd at Folsom 6125 393 2013-08-30 16:35:00 62 2013-08-30 16:42:00 San Francisco Caltrain (Townsend at 4th) 70 633 30 16 8 evening 00:07:00 0 6.550000 7 6.55 in_thirty
Customer 2013-08-30 2nd at Folsom 6292 229 2013-08-30 18:15:00 62 2013-08-30 18:18:00 Market at Sansome 77 272 30 18 8 evening 00:03:00 0 3.816667 4 3.82 in_thirty
Customer 2013-08-30 2nd at Folsom 5222 27103 2013-08-30 09:55:00 62 2013-08-30 17:27:00 Townsend at 7th 65 396 30 9 8 morning 07:32:00 8 451.716667 452 451.72 over_thirty
Subscriber 2013-08-30 2nd at Folsom 6310 655 2013-08-30 18:22:00 62 2013-08-30 18:33:00 Civic Center BART (7th at Market) 72 341 30 18 8 evening 00:11:00 0 10.916667 11 10.92 in_thirty
Subscriber 2013-08-30 2nd at Folsom 6343 689 2013-08-30 18:49:00 62 2013-08-30 19:00:00 Embarcadero at Sansome 60 324 30 18 8 evening 00:11:00 1 11.483333 11 11.48 in_thirty
Subscriber 2013-08-30 2nd at Folsom 5702 170 2013-08-30 13:43:00 62 2013-08-30 13:46:00 2nd at South Park 64 442 30 13 8 mid_day 00:03:00 0 2.833333 3 2.83 in_thirty
Subscriber 2013-08-30 2nd at Folsom 6117 213 2013-08-30 16:31:00 62 2013-08-30 16:35:00 2nd at Folsom 62 633 30 16 8 evening 00:04:00 0 3.550000 4 3.55 in_thirty
Customer 2013-08-31 2nd at Folsom 6916 1455 2013-08-31 13:36:00 62 2013-08-31 14:00:00 Harry Bridges Plaza (Ferry Building) 50 291 31 13 8 mid_day 00:24:00 1 24.250000 24 24.25 in_thirty
Customer 2013-08-31 2nd at Folsom 6614 382 2013-08-31 10:52:00 62 2013-08-31 10:59:00 Post at Kearney 47 505 31 10 8 morning 00:07:00 0 6.366667 6 6.37 in_thirty
Customer 2013-08-31 2nd at Folsom 7019 671 2013-08-31 14:30:00 62 2013-08-31 14:42:00 Civic Center BART (7th at Market) 72 425 31 14 8 mid_day 00:12:00 0 11.183333 11 11.18 in_thirty
Customer 2013-08-31 2nd at Folsom 7041 5061 2013-08-31 14:43:00 62 2013-08-31 16:07:00 Powell at Post (Union Square) 71 588 31 14 8 mid_day 01:24:00 2 84.350000 84 84.35 over_thirty
Customer 2013-08-31 2nd at Folsom 7372 262 2013-08-31 18:42:00 62 2013-08-31 18:47:00 Market at Sansome 77 584 31 18 8 evening 00:05:00 0 4.366667 4 4.37 in_thirty
Customer 2013-08-31 2nd at Folsom 6835 679 2013-08-31 13:04:00 62 2013-08-31 13:15:00 2nd at Folsom 62 267 31 13 8 mid_day 00:11:00 0 11.316667 11 11.32 in_thirty
Customer 2013-08-31 2nd at Folsom 7042 5067 2013-08-31 14:43:00 62 2013-08-31 16:08:00 Powell at Post (Union Square) 71 439 31 14 8 mid_day 01:25:00 2 84.450000 84 84.45 over_thirty
Subscriber 2014-02-27 2nd at Folsom 197102 403 2014-02-27 08:59:00 62 2014-02-27 09:05:00 5th at Howard 57 558 27 8 2 morning 00:06:00 1 6.716667 7 6.72 in_thirty
Subscriber 2014-02-27 2nd at Folsom 196871 301 2014-02-27 06:44:00 62 2014-02-27 06:49:00 San Francisco Caltrain (Townsend at 4th) 70 386 27 6 2 morning 00:05:00 0 5.016667 5 5.02 in_thirty
Subscriber 2014-02-27 2nd at Folsom 197328 158 2014-02-27 12:21:00 62 2014-02-27 12:23:00 2nd at South Park 64 549 27 12 2 mid_day 00:02:00 0 2.633333 3 2.63 in_thirty
Subscriber 2014-02-27 2nd at Folsom 197822 484 2014-02-27 18:33:00 62 2014-02-27 18:41:00 Davis at Jackson 42 397 27 18 2 evening 00:08:00 0 8.066667 8 8.07 in_thirty
Subscriber 2014-02-27 2nd at Folsom 197942 1190 2014-02-27 20:01:00 62 2014-02-27 20:20:00 Golden Gate at Polk 59 576 27 20 2 night 00:19:00 0 19.833333 20 19.83 in_thirty
Subscriber 2014-02-27 2nd at Folsom 197166 659 2014-02-27 09:31:00 62 2014-02-27 09:42:00 2nd at Townsend 61 502 27 9 2 morning 00:11:00 0 10.983333 11 10.98 in_thirty
Subscriber 2014-02-27 2nd at Folsom 197330 403 2014-02-27 12:24:00 62 2014-02-27 12:31:00 5th at Howard 57 560 27 12 2 mid_day 00:07:00 0 6.716667 7 6.72 in_thirty
Subscriber 2014-02-27 2nd at Folsom 197353 349 2014-02-27 12:48:00 62 2014-02-27 12:54:00 Embarcadero at Folsom 51 335 27 12 2 mid_day 00:06:00 0 5.816667 6 5.82 in_thirty
Subscriber 2014-02-27 2nd at Folsom 197641 446 2014-02-27 17:14:00 62 2014-02-27 17:22:00 Powell Street BART 53 449 27 17 2 evening 00:08:00 0 7.433333 7 7.43 in_thirty
Subscriber 2014-02-27 2nd at Folsom 197573 203 2014-02-27 16:32:00 62 2014-02-27 16:36:00 Market at Sansome 77 289 27 16 2 evening 00:04:00 0 3.383333 3 3.38 in_thirty
Subscriber 2014-02-27 2nd at Folsom 196847 175 2014-02-27 05:53:00 62 2014-02-27 05:56:00 Market at Sansome 77 472 27 5 2 morning 00:03:00 0 2.916667 3 2.92 in_thirty
Subscriber 2014-02-27 2nd at Folsom 197043 506 2014-02-27 08:43:00 62 2014-02-27 08:52:00 Washington at Kearney 46 399 27 8 2 morning 00:09:00 0 8.433333 8 8.43 in_thirty
Subscriber 2014-02-27 2nd at Folsom 197091 771 2014-02-27 08:57:00 62 2014-02-27 09:10:00 Townsend at 7th 65 542 27 8 2 morning 00:13:00 1 12.850000 13 12.85 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198421 1729 2014-02-28 14:54:00 62 2014-02-28 15:23:00 Spear at Folsom 49 379 28 14 2 mid_day 00:29:00 1 28.816667 29 28.82 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198718 76 2014-02-28 19:23:00 62 2014-02-28 19:24:00 Howard at 2nd 63 563 28 19 2 evening 00:01:00 0 1.266667 1 1.27 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198168 764 2014-02-28 08:46:00 62 2014-02-28 08:59:00 Grant Avenue at Columbus Avenue 73 222 28 8 2 morning 00:13:00 0 12.733333 13 12.73 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198579 288 2014-02-28 17:28:00 62 2014-02-28 17:33:00 Spear at Folsom 49 391 28 17 2 evening 00:05:00 0 4.800000 5 4.80 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198451 494 2014-02-28 16:16:00 62 2014-02-28 16:24:00 Powell at Post (Union Square) 71 463 28 16 2 evening 00:08:00 0 8.233333 8 8.23 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198222 450 2014-02-28 09:12:00 62 2014-02-28 09:20:00 Beale at Market 56 583 28 9 2 morning 00:08:00 0 7.500000 8 7.50 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198546 464 2014-02-28 17:10:00 62 2014-02-28 17:18:00 San Francisco Caltrain (Townsend at 4th) 70 259 28 17 2 evening 00:08:00 0 7.733333 8 7.73 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198349 513 2014-02-28 11:42:00 62 2014-02-28 11:50:00 Powell Street BART 53 408 28 11 2 mid_day 00:08:00 0 8.550000 9 8.55 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198174 720 2014-02-28 08:50:00 62 2014-02-28 09:02:00 Townsend at 7th 65 389 28 8 2 morning 00:12:00 1 12.000000 12 12.00 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198369 252 2014-02-28 12:44:00 62 2014-02-28 12:49:00 Spear at Folsom 49 589 28 12 2 mid_day 00:05:00 0 4.200000 4 4.20 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198610 378 2014-02-28 17:57:00 62 2014-02-28 18:03:00 5th at Howard 57 487 28 17 2 evening 00:06:00 1 6.300000 6 6.30 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198489 216 2014-02-28 16:44:00 62 2014-02-28 16:47:00 2nd at Townsend 61 557 28 16 2 evening 00:03:00 0 3.600000 4 3.60 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198768 225 2014-02-28 21:54:00 62 2014-02-28 21:57:00 Yerba Buena Center of the Arts (3rd @ Howard) 68 635 28 21 2 night 00:03:00 0 3.750000 4 3.75 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198573 413 2014-02-28 17:27:00 62 2014-02-28 17:34:00 Powell Street BART 53 373 28 17 2 evening 00:07:00 0 6.883333 7 6.88 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198172 491 2014-02-28 08:48:00 62 2014-02-28 08:57:00 San Francisco Caltrain (Townsend at 4th) 70 518 28 8 2 morning 00:09:00 0 8.183333 8 8.18 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198717 615 2014-02-28 19:22:00 62 2014-02-28 19:33:00 Civic Center BART (7th at Market) 72 603 28 19 2 evening 00:11:00 0 10.250000 10 10.25 in_thirty
Subscriber 2014-02-28 2nd at Folsom 198606 663 2014-02-28 17:50:00 62 2014-02-28 18:01:00 Powell at Post (Union Square) 71 604 28 17 2 evening 00:11:00 1 11.050000 11 11.05 in_thirty

3776 rows × 52 columns

Query a multi-index dataframe for subscriptiontype of subscriber.

In [32]:
dmerge4twolevels.query('subscriptiontype == "Subscriber"')
Out[32]:
lubstartdate startstation tripid duration startdate startterminal enddate endstation endterminal bike day hour month timeofday diff hourdiff durationminutes duration_i duration_f thirtymin
landmark subscriptiontype
San Francisco Subscriber 2013-08-29 2nd at Folsom 4636 186 2013-08-29 15:11:00 62 2013-08-29 15:14:00 2nd at Townsend 61 366 29 15 8 evening 00:03:00 0 3.100000 3 3.10 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4820 813 2013-08-29 17:35:00 62 2013-08-29 17:48:00 Townsend at 7th 65 409 29 17 8 evening 00:13:00 0 13.550000 14 13.55 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4812 186 2013-08-29 17:30:00 62 2013-08-29 17:33:00 2nd at Folsom 62 409 29 17 8 evening 00:03:00 0 3.100000 3 3.10 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4390 825 2013-08-29 12:33:00 62 2013-08-29 12:47:00 2nd at South Park 64 331 29 12 8 mid_day 00:14:00 0 13.750000 14 13.75 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4939 816 2013-08-29 19:01:00 62 2013-08-29 19:14:00 Harry Bridges Plaza (Ferry Building) 50 539 29 19 8 evening 00:13:00 0 13.600000 14 13.60 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4582 335 2013-08-29 14:14:00 62 2013-08-29 14:20:00 2nd at Folsom 62 342 29 14 8 mid_day 00:06:00 0 5.583333 6 5.58 in_thirty
Subscriber 2013-08-29 2nd at Folsom 4938 840 2013-08-29 19:01:00 62 2013-08-29 19:15:00 Harry Bridges Plaza (Ferry Building) 50 607 29 19 8 evening 00:14:00 0 14.000000 14 14.00 in_thirty
Subscriber 2013-08-29 2nd at South Park 4557 130 2013-08-29 13:57:00 64 2013-08-29 13:59:00 2nd at South Park 64 371 29 13 8 mid_day 00:02:00 0 2.166667 2 2.17 in_thirty
Subscriber 2013-08-29 2nd at South Park 4069 174 2013-08-29 09:08:00 64 2013-08-29 09:11:00 2nd at South Park 64 288 29 9 8 morning 00:03:00 0 2.900000 3 2.90 in_thirty
Subscriber 2013-08-29 2nd at South Park 4564 1693 2013-08-29 14:04:00 64 2013-08-29 14:32:00 2nd at South Park 64 272 29 14 8 mid_day 00:28:00 0 28.216667 28 28.22 in_thirty
Subscriber 2013-08-29 2nd at South Park 4465 419 2013-08-29 13:11:00 64 2013-08-29 13:18:00 Post at Kearney 47 498 29 13 8 mid_day 00:07:00 0 6.983333 7 6.98 in_thirty
Subscriber 2013-08-29 2nd at South Park 4566 1699 2013-08-29 14:04:00 64 2013-08-29 14:33:00 2nd at South Park 64 371 29 14 8 mid_day 00:29:00 0 28.316667 28 28.32 in_thirty
Subscriber 2013-08-29 2nd at South Park 5083 755 2013-08-29 22:05:00 64 2013-08-29 22:18:00 Harry Bridges Plaza (Ferry Building) 50 270 29 22 8 night 00:13:00 0 12.583333 13 12.58 in_thirty
Subscriber 2013-08-29 2nd at South Park 4503 728 2013-08-29 13:29:00 64 2013-08-29 13:41:00 Embarcadero at Vallejo 48 331 29 13 8 mid_day 00:12:00 0 12.133333 12 12.13 in_thirty
Subscriber 2013-08-29 2nd at South Park 4645 850 2013-08-29 15:20:00 64 2013-08-29 15:35:00 San Francisco Caltrain (Townsend at 4th) 70 288 29 15 8 evening 00:15:00 0 14.166667 14 14.17 in_thirty
Subscriber 2013-08-29 2nd at South Park 4426 982 2013-08-29 12:52:00 64 2013-08-29 13:08:00 2nd at South Park 64 593 29 12 8 mid_day 00:16:00 1 16.366667 16 16.37 in_thirty
Subscriber 2013-08-29 2nd at South Park 4640 335 2013-08-29 15:19:00 64 2013-08-29 15:24:00 San Francisco Caltrain (Townsend at 4th) 70 438 29 15 8 evening 00:05:00 0 5.583333 6 5.58 in_thirty
Subscriber 2013-08-29 2nd at South Park 4567 1684 2013-08-29 14:04:00 64 2013-08-29 14:33:00 2nd at South Park 64 569 29 14 8 mid_day 00:29:00 0 28.066667 28 28.07 in_thirty
Subscriber 2013-08-29 2nd at Townsend 4411 223 2013-08-29 12:40:00 61 2013-08-29 12:43:00 San Francisco Caltrain 2 (330 Townsend) 69 259 29 12 8 mid_day 00:03:00 0 3.716667 4 3.72 in_thirty
Subscriber 2013-08-29 5th at Howard 4909 1220 2013-08-29 18:36:00 57 2013-08-29 18:57:00 Embarcadero at Vallejo 48 404 29 18 8 evening 00:21:00 0 20.333333 20 20.33 in_thirty
Subscriber 2013-08-29 5th at Howard 4970 404 2013-08-29 19:36:00 57 2013-08-29 19:43:00 Powell at Post (Union Square) 71 320 29 19 8 evening 00:07:00 0 6.733333 7 6.73 in_thirty
Subscriber 2013-08-29 5th at Howard 4657 298 2013-08-29 15:30:00 57 2013-08-29 15:35:00 5th at Howard 57 631 29 15 8 evening 00:05:00 0 4.966667 5 4.97 in_thirty
Subscriber 2013-08-29 5th at Howard 4892 1492 2013-08-29 18:21:00 57 2013-08-29 18:46:00 Embarcadero at Vallejo 48 577 29 18 8 evening 00:25:00 0 24.866667 25 24.87 in_thirty
Subscriber 2013-08-29 5th at Howard 4950 1424 2013-08-29 19:12:00 57 2013-08-29 19:35:00 Harry Bridges Plaza (Ferry Building) 50 508 29 19 8 evening 00:23:00 0 23.733333 24 23.73 in_thirty
San Jose Subscriber 2013-08-29 Adobe on Almaden 4419 556 2013-08-29 12:45:00 5 2013-08-29 12:55:00 San Jose Civic Center 3 671 29 12 8 mid_day 00:10:00 0 9.266667 9 9.27 in_thirty
Subscriber 2013-08-29 Adobe on Almaden 4741 412 2013-08-29 16:45:00 5 2013-08-29 16:52:00 San Jose City Hall 10 679 29 16 8 evening 00:07:00 0 6.866667 7 6.87 in_thirty
Subscriber 2013-08-29 Adobe on Almaden 4502 210 2013-08-29 13:28:00 5 2013-08-29 13:31:00 San Jose Civic Center 3 664 29 13 8 mid_day 00:03:00 0 3.500000 4 3.50 in_thirty
Subscriber 2013-08-29 Arena Green / SAP Center 4365 1094 2013-08-29 12:24:00 14 2013-08-29 12:42:00 Arena Green / SAP Center 14 647 29 12 8 mid_day 00:18:00 0 18.233333 18 18.23 in_thirty
San Francisco Subscriber 2013-08-29 Beale at Market 4787 479 2013-08-29 17:12:00 56 2013-08-29 17:20:00 Market at 4th 76 546 29 17 8 evening 00:08:00 0 7.983333 8 7.98 in_thirty
Subscriber 2013-08-29 Beale at Market 4374 563 2013-08-29 12:23:00 56 2013-08-29 12:32:00 Commercial at Montgomery 45 410 29 12 8 mid_day 00:09:00 0 9.383333 9 9.38 in_thirty
Subscriber 2014-02-28 Temporary Transbay Terminal (Howard at Beale) 198170 888 2014-02-28 08:47:00 55 2014-02-28 09:02:00 Grant Avenue at Columbus Avenue 73 319 28 8 2 morning 00:15:00 1 14.800000 15 14.80 in_thirty
Subscriber 2014-02-28 Temporary Transbay Terminal (Howard at Beale) 198105 277 2014-02-28 07:59:00 55 2014-02-28 08:03:00 Harry Bridges Plaza (Ferry Building) 50 420 28 7 2 morning 00:04:00 1 4.616667 5 4.62 in_thirty
Subscriber 2014-02-28 Temporary Transbay Terminal (Howard at Beale) 198483 839 2014-02-28 16:41:00 55 2014-02-28 16:55:00 San Francisco Caltrain (Townsend at 4th) 70 281 28 16 2 evening 00:14:00 0 13.983333 14 13.98 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198770 850 2014-02-28 22:19:00 65 2014-02-28 22:34:00 Post at Kearney 47 479 28 22 2 night 00:15:00 0 14.166667 14 14.17 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198405 317 2014-02-28 14:08:00 65 2014-02-28 14:13:00 San Francisco Caltrain (Townsend at 4th) 70 412 28 14 2 mid_day 00:05:00 0 5.283333 5 5.28 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198750 254 2014-02-28 20:22:00 65 2014-02-28 20:26:00 San Francisco Caltrain (Townsend at 4th) 70 268 28 20 2 night 00:04:00 0 4.233333 4 4.23 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198468 267 2014-02-28 16:33:00 65 2014-02-28 16:38:00 San Francisco Caltrain (Townsend at 4th) 70 400 28 16 2 evening 00:05:00 0 4.450000 4 4.45 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198213 854 2014-02-28 09:06:00 65 2014-02-28 09:20:00 Market at Sansome 77 409 28 9 2 morning 00:14:00 0 14.233333 14 14.23 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198098 558 2014-02-28 07:55:00 65 2014-02-28 08:04:00 South Van Ness at Market 66 515 28 7 2 morning 00:09:00 1 9.300000 9 9.30 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198587 610 2014-02-28 17:32:00 65 2014-02-28 17:42:00 Harry Bridges Plaza (Ferry Building) 50 435 28 17 2 evening 00:10:00 0 10.166667 10 10.17 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198648 408 2014-02-28 18:20:00 65 2014-02-28 18:27:00 Civic Center BART (7th at Market) 72 414 28 18 2 evening 00:07:00 0 6.800000 7 6.80 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198465 891 2014-02-28 16:32:00 65 2014-02-28 16:46:00 Temporary Transbay Terminal (Howard at Beale) 55 570 28 16 2 evening 00:14:00 0 14.850000 15 14.85 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198633 243 2014-02-28 18:08:00 65 2014-02-28 18:12:00 San Francisco Caltrain (Townsend at 4th) 70 389 28 18 2 evening 00:04:00 0 4.050000 4 4.05 in_thirty
Subscriber 2014-02-28 Townsend at 7th 198128 221 2014-02-28 08:12:00 65 2014-02-28 08:16:00 San Francisco Caltrain (Townsend at 4th) 70 514 28 8 2 morning 00:04:00 0 3.683333 4 3.68 in_thirty
Palo Alto Subscriber 2014-02-28 University and Emerson 198399 291 2014-02-28 13:49:00 35 2014-02-28 13:54:00 University and Emerson 35 165 28 13 2 mid_day 00:05:00 0 4.850000 5 4.85 in_thirty
Subscriber 2014-02-28 University and Emerson 198625 233 2014-02-28 18:03:00 35 2014-02-28 18:07:00 Cowper at University 37 138 28 18 2 evening 00:04:00 0 3.883333 4 3.88 in_thirty
San Francisco Subscriber 2014-02-28 Washington at Kearney 198275 570 2014-02-28 09:44:00 46 2014-02-28 09:54:00 Powell Street BART 53 384 28 9 2 morning 00:10:00 0 9.500000 10 9.50 in_thirty
Subscriber 2014-02-28 Washington at Kearney 198277 918 2014-02-28 09:45:00 46 2014-02-28 10:00:00 South Van Ness at Market 66 399 28 9 2 morning 00:15:00 1 15.300000 15 15.30 in_thirty
Subscriber 2014-02-28 Washington at Kearney 198671 956 2014-02-28 18:34:00 46 2014-02-28 18:50:00 Embarcadero at Bryant 54 548 28 18 2 evening 00:16:00 0 15.933333 16 15.93 in_thirty
Subscriber 2014-02-28 Washington at Kearney 198145 555 2014-02-28 08:28:00 46 2014-02-28 08:37:00 Howard at 2nd 63 407 28 8 2 morning 00:09:00 0 9.250000 9 9.25 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198533 292 2014-02-28 17:03:00 68 2014-02-28 17:08:00 San Francisco Caltrain 2 (330 Townsend) 69 479 28 17 2 evening 00:05:00 0 4.866667 5 4.87 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198491 278 2014-02-28 16:44:00 68 2014-02-28 16:49:00 San Francisco Caltrain (Townsend at 4th) 70 314 28 16 2 evening 00:05:00 0 4.633333 5 4.63 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198305 522 2014-02-28 10:05:00 68 2014-02-28 10:14:00 Steuart at Market 74 326 28 10 2 morning 00:09:00 0 8.700000 9 8.70 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198289 561 2014-02-28 09:50:00 68 2014-02-28 10:00:00 Market at 10th 67 518 28 9 2 morning 00:10:00 1 9.350000 9 9.35 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198453 282 2014-02-28 16:19:00 68 2014-02-28 16:24:00 San Francisco Caltrain (Townsend at 4th) 70 437 28 16 2 evening 00:05:00 0 4.700000 5 4.70 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198710 323 2014-02-28 19:13:00 68 2014-02-28 19:19:00 San Francisco Caltrain 2 (330 Townsend) 69 339 28 19 2 evening 00:06:00 0 5.383333 5 5.38 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198685 271 2014-02-28 18:48:00 68 2014-02-28 18:52:00 San Francisco Caltrain (Townsend at 4th) 70 454 28 18 2 evening 00:04:00 0 4.516667 5 4.52 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198087 159 2014-02-28 07:42:00 68 2014-02-28 07:44:00 5th at Howard 57 435 28 7 2 morning 00:02:00 0 2.650000 3 2.65 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198639 342 2014-02-28 18:13:00 68 2014-02-28 18:19:00 Civic Center BART (7th at Market) 72 292 28 18 2 evening 00:06:00 0 5.700000 6 5.70 in_thirty
Subscriber 2014-02-28 Yerba Buena Center of the Arts (3rd @ Howard) 198566 338 2014-02-28 17:20:00 68 2014-02-28 17:25:00 San Francisco Caltrain (Townsend at 4th) 70 477 28 17 2 evening 00:05:00 0 5.633333 6 5.63 in_thirty

113647 rows × 52 columns

What is the average duration in descending order by endstation when the startterminal is number 62?

We can get this type of granular data by querying. By setting up the initial query, we can then change the inputs for interactive reporting of results.

In [46]:
duration62 = dmerge4twolevels.query('startterminal == 62').groupby('endstation')['duration_f'].mean().order(ascending=False)
duration62
Out[46]:
endstation
2nd at Folsom                                    84.510405
Beale at Market                                  28.968750
Civic Center BART (7th at Market)                25.788571
Embarcadero at Vallejo                           22.056000
Powell at Post (Union Square)                    21.702200
Embarcadero at Sansome                           20.633906
Mechanics Plaza (Market at Battery)              18.325526
Golden Gate at Polk                              18.093333
San Francisco Caltrain (Townsend at 4th)         17.642372
South Van Ness at Market                         16.725909
Commercial at Montgomery                         15.988511
Market at 10th                                   15.887500
Townsend at 7th                                  15.233402
Grant Avenue at Columbus Avenue                  15.224000
San Francisco City Hall                          14.389286
Davis at Jackson                                 11.523784
Broadway St at Battery St                        11.013333
Washington at Kearney                            10.850811
Harry Bridges Plaza (Ferry Building)             10.394581
5th at Howard                                     9.919462
Powell Street BART                                9.627881
Clay at Battery                                   9.399346
Post at Kearney                                   8.780957
Steuart at Market                                 8.627549
Embarcadero at Bryant                             8.034000
Yerba Buena Center of the Arts (3rd @ Howard)     7.567119
Market at 4th                                     7.431879
San Francisco Caltrain 2 (330 Townsend)           7.380276
Embarcadero at Folsom                             5.832000
Temporary Transbay Terminal (Howard at Beale)     5.819636
Spear at Folsom                                   5.556141
2nd at South Park                                 5.551429
2nd at Townsend                                   4.662283
Market at Sansome                                 4.334730
Howard at 2nd                                     3.326883
Name: duration_f, dtype: float64

Merging and concatenating

Let's concatenate the morning monthly and evening monthly created in the time series section into a dataframe for the plotting notebook.

In [6]:
concatenated = pd.concat([morning_monthly,evening_monthly], keys=['morning', 'evening'],axis=1)
concatenated
Out[6]:
morning evening
duration_f duration_f
startdate
2013-08-31 37.648913 37.240278
2013-09-30 17.973624 24.621531
2013-10-31 13.302877 16.752767
2013-11-30 13.209249 17.277542
2013-12-31 15.384694 14.980330
2014-01-31 11.679421 14.962542
2014-02-28 11.441120 15.718395

Merge same named column from two datasets but keep both columns but rename them with a suffix

In [27]:
merge = pd.merge(morning_monthly,evening_monthly,how='outer',left_index=True, right_index=True,suffixes=['_morning', '_evening'])
merge
Out[27]:
duration_f_morning duration_f_evening
startdate
2013-08-31 37.648913 37.240278
2013-09-30 17.973624 24.621531
2013-10-31 13.302877 16.752767
2013-11-30 13.209249 17.277542
2013-12-31 15.384694 14.980330
2014-01-31 11.679421 14.962542
2014-02-28 11.441120 15.718395

Ad-hoc data summary or string operations

Use a lambda function to get summary statistics of average duration by landmark. The maximum average duration was in San Jose, the highest average duration was in Palo Alto and the lowest was in San Francisco.

In [11]:
dmerge4.groupby("landmark")[['duration_f']].apply(lambda x: x.describe()).unstack()
Out[11]:
duration_f
count mean std min 25% 50% 75% max
landmark
Mountain View 2728 44.690359 283.766147 1.03 4.100 5.18 13.080 9772.60
Palo Alto 1706 77.261483 459.154620 1.10 6.055 15.57 34.865 12037.27
Redwood City 793 49.096608 233.252426 1.00 3.800 4.73 7.830 3831.90
San Francisco 129853 18.899424 81.849843 1.00 6.000 8.98 13.280 7156.40
San Jose 8935 23.244376 173.987471 1.03 4.770 7.42 11.320 10322.03

Make a grouped object without making it the index.

Another use of the lambda function is to map a function. For example sake, to check the length of a string.

In [12]:
#apply a lambda function
f = lambda x: len(str(x))
dmerge4[['landmark']].applymap(f)[:3]
Out[12]:
landmark
startdate
2013-08-29 15:11:00 13
2013-08-29 17:35:00 13
2013-08-29 20:00:00 13

Here is another string operation by way of mapping a lambda function. Very useful.

In [13]:
dmerge4['endstation'].map(lambda x: x.startswith('Grant')).head()
Out[13]:
startdate
2013-08-29 15:11:00    False
2013-08-29 17:35:00    False
2013-08-29 20:00:00    False
2013-08-29 17:30:00    False
2013-08-29 19:07:00    False
Name: endstation, dtype: bool

Make a grouped object without making it the index.

In [3]:
dmerge4.groupby('landmark', as_index=False).mean() 
Out[3]:
landmark tripid duration startterminal endterminal bike max_temperature_f mean_temperature_f min_temperaturef max_dew_point_f lat long dockcount day hour month hourdiff durationminutes duration_i duration_f
0 Mountain View 113506.415689 2681.418988 28.782991 28.980572 266.556085 67.732405 54.907991 44.681085 48.479106 37.394312 -122.083341 18.888563 15.521261 13.072947 6.898827 0.251833 44.690316 44.682918 44.690359
1 Palo Alto 92832.010551 4635.689918 35.669988 35.193435 208.521688 69.908558 59.389215 48.392145 48.810082 37.440460 -122.156431 15.171161 15.759086 13.461313 7.947831 0.566237 77.261499 77.262603 77.261483
2 Redwood City 82721.219420 2945.804540 23.134931 24.148802 237.013871 68.611602 59.397226 50.197982 51.931904 37.486049 -122.229363 19.766709 15.145019 12.639344 8.464061 0.161412 49.096742 49.084489 49.096608
3 San Francisco 101092.908674 1133.965253 61.825657 61.868821 455.360123 66.115962 58.201636 49.776278 49.848767 37.787144 -122.400225 19.565963 15.746290 13.177293 7.668687 0.231462 18.899421 18.899656 18.899424
4 San Jose 98496.783100 1394.662899 7.371684 7.563290 284.057862 69.322104 58.393397 46.986682 47.791046 37.334148 -121.892161 17.977952 15.714605 13.638500 7.585115 0.095467 23.244382 23.240963 23.244376

5 rows × 38 columns