投稿時間:2023-06-19 02:07:28 RSSフィード2023-06-19 02:00 分まとめ(9件)

カテゴリー等 サイト名等 記事タイトル・トレンドワード等 リンクURL 頻出ワード・要約等/検索ボリューム 登録日
AWS AWSタグが付けられた新着投稿 - Qiita EC2 Instance Connect Endpoint 以前を確認していく https://qiita.com/kado__gen/items/2921d391c2daea94f617 ecinstanceco 2023-06-19 01:37:50
海外TECH MakeUseOf How to Fix Stuttering in Warhammer 40,000: Boltgun on Windows https://www.makeuseof.com/stuttering-warhammer-40000-boltgun-windows/ boltgun 2023-06-18 16:15:20
海外TECH DEV Community Mastering MultiIndexes in Pandas: A Powerful Tool for Complex Data Analysis https://dev.to/glennviroux/mastering-multiindexes-in-pandas-a-powerful-tool-for-complex-data-analysis-4lkf Mastering MultiIndexes in Pandas A Powerful Tool for Complex Data AnalysisPandas is a widely used data manipulation library in Python that offers extensive capabilities for handling various types of data One of its notable features is the ability to work with MultiIndexes also known as hierarchical indexes In this blog post we will delve into the concept of MultiIndexes and explore how they can be leveraged to tackle complex multidimensional datasets Understanding MultiIndexes Analyzing Sports Performance DataA MultiIndex is a pandas data structure that allows indexing and accessing data across multiple dimensions or levels It enables the creation of hierarchical structures for rows and columns providing a flexible way to organize and analyze data To illustrate this let s consider a scenario where you are a personal trainer or coach monitoring the health parameters of your athletes during their sports activities You want to track various parameters such as heart rate running pace and cadence over a specific time interval Synthetic Health Performance DataTo work with this type of data let s begin by writing Python code that simulates health performance data specifically heart rates and running cadences from future import annotationsfrom datetime import datetime timedeltaimport numpy as npimport pandas as pdstart datetime end start timedelta hours minutes timestamps pd date range start end freq timedelta minutes inclusive left def get heart rate begin hr int end hr int break point int gt pd Series float noise np random normal loc scale size heart rate np concatenate np linspace begin hr end hr num break point end hr break point noise return pd Series data heart rate index timestamps def get cadence mean cadence int gt pd Series float noise np random normal loc scale size cadence pd Series data mean cadence noise index timestamps cadence np NAN cadence np NAN return cadence ffill fillna mean cadence The code snippet provided showcases the generation of synthetic data for heart rate and cadence during a sports activity It begins by importing the necessary modules such as datetime numpy and pandas The duration of the sports activity is defined as minutes and the pd date range function is utilized to generate a series of timestamps at one minute intervals to cover this period The get heart rate function generates synthetic heart rate data assuming a linear increase in heart rate up to a certain level followed by a constant level for the remainder of the activity Gaussian noise is introduced to add variability to the heart rate data making it more realistic Similarly the get cadence function generates synthetic cadence data assuming a relatively constant cadence throughout the activity Gaussian noise is added to create variability in the cadence values with the noise values being updated every three minutes instead of every minute reflecting the stability of cadence compared to heart rates With the data generation functions in place it is now possible to create synthetic data for two athletes Bob and Alice bob hr get heart rate begin hr end hr break point alice hr get heart rate begin hr end hr break point bob cadence get cadence mean cadence alice cadence get cadence mean cadence At this point we have the heart rates and cadences of Bob and Alice Let s plot them using matplotlib to get some more insight into the data from future import annotationsimport matplotlib dates as mdatesimport matplotlib pyplot as pltdate formatter mdates DateFormatter H M S Customize the date format as neededfig plt figure figsize ax fig add subplot ax xaxis set major formatter date formatter ax plot bob hr color red label Heart Rate Bob marker ax plot alice hr color red label Heart Rate Alice marker v ax grid ax legend ax set ylabel Heart Rate BPM ax set xlabel Time ax cadence ax twinx ax cadence plot bob cadence color purple label Cadence Bob marker alpha ax cadence plot alice cadence color purple label Cadence Alice marker v alpha ax cadence legend ax cadence set ylabel Cadence SPM ax cadence set ylim Great The initial analysis of the data provides interesting observations We can easily distinguish the differences between Bob and Alice in terms of their maximum heart rate and the rate at which it increases Additionally Bob s cadence appears to be notably higher than Alice s Using Dataframes for ScalabilityHowever as you might have already noticed the current approach of using separate variables bob hr alice hr bob cadence and alice cadence for each health parameter and athlete is not scalable In real world scenarios with a larger number of athletes and health parameters this approach quickly becomes impractical and cumbersome To address this issue we can leverage the power of pandas by utilizing a pandas DataFrame to represent the data for multiple athletes and health parameters By organizing the data in a tabular format we can easily manage and analyze multiple variables simultaneously Each row of the DataFrame can correspond to a specific timestamp and each column can represent a health parameter for a particular athlete This structure allows for efficient storage and manipulation of multidimensional data By using a DataFrame we can eliminate the need for separate variables and store all the data in a single object This enhances code clarity simplifies data handling and provides a more intuitive representation of the overall dataset bob df pd concat bob hr rename heart rate bob cadence rename cadence axis columns This is what the Dataframe for Bob s health data looks like heart ratecadence Introducing Hierarchical DataframesThe last dataframe looks better already But now we still have to create a new dataframe for each athlete This is where pandas MultiIndex can help Let s take a look at how we can elegantly merge the data of multiple athletes and health parameters into one dataframe from itertools import productbob df bob hr to frame value bob df athlete Bob bob df parameter heart rate values Bob heart rate bob hr cadence bob cadence Alice heart rate alice hr cadence alice cadence sub dataframes list pd DataFrame for athlete parameter in product Bob Alice heart rate cadence sub df values athlete parameter to frame values sub df athlete athlete sub df parameter parameter sub dataframes append sub df df pd concat sub dataframes set index athlete parameter append True df index df index set names timestamps athlete parameter This code processes heart rate and cadence data for athletes Bob and Alice It performs the following steps Create a DataFrame for Bob s heart rate data and add metadata columns for athlete and parameter Define a dictionary that stores heart rate and cadence data for Bob and Alice Generate combinations of athletes and parameters Bob Alice and heart rate cadence For each combination create a sub dataframe with the corresponding data and metadata columns Concatenate all sub dataframes into a single dataframe Set the index to include levels for timestamps athlete and parameter This is where the actual MultiIndex is createdThis is what the hierarchical dataframe df looks like values Timestamp Bob heart rate Timestamp Bob heart rate Timestamp Bob heart rate Timestamp Bob heart rate Timestamp Bob heart rate Timestamp Bob heart rate Timestamp Bob heart rate Timestamp Bob heart rate Timestamp Bob heart rate At this point we have got ourselves a single dataframe that holds all information for an arbitrary amount of athletes and health parameters We can now easily use the xs method to query the hierarchical dataframe df xs Bob level athlete get all health data for Bobvalues Timestamp heart rate Timestamp heart rate Timestamp heart rate Timestamp heart rate Timestamp heart rate df xs heart rate level parameter get all heart rates values Timestamp Bob Timestamp Bob Timestamp Bob Timestamp Bob Timestamp Bob df xs Bob level athlete xs heart rate level parameter get heart rate data for Bobtimestampsvalues Use Case Earth Temperature ChangesTo demonstrate the power of hierarchical dataframes let s explore a real world and complex use case analyzing the changes in Earth s surface temperatures over the last decades For this task we ll utilize a dataset available on Kaggle which summarizes the Global Surface Temperature Change data distributed by the National Aeronautics and Space Administration Goddard Institute for Space Studies NASA GISS Inspect and Transform Original DataLet s begin by reading and inspecting the data This step is crucial to gain a better understanding of the dataset s structure and contents before delving into the analysis Here s how we can accomplish that using pandas from pathlib import Pathfile path Path data Environment Temperature change E All Data NOFLAG csv df pd read csv file path encoding cp df describe Area CodeMonths CodeElement CodeYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYcountmeanstdmin maxFrom this initial inspection it becomes evident that the data is organized in a single dataframe with separate rows for different months and countries However the values for different years are spread across several columns in the dataframe labeled with the prefix Y This format makes it challenging to read and visualize the data effectively To address this issue we will transform the data into a more structured and hierarchical dataframe format enabling us to query and visualize the data more conveniently from dataclasses import dataclass fieldfrom datetime import datefrom pydantic import BaseModelMONTHS January February March April May June July August September October November December class GistempDataElement BaseModel area str timestamp date value float dataclassclass GistempTransformer temperature changes list GistempDataElement field default factory list standard deviations list GistempDataElement field default factory list def process row self row gt None relevant elements Temperature change Standard Deviation if element row Element not in relevant elements or month MONTHS get row Months is None return None for year value in row filter regex Y items new element GistempDataElement timestamp date year int year replace Y month month day area row Area value value if element Temperature change self temperature changes append new element else self standard deviations append new element property def df self gt pd DataFrame temp changes df pd DataFrame from records elem dict for elem in self temperature changes temp changes temp changes df set index timestamp area rename columns value temp change std deviations df pd DataFrame from records elem dict for elem in self standard deviations std deviations std deviations df set index timestamp area rename columns value std deviation return pd concat temp changes std deviations axis columns def process self environment data Path data Environment Temperature change E All Data NOFLAG csv df pd read csv environment data encoding cp df apply self process row axis columns This code introduces the GistempTransformer class which demonstrates the processing of temperature data from a CSV file and the creation of a hierarchical DataFrame containing temperature changes and standard deviations The GistempTransformer class defined as a dataclass includes two lists temperature changes and standard deviations to store the processed data elements The process row method is responsible for handling each row of the input DataFrame It checks for relevant elements such as Temperature change and Standard Deviation extracts the month from the Months column and creates instances of the GistempDataElement class These instances are then appended to the appropriate lists based on the element type The df property returns a DataFrame by combining the temperature changes and standard deviations lists This hierarchical DataFrame has a MultiIndex with levels representing the timestamp and area providing a structured organization of the data transformer GistempTransformer transformer process df transformer dftemp changestd deviation datetime date Afghanistan datetime date Afghanistan datetime date Afghanistan datetime date Afghanistan datetime date Afghanistan Analyzing Climate DataNow that we have consolidated all the relevant data into a single dataframe we can proceed with inspecting and visualizing the data Our focus is on examining the linear regression lines for each area as they provide insights into the overall trend of temperature changes over the past decades To facilitate this visualization we will create a function that plots the temperature changes along with their corresponding regression lines def plot temperature changes areas list str gt None fig plt figure figsize ax fig add subplot ax fig add subplot for area in areas df country df df index get level values area area reset index dates df country timestamp map datetime toordinal gradient offset np polyfit dates df country temp change deg ax scatter df country timestamp df country temp change label area s ax plot df country timestamp gradient dates offset label area ax grid ax grid ax legend ax set ylabel Regression Lines °C ax set ylabel Temperature change °C In this function we are using the get level values method on a pandas MultiIndex to efficiently query the data in our hierarchical Dataframe on different levels Let s use this function to visualize temperature changes in the different continents plot temperature changes Africa Antarctica Americas Asia Europe Oceania From this plot we can draw several key conclusions The regression lines for all continents have a positive gradient indicating a global trend of increasing Earth surface temperatures The regression line for Europe is notably steeper compared to other continents implying that the temperature increase in Europe has been more pronounced This finding aligns with observations of accelerated warming in Europe compared to other regions The specific factors contributing to the higher temperature increase in Europe compared to Antarctica are complex and require detailed scientific research However one contributing factor may be the influence of ocean currents Europe is influenced by warm ocean currents such as the Gulf Stream which transport heat from the tropics towards the region These currents play a role in moderating temperatures and can contribute to the relatively higher warming observed in Europe In contrast Antarctica is surrounded by cold ocean currents and its climate is heavily influenced by the Southern Ocean and the Antarctic Circumpolar Current which act as barriers to the incursion of warmer waters thereby limiting the warming effect Now let s focus our analysis on Europe itself by examining temperature changes in different regions within Europe We can achieve this by creating individual plots for each European region plot temperature changes Southern Europe Eastern Europe Northern Europe Western Europe From the plotted temperature changes in different regions of Europe we observe that the overall temperature rises across the European continent are quite similar While there may be slight variations in the steepness of the regression lines between regions such as Eastern Europe having a slightly steeper line compared to Southern Europe no significant differences can be observed among the regions Ten Countries Most and Less Affected by Climate ChangeNow let s shift our focus to identifying the top countries that have experienced the highest average temperature increase since the year Here s an example of how we can retrieve the list of countries df df index get level values level timestamp gt date groupby area mean sort values by temp change ascending False head areatemp changestd deviationSvalbard and Jan Mayen IslandsEstoniananKuwaitBelarusnanFinlandSloveniananRussian FederationnanBahrainEastern EuropeAustriaTo extract the top countries with the highest average temperature increase since the year we perform the following steps Filter the dataframe to include only rows where the year is greater than or equal to using df index get level values level timestamp gt date Group the data by the Area country using groupby area Calculate the mean temperature change for each country using mean Select the top countries with the largest mean temperature change using sort values by temp change ascending True head This result aligns with our previous observations confirming that Europe experienced the highest rise in temperature compared to other continents Continuing with our analysis let s now explore the ten countries that are least affected by the rise in temperature We can utilize the same method as before to extract this information Here s an example of how we can retrieve the list of countries df df index get level values level timestamp gt date groupby area mean sort values by temp change ascending True head areatemp changestd deviationPitcairn IslandsMarshall IslandsnanSouth Georgia and the South Sandwich IslandsMicronesia Federated States of nanChileWake IslandnanNorfolk IslandArgentinaZimbabweAntarcticaWe observe that the majority of countries in this list are small remote islands located in the southern hemisphere This finding further supports our previous conclusions that southern continents particularly Antarctica are less affected by climate change compared to other regions Temperature Changes during Summer and WinterNow let s delve into more complex queries using the hierarchical dataframe In this specific use case our focus is on analyzing temperature changes during winters and summers For the purpose of this analysis we define winters as the months of December January and February while summers encompass June July and August By leveraging the power of pandas and the hierarchical dataframe we can easily visualize the temperature changes during these seasons in Europe Here s an example code snippet to accomplish that all winters df df index get level values level timestamp map lambda x x month in all summers df df index get level values level timestamp map lambda x x month in winters europe all winters xs Europe level area sort index summers europe all summers xs Europe level area sort index fig plt figure figsize ax fig add subplot ax plot winters europe index winters europe temp change label Winters marker o markersize ax plot summers europe index summers europe temp change label Summers marker o markersize ax grid ax legend ax set ylabel Temperature Change °C From this figure we can observe that temperature changes during the winters exhibit greater volatility compared to temperature changes during the summers To quantify this difference let s calculate the standard deviation of the temperature changes for both seasons pd concat winters europe std rename winters summers europe std rename summers axis columns winterssummerstemp change ConclusionIn conclusion mastering MultiIndexes in Pandas provides a powerful tool for handling complex data analysis tasks By leveraging MultiIndexes users can efficiently organize and analyze multidimensional datasets in a flexible and intuitive manner The ability to work with hierarchical structures for rows and columns enhances code clarity simplifies data handling and enables simultaneous analysis of multiple variables Whether it s tracking health parameters of athletes or analyzing Earth s temperature changes over time understanding and utilizing MultiIndexes in Pandas unlocks the full potential of the library for handling complex data scenarios You can find all code included in this post here 2023-06-18 16:49:01
Apple AppleInsider - Frontpage News Sihoo Doro C300 Ergonomic Office Chair review: extra lumbar support where you need it https://appleinsider.com/articles/23/06/18/sihoo-doro-c300-ergonomic-office-chair-review-extra-lumbar-support-where-you-need-it?utm_medium=rss Sihoo Doro C Ergonomic Office Chair review extra lumbar support where you need itIf you suffer from aches and pains from sitting at your desk all day the Sihoo Doro C may be the fix you re looking for ーprovided you re the right size Working a desk job is surprisingly hard on your body Long periods of inactivity can cause muscle stiffness back pain and increased fatigue ーthe list goes on and on While we encourage you to get up and move around as much as your job allows a good chair can help carry you through those times when getting up isn t an option Read more 2023-06-18 16:43:28
ニュース BBC News - Home Greece boat disaster: Ship tracking casts doubt on Greek Coastguard's account https://www.bbc.co.uk/news/world-europe-65942426?at_medium=RSS&at_campaign=KARANGA accountthe 2023-06-18 16:27:51
ニュース BBC News - Home Mortgage help 'under review', says Michael Gove https://www.bbc.co.uk/news/business-65922072?at_medium=RSS&at_campaign=KARANGA govemichael 2023-06-18 16:13:36
ニュース BBC News - Home Antony Blinken hails candid talks on high stakes China trip https://www.bbc.co.uk/news/world-us-canada-65941659?at_medium=RSS&at_campaign=KARANGA china 2023-06-18 16:10:28
ニュース BBC News - Home Birmingham police save 'slippery customer' boa constrictor spotted in road https://www.bbc.co.uk/news/uk-england-birmingham-65943681?at_medium=RSS&at_campaign=KARANGA birmingham 2023-06-18 16:01:33
ニュース BBC News - Home Challenge Cup: Wigan 14-12 Warrington - Warriors hold on with 12 men https://www.bbc.co.uk/sport/rugby-league/65902189?at_medium=RSS&at_campaign=KARANGA challenge 2023-06-18 16:01:37

コメント

このブログの人気の投稿

投稿時間:2021-06-17 05:05:34 RSSフィード2021-06-17 05:00 分まとめ(1274件)

投稿時間:2021-06-20 02:06:12 RSSフィード2021-06-20 02:00 分まとめ(3871件)

投稿時間:2020-12-01 09:41:49 RSSフィード2020-12-01 09:00 分まとめ(69件)