投稿時間:2023-07-30 11:09:34 RSSフィード2023-07-30 11:00 分まとめ(12件)

投稿時間:2023-07-30 11:09:34 RSSフィード2023-07-30 11:00 分まとめ(12件)

- 7月 30, 2023

カテゴリー等	サイト名等	記事タイトル・トレンドワード等	リンクURL	頻出ワード・要約等/検索ボリューム	登録日
IT	気になる、記になる…	X (Twitter) 、広告ラベルをより目立たない形に変更か	https://taisy0.com/2023/07/30/174690.html	twitter	2023-07-30 01:50:06
IT	ITmedia 総合記事一覧	[ITmedia ビジネスオンライン] 「魅力的なビュッフェ」ランキング　3位「デザート」、2位「お寿司」、1位は？	https://www.itmedia.co.jp/business/articles/2307/30/news033.html	itmedia	2023-07-30 10:30:00
AWS	lambdaタグが付けられた新着投稿 - Qiita	Lambdaで「libcrypt.so.1: cannot open shared object file: No such file or directory」を解消	https://qiita.com/yanagiii/items/fbefdb7f7be5e820bda7	eloadingsharedlibrariesl	2023-07-30 10:43:08
python	Pythonタグが付けられた新着投稿 - Qiita	Pythonのアンパックの基礎	https://qiita.com/akeyi2018/items/2b2a74681e22df6f93f0	fromchatgptpython	2023-07-30 10:55:18
Ruby	Rubyタグが付けられた新着投稿 - Qiita	Lambdaで「libcrypt.so.1: cannot open shared object file: No such file or directory」を解消	https://qiita.com/yanagiii/items/fbefdb7f7be5e820bda7	eloadingsharedlibrariesl	2023-07-30 10:43:08
Ruby	Rubyタグが付けられた新着投稿 - Qiita	【Ruby】frozen_string_literal: trueとは？	https://qiita.com/ym0628/items/dd5bc2d703d6afb41c31	frozenstringliteraltrue	2023-07-30 10:26:07
AWS	AWSタグが付けられた新着投稿 - Qiita	Lambdaで「libcrypt.so.1: cannot open shared object file: No such file or directory」を解消	https://qiita.com/yanagiii/items/fbefdb7f7be5e820bda7	eloadingsharedlibrariesl	2023-07-30 10:43:08
Ruby	Railsタグが付けられた新着投稿 - Qiita	【Ruby】frozen_string_literal: trueとは？	https://qiita.com/ym0628/items/dd5bc2d703d6afb41c31	frozenstringliteraltrue	2023-07-30 10:26:07
海外TECH	DEV Community	Python Networking: HTTP	https://dev.to/cwprogram/python-networking-http-2o3	Python Networking HTTPSo far we ve seen what takes place behind servers and networking Now the modern web certainly is more than just a network of echo servers Many websites are powered by something known as HTTP HyperText Transfer Protocol This article will discuss some of the inner workings of HTTP using various python code and modules For those looking for more resources I highly recommend the Mozilla Developer Network documentation on anything web related Security NotesHTTP VersionsA URLA Basic RequestResponseA Better ServerHeadersType HeadersEnhancing The Server Even MoreRedirectionCachingUser AgentCookiesRequest TypesGETHEADPOSTStatus CodesxxxxxxxxConclusion Security NotesThe code presented here is for learning purposes Given the complexity of modern day web services I highly discourage trying to roll your own web server outside of learning purposes in an isolated network You should instead evaluate a secure and well maintained web server that meets your needs Traffic here is also unencrypted meaning anyone could snoop in on the data So to summarize Don t use this code in productionAlways make sure your network communication is encrypted and that the encryption method is not outdated insecure HTTP VersionsThe HTTP protocol has seen a number of revisions over the year Version was released in as RFC This is followed in by HTTP which added a number of features that are widely used in the modern web Currently HTTP is considered the modern standard Many features here helped to work out performance issues with the way modern web applications worked HTTP is the current in progress standard which is based on a UDP encapsulation protocol In particular it looks to reduce connection duplication in negotiating secure connections Taking support into consideration this article will cover standards set by HTTP A URLURL stands for Uniform Resource Locator and is a subset of URI or Uniform Resource Identifier The specifics of URL are defined in RFC Despite seeming like it URLs are not necessarily for reaching out to an HTTP server though it s certainly one of the more popular use cases The scheme section allows it to work with other services as well such as FTP and Gopher URL supported schemes at the time can be found in the RFC The IANA keeps a more up to date and extensive list Python offers the urllib module which can be used to work with URLs from urllib parse import urlparseURL section print urlparse URL This gives the output ParseResult scheme https netloc datatracker ietf org path doc html rfc params query fragment section With a more complex example from urllib parse import urlparseURL https user password domain com parsed url urlparse URL print parsed url hostname print parsed url username print parsed url password print parsed url port Output domain com user password There are some cases where a URL path may have non alphabetic characters such as a space character To deal with such cases the values can be URL encoded This is done by taking the ASCII hex value of the character including the extended ASCII table and adding a in front of it urllib parse quote is able to handle such encoding from urllib parse import quoteprint quote path with spaces Output path with spaces A Basic RequestTo look at the request process I ll use a basic socket server on port and the requests library which can be installed with pip install requests import socket os pwd grpimport socketserver def drop privileges uid name nobody gid name nogroup if os getuid We re not root so like whatever dude return Get the uid gid from the name running uid pwd getpwnam uid name pw uid running gid grp getgrnam gid name gr gid Remove group privileges os setgroups Try setting the new uid gid os setgid running gid os setuid running uid owner group r w x old umask os umask x class MyTCPHandler socketserver StreamRequestHandler The request handler class for our server It is instantiated once per connection to the server and must override the handle method to implement communication to the client def handle self self data self rfile readline print self data if name main HOST PORT localhost Create the server binding to localhost on port with socketserver TCPServer HOST PORT MyTCPHandler as server Activate the server this will keep running until you interrupt the program with Ctrl C print f Server bound to port PORT drop privileges server serve forever and the client import requestsrequests get http localhost Now currently this is an incomplete request on purpose so I can show things line by line This means simple client py will error out about connection reset by peer as it s expecting an HTTP response On the server side we see Server bound to port b GET HTTP r n The first line is indicated by the HTTP RFC as the Request Line The first is the method followed by the request to the host the HTTP version to use and finally a CRLF Carriage Return r Line Feed n So the GET method is being used on the path and requesting HTTP for the version Now you ll notice here in the request we didn t have to declare that we were using port This is because it s defined by IANA as a service port so implementations know to use port by default or for https ResponseNext I ll read in the rest of the lines for the HTTP request def handle self self data self rfile readlines print self data Server bound to port b GET HTTP r n b Host localhost r n b User Agent python requests r n b Accept Encoding gzip deflate r n b Accept r n b Connection keep alive r n b r n Now to clean up the output a bit GET HTTP Host localhostUser Agent python requests Accept Encoding gzip deflateAccept Connection keep aliveAfter the request line we see a number of key value pairs separated by a This is something we ll go to shortly but for now I ll pass back data to finish the connection While I m here I ll also re organize the handler class to make it easier to follow class MyTCPHandler socketserver BaseRequestHandler The request handler class for our server It is instantiated once per connection to the server and must override the handle method to implement communication to the client def read http request self print reading request self data self request recv print self data def write http response self print writing response response lines b HTTP r n b Content Type text plain r n b Content Length r n b Location http localhost r n b r n b Hello World n for response line in response lines self request send response line print response sent def handle self self read http request self write http response self request close print connection closed and the client is also slightly modified import requestsr requests get http localhost print r headers print r text So the handler class has been changed back to socketserver BaseRequestHandler as I don t need single line reads anymore I m also writing back a static response for right now Finally the handle method gives a nice overview of the different steps Now as an example Server Server bound to port reading requestb GET HTTP r nHost localhost r nUser Agent python requests r nAccept Encoding gzip deflate r nAccept r nConnection keep alive r n r n writing responseresponse sentconnection closedClient Content Type text plain Content Length Location http localhost Hello WorldAs with the request response also have their own response line which in this case is HTTP r nFirst is the HTTP version as a confirmation that the server can communicate in that version The next is a status code to indicate the nature of the response I ll touch on some of the status codes later on in the article In this case is confirmation that the request is valid and everything went okay With an initial response working it s time to look at things in a bit more depth A Better ServerNow that we ve seen the raw elements of an HTTP request it s time to abstract out a bit Python actually has an http server module that can be used to extend the socketserver module It has various components to facilitate serving HTTP traffic So now our server looks something like this import os pwd grpfrom http server import ThreadingHTTPServer BaseHTTPRequestHandler def drop privileges uid name nobody gid name nogroup if os getuid return running uid pwd getpwnam uid name pw uid running gid grp getgrnam gid name gr gid os setgroups os setgid running gid os setuid running uid old umask os umask x class MyHTTPHandler BaseHTTPRequestHandler def read http request self self log message f Reading request from self client address print dict self headers items def write http response self self log message f Writing response to self client address self send response self end headers self wfile write b Hello World n def do GET self self read http request self write http response self request close if name main HOST PORT localhost with ThreadingHTTPServer HOST PORT MyHTTPHandler as server print f Server bound to port PORT drop privileges server serve forever Now the server is doing some of the heavy lifting for is It has information about the request line and sends sensible default headers for the response Speaking of which let s look at headers HeadersSo after doing another run of the new server Host localhost User Agent python requests Accept Encoding gzip deflate Accept Connection keep alive We see headers sent in the request to us Now the thing with headers is that with the exception of Host required by the HTTP standard the others are not something the server has to really concern itself with This is because a single IP can host multiple domains in fact this becomes more common with CDNs So if I add an etc hosts entry like so webserverThen I could make the following change def write http response self self log message f Writing response to self client address self send response self end headers self wfile write bytes f Hello self headers Host n utf And as an example Server BaseHTTP Python Date Tue Jul GMT Hello localhost Server BaseHTTP Python Date Tue Jul GMT Hello webserverDespite connecting via the same IP address I m getting different content back Now as for the rest of the headers it s a pretty long list With this in mind I ll be going over some of the fundamental ones some other more specific ones will be gone over in later sections Type HeadersThese headers have to do with the type of content that s being delivered Request versions will ask for certain types of content and response versions will give metadata about the content Accept is one of the more important ones and is related to the type of content indicated by MIME Multipurpose Internet Mail Extensions This is a way to indicate the type of a file and was originally created as a way to give information about non textual content in the normally textual format of email This helps in differentiating between parsing HTML and showing an image Not surprisingly IANA manages the official list of MIME types Python has a mimetypes module which maps file extensions to the system s MIME type database import mimetypesmimetypes init print mimetypes guess all extensions text plain print mimetypes types map html Output txt bat c h ksh pl csh jsx sln slnf srf ts text htmlNow this is of course assuming that a file with a certain extension is actually that type of file Realistically though a malicious actor could simply rename their malware to jpg or similar so it s not a very good source of truth if you can t completely trust the intentions of your users Instead we can use python magic So after doing a pip install python magic Windows users will need to install python magic bin instead which includes required DLLs import magicf magic Magic mime True print f from file bin bash Output application x sharedlibSome content types you ll likely deal with text plain Plain texttext html HTMLapplication json JSONMozilla also has a more extensive list Now looking the request headers we see Accept Accept Encoding gzip deflateFor basic transactions Accept is fairly standard if the client isn t sure what the server will respond with This essentially says it doesn t have a preference on the mime type a server will return A more complex example would be Accept text html application xhtml xml application xml q image webp q This will accept most HTMl formats There s also a q number here which indicates the preference of the mime type If there s no preference indicated then everything is weighted the same and the most specific type will be selected The server version of this is Content Type which indicates the type of content the server will send Now if you decide to mangle the type to something which it s not def write http response self self log message f Writing response to self client address self send response self send header Content Type image jpeg self end headers self wfile write bytes f Hello self headers Host n utf The requests based client or really things like curl and wget which download only won t care as it doesn t render the image Actual browsers on the other hand will throw an error or show a placeholder broken image Accept Encoding indicates that the client supports compressed data being returned Compression is recommended when possible through the server specs to reduce the amount of data transferred As it s not uncommon to use this metric for pricing it can also help reduce cost A server can send Content Encoding back to indicate it s sending compressed data def write http response self self log message f Writing response to self client address self send response self send header Content Type text plain self send header Content Encoding gzip self end headers return data gzip compress bytes f Hello self headers Host n encoding utf self wfile write return data Requests is able to handle compressed data out of the box so no changes needed there and a run shows that the compression works Server BaseHTTP Python Date Wed Jul GMT Content Type text plain Content Encoding gzip Hello localhost Enhancing The Server Even MoreNow to look at some of the other header options I ll update the HTTP server import datetimeimport grpimport gzipimport hashlibimport osimport pwdfrom email utils import parsedate to datetimefrom http server import ThreadingHTTPServer BaseHTTPRequestHandler def drop privileges uid name nobody gid name nogroup if os getuid return running uid pwd getpwnam uid name pw uid running gid grp getgrnam gid name gr gid os setgroups os setgid running gid os setuid running uid old umask os umask x class MyHTTPHandler BaseHTTPRequestHandler ROUTES serve front page index html serve html python logo serve python logo js myjs js serve js favicon ico serve favicon HTTP DT FORMAT a d b Y H M S GMT def read http request self self log message f Reading request from self client address print dict self headers items def serve front page self self log message f Writing response to self client address self send response self send header Location index html return b def serve python logo self return self serve file with caching python logo only png image png def serve favicon self return self serve file with caching favicon ico image x icon def serve html self self send response self send header Content Type text html return b lt html gt lt head gt lt title gt Old Website lt title gt lt script type text javascript src js myjs js gt lt script gt lt head gt lt body gt lt img src python logo gt lt body gt lt html gt def serve js self js code b const a Math random etag hashlib md js code hexdigest if If None Match in self headers and self headers If None Match etag self send response return b else self send response self send header Etag etag self send header Content Type text javascript self send header Cache Control public max age return js code def write data self bytes data self send header Content Encoding gzip return data gzip compress bytes data self send header Content Length len return data self end headers self wfile write return data def check cache self filename if If Modified Since in self headers cache date parsedate to datetime self headers If Modified Since filename date datetime datetime fromtimestamp os path getmtime filename tz datetime timezone utc replace microsecond return filename date lt cache date return False def serve file with caching self filename file type self log message f Writing response to self client address if self check cache filename self send response return b else self send response self send header Content Type file type file date datetime datetime fromtimestamp os path getmtime filename tz datetime timezone utc replace microsecond self send header Last Modified file date strftime self HTTP DT FORMAT self send header Cache Control public max age self send header Expires datetime datetime now datetime timezone utc datetime timedelta strftime self HTTP DT FORMAT with open filename rb as file fp file data file fp read return file data def do GET self self read http request bytes data self getattribute self ROUTES self path self write data bytes data self request close if name main HOST PORT localhost with ThreadingHTTPServer HOST PORT MyHTTPHandler as server print f Server bound to port PORT drop privileges server serve forever This will require two files in the same directory as the server PNG format × of only two snakes ROUTES allows the handler to act as a simple router mapping paths to methods in the handler class The data write method is also abstracted out to gzip compress the data every time Another method deals with caching logic I ll go over the parts in respect to header logic RedirectionThe Location header can be utilized with a few status codes to indicate the location of a file that needs to be redirected to Looking here def serve front page self self log message f Writing response to self client address self send response self send header Location index html return b This will redirect the user to an index html page Note that some CLI based HTTP clients will require an additional option to actually handle the redirection Web browsers on the other hand will handle this seamlessly CachingThe first useful caching header is Cache Control The main usage is to indicate how long a file can stay in a client s local cache This is then supplemented with Last Modified and or Etag So here self send header Cache Control public max age We re telling the client it can be cached locally without re verification for seconds For Last Modified I set it to the modification time of the file in UTC file date datetime datetime fromtimestamp os path getmtime filename tz datetime timezone utc replace microsecond self send header Last Modified file date strftime self HTTP DT FORMAT The replace microseconds part is due to the granularity of microseconds causing comparison issues getmtime will get the modification time of the file since the epoch tz being set to UTC makes it timezone aware as a UTC date time Now for a standard web browser when the client has had the file locally cached for more than seconds it will query the server with If Modified Since def check cache self filename if If Modified Since in self headers cache date parsedate to datetime self headers If Modified Since filename date datetime datetime fromtimestamp os path getmtime filename tz datetime timezone utc replace microsecond return filename date lt cache date return FalseNow the server will check the value provided and compare it against the modified time of the file If it s greater than If Modified Since then the server will return the file as usual with a new Last Modified value if self check cache filename self send response return b else self send response Otherwise the server sends a to indicate the file hasn t changed The Cache Control s max age timer will be reset to and the cycle continues Now the problem is situations where content is dynamically generated In this case Etag can be used This value does not have an exact method to generate As MDN web docs states typically the ETag value is a hash of the content a hash of the last modification timestamp or just a revision number js code b const a Math random etag hashlib md js code hexdigest In this case I use the md hash This is sent to the client when they request it The client will then attach this etag value to the cache entry When max age is up instead of sending If Modified Since the header If None Match is utilized if If None Match in self headers and self headers If None Match etag self send response return b else Note that with Firefox they implemented a feature called Race Cache With Network RCWN Firefox will calculate if the network is faster than pulling it from the disk If the network is faster it will pull the content anyways regardless of cache settings This will most likely proc if you re doing a localhost gt localhost connection or on a very high speed network There is currently no server side way to disable this and must be done through the browser side instead User AgentThis is a rather interesting header that s supposed to indicate what client is currently requesting the content For example Chrome may show Mozilla Windows NT Win x AppleWebKit KHTML like Gecko Chrome Safari The problem is that it s very easy to spoof because at the end of the day it s just a normal header As an example import requestsr requests get http localhost js myjs js headers User Agent Mozilla Windows NT Win x AppleWebKit KHTML like Gecko Chrome Safari print r headers print r text Looking at the server side request log Jul Reading request from Host localhost User Agent Mozilla Windows NT Win x AppleWebKit KHTML like Gecko Chrome Safari Accept Encoding gzip deflate Accept Connection keep alive So even though I m actually using requests I can spoof myself as Chrome using the user agent string from earlier If you do a lot of JS development this is why best practice for browser detection is by feature and not user string CookiesSecurity Note As this is an HTTP server being used for learning purposes encryption is not setup In a real world setup cookies should always be sent over HTTPS with strong encryption to help prevent things like session hijacking While cookies are technically another header I find that their functionality is enough to warrant a dedicatd sections Cookies are essentially a method for being able to have state between HTTP calls In general operation each HTTP call is distinct from the other Without Cookies to bridge this gap it would be difficult to ensure states such as a user being authenticated to a service Cookies start with the server sending one or more Set Cookie headers So I ll add another route here from http cookies import SimpleCookie lt snip gt ROUTES serve front page index html serve html python logo serve python logo js myjs js serve js favicon ico serve favicon cookie test serve cookies lt snip gt def serve cookies self self send response cookies list SimpleCookie cookies list var test cookies list var test cookies list var path for morsel in cookies list values self send header Set Cookie morsel OutputString return self serve html Which will use the SimpleCookie class to setup our headers Requests puts these cookies into their own dedicated property import gzipimport requestsr requests get http localhost cookie test print r headers print dict r cookies print gzip decompress r content Output Server BaseHTTP Python Date Thu Jul GMT Set Cookie var test var test Path var test var test b lt html gt lt head gt lt title gt Old Website lt title gt lt script type text javascript src js myjs js gt lt script gt lt head gt lt body gt lt img src python logo gt lt body gt lt html gt Now adding some more routes and adjusting the cookie logic ROUTES serve front page index html serve html python logo serve python logo js myjs js serve js favicon ico serve favicon cookie test serve cookies cookie test serve cookies lt snip gt def serve cookies self self send response cookies list SimpleCookie cookies list var test cookies list path specific test cookies list path specific path cookie test cookies list shady cookie test cookies list shady cookie domain shadysite com for morsel in cookies list values self send header Set Cookie morsel OutputString return self serve html When visiting once in a browser as long as I m in cookie test and its sub paths the path specific cookie will show up However if I browse to cookie test it won t as the paths don t match If we also take a look at the shady cookie Chrome refuses to register the cookie as it s not the same domain as the host This is generally known as a third party cookie While there are some ways to deal with proper usage in generally expect that third party cookies will be denied by most browsers This is mostly done as third party cookies often contain advertising tracking related content Now when the cookies have been sent the browser we ll use logic to figure out what cookies are valid for a request and send them back in a Cookie header This can then be used by the server to keep some kind of state def parse cookies self if Cookie in self headers raw cookies self headers Cookie self cookies SimpleCookie self cookies load raw cookies else self cookies None def get cookie self key default None if not self cookies return default elif key not in self cookies return default else return self cookies key value def serve html self self send response self send header Content Type text html title cookie self get cookie path specific Old Website return bytes f lt html gt lt head gt lt title gt title cookie lt title gt lt script type text javascript src js myjs js gt lt script gt lt head gt lt body gt lt img src python logo gt lt body gt lt html gt encoding utf So here we ve modified serve html to use the cookie value as the title If it doesn t exist then we use the Old Website value instead SimpleCookie as double as a way to parse cookies letting us reuse for parsing cookies Security Note Having cookie values inserted directly into HTML is a terrible idea This was done for simple illustration purposes only Now on the client side import gzipimport requestsr requests get http localhost cookie test print r headers print dict r cookies print gzip decompress r content r requests get http localhost cookie test cookies r cookies print r headers print dict r cookies print gzip decompress r content Which will output Server BaseHTTP Python Date Fri Jul GMT Set Cookie var test path specific test Path cookie test shady cookie test Domain shadysite com var test path specific test b lt html gt lt head gt lt title gt Old Website lt title gt lt script type text javascript src js myjs js gt lt script gt lt head gt lt body gt lt img src python logo gt lt body gt lt html gt Server BaseHTTP Python Date Fri Jul GMT Set Cookie var test path specific test Path cookie test shady cookie test Domain shadysite com var test path specific test b lt html gt lt head gt lt title gt test lt title gt lt script type text javascript src js myjs js gt lt script gt lt head gt lt body gt lt img src python logo gt lt body gt lt html gt I ll also note that even requests removed the third party shady site cookie Jul Reading request from Host localhost User Agent python requests Accept Encoding gzip deflate Accept Connection keep alive Cookie path specific test var test Cookies can be removed by expiring them in the past or setting an expiration time in the future and having time pass up until then Here s an example of a cookie that will be removed import timeHTTP DT FORMAT a d b Y H M S GMT INSTANT EXPIRE time strftime HTTP DT FORMAT time gmtime cookies list var test cookies list var expires self INSTANT EXPIREThis will set the expiration to the start of epoch or Thu Jan GMT One important thing to note is that this is the case when an implementation handles things to spec A rogue client could simply choose to send the expired cookies regardless Request TypesSince we re not working with headers I ll take the server code back to a simplified form again import datetimeimport grpimport gzipimport hashlibimport osimport pwdfrom email utils import parsedate to datetimefrom http server import ThreadingHTTPServer BaseHTTPRequestHandler def drop privileges uid name nobody gid name nogroup if os getuid return running uid pwd getpwnam uid name pw uid running gid grp getgrnam gid name gr gid os setgroups os setgid running gid os setuid running uid old umask os umask x class MyHTTPHandler BaseHTTPRequestHandler ROUTES serve front page index html serve html python logo serve python logo js myjs js serve js favicon ico serve favicon HTTP DT FORMAT a d b Y H M S GMT def read http request self self log message f Reading request from self client address print dict self headers items def serve front page self self log message f Writing response to self client address self send response self send header Location index html return b def serve python logo self return self serve file with caching python logo only png image png def serve favicon self return self serve file with caching favicon ico image x icon def serve html self self send response self send header Content Type text html return bytes f lt html gt lt head gt lt title gt Old Website lt title gt lt script type text javascript src js myjs js gt lt script gt lt head gt lt body gt lt img src python logo gt lt body gt lt html gt encoding utf def serve js self js code b const a Math random etag hashlib md js code hexdigest if If None Match in self headers and self headers If None Match etag self send response return b else self send response self send header Etag etag self send header Content Type text javascript self send header Cache Control public max age return js code def write data self bytes data self send header Content Encoding gzip return data gzip compress bytes data self send header Content Length len return data self end headers self wfile write return data def check cache self filename if If Modified Since in self headers cache date parsedate to datetime self headers If Modified Since filename date datetime datetime fromtimestamp os path getmtime filename tz datetime timezone utc replace microsecond return filename date lt cache date return False def serve file with caching self filename file type self log message f Writing response to self client address if self check cache filename self send response return b else self send response self send header Content Type file type file date datetime datetime fromtimestamp os path getmtime filename tz datetime timezone utc replace microsecond self send header Last Modified file date strftime self HTTP DT FORMAT self send header Cache Control public max age self send header Expires datetime datetime now datetime timezone utc datetime timedelta strftime self HTTP DT FORMAT with open filename rb as file fp file data file fp read return file data def do GET self self read http request bytes data self getattribute self ROUTES self path self write data bytes data self request close if name main HOST PORT localhost with ThreadingHTTPServer HOST PORT MyHTTPHandler as server print f Server bound to port PORT drop privileges server serve forever In the HTTP standard there are various request methods that can be used I ll be going over three that I would consider the core ones If you re developing a REST API it s likely you ll utilize more of them GETThis is the standard method you will see for a majority of web interaction It indicates a read only action to obtain some form of content and should not change change the content used by the server Due to the read only nature of GET the contents of the request body are ignored In order to pass in any kind of parameters a query string can be used after the path As the HTTP server pulls in query strings as part of the path we ll need to parse them before using the routing dictionary from urllib parse import urlparse parse qs lt snip gt ROUTES serve front page index html serve html python logo serve python logo js myjs js serve js favicon ico serve favicon query test serve html lt snip gt def do GET self self read http request segments urlparse self path self query parse qs segments query self log message self query bytes data self getattri self write data bytes data self request close urlparse allows us to break up the path and query string components parse qs will then parse the query string to give us a dictionary value Note that both of these examples are valid Handled by the codehttp website query test test test amp test test Valid but not handled by our codehttp website query test test test amp test testBut I m only handling the first case on purpose to keep things simple feature rich web servers can deal with this We ll update our client to pass in some parameters and see the result import requestsr requests get http localhost query test test foo amp test bar amp test hello world print r headers print r content Which will give the following output from the server Jul test foo test bar test hello world Now the reason why the values are lists is because by using the same key in a query string you can allow for multiple values r requests get http localhost query test test foo amp test bar amp test hello world amp test baz amp test nothing Jul test foo test bar baz nothing test hello world If you only wish to support single values with unique keys parse qsl can be used instead segments urlparse self path Returns key value pair tuple self query dict parse qsl segments query self log message f self query bytes data self getattribute self ROUTES segments path r requests get http localhost query test test foo amp test bar amp test hello world Jul test foo test bar test hello world r requests get http localhost query test test foo amp test bar amp test hello world amp test baz amp test nothing Jul test foo test nothing test hello world As you can see the multiple values version still works but it only takes in the last defined value Again another good reason to go with a feature rich web server for practical use HEADThis is essentially the same as a GET quest except only returning the headers It s useful for things like figuring out if a file exists without downloading the entire thing That said even if though the response body is blank the headers still have to be calculated exactly the same as if the file were being downloaded Server side this isn t too bad for static files Having to dynamically generate a large amount of data just to push back an empty body is not ideal Something to consider in your method logic With the base HTTP server get HEAD will need to be added with the logic and the write data method will need another version to handle headers properly I ll ignore query string parsing for simplicity here def write head data self bytes data self send header Content Encoding gzip return data gzip compress bytes data self send header Content Length len return data self end headers self wfile write b def do HEAD self self read http request bytes data self getattribute self ROUTES self path self write head data bytes data self request close Now requests will need to call head instead of get import requestsr requests head http localhost index html print r headers print r content Server BaseHTTP Python Date Sat Jul GMT Content Type text html Content Encoding gzip Content Length b Server Log Jul HEAD index html HTTP So Content Length properly shows the number of bytes that would have come from the compressed HTML but the body response is empty POSTPOSTs are meant for cases where data is to be changed on the server side It s important to note that even if an HTML form is present it s not a guarantee that the result is POST Search functionality may have a form for search parameters and the results are a GET query with a query string containing the parameters Due to the fact that POST lets you declare data in the body query strings in the URL have little practical use and should be avoided The first type of POST is a key value post encoded as application x www form urlencoded in the body First we ll just print out the headers and body to see what it looks like def read post request self self log message f Reading request from self client address print dict self headers items content length int self headers Content Length data self rfile read content length print data def serve post response self self send response self send header Content Type text html return bytes f lt html gt lt head gt lt title gt Old Website lt title gt lt script type text javascript src js myjs js gt lt script gt lt head gt lt body gt lt img src python logo gt lt body gt lt html gt encoding utf def do POST self self read post request bytes data self serve post response self write data bytes data self request close And the client import requestsr requests post http localhost data var test var test print r headers print r content After running the client we see this on the server side Jul Reading request from Host localhost User Agent python requests Accept Encoding gzip deflate Accept Connection keep alive Content Length Content Type application x www form urlencoded b var test amp var test Due to the client sending information the Content Type and Content Length headers are being sent This can now be parsed on the server side using parse qsl def read post request self self log message f Reading request from self client address print dict self headers items content length int self headers Content Length data self rfile read content length self data dict parse qsl data decode utf print self data Output var test var test As data is being read from a connection it comes in as bytes which can be turned into a string using decode Content Length is also an interesting predicament security wise When doing a read on sockets if you attempt to read in more than the client sent the server can get into a stuck phase This is due to expecting the possibility more packets are set to arrive and the network is simply slow A malicious attacker could simply set Content Length to be more bytes than are actually sent causing a server side read to hang It s important to ensure your connections have time outs in this case Now another option is to simply post a format such as JSON This is so popular with REST APIs that requests even has an option for it import requestsr requests post http localhost json var test var test print r headers print r content Which can then be decoded as JSON on the server side def read post request self self log message f Reading request from self client address print dict self headers items content length int self headers Content Length data self rfile read content length self data json loads data print self data In this case json loads accepts bytes so we don t need to decode it ourselves Output wise is the same but the content type has changed to be JSON Host localhost User Agent python requests Accept Encoding gzip deflate Accept Connection keep alive Content Length Content Type application json var test var test Now another method is one called a multipart post This is mainly used for cases where you might be dealing with binary input along with other form fields generally a file selection input in an HTML form So to see what this looks like I ll update our client import requestsmultipart data image data python logo png open python logo only png rb image png field None value field None value r requests post http localhost files multipart data print r headers print r content So each multipart data entry is a key to what the field name is and a tuple value Actual files will have a filename as the first part a file pointer as the second part an an optional MIME type for the contents Regular fields simply have None as the filename and the string contents of the value as the second part This all gets passed in as a files keyword argument in the requests post Now to check what the server will receive out of this def read post request self self log message f Reading request from self client address print dict self headers items content length int self headers Content Length self data self rfile read content length print self data Quite a lot of data comes back from this Host localhost User Agent python requests Accept Encoding gzip deflate Accept Connection keep alive Content Length Content Type multipart form data boundary cfcdfddeefcc b cfcdfddeefcc r nContent Disposition form data name image data filename python logo png r nContent Type image png r n r n xPNG r n xa n x x x rIHDR x x x r x x xF x x x x xp xd xca xa x x x tpHYs x x xbf x x lt snip lots of binary here gt r n cfcdfddeefcc r nContent Disposition form data name field r n r nvalue r n cfcdfddeefcc r nContent Disposition form data name field r n r nvalue r n cfcdfddeefcc r n So what s happening here is we have something called a boundary This helps show separation for each field I cleaned up the output for the last part and it ends up looking like this cfcdfddeefccContent Disposition form data name field value cfcdfddeefccContent Disposition form data name field value cfcdfddeefcc So as you can see the boundary as part of the content type header has before it to indicate a new field on its own line The very last one has a at the end to show completion of all the fields Much of this is from email standards which used multiparts as a way of indicating file attachments Now all of this looks quite tedious to deal with but thankfully there is a package we can install via pip install multipart which makes it easier to work with multipart from multipart import MultipartParser lt snip gt def read post request self self log message f Reading request from self client address print dict self headers items content length int self headers Content Length content boundary self headers Content Type split self data MultipartParser self rfile content boundary content length print self data get field value print self data get field value Now after starting the server and running the client again Host localhost User Agent python requests Accept Encoding gzip deflate Accept Connection keep alive Content Length Content Type multipart form data boundary bedfdaccedcf valuevalue Jul POST HTTP The data is being shown multipart also gives a handy save as method for downloading the file def read post request self self log message f Reading request from self client address print dict self headers items content length int self headers Content Length content boundary self headers Content Type split self data MultipartParser self rfile content boundary content length image entry self data get image data image entry save as image entry filename This will write the image to the current directory with the python logo png name we gave it in the requests data Status CodesNow we look at some of the HTTP status codes Instead of going through everyone one I ll simply cover what the different categories entail xxThese indicate a success Out of all of them is what you ll most likely have a majority of the cases xxThese generally deal with redirections is a bit of an odd one to indicate that the contents have not been modified This is used in coordination with the caching system can be used to indicate a redirection to another location xxThis is mostly around showing something bad with the request A few notable codes Your client request is completely wrong missing malformed headers You re not authorized to view a page It s difficult to find someone who hasn t hit this before Used to indicate a page doesn t exist I m a teapot Based on an April Fools standard about a Coffee Pot Protocol xxThese codes are all related to the server being broken is the generic this server is broken The other versions can provide more specifics as to the exact nature of what went wrong ConclusionThis concludes a look at the HTTP protocol using python It also will be the final installment of this series I believe that HTTP is a sufficient enough level to stop deep diving as modern abstractions such as user sessions can be reasoned about more quickly by understanding all the concepts presented up until now The networking parts of this guide can also be helpful for those in a DevOps role that might need to troubleshoot more unique situations If there s one thing I hope you get out of this it s that despite all the code shown it s not even a complete HTTP server implementation that properly handles all use cases Security wise communication isn t encrypted there s no timeout handling parsing of headers in general could use work So basically trying to do it yourself where you have to keep several use cases in mind and deal with potential malicious actors is not worth it Work with your security needs threat model and user cases to find a comprehensive server that fits your needs Thank you to all the new folks who have followed me over the last few weeks Look forward to more articles ahead	2023-07-30 01:09:44
海外科学	NYT > Science	Ancient Worms Revived From Permafrost After 46,000 Years	https://www.nytimes.com/2023/07/29/science/roundworm-nematodes-siberia-permafrost.html	periods	2023-07-30 01:14:37
ニュース	BBC News - Home	Sunak orders review of low traffic neighbourhoods in pro-motorist message	https://www.bbc.co.uk/news/uk-politics-66351785?at_medium=RSS&at_campaign=KARANGA	people	2023-07-30 01:17:48
ビジネス	東洋経済オンライン	｢年収800万円｣の俳優がスト参加する深刻事情ハリウッドでストをしている彼女の言い分 \| 映画・音楽 \| 東洋経済オンライン	https://toyokeizai.net/articles/-/690678?utm_source=rss&utm_medium=http&utm_campaign=link_back	東洋経済オンライン	2023-07-30 10:30:00

コメント