投稿時間:2023-02-12 18:12:28 RSSフィード2023-02-12 18:00 分まとめ(11件)
カテゴリー等 | サイト名等 | 記事タイトル・トレンドワード等 | リンクURL | 頻出ワード・要約等/検索ボリューム | 登録日 |
---|---|---|---|---|---|
TECH | Techable(テッカブル) | 日立家電の相談・修理依頼をアプリで完結!日立の「ハピネスアップ」アプリがリリース | https://techable.jp/archives/195768 | 日立グローバルライフソリューションズ株式会社 | 2023-02-12 08:00:33 |
python | Pythonタグが付けられた新着投稿 - Qiita | Python学習、ボール取り出しゲームの作成 | https://qiita.com/kmd-H/items/1e53c72392d6fc7d782b | 取り出し | 2023-02-12 17:26:42 |
js | JavaScriptタグが付けられた新着投稿 - Qiita | 『リーダブルコード』を実践的に活用するための問題集[Javascript](第1節2章-4章) | https://qiita.com/kyok01_japan/items/cc6df24164bc291677bd | typescript | 2023-02-12 17:45:28 |
Ruby | Rubyタグが付けられた新着投稿 - Qiita | Rubyで二分探索する | https://qiita.com/massaaaaan/items/f550515d3917f3f94c57 | arraybsearch | 2023-02-12 17:17:38 |
AWS | AWSタグが付けられた新着投稿 - Qiita | 【AWS】初心者によるサーバー構築 | https://qiita.com/kawai_t25/items/5d7521ca12f7bf5f49f4 | vpcvirtualprivatecloud | 2023-02-12 17:40:01 |
Docker | dockerタグが付けられた新着投稿 - Qiita | DockerによるRustのAtCoder環境を半自動化して作った | https://qiita.com/rokoooouribo/items/76a0057c75694fd943f5 | atcoder | 2023-02-12 17:05:01 |
Azure | Azureタグが付けられた新着投稿 - Qiita | az cli使ってAzure LBに VM を接続/切り離し | https://qiita.com/watyanabe164/items/ea0ac62d27059cd592bf | azcli | 2023-02-12 17:16:09 |
海外TECH | DEV Community | Make your original git! (Analyze section) | https://dev.to/nopenoshishi/make-your-original-git-analyze-section-139d | Make your original git Analyze section Hello Dev community I m noshishi a apprentice engineer in Tokyo This article is about understanding Git from the inside by creating a simple program that add and commit But it s a very long story so I ll post the development section separately ForewordThe starting point is If I could understand git I could make it I took this opportunity to try out a new programming language so I decided to try Rust this time The repository I actually created is My original git nss The quality of the code isn t quite there yet and many parts are still incomplete but you can do a straight line of local development If you give me a star I ll be happy to fly and of course I ll be waiting for your contributions Feel free to touch this repository any way you like Please forgive us for not being able to explain some of the details in this article alone Also we use Rust for development but Python for the stage where we uncover Git s internals TOCGit Inside Where is repositoryObjectIndexAnalyze ObjectblobtreecommitSummaryAnalyze IndexSpecificationIndexSummaryBackground of CommandaddcommitDigressionDeciphering TreeHEAD and BranchPlumbing commandsFinallyWhat you need Git InsideFirst we will unpack how Git handles data based on the official documentation The Git command system is very complex But Git data structure is very simple Where is repositoryA repository is the directory under the control of Git and the folder git in the directory created by init or clone is the actual state of the repository Let s put an empty folder called project under Git s control pwd home noshishi project ls a nothing yet git initInitialized empty Git repository in home noshishi project git ls a gitThis git directory consists of the following git├ーHEAD├ー index Not created by init ├ーconfig ├ーobjects └ーrefs ├ーheads └ーtags info The path types of Git repositories are difficult to understand at first glance We have added to the directory path so that you can refer to it Also we have omitted parts that are not explained in this article ObjectGit manage versions by file data called objects Objects are stored in git objects TypesObjects has four types blob、tree、commit、tag The contents of each and the corresponding data will be as follows blob File datatree Directory datacommit Metadata to manage the tree of the repositorytag Metadata for a specific commit Not explained at this article Image with first txt in the project repository StructureThe Object is FILE DATA so it has a file name path and the data stored in it just like a normal file File name path The file name path is character string This is a hash sha of object data Actually the first two are the directory path and the remaining are the file path DataObject data is compressed by zlib The decompressed data consists of two parts header and content The two elements are then separated by null byte header is a combination of the object type and the size of content content contains the corresponding data in an easy to handle format as indicated by the type Later we will see the details How to Create blob Object Index staging area The actual index used when you add is a file git index StructureThe index stores data of files marked by add with meta information The stored data contains the latest file data at the time of add It is important to note that all data recorded in the index is in file data units I will describe meta information in detail later but the storage format is exactly defined as shown in index format Hmm feel sleepy Wait Let s actually analyze the object and the index Analyze ObjectBefore starting the analysis work create all of the blob tree and commit Just add the files in project and commit Createing the following two files first txtHello World This is first txt second pydef second print This is second py next add and commit git add Agit commit m initial Then the contents of git objects are now as follows git └ーobjects ├ー └ーcaebbadadafdcdbaf ├ーaf └ーdfcedfbcbabaf ├ーda └ーffffadacedcbabcc └ーf └ーfbdbbfcfacfcf From now on hash values in the text will omit the number of characters The corresponding data and hash values for each are summarized below hash valueObjectcorrespond dataffbblobfirst txtafblobsecond pydafftreeproject direcrtorycacommitcommit version The analysis work will be conducted interactively using Python an interpreted language blobblob is an object corresponding to file data The image looks like this DataFirst let s look at ffb which corresponds to first txt Oops I failed python gt gt gt with open git objects f fbdbbfcfacfcf r as f contnet f read UnicodeDecodeError utf codec can t decode byte xca in position invalid continuation byteSince the content is compressed attempting to read the content as is as a string will fail Therefore we read the content as binary gt gt gt with open git objects f fbdbbfcfacfcf rb as f read binary contnet f read gt gt gt contentb x xK xca xcORd xfH xcd xc xcW x xcf xcaIQ xe n xc xc V x xa xb xcc xa xe x xbd x xa x x xfa r x Then I read successfully and the byte string Now decompress the content with zlib as described in the official documentation gt gt gt import zlib gt gt gt decompressed zlib decompress content gt gt gt decompressedb blob xHello World nThis is first txt gt gt gt decompressed split b b blob b Hello World nThis is first txt We found that a blob consists of the following elementsheader blob Null byte x ※hex notationcontent Hello World nThis is first txt File nameWe should check whether the hash value of the object is indeed correct The file name of the object should be the value obtained by hashing decompressed with the hash function sha so check it gt gt gt import hashlib gt gt gt blob b blob xHello World nThis is first txt gt gt gt sha hashlib sha blob hexdigest gt gt gt sha ffbdbbfcfacfcf Great exact match How about another fileLet s also look at af which corresponds to the other second py gt gt gt with open git objects af dfcedfbcbabaf rb as f contnet f read gt gt gt decompressed zlib decompress content gt gt gt decompressedb blob xdef second n print This is second py gt gt gt blob b blob xdef second n print This is second py gt gt gt sha hashlib sha test hexdigest gt gt gt sha afdfcedfbcbabaf It can be summarized as followsheader blob Null byte xcontent def second n print This is second py And the sha values hash values derived from the data also matched SupplementalThe blob itself does not hold the filename of the corresponding file data Instead of blob the object that manages its name is tree Treetree is an object corresponding to directory data The image looks like this We will analyze it in the same way as for blob gt gt gt with open git objects da ffffadacedcbabcc rb as f content f read gt gt gt decompressed zlib decompress content gt gt gt decompressedb tree x first txt x xf xf xb x x xd x xbb x xf x xc x xf xaFc xcf xcf x second py x xaf x b xf xc xe xdfR x xb xcb xa x xX xbQ xaf gt gt gt decompressed split b b tree b first txt b xf xf xb x x xd x xbb x xf x xc x xf xaFc xcf xcf x second py b xaf x b xf xc xe xdfR x xb xcb xa x xX xbQ xaf The tree has multiple contents so we seem a bit complicated The tree contnet is composed of repeating mode path and hash which are meta information about the data in the directory If you simply separate them with the hash value of the previous data and the meta information of the next file data are attached to each other This is because the meta information and the hash value are separated by First we will check the data stored in the first one Looking at the split like first txt is stored right gt gt gt temp decompressed split b gt gt gt temp b first txt gt gt gt temp b xf xf xb x x xd x xbb x xf x xc x xf xaFc xcf xcf x second py In order to split temp well let s take it out by bytes Array access of byte strings can be byte gt gt gt temp b xf xf xb x x xd x xbb x xf x xc x xf xaFc xcf xcf x gt gt gt temp hex ffbdbbfcfacfcf gt gt gt temp b second py Repeating the same process revealed the following header tree Null byte xcontent first txt xffb content second py xaf The management of tree hashes is described in Digression deciphering Tree bytes SupplementalA tree may contain not only a blob but also a tree That is if there is a directory within a directory This is because tree like blob does not keep the directory name of itself and the corresponding data Commitcommit contains the tree of the repository directory with meta information The image looks like this Let s analyze gt gt gt with open git objects caebbadadafdcdbaf rb as f content f read gt gt gt decompressed zlib decompress content gt gt gt decompressedb commit xtree daffffadacedcbabcc nauthor nopeNoshishi lt nope noshishi jp gt ncommitter nopeNoshishi lt nope noshishi jp gt n ninitial n gt gt gt decompressed split b b commit b tree daffffadacedcbabcc nauthor nopeNoshishi lt nope noshishi jp gt ncommitter nopeNoshishi lt nope noshishi jp gt n ninitial n a little bit more gt gt gt header content decompressed split b gt gt gt headerb commit gt gt gt contentb tree daffffadacedcbabcc nauthor nopeNoshishi lt nope noshishi jp gt ncommitter nopeNoshishi lt nope noshishi jp gt n ninitial n gt gt gt content split b n b tree daffffadacedcbabcc b author nopeNoshishi lt nope noshishi jp gt b committer nopeNoshishi lt nope noshishi jp gt b b initial b The stored data are as follows header commit Null byte xtree tree daffffadacedcbabccauthor author nopeNoshishi lt nope noshishi jp gt committer committer nopeNoshishi lt nope noshishi jp gt message initialYou can see that it contains the tree hash value that you saw in the tree chapter earlier information about the repository owner and the person who made the commit and the message I will go ahead with the commit and analyze it again Edit first txt as follows and add and commit again first txt version Hello World This is first txt Versiongit add first txtgit commit m second Then the contents of git objects are now as follows git └ーobjects ├ーf └ーf new tree project repo version ├ー └ーcb new commit second ├ー └ーcae old commit initial ├ーaf └ーd old blob second py version ├ーc └ーbdb new blob first txt version ├ーda └ーfff old tree project repo version └ーf └ーfb new blob first txt version See the new commit gt gt gt with open git objects cbcebbbbba rb as f content f read gt gt gt decompressed zlib decompress content gt gt gt decompressedb commit xtree ffcafaacccde nparent caebbadadafdcdbaf nauthor nopeNoshishi lt nope noshishi jp gt ncommitter nopeNoshishi lt nope noshishi jp gt n nsecond n The stored data are as follows header commit Null byte xtree tree daffffadacedcbabccparent parent caebbadadafdcdbafauthor author nopeNoshishi lt nope noshishi jp gt committer committer nopeNoshishi lt nope noshishi jp gt message secondThe new commit stored the hash value of the previous version of commit SupplementalThe difference against blob or tree is that commit does not store the actual data in the repository But it has meta data starting from tree Key Value StoreSome of you may have an idea of what I m talking about If you unravel a commit you can get a tree and if you unravel a tree you can get a blob The version flow shows the history because commit knows the hash value of the previous commit This image shows the history of the current commit So Git manages file versions from the starting point which is the hash value of the object Info Officially Git is called Address hash File System The hash function itself is an invertible transformation so the original data cannot be restored from the hash value but as long as the hash value depends on the contents of the object to begin with it may be called a value value store SummaryIn a world without version control systems like Git what do you do when you want to keep your current files and work on something new with the same files Perhaps one way you might think of doing this is to copy the file and put it in another folder In fact this seemingly weird management method is the closest form of version control that supports Git Info Git is a storage system that makes clever use of the OS file system Analize IndexThe index staging area is veiled but like the object the design is very simple On the other hand it is a bit quirky to analyze The dismantling of the index sucked up dozens of hours I m going to analyze git index which has been committed for the second time SpecificationIn order to analyze we need to understand the design specification of index Referring to Index format in the official document we found the following specifications Index FormatHeader bytes Index header DIRC bytes Index version basic version bits number of entries in index Entries are the meta information for each file エントリー bits create file time bits create file time at nano bits modify file time bits modify file time at nano bits device id bits inode bits Permission mode bits user id bits group id bits file size bits blob hash value bits filename size Number of bytes in filename string bytes filename Variable depending on file name bytes padding Variable depending on entry The same thing continues by number of entries IndexNow that we have the specifications we will read them again in python The index is uncompressed but reads in binary format as well as the object because all meta information is stored in bytes gt gt gt with open git index rb as f index f read gt gt gt indexb DIRC x x x x x x x xc xd xf x xeb x xbc xd xf x xeb x xb x x x x x xb x x x x xa x x x xf x x x x x x x xc x M xb x xe xdZ x xefV xbfK xeeQ xe x x x tfirst txt xc xdhv x xa xnc xdhv x xa xn x x x x x xb x x x x xa x x x xf x x x x x x x xaf x b xf xc xe xdfR x xb xcb xa x xX xbQ xaf x tsecond py xTREE x x x x x n xf xca xf x xt x xaa l xc xc xd xe xf xe xd x xc x xd xe xf xfp xc N xcdX xa It looks readable in places You can see the original DIRC first txt and second py Since bits is bytes it can be easily pulled out gt gt gt index b DIRC Index header gt DIRC gt gt gt index b x x x x Index version gt gt gt gt index b x x x x number of entries gt The index manages metadata per file so you will have two entries first txt and second py For the purpose of this article I will just take a quick look at the meta information from the next creation time to the group ID which is not very important except for the mode gt gt gt index b c xd xf ctime gt gt gt index b x xeb x xb ctime nano gt gt gt index b xd xf mtime gt gt gt index b x xeb x xb mtime nano gt gt gt index b x x x x dev id gt gt gt index b x xb x inode gt gt gt index b x x x xa mode gt gt gt index b x x xf user id gt gt gt index b x x x x gorup idHere are the key points to look at First is the file size file size gt gt gt index b x x x gt gt gt index gt gt gt index gt gt gt index gt gt gt index The file size of the next file to come is found to be bytes Next is the hash value hash gt gt gt index b xc x M xb x xe xdZ x xefV xbfK xeeQ xe x x gt gt gt index hex cbdbedaefbfbeee We see the hash value matches the one in version first txt And the size of the filename filename size gt gt gt index b x t gt gt gt index gt gt gt index This size in bytes is very important without it you will have to search for the next file name by your feeling Now that we know the filename is bytes we can gt gt gt index b first txt We can extract the file name without missing anything Finally padding depends on the number of bytes used to represent the entry The calculation method is to find X bytes such that the bytes up to the padding plus the X bytes to be padded is a multiple of Expressed as a formula X padding y filename size a remainder In this case from creation time to file size bytes and the file name is bytes We found the bytes of padding was byte gt gt gt index b x gt gt gt index b xc There s one that isn t a null bite and it s from the second bite gt gt gt index b xc xd The bytes of padding up to the next entry creation time was correctly matched SummaryActually when you add tree is not created You commit then tree will be generated from index index has important role to link added file data to blobs and manage which versions of files are committed You may have heared git dealed a snapshot not difference In other words when indexes have not been updated file data will always remain unless explicitly excluded And that means that everything you commit can be restored through the index Info index is an important entity that holds the key to whether or not a file is subject to version control in Git Background of CommandNow that we know how Git handles data let s take a quick look at how the commands behave The command has many options so more complex behavior can be achieved but I only describe a basic role addadd is responsible for adding deleting and updating the target file data to the index When added git creates a blob of the instantaneous latest file data The plumbing commands that make this happen are hash object and update index ※In Plumbing commands chapter I describe the detail commitGit create a tree corresponding to the repository directory based on the index created and then create a commit After the commit is successfully created change the hash value of the commit that the HEAD and branch point to The plumbing commands that accomplish this are write tree commit tree and update ref Digression Deciphering TreeWe ll look into the byte in a bit What is the maximum value of a number that can be represented by a single unsigned byte This corresponds to the maximum number of hexadecimal digits that can be represented by two hexadecimal digits gt gt gt temp xf I used the hex function quickly above but if you look at it one byte at a time gt gt gt hash gt gt gt for hex in temp hash format hex x gt gt gt hash ffbdbbfcfacfcf I can get the hash value of the blob corresponding to first txt as a string hash are characters but each character is a value calculated in hexadecimal So the trick is that one byte can represent two characters commit stores the hash value as a string but for some reason the tree stores the hash value directly as bytes not as a string There was some discussion on stackoverflow as to why HEAD and BranchThe Branch is responsible for marking specific commit objects It is stored under git refs heads You can easily see the contents with the Linux command cat Since we were working on the master branch earlier we can look at git refs heads master and see cat git refs heads mastercbcebbbbbaThe hash value of the last committed commit object was stored The HEAD indicates which commit object you are basing your file edits on HEAD can point directly to a commit object but it basically goes through branch git HEAD is what it is The data is stored as follows cat git HEADref refs heads masterIt contained the path about where the master branch is stored If you want to point directly to a commit detached head use checkout to move HEAD git checkout cbcebbbbba cat git HEADref cbcebbbbba Plumbing commandsTo further manipulate Git at a low level there is a command for every single action These are god like commands created by Mr Linus for ordinary people like me cat fileThis command allows you to see the contents of an object We worked hard earlier to analyze the object but this single command is the solution See object type git cat file t afdfcedfbcbabaf second pyblob Output object content git cat file p afdfcedfbcbabaf second pydef second print This is second py hash objectYou can hash file data etc or store them in git objects Let s create third rs struct Third message String calculate hash value git hash objectaaeeddffebeacd Create blob object git hash object waaeeddffebeacd ls git objects aaeeddffebeacd update indexThis command adds the target file to the index Note however that no object is created ls filesThis command provides a concise view of the contents of the index see the latest index git ls filesfirst txtsecond py add index third rs cache git update index add third rs git ls filesfirst txtsecond pythird rs git ls files s cbdbedaefbfbeee first txt afdfcedfbcbabaf second py aaeeddffebeacd third rs write treeWe create a tree based on the contents of the index All directories not just repository directory git write treeeacaaebefbbefe ls git objects eacaaebefbbefe commit treeWe create the commit with the hash of the repository directory tree Enter the hash value of the parent commit and the hash value of the tree you just created git commit tree p cbcebbbbba m third commit eacaaebefbbefeddbcddffecdbffdbf ls git objects dd bcddffecdbffdbf object is created update refWe can t just commit tree and follow the history because no one will see the commits you made This is because no one can see the commits we have made Because the git log follows the history sequentially from the commit pointed to by HEAD the commit you just created is not yet referenced git logcommit cbcebbbbba HEAD gt master Author nopeNoshishi lt nope noshishi jp gt Date Tue Jan secondcommit caebbadadafdcdbafAuthor nopeNoshishi lt nope noshishi jp gt Date Sun Jan initial Change the branch s references git update ref refs heads master ddbcd c new hash old hash git logcommit ddbcddffecdbffdbf HEAD gt master Author nopeNoshishi lt nope noshishi jp gt Date Thu Feb third commitcommit cbcebbbbbaAuthor nopeNoshishi lt nope noshishi jp gt Date Tue Jan secondIn creating Git it is difficult to suddenly create something as sophisticated as add or commit Therefore while implementing the plumbing command we will create add and commit in the development section to bypass the functionality of this command FinallyThank you for reading all the way to the end This is still a rough explanation but I hope it contributes to your understanding If you may ok please star my repository Reference SiteOfficail Documentation What you needListed here are the key elements in making git BinaryByteBitwise operationn decimal system and character stringsStringCompression algorithmsHash functionFile system Annotation zlib This is a free software to compress data losslessly The main compression algorithm called Deflate is very interesting Official Site back to article sha One of the very famous SHA based hash functions characterized by the generation of a bit byte hash value Incidentally the probability of a collision of sha hash values is said to be astronomical The Reality of SHA back to article hash number When you specify a hash value directly in a Git command you may only use characters As mentioned in ano this means that even with a small input hash value we can find a specific object because there are almost no hash collisions It is similar to the feeling of pressing tab in shell to receive input assistance back to article compressed string Compressed data is stored in a form that does not correspond to a character code Therefore it cannot be read as a specific character code back to article mode The mode permission can of course also be expressed in binary And since there are few combinations certain combinations can be expressed in computation back to article | 2023-02-12 08:18:13 |
海外TECH | CodeProject Latest Articles | InferJS Library & Compiler | https://www.codeproject.com/Articles/5353381/InferJS-Library-Compiler | javascript | 2023-02-12 08:37:00 |
ニュース | BBC News - Home | Quake-hit Turkey issues 113 building arrest warrants | https://www.bbc.co.uk/news/world-middle-east-64615349?at_medium=RSS&at_campaign=KARANGA | arrest | 2023-02-12 08:28:13 |
ニュース | BBC News - Home | Pregnant Russian women flying to Argentina for citizenship, officials say | https://www.bbc.co.uk/news/world-64610954?at_medium=RSS&at_campaign=KARANGA | citizenship | 2023-02-12 08:48:42 |
コメント
コメントを投稿