投稿時間:2023-02-03 01:26:09 RSSフィード2023-02-03 01:00 分まとめ(29件)

カテゴリー等 サイト名等 記事タイトル・トレンドワード等 リンクURL 頻出ワード・要約等/検索ボリューム 登録日
python Pythonタグが付けられた新着投稿 - Qiita 【自然言語処理】Kaggle1位タイ語極性分析を日本語でやってみる【TDIDFロジスティック回帰】 https://qiita.com/konbu9640/items/81f579b9a9339ca55b3d cstorm 2023-02-03 00:44:16
python Pythonタグが付けられた新着投稿 - Qiita pythonの内包表記について https://qiita.com/Yamakoshi/items/ea3ef44a576d5c7c132d numbersdoublednum 2023-02-03 00:40:55
python Pythonタグが付けられた新着投稿 - Qiita 言語の壁を打ち破る:GPT with 多言語インデックス https://qiita.com/yakigac/items/d350ec3c94f2c640c3cf harrypotte 2023-02-03 00:26:56
python Pythonタグが付けられた新着投稿 - Qiita 最近のイラスト系画像生成AI調べてみた https://qiita.com/kajiyai/items/41e55636f4d8f9dc3ca6 配布 2023-02-03 00:11:20
js JavaScriptタグが付けられた新着投稿 - Qiita CookieとLocalStorageについて調べて比較した【JavaScript】 https://qiita.com/nanamihirooka/items/8cea5c5d40e1474b844f localstorage 2023-02-03 00:35:10
Docker dockerタグが付けられた新着投稿 - Qiita Dockerのメモ https://qiita.com/kajiyai/items/3e55844b9b39cb177128 docker 2023-02-03 00:33:14
海外TECH MakeUseOf What Is a Smart Home EV Charging Station, and How Does It Work? https://www.makeuseof.com/what-is-smart-home-ev-charging-station/ adding 2023-02-02 15:45:15
海外TECH MakeUseOf 6 Ways Technology Can Help You Manage Diabetes https://www.makeuseof.com/technology-manage-diabetes-ways-help/ diabetes 2023-02-02 15:30:15
海外TECH MakeUseOf 10 Pro Tips for Naming and Organizing Files in Windows https://www.makeuseof.com/pro-tips-naming-organizingfiles-windows/ windows 2023-02-02 15:15:15
海外TECH MakeUseOf What Is MAC-Binding, and How Does It Work? https://www.makeuseof.com/what-is-mac-binding-how-does-it-work/ network 2023-02-02 15:01:16
海外TECH DEV Community 10 best GitHub repos for developers ✅ https://dev.to/mariamarsh/10-best-github-repos-for-developers-5gmp best GitHub repos for developers With the help of GitHub developers can easily access and share their code with others It has become an essential tool for developers to collaborate on projects and stay up to date with the latest trends in development For developers GitHub is an invaluable resource for finding the best repositories to help them with their development projects With so many repos available it can be difficult to know which ones are the most useful and reliable That s why I ve compiled a list of the top GitHub repositories for developers This list includes repositories that are popular with developers as well as those that offer unique features or tools to make development easier and more efficient I hope this list will provide you with some great starting points for your next project Public APIs Public APIs have become an essential tool for developers looking to build modern web and mobile applications The Public APIs repository on GitHub is a fantastic resource for finding free APIs to use in your projects and applications It covers a wide range of topics including business anime animals news finance games and more This repository contains very simple APIs like ones that return information about animals for example as well as more complex ones such as the Gmail API or the Google Analytics API This is a huge collection so go check it out for yourself FreeCodeCamp FreeCodeCamp is a large collection of repos on GitHub that are designed to help developers learn and practice coding It contains a wide range of projects tutorials and resources for developers to use in their development journey With its wide range of development tools and resources FreeCodeCamp is the perfect place for developers to learn and grow their skills With its free access to a large collection of repos on GitHub it s easy for developers to find the exact code they need for their project Whether you re just starting out or an experienced developer FreeCodeCamp can help you take your development skills to the next level Link for the repo Free Ebook Foundation This repository was also created for educational purposes like the previous one The Free Ebook Foundation offers users a free library with different books on various topics about development testing code writing etc There are links to free books in over languages Over a thousand books covering over programming languages and millions of concepts are available Link for the repo Storybook Storybook allows developers to quickly build test and iterate on their UI components without having to worry about the underlying code It also provides an easy way for developers to share their work with others and collaborate on projects The Storybook GitHub repo is a great resource for anyone who wants to get started with UI development It runs outside of your app This allows you to develop UI components in isolation which can improve component reuse testability and development speed You can build quickly without having to worry about application specific dependencies Also there is a lot of understandable information on how to implement it in your projects Link for the repo Build Your Own X Another great GitHub repo created by codecrafters io is a collection of well written step by step instructions for recreating our favourite technologies from scratch This amazing repository contains tutorials on how to build your own technology of any kind There are examples of how to create a command line tool an operating system a search engine a D renderer and plenty of other things Have you ever considered developing your own programming language Or maybe your own Docker or Git Then you ve come to the right place Link for the repo The Node js best practicesThis repository is an excellent resource for staying up to date with the Node world while also learning about best practices This repository which has over k stars and contributors is updated almost every day The Node js best practices repository contains a summary and curation of the most popular content on Node js as well as its integration with other tools such as Docker Kubernetes and so on It now contains over best practices style guides and architectural tips Link for the repo Developer roadmap This repository contains interactive roadmaps guides and other educational content to help developers grow in their careers While it may appear to be a bit overwhelming at first it is a useful guide for what is possible and required in this rapidly changing industry It is updated on a weekly monthly and annual basis They also have their own website with roadmaps to becoming a frontend backend Android DevOps React and PostgreSQL developer Link for the repo The AlgorithmsThis is an open source resource for learning data structures and algorithms and their implementation in any programming language It is one of the best GitHub repositories for learning different languages data structures and algorithms Every computer science student should be familiar with data structures This repository has something for everyone whether you are a Python developer a Java developer a Go developer or an old school C developer All of the algorithms and data structures presented here are easily explained They also have a website where you can easily access all of the code Link for the repo Gitignore The purpose of this repository is quite simple it is a collection of gitignore templates To filter what gets uploaded every new project you create as a GitHub repository must include a gitignore file This file s content varies depending on the project and language The repository includes templates for almost any language or framework including Rails Python Perl Laravel Java and others Link for the repo The art of command line One of the most popular tools a developer has ever used is the command line It becomes important for every developer to master it There are numerous commands that can save you many hours each day The README md file is available in different languages Despite the fact that this repository has not been updated in a couple of years it still contains plenty of useful information for developers Link for the repo Thanks for reading Write about your favourite GitHub repositories in the comments 2023-02-02 15:43:48
海外TECH DEV Community Twitter to revoke free access to the Twitter API beginning Feb 9 https://dev.to/erinposting/twitter-to-revoke-free-access-to-the-twitter-api-beginning-feb-9-4ahg Twitter to revoke free access to the Twitter API beginning Feb This just in straight from the bird s mouth beak Twitter Dev twitterdev Starting February we will no longer support free access to the Twitter API both v and v A paid basic tier will be available instead AM Feb Twitter Dev twitterdev Twitter data are among the world s most powerful data sets We re committed to enabling fast amp comprehensive access so you can continue to build with us We ll be back with more details on what you can expect next week AM Feb Well DEVCommunity now you know What do you think 2023-02-02 15:23:41
海外TECH DEV Community You are probably testing wrong https://dev.to/wparad/you-are-probably-testing-wrong-4il3 You are probably testing wrongI love having to answer the questions that come up regarding testing It s amazing that something that is pure waste according to lean for our customers and users can still offer us so much value In less mature organizations an often unfortunate and incorrect assumption is that we need to ensure test coverage And so the question get s asked How many tests should we haveThis is a question whose only wrong answers are those that contain a number no wait test coverage is correct But only right answer is The right number is based on the problem we are trying to solveMy colleagues often joke that this is just another example of It Depends Because it really does the only wrong answers to this question are based on fixed numbers or percentages And I ll share exactly why Why write tests at all First we need to start with talking about why we should write tests at all Normally I encourage my teams to write tests like they write automation Is the code frequently changing recently do you want to prevent a regression is the code critical to the success of the application where a failure would spell huge problems or is there a business reason the code is the way it is in other words a unit test is better than a code comment How many tests this comes out to be is a result of the domain and those criteria but my perspective if that it doesn t matter if that is or as long as those things above are tested and nothing else is then I m good The caveat is libraries which can have component testing at because of the widespread usage and that s because of Hyrum s Law we know all the aspects of the library we become used A deeper lookLet s review those different areas where we should write tests Frequently changing codeWriting tests in the wrong spot is purely a waste but when you have an area where code changes frequently this is one of the place places to put tests It might sound counter intuitive since adding tests will slow down development But since this areas changes so much bugs here will happen with the higher probability and frequency The reason this happens is because there are usually N bugs lines of code And every additional line there is risk of a new bug Add a test Preventing regressionsAt some point you ll change something that has been working before It rightfully doesn t have any tests because it wasn t that important when it was written and it s mostly straightforward However you are adding something new and really don t want to break what was there This is the agile test Since we didn t need a test before we didn t write one But now that we are changing the code this is the perfect time We ll add the test based on the existing code to prevent additional changes from having an adverse impact Add a test High impact codeDoes this code being wrong spell disaster for the company Is it more than k if we get this wrong If the answer is yes then there should be a test User can t log in isn t k and often even User can t click buy isn t even k since those users come back But a bug that automatically deletes your whole database because it doesn t have a where clause now that s bad Add a test Business Logic justificationsComments are the worst And the worst kinds of comments are The sum of a and bsum a bCompletely worthless Good comments explain the why here s an good example We use quick sort because we expect this to be the fastestbased on our expectations of what the data looks like thedata is expected to look like this result quicksort array But that s not great and it makes a lot of this confusing and worse if we had some other requirement like IMPORTANT DO NOT CHANGE we only allow alphanumericbecause we use this in XYZ other process result string replace a z g This is the perfect time to write a unit test and express that dependency as the test description That way we can be sure no one will accidentally change or delete the comment Add a test Types of testsNow that we know which areas of our code we should test we can start to think about how to actually test these things You ll notice all the examples above lend themselves to be Unit Tests And that s because almost everything should be a unit test How do I know that I look to the test pyramid which points to a majority of the tests being unit tests Above unit tests we have in ascending order component service levelintegration also known as Production Testsexploratory manual testsIf we think we need a service level test on our endpoints then I would expect an exponential number of tests at the unit level same goes for integration Have unit tests that means you want service level tests and integration test And the only time you want exploratory manual tests is during pull requests where new functionality is being added Your engineering team during the PR should be diving into the automatically built PR ephemeral environment and trying it out to see if it breaks So shouldn t everything be tested You ll notice that no where in the pyramid are EE End to End tests That s because in any real technology deployment it s impossible to test anything end to end and we actually don t need to That s because our end to end isn t run synchronously so we don t need synchronous tests to validate that Further most of this will happen any way when our users use our technology We ll see what is and isn t working and we can fix it on the spot Does Data IN and Data OUT both work And do all the unit tests for how that data is loaded and saved work Then the process is isomorphic and will work we don t need a test for IN and OUT Also the easiest answer is just Definitely not At most companies it makes sense to have a couple that means validations that test the most important value that thing that must always work or else the company will go under But everything else the test costs more to write and maintain then it does to fix the problem With great disdain I ll reference one company that found being on one extreme to work for them So how many tests do we really need The answer is the tests the make up those things we need to test For one service the coverage may need to be and another component might be But arbitrarily setting a value is irresponsible And we start to develop antipatterns Testing antipatternsThere are three fundamental antipatterns that exist with test coverage The first one is we must have X test coverage Let s dive into that It s so bad because it falls pray to we get what we measure If we are forced to write tests then we will write the simplest and easy tests And that means we are neither ensuring that the tests are correct nor that the tests are the ones we should have And if was easy to write the tests in the right spot then we would have from the start So clearly the tests we need are the ones we don t want to write This causes all the wrong tests to be written and thus not only is this waste according to lean but also doesn t give us any value The biggest example of the tests that make no sense is testing user login We never need to test user login because if it doesn t work we can know immediately Since every change an engineer makes needs login to work Further we have monitoring up and running that if login doesn t work we ll know Also let s take a look at login Your team didn t write login your company didn t even write it You used a third party product to handle of your login needs be it Auth in the BC space or Authress for example in the BB space Do not test software from another company They are already testing their software all the time and if you don t trust them to not break it you need to find a different provider The second antipattern is The deletion of code causes the test coverage to go down I love this one You might think a rule like if the test coverage goes down block the pull request But let s say you have a problem with ten lines of code and of that code is tested If you remove a line of code that is not tested your coverage goes up to gt BUT if you remove a line of code that was tested then your test coverage goes down to That means as an engineer you aren t allowed to remove dead code code that does the wrong thing or just fix something and do it more effectively because you wrote a rule that doesn t make sense The last antipattern is production is never broken If production never breaks then we have too many tests Full stop The goal is to have tests that prevent production failures and we don t want tests that will never prevent prod failures But how do we know which those are But still I can t see why prod not breaking is not something we should aim for It s simple actually if production never breaks we have too many tests That means we could waste less time writing tests and more time delivering value If you never see a problem then it is too far away And as mindful testers we should be focusing on preventing real problems not imaginary ones Prevent production problems that will never happen is a waste of everyone s time This brings us back to the original point of adding tests where we know we need to But it isn t always so easy to know where those things above are Thankfully we can use the DORA metrics to help us And specifically How many prod failures do we have today Called the Change Failure Rate CFRWhat s the Mean Time To Resolution MTTR Those are two of the four DORA metrics and if we don t know the answers to those then we also don t know how many tests we should have If you aren t tracking how many production problems you are getting and how long it takes to fix them then adding metrics for testing and creating arbitrary tests is the wrong solution You simply are spraying tests everywhere hoping to hit something Don t add tests randomly Another way of looking at this is I want to see prod breaking in ways that don t matter but I don t want it to break it ways that do If production isn t breaking at all you are violating this and of course if it is breaking in ways that does matter then you need to add more tests What is defensive programmingIn the guise of testing we often forget about Logging Monitoring and more importantly how to write better code The former I ve talked about at length so I m not going to go into it here other than to say if we don t know when we have a problem and also details about what that problem is then have no idea what the fix should be The latter is Defensive Programming If your code could break something no matter of unit testing service level testing production testing or exploratory testing is going to find it unless you are really really good So in many places you can throw an extra try catch around your code or execute the current code and new code in parallel and then compare the results in memory before returning the result It doesn t matter in these situations if your code is tested because this is a much easier faster more reliability and safer way to be correct If you can write the simple test write the test but that doesn t mean you shouldn t also prevent bugs in production using non testing based strategies That s why we have PRs after all Doing the followup analysisSome teams do RCA or Root Cause Analysis on problems that happen in production RCAs are great but the only wrong answer is more tests It s the wrong answer because the only place where you will have problems is where you didn t test correctly So coming up with this as the answer is almost always wrong Instead we need to look at the long term solution for each problem For instance how did we break login Is it a component that has an issue is something bespoke that we did custom or a change we don t understand The fix isn t a test it s potentially stop doing a custom thing and do it in a standard way Or maybe we need education Adding a test should only be done if we are doing the right long term thing education is there but it falls into one of the above risk categories ConclusionAdding unit tests to all our code creates a burden for future development So we need trade off extra burden for extra value Arbitrarily adding tests to meet a test coverage always results in the wrong tests Even the simple thing like test that user can log in is way more complex discussion than it seems Is it one user ten users or every user How about users that are currently logged in is it a problem for them Do we have any ongoing expected user activities or a possibly unexpected spike in user activity that will happen The simple thing let s test user login isn t straight forward If no one is logging in and it only affects a couple of users then that s not important instead we can be reactive Further we need to understand when to be reactive versus being proactive Testing is proactive find problems before they happen We know via the Pareto Principle it would take an infinite of time to prevent all bugs which means we have to let some through We don t even have a large finite amount of time let alone infinite So don t test everywhere test only in some places The highest value places Those we cane be proactive but everywhere else we should optimize for being reactive The truth is that we likely don t need anywhere close to the number of tests that you are collectively running today I m going to say anecdotally something like unit test coverage Service tests per service and production test per team on average is the right amount Clever tests in the right spot are worth so much more than an arbitrary percentage Should this thing have automated testing is a conversation it definitely can t be we have some arbitrary metric to hit Like we should never say we must have of our code unit tested besides that being a ridiculously high amount it s actually detrimental to have more than a few tests Since every test we add is a burden on building new things You want tests where your risks are Risks that will end the company Cause revenue loss An easy way for me to look at this is you aren t at a scale that s appropriate for having more than one or two EE tests at the whole company I d rather see us move faster and break some things If production never breaks then we have too many tests 2023-02-02 15:01:06
海外TECH DEV Community Who likes stickers more? Software developers/computing professionals or five-year-olds? And why? https://dev.to/cicirello/who-likes-stickers-more-software-developerscomputing-professionals-or-five-year-olds-and-why-4a9k Who likes stickers more Software developers computing professionals or five year olds And why Over the past few weeks I ve received laptop stickers from multiple sources First came a batch of stickers along with the t shirt for participating in Hacktoberfest Then there were a couple in with my membership to FSF And most recently a bunch of stickers from DEV as a thank you for volunteering as a Tag Moderator Trusted User thank you DEV for the thank you stickers All of these stickers made me wonder Which group of people collectively like stickers more Software developers computing professionals or five year olds Little kids like stickers so much that it is not uncommon for parents and teachers to use sticker charts as a form of operant conditioning to shape their behavior Are the laptops of computing professionals just expensive sticker charts Or are computing professionals somehow more likely to find childlike enjoyment out of simple things like stickers than those in other fields Go to the comments section to discuss Why do you like stickers Why do you think software developers collectively like stickers Feel free to also include pictures of your stickers such as on your laptops etc Anyway here s a few photos of just the stickers I don t actually have any on my current laptop From DEV Thanks DEV From FSF From Hacktoberfest 2023-02-02 15:00:53
Apple AppleInsider - Frontpage News Daily Deals Feb. 2: AirPods for $99, $200 off MacBook Air, iPad Air 5 $499 & more https://appleinsider.com/articles/23/02/02/daily-deals-feb-2-airpods-for-99-200-off-macbook-air-ipad-air-5-499-more?utm_medium=rss Daily Deals Feb AirPods for off MacBook Air iPad Air amp moreSome of the today s best deals include off a Google Hello video doorbell off a Samsung Frame K smart TV off JBL Tune In ear headphones off a MacBook Pro and more Save on a Google Hello Video Doorbell The AppleInsider team searches for unbeatable deals at online retailers to curate a list of can t miss deals on the top tech products including discounts on Apple products TVs accessories and other gadgets We share the best deals in our Daily Deals list to help you get the most bang for your buck Read more 2023-02-02 15:29:45
海外TECH Engadget 'Star Wars: Visions' Volume 2 debuts May 4th with an Aardman short https://www.engadget.com/star-wars-visions-volume-2-release-data-aardman-154149439.html?src=rss x Star Wars Visions x Volume debuts May th with an Aardman shortStar Wars Visions is returning for a second season this time with a more international scope ーincluding a studio you might not have expected Disney has announced that Star Wars Visions Volume will premiere May th aka Star Wars Day with shorts from nine countries including one from UK stop motion legend Aardman Details of the project quot I Am Your Mother quot aren t available but it s directed by Wallace amp Gromit veteran Magdalena Osinska Other titles come from Pictures India Cartoon Saloon Ireland D art Shtajio Japan El Guiri Spain Punkrobot Chile Studio La Cachette France Studio Mir South Korea and Triggerfish South Africa Some of the creators have illustrious credentials El Guiri s Rodrigo Blaas is a Pixar alumnus for example while Triggerfish has worked on BBC titles like The Highway Rat and Stick Man The first Visions focused on Japanese anime studios approach to the Star Wars universe including well known names like Production I G Ghost in the Shell Stand Alone Complex and Trigger Kill la Kill Creators were given more creative freedom than those producing canonical movies and TV shows ーthey were free to not only pursue different art styles and themes but to break continuity with the official storyline That s likely to continue with Volume as series executive producer James Waugh says the anthology is about quot celebratory expressions quot of Star Wars that open quot bold new ways quot of telling stories in the space fantasy setting The Visions release date bolsters an increasingly packed Star Wars release schedule at Disney It starts with The Mandalorian season three on March st but will also include Young Jedi Adventures spring Ahsoka and Skeleton Crew You ll have plenty to watch then even if the animated shorts aren t to your liking 2023-02-02 15:41:49
海外TECH Engadget Microsoft rolls out Teams Premium with OpenAI-powered features https://www.engadget.com/microsoft-teams-premium-openai-artificial-intelligence-152658424.html?src=rss Microsoft rolls out Teams Premium with OpenAI powered featuresFresh off the heels of news that Microsoft is making a multibillion dollar investment into OpenAI it s integrating the company s tech into more of its products and services Microsoft has announced that Teams Premium is now broadly available The service features large language models powered by OpenAI s GPT along with other tech geared toward making meetings more intelligent personalized and protected Microsoft says Teams Premium offers AI generated chapters in PowerPoint Live and “personalized timeline markers for when you leave and join a meeting Live translations in captions are currently available too In the coming months Teams Premium will be able to automatically generate meeting notes with the help of GPT Users will have access to AI generated task and action item suggestions as well Microsoft will likely expand the Teams Premium AI features over time The company previously introduced the Azure OpenAI Service for developers along with a tool to help beginners build their own apps and a graphic design app that are both powered by OpenAI tech Word on the street is that Microsoft is building ChatGPT OpenAI s astonishingly popular chatbot into Bing Google is said to be working on an AI chatbot for Search too 2023-02-02 15:26:58
海外科学 NYT > Science A Proud Ship Turned Into a Giant Recycling Problem. Brazil Plans to Sink It. https://www.nytimes.com/2023/02/02/climate/brazil-aircraft-carrier-asbestos.html A Proud Ship Turned Into a Giant Recycling Problem Brazil Plans to Sink It The old aircraft carrier once the navy s flagship is packed with asbestos No country including Brazil will let it dock to be dismantled 2023-02-02 15:20:36
金融 RSS FILE - 日本証券業協会 PSJ予測統計値 https://www.jsda.or.jp/shiryoshitsu/toukei/psj/psj_toukei.html 統計 2023-02-02 16:00:00
金融 RSS FILE - 日本証券業協会 株券等貸借取引状況(週間) https://www.jsda.or.jp/shiryoshitsu/toukei/kabu-taiw/index.html 貸借 2023-02-02 15:30:00
金融 金融庁ホームページ 「脱炭素等に向けた金融機関等の取組みに関する検討会」(第4回)を開催します。 https://www.fsa.go.jp/news/r4/singi/20230202.html 金融機関 2023-02-02 17:00:00
金融 金融庁ホームページ 鈴木財務大臣兼内閣府特命担当大臣閣議後記者会見の概要(令和5年1月31日)を掲載しました。 https://www.fsa.go.jp/common/conference/minister/2023a/20230131-1.html 内閣府特命担当大臣 2023-02-02 16:15:00
ニュース BBC News - Home Image of witness released in missing mum search https://www.bbc.co.uk/news/uk-england-lancashire-64501150?at_medium=RSS&at_campaign=KARANGA bulley 2023-02-02 15:30:42
ニュース BBC News - Home UK to see shorter recession, says Bank of England https://www.bbc.co.uk/news/business-64487179?at_medium=RSS&at_campaign=KARANGA englandthe 2023-02-02 15:24:36
ニュース BBC News - Home Omagh bombing: UK government announces independent statutory inquiry https://www.bbc.co.uk/news/uk-northern-ireland-64495873?at_medium=RSS&at_campaign=KARANGA investigation 2023-02-02 15:34:05
ニュース BBC News - Home British Gas admits agents break into struggling homes https://www.bbc.co.uk/news/business-64491243?at_medium=RSS&at_campaign=KARANGA meters 2023-02-02 15:01:23
ニュース BBC News - Home Lucy Letby: Nurse sent card to grieving parents, jury told https://www.bbc.co.uk/news/uk-england-merseyside-64496406?at_medium=RSS&at_campaign=KARANGA hears 2023-02-02 15:10:57
ニュース BBC News - Home Six Nations 2023: Ollie Hassell-Collins to make England debut against Scotland https://www.bbc.co.uk/sport/rugby-union/64500383?at_medium=RSS&at_campaign=KARANGA twickenham 2023-02-02 15:02:06
GCP Cloud Blog Advancing cancer research with public imaging datasets from the National Cancer Institute Imaging Data Commons https://cloud.google.com/blog/topics/developers-practitioners/advancing-cancer-research-public-imaging-datasets-national-cancer-institute-imaging-data-commons/ Advancing cancer research with public imaging datasets from the National Cancer Institute Imaging Data CommonsMedical imaging offers remarkable opportunities in research for advancing our understanding of cancer discovering new non invasive methods for its detection and improving overall patient care Advancements in artificial intelligence AI in particular have been key in unlocking our ability to use this imaging data as part of cancer research Development of AI powered research approaches however requires access to large quantities of high quality imaging data  Sample images from NCI Imaging Data Commons  Left Magnetic Resonance Imaging MRI of the prostate credit along with the annotations of the prostate gland and substructures  Right highly multiplexed fluorescence tissue imaging of melanoma credit The US National Cancer Institute NCI has long prioritized collection curation and dissemination of comprehensive publicly available cancer imaging datasets Initiatives like The Cancer Genome Atlas TCGA and Human Tumor Atlas Network HTAN to name a few work to make robust standardized datasets easily accessible to anyone interested in contributing their expertise students learning the basics of AI engineers developing commercial AI products researchers developing innovative proposals for image analysis and of course the funders evaluating those proposals Even so there continue to be challenges that complicate sharing and analysis of imaging data Data is spread across a variety of repositories which means replicating data to bring it together or within reach of tooling such as cloud based resources Images are often stored in vendor specific or specialized research formats which complicates analysis workflows and increases maintenance costs Lack of a common data model or tooling make capabilities such as search visualization and analysis of data difficult and repository or dataset specific  Achieving reproducibility of the analysis workflows a critical function in research is challenging and often lacking in practice Introducing Imaging Data CommonsTo address these issues as part of the Cancer Research Data Commons CRDC initiative that establishes the national cancer research ecosystem NCI launched the Imaging Data Commons IDC a cloud based repository of publicly available cancer imaging data with several key advantages Colocation Image files are curated into Google Cloud Storage buckets side by side with on demand computational resources and cloud based tools making it easier and faster for you to access and analyze Format Images annotations and analysis results are harmonized into the standard DICOM Data Imaging and Communications and Medicine format to improve interoperability with tools and support uniform processing pipelines Tooling IDC maintains tools that without having to download anything allow you to explore and search the data and visualize images and annotations You can easily access IDC data from the cloud based tools available in Google Cloud such as Vertex AI Colab or deploy your own tools in highly configurable virtual environments Reproducibility Sharing reproducible analysis workflows is streamlined through maintaining persistent versioned data that you can use to precisely define cohorts used to train or validate algorithms which in turn can be deployed in virtual environments that can provide consistent software and hardware configuration IDC ingests and harmonizes de identified data from a growing list of repositories and initiatives spanning a broad range of image types and scales cancer types and manufacturers A significant portion of these images are accompanied by annotations and clinical data  For a quick summary of what is available in IDC check the IDC Portal or this Looker Studio dashboard  Exploring the IDC dataIDC PortalA great place to start exploring the data is the IDC Portal From this in browser portal you can use some of the key metadata attributes to navigate the images and visualize them Navigating the IDC portal to view dataset imagesAs an example here are the steps you can follow to find slide microscopy images for patients with lung cancer From the IDC Portal proceed to “Explore images In the top right portion of the exploration screen use the summary pie chart to select Chest primary site you could alternatively select Lung noting that annotation of cancer location can use different terms In the same pie chart summary section navigate to Modality and select Slide Microscopy In the right hand panel scroll to the Collections section which will now list all collections containing relevant images Select one or more collections using the checkboxes  Navigate to the Selected Cases section just below where you will find a list of patients within the selected collections that meet the search criteria  Next select a given patient using the checkbox Navigating to the Selected Studies section just below will now show the list of studies think of these as specific imaging exams available for this patient  Click the “eye icon on the far right which will open the viewer allowing you to see the images themselves BigQuery Public DatasetWhen it s time to search and select the subsets or cohorts of the data that you need to support your analysis more precisely you ll head to the public dataset in BigQuery This dataset contains the comprehensive set of metadata available for the IDC images beyond the subset contained in the IDC portal which you can use to precisely define your target data subset with a custom standard SQL query You can run these queries from the in browser BigQuery Console by creating a BigQuery sandbox The BigQuery sandbox enables you to query data within the limits of the Google Cloud free tier without needing a credit card If you decide to enable billing and go above the free tier threshold you are subject to regular BigQuery pricing However we expect most researchers needs will fit within this tier  To get started with an exploratory query you can select studies corresponding to the same criteria you just used in your exploration of the IDC Portal code block StructValue u code u SELECT r n DISTINCT StudyInstanceUID r nFROM r n bigquery public data idc current dicom all r nWHERE r n tcia tumorLocation Chest r n AND Modality SM u language u lang sql u caption lt wagtail wagtailcore rich text RichText object at xeac gt Alright now you re ready to write a query that creates precisely defined cohorts This time we ll shift from exploring digital pathology images to subsetting Computed Tomography CT scans that meet certain criteria The following query selects all files identified by their unique storage path in the gcs url column and corresponding to CT series that have SliceThickness between and mm It also builds a URL in series viewer url that you can follow to visualize the series in the IDC Portal viewer For the sake of this example the results are limited to only one series code block StructValue u code u SELECT r n collection id r n PatientID r n SeriesDescription r n SliceThickness r n gcs url r n CONCAT StudyInstanceUID seriesInstanceUID SeriesInstanceUID AS series viewer url r nFROM r n bigquery public data idc current dicom all r nWHERE r n SeriesInstanceUID IN r n SELECT r n SeriesInstanceUID r n FROM r n bigquery public data idc current dicom all r n WHERE r n Modality CT r n AND SAFE CAST SliceThickness AS FLOAT gt r n AND SAFE CAST SliceThickness AS FLOAT lt r n LIMIT r n u language u lang sql u caption lt wagtail wagtailcore rich text RichText object at xeabac gt As you start to write more complex queries it will be important to familiarize yourself with the DICOM format and how it is connected with the IDC dataset This getting started tutorial is a great place to start learning more What can you do with the results of these queries For example You can build the URL to open the IDC Portal viewer and examine individual studies as we demonstrated in the second query above You can learn more about the patients and studies that meet this search criteria by exploring what annotations or clinical data available accompanying these images The getting started tutorial provides several example queries along these lines You can link DICOM metadata describing imaging collections with related clinical information which is linked when available This notebook can help in navigating clinical data available for IDC collections Finally you can download all images contained in the resulting studies Thanks to the support of Google Cloud Public Dataset Program you are able to download IDC image files from Cloud Storage without cost Integrating with other Cloud toolsThere are several Cloud tools we want to mention that can help in your explorations of the IDC data Colab Colab is a hosted Jupyter notebook solution that allows you to write and share notebooks that combine text and code download images from IDC and execute the code in the cloud with a free virtual machine You can expand beyond the free tier to use custom VMs or GPUs while still controlling costs with fixed monthly pricing plans Notebooks can easily be shared with colleagues such as readers of your academic manuscript Check out these example Colab notebooks to help you get started Vertex AI Vertex AI is a platform to handle all the steps of the ML workflow Again it includes managed Jupyter notebooks but with more control over the environment and hardware you use As part of Google Cloud it also comes with enterprise grade security which may be important to your use case especially if you are joining in your own proprietary data Its Experiments functionality allows you to automatically track architectures hyperparameters and training environments to help you discover the optimal ML model faster  Looker Studio Looker Studio is a platform for developing and sharing custom interactive dashboards You can create dashboards that are focused on a specific subset of metadata accompanying the images and cater to the users that prefer interactive interface over the SQL queries As an example this dashboard provides a summary of IDC data and this dashboard focuses on the preclinical datasets within the IDC Cloud Healthcare API  IDC relies on Cloud Healthcare API to extract and manage DICOM metadata with BigQuery and to maintain DICOM stores that make IDC data available via the standard DICOMweb interface IDC users can utilize these tools to store and provide access to the artifacts resulting from the analysis of IDC images As an example DICOM store can be populated with the results of image segmentation which could be visualized using a user deployed Firebase hosted instance of OHIF Viewer deployment instructions are available here Next StepsThe IDC dataset is a powerful tool for accelerating data driven research and scientific discovery in cancer prevention treatment and diagnosis We encourage researchers engineers and students alike to get started by following the onboarding steps we laid out in this post familiar yourselves with the data by heading to the IDC portal tailor your cohorts using the BigQuery public dataset and then download the images to analyze with your on prem tools or with Google Cloud services or Colab Getting started with the IDC notebook series should help you get familiar with the resource For questions you can reach the IDC team at support canceridc dev or join the IDC community and post your questions Also see the IDC user guide for more details including official documentation Related ArticleBoost medical discoveries with AlphaFold on Vertex AILearn ways to run AlphaFold on Google Cloud using no cost solutions and guides Read ArticleRelated ArticleMost popular public datasets to enrich your BigQuery analysesCheck out free public datasets from Google Cloud available to help you get started easily with big data analytics in BigQuery and Cloud Read Article 2023-02-02 17:00:00

コメント

このブログの人気の投稿

投稿時間:2021-06-17 05:05:34 RSSフィード2021-06-17 05:00 分まとめ(1274件)

投稿時間:2021-06-20 02:06:12 RSSフィード2021-06-20 02:00 分まとめ(3871件)

投稿時間:2020-12-01 09:41:49 RSSフィード2020-12-01 09:00 分まとめ(69件)