投稿時間:2020-06-27 05:19:56 RSSフィード2020-06-27 05:00 分まとめ(29件)

カテゴリー等	サイト名等	記事タイトル・トレンドワード等	リンクURL	頻出ワード・要約等/検索ボリューム	登録日
AWS	AWS Machine Learning Blog	Developing NER models with Amazon SageMaker Ground Truth and Amazon Comprehend	https://aws.amazon.com/blogs/machine-learning/developing-ner-models-with-amazon-sagemaker-ground-truth-and-amazon-comprehend/	Developing NER models with Amazon SageMaker Ground Truth and Amazon ComprehendNamed entity recognition NER involves sifting through text data to locate noun phrases called named entities and categorizing each with a label such as person organization or brand For example in the statement “I recently subscribed to Amazon Prime Amazon Prime is the named entity and can be categorized as a brand Building an accurate …	2020-06-26 19:19:35
AWS	AWS	AWS Builder Stories	https://www.youtube.com/watch?v=qW4mV4Epkvw	AWS Builder StoriesThe unique stories backgrounds of our Builders here at AWS bring inspiration to us every day They contribute their knowledge and experiences to help navigating our way through uncharted territory at work and at home Learn more about AWS Careers at Subscribe More AWS videos More AWS events videos AWS	2020-06-26 19:17:00
AWS	AWS - Webinar Channel	Modernize and Simplify Data Archiving with AWS Storage - AWS Online Tech Talks	https://www.youtube.com/watch?v=wR4QaTnM_UM	Modernize and Simplify Data Archiving with AWS Storage AWS Online Tech TalksAWS offers a complete set of cloud storage services for data archiving Customers can choose Amazon S Glacier and S Glacier Deep Archive for affordable less time sensitive cloud storage or Amazon S for faster cloud storage depending on their retention and retrieval needs With AWS Storage Gateway and our solution provider ecosystem customers can build a comprehensive storage solution for active archiving or digital asset preservation Join us to learn best practices and architectures for long term data retention Learning Objectives Learn why AWS is the best solution for archival use cases Learn best practices of archiving data Learn AWS Storage solutions for long term data retention To learn more about the services featured in this talk please visit	2020-06-26 19:32:21
Program	[全てのタグ]の新着質問一覧｜teratail（テラテイル）	C++ オブジェクト指向　クラス間の値	https://teratail.com/questions/273202?rss=all	Cオブジェクト指向クラス間の値Aクラスの値をBクラスのメソッドで使いたい場合オブジェクト指向でどのように記述すればいいのでしょうか具体的に言いますとwinapiにてウィンドウクラスとボタンクラスを作成しました。	2020-06-27 04:42:14
海外TECH	Ars Technica	Amazon pays $1.2 billion for self-driving startup Zoox	https://arstechnica.com/?p=1687685	service	2020-06-26 19:35:13
Apple	AppleInsider - Frontpage News	Apple uses WWDC to launch assaults on Google strongholds	https://appleinsider.com/articles/20/06/26/at-wwdc-2020-apple-launched-new-assaults-on-google-strongholds	Apple uses WWDC to launch assaults on Google strongholdsApple s upcoming stable of software updates will introduce a fiery onslaught of new features designed to lure consumers away from some of Google s legacy businesses and services	2020-06-26 19:28:14
海外TECH	Engadget	NASA made a necklace that reminds you not to touch your face	https://www.engadget.com/nasa-coronavirus-covid-19-necklace-jpl-stop-touching-your-face-194312229.html	NASA made a necklace that reminds you not to touch your faceNASA has released open source instructions for a D printed necklace designed to help you stop touching your face We ve heard time and time again that we shouldn t touch our mush with our fingers to limit our chances of contracting COVID However	2020-06-26 19:43:12
Cisco	Cisco Blog	Back to the Office or Not? Why it Matters and Why it Shouldn’t	https://blogs.cisco.com/collaboration/back-to-the-office-or-not-why-it-matters-and-why-it-shouldnt	Back to the Office or Not Why it Matters and Why it Shouldn tThe beauty of Webex Rooms is that you can seamlessly work from anywhere back to the office or not The post Back to the Office or Not Why it Matters and Why it Shouldn t appeared first on Cisco Blogs	2020-06-26 19:35:45
海外TECH	CodeProject Latest Articles	Run Linux in Microsoft Windows in a VirtualBox - 2017	https://www.codeproject.com/Articles/1196557/Run-Linux-in-Microsoft-Windows-in-a-VirtualBox	miscellaneous	2020-06-26 19:46:00
海外科学	NYT > Science	Coronavirus Live News and Updates	https://www.nytimes.com/2020/06/26/world/coronavirus-live-updates.html	Coronavirus Live News and UpdatesTrump s coronavirus task force gave its first briefing in nearly two months as new daily cases in Florida shot past The W H O said it needs billion to speed production of a vaccine	2020-06-26 19:56:27
海外ニュース	Japan Times latest articles	Shareholders meetings peak in Japan as pandemic forces changes	https://www.japantimes.co.jp/news/2020/06/26/business/corporate-business/shareholders-meetings-peak-japan-pandemic/	Shareholders meetings peak in Japan as pandemic forces changesSome of the companies held virtual meetings that helped investors participate from afar while others scaled down the annual events to prevent infections	2020-06-27 05:53:04
海外ニュース	Japan Times latest articles	Filipino phenom Thirdy Ravena leaves comfort zone to join NeoPhoenix in Aichi	https://www.japantimes.co.jp/sports/2020/06/26/basketball/b-league/filipino-phenom-thirdy-ravena-leaves-comfort-zone-to-join-neophoenix-in-shizuoka/	Filipino phenom Thirdy Ravena leaves comfort zone to join NeoPhoenix in AichiThe year old shooting guard is the first to join the B League under a new rule encouraging the use of other Asian players	2020-06-27 04:01:14
海外ニュース	Japan Times latest articles	Japan’s structural problems exposed by COVID-19 crisis	https://www.japantimes.co.jp/opinion/2020/06/26/commentary/japan-commentary/japans-structural-problems-exposed-covid-19-crisis/	Japan s structural problems exposed by COVID crisisA crisis exposes the contradiction inherent in a society in various forms That is an important lesson from history The confusion caused by the COVID	2020-06-27 04:05:07
ニュース	BBC News - Home	UN chief 'shocked and disturbed' by video of car sex act in Israel	https://www.bbc.co.uk/news/53191444	israel	2020-06-26 19:37:16
ニュース	BBC News - Home	Liverpool: Jurgen Klopp 'absolutely overwhelmed' by Premier League title win	https://www.bbc.co.uk/sport/av/football/53200874	Liverpool Jurgen Klopp x absolutely overwhelmed x by Premier League title winLiverpool manager Jurgen Klopp tells BBC sports editor Dan Roan that he was completely overwhelmed when his side were confirmed as Premier League champions	2020-06-26 19:41:12
ニュース	BBC News - Home	Battle of the Brits: Kyle Edmund continues fine form with win over Broady - best shots	https://www.bbc.co.uk/sport/av/tennis/53200998	brits	2020-06-26 19:38:41
ビジネス	ダイヤモンド・オンライン - 新着記事	有安杏果が今伝えたい「立ち止まる勇気」 - あやうく一生懸命生きるところだった	https://diamond.jp/articles/-/239185	一生懸命	2020-06-27 04:55:00
ビジネス	ダイヤモンド・オンライン - 新着記事	首コリにすぐ効く最強ストレッチ - 座り仕事の疲れがぜんぶとれるコリほぐしストレッチ	https://diamond.jp/articles/-/241584	「筋肉は動かさないと硬くなる。	2020-06-27 04:50:00
ビジネス	ダイヤモンド・オンライン - 新着記事	【國分利治×山下誠司】年収1億円になるにはどうしたらいいですか？ - 年収1億円になる人の習慣	https://diamond.jp/articles/-/239820		2020-06-27 04:45:00
ビジネス	ダイヤモンド・オンライン - 新着記事	慶應の中学入試では「もやし1袋」の値段が問われる - 中学受験　大学付属校　合格バイブル	https://diamond.jp/articles/-/239748		2020-06-27 04:40:00
ビジネス	ダイヤモンド・オンライン - 新着記事	自分の「有能さ」をアピールしたがる人の末路 - 参謀の思考法	https://diamond.jp/articles/-/239960	自分の「有能さ」をアピールしたがる人の末路参謀の思考法単なる「優秀な部下」にとどまるか、「参謀」として認められるかー。	2020-06-27 04:35:00
ビジネス	ダイヤモンド・オンライン - 新着記事	プレゼン資料のフローチャートは、「左から右」に流すのがベストな理由 - パワーポイント最速仕事術	https://diamond.jp/articles/-/239915		2020-06-27 04:30:00
ビジネス	ダイヤモンド・オンライン - 新着記事	【マンション管理】管理委託費の見直しは、どう進めればいい？ - マンション管理はこうして見直しなさい	https://diamond.jp/articles/-/238258	【マンション管理】管理委託費の見直しは、どう進めればいいマンション管理はこうして見直しなさい上下階の騒音トラブル、ゴミ置き場の清掃問題、大雨による浸水、欠陥工事、管理費等の値上げ、大地震の被害……。	2020-06-27 04:25:00
ビジネス	ダイヤモンド・オンライン - 新着記事	基準を下げないと、恋人は見つからない？ - 大人が自分らしく生きるためにずっと知りたかったこと	https://diamond.jp/articles/-/241435	基準を下げないと、恋人は見つからない大人が自分らしく生きるためにずっと知りたかったこと「代で結婚できる」「年老いた親をどうする」「代で仕事を失ったら」「どうやって美しさを保つ」「大人であることのメリット」など……聞きたくても聞けない、代女性の等身大の悩みへのヒントが満載フランスで大人気となった代の著者の等身大エッセイで学ぶ年齢にとらわれず、自分らしく生きる方法。	2020-06-27 04:20:00
ビジネス	ダイヤモンド・オンライン - 新着記事	「これ、何の絵を描いたの？」つい子どもに聞いてしまう親の“盲点” - 13歳からのアート思考	https://diamond.jp/articles/-/236573	「これ、何の絵を描いたの」つい子どもに聞いてしまう親の“盲点歳からのアート思考とある美術教師による初著書にもかかわらず、各界のオピニオンリーダーらやメディアから絶賛され、発売ヵ月で万部超という異例のヒット作となった『歳からのアート思考』。	2020-06-27 04:15:00
ビジネス	ダイヤモンド・オンライン - 新着記事	人間の脳は 1万年以上進化していない!? トランプ大統領の出現を歴史的にどう見るか ……現代の知の巨人・出口治明講演会質疑応答2 - 哲学と宗教全史	https://diamond.jp/articles/-/238381		2020-06-27 04:10:00
ビジネス	ダイヤモンド・オンライン - 新着記事	なぜ、モテたい 50代のおじさまは、次々インスタを始めているのか？ - 医者が絶賛する歩き方やせる3拍子ウォーク	https://diamond.jp/articles/-/238442	なぜ、モテたい代のおじさまは、次々インスタを始めているのか医者が絶賛する歩き方やせる拍子ウォーク続々ランキング入り家で・テレビを見ながら・通勤通学・仕事中にできる医者が絶賛する歩き方やせる拍子ウォーク。	2020-06-27 04:05:00
Azure	Azure の更新情報	Azure Sphere OS quality update 20.06 is now available	https://azure.microsoft.com/ja-jp/updates/azure-sphere-os-quality-update-2006-is-now-available/	retail	2020-06-26 19:00:42
GCP	Cloud Blog	How the Google AI Community Used Cloud to Help Biomedical Researchers	https://cloud.google.com/blog/products/ai-machine-learning/google-ai-community-used-cloud-to-help-biomedical-researchers/	How the Google AI Community Used Cloud to Help Biomedical ResearchersIn response to the global pandemic the White House and a coalition of research groups published the CORD dataset on Kaggle the world s largest online data science community The goalーto further our understanding about coronaviruses and other diseasesーcaught the attention of many in the health policy research and medical community The Kaggle challenge has received almost million page views since it launched in mid March according to this article in Nature The dataset freely available to researchers and the general public contains over scholarly articles thousands just on COVID making it almost impossible to stay up to date on the latest literature Furthermore there are millions of medical publications with information that could enhance our scientific understanding of COVID and other diseases However much of this literature is not readily consumable by machines and is difficult to digest and analyze using modern natural language processing tools Enter the Google artificial intelligence AI community External to the company this is a group of data scientists known as Machine Learning Google Developer Experts ML GDEs They are a highly skilled community of AI practitioners from all over the world With the support of Google Cloud credits and credits from the TensorFlow Research Cloud TFRC the ML GDEs began to tackle the problem of understanding the research literature While not healthcare experts they quickly realized they could help with the current crisis by applying their knowledge of big data and AI to the biomedical domain The team came together in April under the audacious name of AI versus COVID aiscovid org and established the objective of using state of the art machine learning and cloud technologies to help biomedical researchers discover new insights faster from research literature Designing the DatasetThe first step by the ML GDE team was to reach out to biomedical researchers to better understand their workflows tools challenges and most importantly the relevance in medical literature They found some common insights overwhelming amount of existing and new informationambiguous and inconsistent sources of truthlimited information retrieval functionality in current toolssearch based only on simple keywords multiple scattered datasets inability to understand the meaning of words in context One of the pillars of the current AI revolution is the ability of these systems to become better as they analyze more data Recent work BERT XLNEt T GPT uses millions of documents to train state of the art neural networks for NLP tasks Based on these insights they determined the best way to help the research community was to create a single dataset containing a very large corpus of papers and then to make that dataset available in machine usable formats Inspired by the Open Accessmovement and initiatives such as the Chan Zuckerberg Institute s Meta they sought to find as many relevant and unique freely available publications and collect them into one easily accessible dataset designed specifically to train AI systems Introducing BREATHEThe Biomedical Research Extensive Archive To Help Everyone BREATHE is a large scale biomedical database containing entries from top biomedical research repositories The dataset contains titles abstracts and full body texts when licensing permitted for over million biomedical articles published in English They released the first version in June and expect to release new versions as the corpus of articles is constantly updated by their search crawlers Collecting articles originally written in different languages other than English is among the ideas on how to further improve the dataset and the domain specific knowledge that it tries to capture While there are several COVID specific datasets BREATHE differs in that it is broad contains many different sources machine readablepublicly accessible and free to usehosted on a scalable easy to analyze cost effective data warehouse Google BigQueryBREATHE Development ApproachThe ML GDE team identified the top ten web archives or sources with potential material based on three main factors amount of data quality of data and availability These sources are listed in Table Table Medical ArchivesData Mining Approach amp ToolsThe development and automation of the article download workflow was significantly accelerated by using Google Cloud infrastructure This system internally called the “ingestion pipeline has the classical three stages Extract Transform and Load ETL Google Cloud Platform BREATHE Dataset CreationExtract For all the resources the ML GDE team first verified the content licensing making sure they were abiding to the source s terms of use and then employed APIs and FTP servers when available For the remaining resources they adopted the ethical scraping philosophy to ingest the public data To easily prototype the main logic of the scrapers their interns used a Google Colaboratory Notebook or Colab Colab is a hosted Python Jupyter notebook that enables users to write and execute Python in the browser with no additional setup or configuration and provides free limited access to GPUs making it an attractive tool of choice for many machine learning practitioners Google Colab provided us the ability to easily share code amongst our interns and collaborators The scrapers are written using Selenium a suite of tools for automating web browsers among which they chose Chromium in headless mode Chromium is the open source project on which the Google Chrome browser is based All the raw data from the different sources is downloaded directly to their Google Cloud Storage bucket TransformThe ML GDE team ingested over million articles from ten different sources each one with raw data formatted in CSV JSON or XML and its own unique schema Their tool of choice to efficiently process this amount of data was Google Dataflow Google Dataflow is a fully managed service for executing Apache Beam pipelines on Google Cloud In the transform stage the pipeline processes every single raw document applying cleaning normalization and multiple heuristic rules to extract a final general schema formatted in JSONL Some of the heuristic applied includes checks for null values invalid strings and duplicate entries They also verified the consistency between fields with different names in different tables which represented the same entity Documents going through these stages end up in three different sink buckets based on the status of the operation Success for documents correctly processedRejected for documents that did not match one or more of their rulesError for documents that the pipeline failed to processApache Beam allows us to design logic that is not straightforward with an easy to read syntax such as Snippet Google Dataflow makes it easy to scale this process across many Google Cloud compute instances without having to change any code The pipeline was applied to the full raw data distilling it to million records for a total of GB of JSONL text data Snippet Google Dataflow Processing ExampleLoadFinally the data was loaded into Google Cloud Storage buckets and Google BigQuery tables BigQuery didn t require us to manage any infrastructure nor does it need a database administrator making it ideal for their project which is composed mainly of data science experts They iterated several times on the ingestion process as they scaled the number of total documents processed In the initial stages of data exploration data scientists were able to explore the contents of the data loaded into BigQuery by simply using the standard Structured Query Language SQL One useful technique in this phase is to “sample the dataset to discover non conforming documents for example to extract of the whole dataset you can use this simple code For more advanced queries they used Google Colab and BigQuery Python API For example here s how you can count the number of lines in each table Using this approach it was easy to calculate aggregate statistics about their dataset If one considers all the abstracts in the BREATHE there are billion total words and million unique words Using Python and Colab it was also easy to do some exploratory data analysis For example here s a plot of the word frequencies Google Public Dataset ProgramThe ML GDE team believes other data scientists may find value in the dataset so they chose to make it available via the Google Public Dataset Program This public dataset is hosted in Google BigQuery and is included in BigQuery s free tier Each user can process up to TB for free every month This quota can be used by anyone to explore the BREATHE dataset using simple SQL commands Watch this short video to learn about BigQuery and start querying BREATHE using the BigQuery public access program today What can YOU do with this dataset The BREATHE dataset can be used in many ways to better understand and synthesize voluminous biomedical research and uncover new insights into biomedical challenges such as the COVID pandemic The ML GDE team thinks there are many other interesting things that data scientists can build using BREATHE such as training biomedical specific language models building biomedical information retrieval systems or deriving new forms of unsupervised classification for niches of research in the vast biomedical domain Some ideas may even address the challenging task of accurately translating articles to many different languages where non native english speaking researchers and clinicians are often forced to understand and comprehend material in the original author s language The team is looking forward to seeing what the AI community can create with the BREATHE dataset Cloud collaboration it takes a villageOne of the distinct advantages of working in the cloud is that many geographically separated developers can work together on a single project In this case generating the dataset involved no less than people on three continents and five time zones Dan Goncharov head of the Silicon Valley AI and Robotics Lab led the team that drove the BREATHE dataset creation is a private nonprofit and tuition free computer programming school with locations worldwide The ML GDE team would like to acknowledge the work of Blaire Hunter Simon Ewing Khloe Hou Gulnozai Khodizoda Antoine Delorme Ishmeet Kaur Suzanne Repellin Igor Popov Uliana Popova and especially the work of Ivan Kozlov Francesco Mosconi Zero to Deep Learning and Fabricio Milo Entropy Source Next time building a search tool using TensorFlow and state of the art natural language architecturesIn this post we went through the project background the design principles and the development process for creating BREATHE a publicly available machine readable dataset for biomedical researchers In the next post the ML GDE team will walk through how they built a simple search tool on top of this dataset using open source and state of the art natural language understanding tools Tools Used Creating BREATHEGoogle Networking amp ComputeGoogle DataflowGoogle BigQuery BQ Google Cloud Storage GCS Google Cloud Public Dataset ProgramSelenium Google ColabPython “Unique articles as determined by DOI however many that are listed with the same DOI contain valuable additional information JAMA contained k articles with full body text that technically were duplicated in abstract form from other sources	2020-06-26 19:30:00

このブログを検索

IT音痴アラフィフおやじのストック記事倉庫

投稿時間:2020-06-27 05:19:56 RSSフィード2020-06-27 05:00 分まとめ(29件)

コメント

コメントを投稿

このブログの人気の投稿

投稿時間:2021-06-20 02:06:12 RSSフィード2021-06-20 02:00 分まとめ(3871件)

投稿時間:2021-04-30 23:37:32 RSSフィード2021-04-30 23:00 分まとめ(42件)

投稿時間:2023-02-05 02:09:04 RSSフィード2023-02-05 02:00 分まとめ(9件)