投稿時間:2021-06-29 02:31:45 RSSフィード2021-06-29 02:00 分まとめ(33件)

カテゴリー等	サイト名等	記事タイトル・トレンドワード等	リンクURL	頻出ワード・要約等/検索ボリューム	登録日
python	Pythonタグが付けられた新着投稿 - Qiita	python で Photoshop形式のファイル（.psd）を扱う実行ファイルを作る	https://qiita.com/bellvine/items/d42b05c3cbdd823dc77a		2021-06-29 01:48:14
Program	[全てのタグ]の新着質問一覧｜teratail（テラテイル）	あるファイルを追加したときだけ404エラーが起きます	https://teratail.com/questions/346651?rss=all	普通に使っていたらは出ないのですが、入力したデータを確認画面に送る機能を持ったファイルを作った時にだけ、急にエラーが発生します。	2021-06-29 01:51:11
Program	[全てのタグ]の新着質問一覧｜teratail（テラテイル）	シェルスクリプトで複数の値を抽出し並べる方法	https://teratail.com/questions/346650?rss=all	シェルスクリプトで複数の値を抽出し並べる方法以下のようなテキストファイルがあった場合にTnbspSTARTnbspRequestIdnbspffnbspVersionnbspLATESTqwreTnbsphellonbspworldasdfTnbspENDnbspRequestIdnbspffasdfTnbspREPORTnbspRequestIdnbspffzxcv冒頭の日時部分「T」と末尾の内四桁のアルファベットを行ごとに抽出し、以下のような結果を得たいです。	2021-06-29 01:28:58
Program	[全てのタグ]の新着質問一覧｜teratail（テラテイル）	dplyr:summarizeを使って値ごとの数をカウントしたい	https://teratail.com/questions/346649?rss=all	dplyrsummarizeを使って値ごとの数をカウントしたいRのデータフレームの集計に関する質問です。	2021-06-29 01:01:48
海外TECH	Ars Technica	Amazon is using algorithms with little human intervention to fire Flex workers	https://arstechnica.com/?p=1776745	delivery	2021-06-28 16:40:03
海外TECH	Ars Technica	Why is Russia launching a new module to the space station if it’s pulling out?	https://arstechnica.com/?p=1776683	module	2021-06-28 16:10:00
海外TECH	DEV Community	Airflow at Adyen: Adoption as ETL/ML Orchestrator	https://dev.to/adyen/airflow-at-adyen-adoption-as-etl-ml-orchestrator-k5a	Airflow at Adyen Adoption as ETL ML OrchestratorBy Igor Lushchyk and Ravi AutarAdyen makes many decisions within and outside of the payment flow to provide state of the art payment processing Challenges that need to be solved include optimization of the payment conversion rates rescuing failed subscription payments or predicting and monitoring payment volumes just to name a few All of these decisions are made by enabling an array of specialized data teams to leverage the vast amount of data generated throughout the payment flow However to leverage this data we need a versatile platform and toolset to cater to all common needs of the data teams while still giving each team the flexibility to work on their unique and domain specific solution Building such a platform allows us to achieve operational excellence and allows our data teams to launch fast and iterate on their solutions In this blog post let s see how we kickstarted with an in house built ETL framework the issues we faced with it and how we migrated to Airflow SpoinkAt the beginning of Adyen s data initiative we developed a framework for creating and scheduling data processing pipelines we called it Spoink We built the Spoink framework with a lot of design concepts taken from Airflow As a result our framework inherited a lot of Airflow s API such as DAG and task dependency definition The initial plan was to grow Spoink into a feature complete open source ETL framework In a previous blog post we discussed the various reasons for designing our own ETL framework among which lightweight security and alignment with existing infrastructure at Adyen were the key reasons The simplicity of its use by the stakeholders played a key role as an increasing number of teams adopted this tool for data analysis and data preparation Furthermore many machine learning pipelines were being deployed through Spoink as well After becoming a central component of the data infrastructure we understood that we have a crucial dependency on Spoink Problems with SpoinkAs our understanding and use cases for our big data platform grew over the years so did the technical debt we had incurred for Spoink it had grown to such an extent that it was beyond maintenance One of such decisions was the use of a single DAG where all streams had shared ownership as opposed to modular ownership based on the data product Another implementation detail made it impossible to submit Spark jobs in cluster mode which would lead to increased operational overhead since a single edge node would be overloaded all the time Scheduling and backfilling jobs would require users to have intricate knowledge of the Spoink framework and any mistakes made would lead to big operational overhead to both the engineering and infrastructure teams Adding to these issues the most prominent issue with Spoink was its closed source nature With the increase in technical debt and simultaneous increase in teams and products dependent on the Big Data platform supporting Spoink s codebase became increasingly more difficult Being closed source also meant that we were missing out on a plethora of recent developments in ETL orchestration developed by the open source community Continuing to work on Spoink would also close the possibility of ever contributing back to the open source community In summary it was clear that we needed to reassess the way we scheduled ETL jobs and how we managed data ownership Evolution of Data ApproachBefore deciding on a new orchestration framework we first had to rethink the way we managed data organizationally in terms of ETL tasks and data ownership Spoink framework had a single daily DAG which contained all the ETL jobs across multiple product teams Therefore the DAG was updated and maintained by every team resulting in huge run times decreased flexibility and increased operational overhead in case of failed runs We needed to shift to a more decentralized approach where teams had clear ownership of their ETL processes and increased clarity in data ownership as well To achieve this we adopted the data mesh architecture put forward in this blogpost Data Mesh at AdyenEach data team at Adyen is specialized in the problems they are solving and by developing and maintaining the entire data pipeline for their solution Depending on the team and the problem they are solving the data product can come in different forms such as dashboards reports or ML artifacts Starting from the raw data the team holds ownership of all the intermediate tables artifacts required to facilitate their data solution Many challenges need to be taken into consideration when we apply the data mesh architecture in practice Giving teams ownership of their ETLs processes also introduces more variation in the types of use cases the CDI teams need to account for Some of them are ETL scheduling One of the undisputed requirements is the ability to schedule different ETLs with unique characteristics While most teams require their ETL jobs to run daily some jobs need to run on an hourly weekly or monthly basis Teams not only need the flexibility to specify different scheduling intervals but also different starting ending times and retrying behaviors for their specific ETL Task dependencies Teams also need to specify dependencies between different ETL jobs These can be dependencies between different jobs owned by a single team but can also be extended to include dependencies on jobs owned by other teams i e cross team dependencies An example of this is when the Business Intelligence team wants to reuse a table created by the Authentication team to build summary tables that eventually power their dashboards Undoing and backfilling Every team in Adyen strives to productionize their tables fast and iterate on them This usually means that teams require rerunning some of their ETLs multiple times Sometimes data might be corrupted incomplete for certain date ranges This inevitably requires us to rerun their ETL pipelines for specified date ranges for certain tables while also considering their downstream dependencies and possibly varying schedule intervals Adoption of AirflowThe previously mentioned problems and change in view on work with data prompted us to look for a replacement framework for which we chose Airflow Airflow is an open source scheduling framework that allows you to benefit from the rapid developments made by the open source community There were multiple reasons we did choose it over competitors Just to name a few Scalability With its design it can scale with minimum efforts from the infrastructure team Extensible model It is extremely easy to add custom functionality to Airflow to fulfill specific needs Built in retry policy sensors and backfilling With these features we can add DAG task and retroactively run ETL or we will be on the safe side waiting for the event to trigger DAG Monitoring and management interface Built in interface to interact with logs Our data system is built around Spark and Hadoop for running our ETL and ML jobs with HDFS as data storage We use Apache YARN as a main resource manager This standard setup made the process of installing and deploying Airflow much easier as Airflow comes with built in support for submitting Spark jobs through YARN We also have the following Airflow components running Airflow web server The main component responsible for the UI that all our stakeholders will interact with However downtime of the web server does not automatically translate to ETLs not being able to run this is handled by the scheduler and workers Airflow schedule Brains of the Airflow Responsible for DAG serialization defining DAG execution plan and communicating with Airflow workers Airflow worker Workhorse of the installation and gets tasks from the scheduler and to run in a specific manner With workers we can scale indefinitely Also there can be different types of workers with different configurations At Adyen we make use of Celery workers Apart from the standard Airflow components we also need a couple of other services to support our installation The broker queue is responsible for keeping track of tasks that were scheduled and still need to be executed The technology of your choice here should be reliable and scalable At Adyen we use Redis Relational database for storing metadata needed for DAGs and Airflow to run and storing the results of the task executions At Adyen we make use of a Postgres databaseFlower This component is optional if you want to monitor and understand what is happening with Celery workers and the tasks they are executing At least for the next ones we need to have high availability Airflow workers PostgreSQL database and Redis Which means more instances and more load on the cluster After careful thinking we introduced a new type of machine to our Hadoop installation Those types of machines will have all the required clients to interact with Spark HDFS Apache Ranger Apache YARN but will not host any workload for running ETL or ML tasks We call them edge nodes The machines which are running ETL ML workload are the workers This blog post will not dive into the exact architecture of every single component which is involved in our Big Data platform But here is an architectural diagram that depicts the general setup With given separation of machines which are running jobs and which control them we can have painless maintenances and be secure if something fails With a worker s failure we maintain all the information about the success or failure of the tasks and can reschedule it in the future With an edge failure we still can complete ongoing tasks Update we have recently upgraded to Airflow and now also use the Airflow scheduler in HA mode Migration to AirflowOne of the biggest challenges during the adoption of airflow was the migration of already existing pipelines from Spoink During such a migration we carefully needed to choose our strategy since most of the jobs running on Spoink were also production critical to our product teams We needed to support the uninterrupted operation of the existing infrastructure while simultaneously deploying a new architecture and migrating production jobs and users For such an activity we choose a green blue green blue approach This relatively simple method allows us to adhere to the aforementioned constraints during this migration To follow this approach you need to consider these assumptions We needed to have old and new installations running at the same time and achieve feature parity This essentially meant to have all production jobs running simultaneously on both Spoink and Airflow for multipleYou do not add new features to the old installation We introduced a code freeze for the duration of the migration to avoid adding more moving components to the migration process weeks You do not migrate teams at one time but slowly with proper testing and validation With regards to ETL pipeline and data ownership we decided to tackle the problem structurally by reflecting the respective ownerships directly in the codebase As a result the codebase which contains the logic for each ETL pipeline was segregated into the product teams which were the first point of contact for that specific logic Ownership of tables was also reflected using DDL Data Definition Language files which contains the schema of said table and again segregated between the teams that own that table The left image shows the ETL pipeline definitions segregated between different teams while the right image shows table definitions DDLs segregated between data teams This segregation highlights the ownership and responsibilities of different streams Each team then has its own Airflow DAGs and the tables they create update using those DAGs In this sense using Airflow made it possible for us to split up a single massive DAG we had in Spoink into multiple smaller DAGs each owned by their specific stream with their unique scheduling configurations ResultsWe extended Airflow by introducing custom Airflow views operators sensors and hooks that are tailored for running ETLs on Ayden s Big Data platform By doing this we built tools and functionalities that are common across different streams while still giving streams the freedom to work on the data solution they are the domain experts in With Airflow s built in functionality for managing schedules and defining within DAG dependencies our data teams leveraged the newly gained flexibilities and were suddenly able to define dozens of tasks with intricate dependencies between each other example image shown While the out of the box features of Airflow already solved a wide range of problems we faced in our in house developed framework we still encountered multiple operational problems with regards to backfilling and specifying dependencies across multiple Airflow DAGs In our next “Airflow at Adyen series we dive further into the challenges we faced with cross DAG dependencies and backfilling and how we extended Airflow s functionalities to address these problems Technical careers at AdyenWe are on the lookout for talented engineers and technical people to help us build the infrastructure of global commerce Check out developer vacancies Developer newsletterGet updated on new blog posts and other developer news Subscribe nowOriginally published at on May	2021-06-28 16:41:05
海外TECH	DEV Community	Quick Introduction to header files in C++	https://dev.to/guptaaastha/quick-introduction-to-header-files-in-c-4fda	Quick Introduction to header files in C Using a variable or function or a class and so on without declaring it is an error in C more specifically this error In function int main error x was not declared in this scope x ERROR in Compilation which means it is important to declare an element before using it or assigning it a value in C The word element refers to functions classes structs variables etc throughout this article We might end up declaring and defining various elements repeatedly in our different C programs if they re useful in varied cases This would result in a lot of code repetition throughout our code base and making one change in an element would result in copying the same change over and over again in different files sounds scary Therefore it makes sense to declare and define commonly used elements that are predicted to be of use often before hand and use them later without worrying about their declaration To keep our code modular it fits to combine such utility elements in one single file and keep importing it in other c programs Enter header files these are declarations and definitions of various elements which are ready to be used in a c program Header files have a h extension instead of a cpp one this tells everyone that my file h is a header file that contains declarations and definitions of various utility elements A header file can than be included in a cpp program as such Want to write good a header file Keep the following points in mind while writing one Do not bring complete namespaces into scope with the using directive read more about using here as this might conflict with other elements present in the file where this header is being included into Use const variable definitions to make sure a program including this header file cannot change it Use inline function definitions and named namespaces Standard library header files enable various useful functionalities in our C programs Moreover we can write our own header files to make our code base modular and thus more understandable This article was a short introduction and if you want to know more about various header files in C I encourage you to head over here Thanks for giving this article a read and I ll see you in the next one PS This is an article in my series Quick Introduction to a concept in C You can find all the articles in this series here	2021-06-28 16:19:09
海外TECH	DEV Community	Why Blender Is the Best Software for the 3D Workflow	https://dev.to/hugop/why-blender-is-the-best-software-for-the-3d-workflow-3b4l	Why Blender Is the Best Software for the D Workflow DD short for the three dimensions of space we live in is a catch all term used to describe the varied technologies used to create virtual worlds D s technology stack can be roughly split into two broad categories asset creation and asset scripting Asset creation is the process of creating assets virtual objects scenes and materials Asset scripting is the process of manipulating those assets and their interactions over the fourth dimension of time Decades of progress have resulted in sophisticated software tools that make D workflows more automated and straightforward but a significant amount of human expertise and artistic talent is still required Asset CreationAssets are digital representations of a D object One type of asset is a mesh a connected graph of D points also called vertices which define the surface of an object Edges interconnect vertices and a closed loop of vertices creates a polygon known as a face The engineering and manufacturing world creates meshes using computer aided design CAD software such as AutoCAD Solidworks Onshape and Rhino The entertainment industry creates meshes using modeling software such as Maya DSMax and CinemaD Whereas a mesh describes the shape and form of an object a material asset describes the texture and appearance of a virtual object A material may define rules for the reflectivity specularity and metallic ness of the object as a function of lighting conditions Shader programs use materials to calculate the exact pixel values to render for each face of a mesh polygon Modeling software usually comes packaged with tools for the creation and configuration of materials Finally asset creation encompasses the process of scene composition Assets can be organized into scenes which may contain other unique virtual objects such as simulated lights and cameras Deciding where to place assets especially lights is still almost entirely done by hand Automatic scene composition remains a tremendous challenge in the D technology stack Asset ScriptingThe fourth perceivable dimension of our reality is time Asset scripting is the process of defining the behaviors of assets within scenes over time One type of asset scripting is called animation which consists of creating sequential mesh deformations that create the illusion of natural movement Animation is a tedious manual task because an artist must define every frame expert animators spend decades honing their digital puppeteering skills Specialized software is often used to automate this task as much as possible and technologies such as Motion Capture MoCap can be used to record the movement of real objects and play those movements back on virtual assets Game Engines are software tools that allow for more structured and systematic asset scripting mostly by providing software interfaces e g code to control the virtual world Used extensively in the video game industry after which they were named examples include Unity Unreal Engine GoDot and Roblox These game engines support rule based spawning animation and complex interactions between assets in the virtual world Programming within game engines is a separate skillset to modeling and animating and is usually done by separate engineers within an organization BlenderBlender is an open source D software tool initially released in It has grown steadily over the decades and has become one of the most popular D tools available with a massive online community of users Blender s strength is in its breadth it provides simple tools for every part of the D workflow rather than specializing in a narrow slice Organizations such as game studios have traditionally preferred specialization having separate engineers using separate tools such as Maya for modeling and Unreal Engine for scripting However the convenience of using a single tool and the myriad advantages of a single engineer being able to see a project start to finish make a strong case for Blender as the ultimate winner in the D software tools race Many of the world s new D developers opt to get started and build their expertise in Blender for its open source and community emphasizing offering This is an example of a common product flywheel using a growing community of users to improve a product over time With big industry support from Google Amazon and even Unreal Blender also has the funding required to improve its tools with this user feedback In addition to supporting the full breadth of the D workflow Blender has the unique strength of using Python as the programming language of choice for asset scripting Python has emerged as the lingua franca for modern deep learning in part due to the popularity of open source frameworks such as TensorFlow PyTorch and Scikit Learn Successful adoption of synthetic data will require Machine Learning Engineers to perform asset scripting and these engineers will be much more comfortable in Blender s Python environment than Unity s C or Unreal Engine s C tools ConclusionThanks for getting this far If you re interested in D and what it can do for synthetic data check out our open source data development toolkit zpy Everything you need to generate and iterate synthetic data for computer vision is available for free Your feedback commits and feature requests are invaluable as we continue to build a more robust set of tools for generating synthetic data In the meantime if you need our support with a particularly tricky problem please reach out	2021-06-28 16:17:02
Apple	AppleInsider - Frontpage News	GOP lawmakers mull taxing Big Tech to subsidize broadband access	https://appleinsider.com/articles/21/06/28/gop-lawmakers-mull-taxing-big-tech-to-subsidize-broadband-access?utm_medium=rss	GOP lawmakers mull taxing Big Tech to subsidize broadband accessKey Republican lawmakers in the House and Senate are warming up to the idea of leveling taxes on big U S tech companies to fund broadband subsidy programs Credit WikiMedia CommonsThe idea to compel technology giants to pay into a pool of money to subsidized broadband access first originated with Republican FCC commissioner Brendan Carr Axios reported Monday Several key GOP lawmakers have expressed interest in the proposal Read more	2021-06-28 16:53:10
海外TECH	Engadget	Juul will pay $40 million to settle a vaping lawsuit in North Carolina	https://www.engadget.com/juul-north-carolina-settlement-vaping-teens-163805341.html?src=rss_b2c	carolinaseveral	2021-06-28 16:38:05
海外TECH	Engadget	Heineken made a cute but garish autonomous beer cooler	https://www.engadget.com/heineken-beer-outdoor-transporter-162029463.html?src=rss_b2c	heineken	2021-06-28 16:20:29
海外TECH	Engadget	How to watch today's Samsung Wear OS event	https://www.engadget.com/how-to-watch-samsung-wear-os-event-mwc-161523864.html?src=rss_b2c	google	2021-06-28 16:15:23
海外TECH	Engadget	YouTube to open a 6,000-seat live entertainment arena in California	https://www.engadget.com/youtube-theater-live-arena-california-160056935.html?src=rss_b2c	YouTube to open a seat live entertainment arena in CaliforniaYouTube is opening a three story seat live arena for concerts esports and creator events at Hollywood Park in Inglewood California	2021-06-28 16:00:56
金融	RSS FILE - 日本証券業協会	動画で見る日証協の活動	https://www.jsda.or.jp/about/gaiyou/movie/index.html	日証協	2021-06-28 17:33:00
ニュース	ジェトロビジネスニュース（通商弘報）	ベルギーの復興計画の審査完了、建物の省エネ化と脱炭素化に焦点	https://www.jetro.go.jp/biznews/2021/06/8e6f8192bf4c859e.html	復興計画	2021-06-28 16:40:00
ニュース	ジェトロビジネスニュース（通商弘報）	海南自由貿易港、15分野でビジネス環境改善へ	https://www.jetro.go.jp/biznews/2021/06/39724096c84bb508.html	環境改善	2021-06-28 16:30:00
ニュース	ジェトロビジネスニュース（通商弘報）	欧州委、ラトビアの復興計画の審査完了	https://www.jetro.go.jp/biznews/2021/06/8b3fc68c2128103d.html	復興計画	2021-06-28 16:20:00
ニュース	ジェトロビジネスニュース（通商弘報）	フランスの食品の健康強調表示、EU規則違反が多数	https://www.jetro.go.jp/biznews/2021/06/52198581b8214187.html	規則違反	2021-06-28 16:10:00
ニュース	BBC News - Home	Covid-19: England lockdown end still set for 19 July - Javid	https://www.bbc.co.uk/news/uk-57643694	confirms	2021-06-28 16:48:51
ニュース	BBC News - Home	Ministry of Defence 'sorry' after secret papers left at bus stop	https://www.bbc.co.uk/news/uk-57642108	jeremy	2021-06-28 16:09:05
ニュース	BBC News - Home	Spain, Malta and Portugal restrict non-vaccinated travellers	https://www.bbc.co.uk/news/business-57634932	travellers	2021-06-28 16:08:43
ニュース	BBC News - Home	Elephant and Castle fire: Six hurt in huge blaze at railway arches	https://www.bbc.co.uk/news/uk-england-london-57642027	castle	2021-06-28 16:15:50
ニュース	BBC News - Home	Third seed Tsitsipas beaten by unseeded American Tiafoe	https://www.bbc.co.uk/sport/tennis/57638801	wimbledon	2021-06-28 16:47:34
ニュース	BBC News - Home	Euro 2020: 'It's only Germany!' - Dion Dublin and Martin Keown's plan for England to win against Germany	https://www.bbc.co.uk/sport/av/football/57633426	Euro x It x s only Germany x Dion Dublin and Martin Keown x s plan for England to win against GermanyMatch of the Day pundits Martin Keown and Dion Dublin discuss how England could line up in Tuesday s Euro tie against Germany at Wembley	2021-06-28 16:52:46
ニュース	BBC News - Home	Tour de France 2021: Geraint Thomas loses time after crash as Tim Merlier wins stage three	https://www.bbc.co.uk/sport/cycling/57642354	Tour de France Geraint Thomas loses time after crash as Tim Merlier wins stage threeGeraint Thomas is one of several riders involved in crashes as Tim Merlier wins a dramatic stage three of the Tour de France	2021-06-28 16:49:18
ニュース	BBC News - Home	Covid-19 in the UK: How many coronavirus cases are there in my area?	https://www.bbc.co.uk/news/uk-51768274	cases	2021-06-28 16:17:53
ニュース	BBC News - Home	Covid: What's the roadmap for lifting lockdown?	https://www.bbc.co.uk/news/explainers-52530518	covid	2021-06-28 16:40:29
ニュース	BBC News - Home	What are the rules for travelling to green, amber and red list countries?	https://www.bbc.co.uk/news/explainers-52544307	countries	2021-06-28 16:42:40
ニュース	BBC News - Home	Vaccine passports: How can I prove I've had both my Covid jabs?	https://www.bbc.co.uk/news/explainers-55718553	covid	2021-06-28 16:05:24
北海道	北海道新聞	学長不正支出７００万円認定　旭医大選考会議　不適切言動３４件	https://www.hokkaido-np.co.jp/article/560927/	万円認定旭医大選考会議不適切言動件	2021-06-29 01:16:00
北海道	北海道新聞	欧米でデジタル分野強化　日立、巨額買収の米社活用	https://www.hokkaido-np.co.jp/article/560926/	日立製作所	2021-06-29 01:15:00
Azure	Azure の更新情報	Expansion of the public preview of on-demand disk bursting for Premium SSD to more regions	https://azure.microsoft.com/ja-jp/updates/expansion-of-the-public-preview-of-ondemand-disk-bursting-for-premium-ssd-to-more-regions/	Expansion of the public preview of on demand disk bursting for Premium SSD to more regionsThe preview of on demand disk bursting for Premium SSDs P and greater lager than GiB is now expanded to all production regions	2021-06-28 16:00:43

このブログを検索

IT音痴アラフィフおやじのストック記事倉庫

投稿時間:2021-06-29 02:31:45 RSSフィード2021-06-29 02:00 分まとめ(33件)

コメント

コメントを投稿

このブログの人気の投稿

投稿時間:2021-06-17 22:08:45 RSSフィード2021-06-17 22:00 分まとめ(2089件)

投稿時間:2021-06-20 02:06:12 RSSフィード2021-06-20 02:00 分まとめ(3871件)

投稿時間:2021-06-17 05:05:34 RSSフィード2021-06-17 05:00 分まとめ(1274件)