投稿時間:2022-05-15 03:09:09 RSSフィード2022-05-15 03:00 分まとめ(10件)

カテゴリー等 サイト名等 記事タイトル・トレンドワード等 リンクURL 頻出ワード・要約等/検索ボリューム 登録日
python Pythonタグが付けられた新着投稿 - Qiita beautifulsoupで[特定の親要素]配下の[特定の子要素]だけを取得する方法 https://qiita.com/std-flower/items/991b8b5e2d37760206f6 beautifulsoup 2022-05-15 02:25:59
AWS AWSタグが付けられた新着投稿 - Qiita Serverless+GraphQL+NestJS+TypeORM+RDSをAWSにデプロイする https://qiita.com/shinobu_shiva/items/62a221d086443d188b3a graphqlnestjsty 2022-05-15 02:52:01
海外TECH MakeUseOf 8 Things You Need to Start Vlogging on Your Smartphone https://www.makeuseof.com/smartphone-vlogging-things-you-need/ camera 2022-05-14 17:30:13
海外TECH MakeUseOf How to Implement Object-Oriented Programming Concepts in Go https://www.makeuseof.com/how-to-implement-object-oriented-programming-concepts-in-go/ concepts 2022-05-14 17:16:13
海外TECH DEV Community What is the Lakehouse, the latest Direction of Big Data Architecture? https://dev.to/qazmkop/what-is-the-lakehouse-the-latest-direction-of-big-data-architecture-59i What is the Lakehouse the latest Direction of Big Data Architecture Explanation of nounsBecause there are many nouns in the article the leading nouns are briefly introduced to facilitate everyone to read Database In the sense of the word Databases have been used in computers since the s However the database structure at this stage is mainly hierarchical or mesh and there is an extreme dependence between data and programs so the application is relatively limited Databases are now commonly referred to as relational databases A relational database is a database that uses a relational model to organize data It stores data in the form of rows and columns and has the advantages of high structuration strong independence and low redundancy In the birth of the relational database which truly completely separated software data and programs became an integral part of the mainstream computer system The relational database has become one of the most important database products Almost all the new database products of database manufacturers support relational databases even if some non relational database products also almost have the interface to support relational databases Relational databases are mainly used for Online Transaction Processing OLTP OLTP mainly processes basic and routine transactions such as bank transactions Data warehouse With the large scale application of databases the data of the information industry grows explosively To study the relationship between data and excavate the hidden value of data more and more people need to use ONLINE Analytical Processing OLAP to analyze data and explore deep seated relationships and information However it isn t easy to share data between different databases and data integration and analysis are also very challenging To solve the problem of enterprise data integration and analysis bill Enman the father of Data Warehouse proposed Data Warehouse in The primary function of a data warehouse is to OLAP the large amount of data accumulated by OLTP over the years through the unique data storage architecture of the data warehouse and help decision makers quickly and effectively analyze valuable information from a large amount of data and provide decision support Since the emergence of data warehouse the information industry began to develop from relational database based operational systems to decision support systems Compared with a database the data warehouse has the following two characteristics Data warehouse is subject oriented integration The Data warehouse is built to support various businesses and data from scattered operational data Therefore the required data must be extracted from multiple heterogeneous sources processed and integrated reorganized according to the topic and finally entered into the data warehouse Data warehouse is mainly used to support enterprise decision analysis and the data operation involved is mostly data query Therefore the data warehouse can improve query speed and reduce overhead by optimizing table structure and storage mode Although warehouses are well suited for structured data many modern enterprises must deal with unstructured semi structured and data with high diversity speed and volume Data warehousing is not suitable for many of these scenarios and is not the most cost effective Data lake The essence of a data lake is a solution composed of data storage architecture data processing tools The data storage architecture must be scalable and reliable enough to store massive data of any type including structured semi structured and unstructured data Data processing tools fall into two broad categories The first type of tool focuses on how to move data into the lake It includes defining data sources formulating data synchronization policies moving data and compiling data catalogs The second type of tool focuses on how to analyze mine and utilize data from the lake Data lake needs to have perfect data management ability diversified data analysis ability comprehensive data life cycle management ability safe data acquisition and data release ability Without these data management tools metadata will be missing the data quality of the lake will not be guaranteed and eventually the data lake will deteriorate into a data swamp It has become a common understanding within the enterprise that data is an important asset With the continuous development of enterprises data keeps piling up Enterprises hope to keep all relevant data in production and operation completely carry out effective management and centralized governance and dig and explore data value Data lakes are created in this context The data lake is a large data warehouse that centrally stores structured and unstructured data It can store original data from multiple data sources and various data types Data can be accessed processed analyzed and transmitted without structural processing The data lake can help enterprises quickly complete federated analysis mining and exploring data value of heterogeneous data sources With the development of big data and AI the value of data in the data lake is gradually rising and being redefined The data lake can bring a variety of capabilities to enterprises such as centralized data management help enterprises build more optimized operation models and provide other capabilities for enterprises such as predictive analysis recommendation models etc which can stimulate the subsequent growth of enterprise capabilities The data warehouse and a data lake can be likened to the difference between a warehouse and a lake a warehouse stores goods from a specific source Lake water comes from rivers streams and other sources and is raw data Data lakes while good for storing data lack some key features they do not support transaction processing do not guarantee data quality and lack consistency isolation making it almost impossible to mix append and read data and to do batch and streaming jobs For these reasons many of the data lake capabilities are not yet implemented and the benefits of a data lake are lost Data lakehouse Wikipedia does not give a specific definition of the lakehouse It considers the advantages of both data lake and data warehouse On the low cost cloud storage in an open format it realizes functions similar to data structure and data management functions in the data warehouse It includes the following features concurrent data reads and writes architecture support with data governance mechanism direct access to source data separation of storage and computing resources open storage formats support for structured and semi structured data audio and video and end to end streaming Evolution direction of big data system In recent years many new computing and storage frameworks have emerged in the field of big data For example a standard computing engine represented by Spark Flink and an OLAP system described by Clickhouse emerged as computing frameworks In storage object storage has become a new storage standard representing an important base for integrating data lake and lake warehouse At the same time Alluxio JuiceFS and other local cache acceleration layers have emerged Several key evolution directions in the field of big data Cloud native Public and private clouds provide computing and storage hardware abstraction abstracting the traditional IaaS management operation and maintenance An important feature of cloud native is that both computing and storage provide elastic capabilities Making good use of elastic capabilities and reducing costs while improving resource utilization is an issue that both computing and storage frameworks need to consider Real time Traditional Hive is an offline data warehouse that provides T data processing It cannot meet new service requirements The traditional LAMBDA architecture introduces complexity and data inconsistencies that fail to meet business requirements So how to build an efficient real time data warehouse system and realize real time or quasi real time write updates and analysis on a low cost cloud storage are new challenges for computing and storage frameworks Computing engine diversification Big data computing engines are blooming and while MapReduce is dying out Spark Flink and various OLAP frameworks are still thriving Each framework has its design focus some deep in vertical scenarios others with converging features and the selection of big data frameworks are becoming more and more diverse In this context the lakehouse and flow batch emerged What problems can be solved by integrating the lakehouse Connect data storage and computingMany companies have not diminished the need for flexible high performance systems for a wide range of data applications including SQL analysis real time monitoring data science and machine learning Most of the latest advances in AI are based on models that better handle unstructured data text images video audio The two dimensional relational tables of a completely pure data warehouse can no longer handle semi unstructured data and AI engines cannot run solely on pure data warehouse models A common solution is to combine the advantages of the data lake and warehouse to establish the lakehouse and then solve the limitations of the data lake directly realize the data structure and data management functions similar to those in the data warehouse on the low cost storage for the data lake The data warehouse platform is developed based on big data demand and the data lake platform is developed based on the demand for AI These two big data platforms are completely separated at the cluster level and data and computation cannot flow freely between the two platforms By the Lakehouse the seamless flow between data lake and data warehouse can be realized opening up different data storage and computation levels Flexibility and ecological richnessLakehouse can give full play to the flexibility and ecological richness of the data lake and the growth and enterprise capability of the data warehouse Its main advantages are as follows Data duplication If an organization maintains a data lake and multiple data warehouses simultaneously there is no doubt that there is data redundancy At best this can lead to inefficient data processing but it can lead to inconsistent data at worst The Lakehouse can remove the repeatability of data and truly achieve uniqueness Data lakehouse has the following advantages High storage costs Data warehouses and data lakes are designed to reduce the cost of data storage Data warehouses often reduce costs by reducing redundancy and integrating heterogeneous data sources On the other hand data lakes tend to use big data file systems and Spark to store computational data on inexpensive hardware The goal of the lakehouse integrated architecture is to combine these technologies to maximize cost reduction Differences between reporting and analysis applications Data science tends to work with data lakes using various analytical techniques to deal with raw data On the other hand reporting analysts tend to use consolidated data such as data warehouses or data marts There is often not much overlap between the two teams in an organization but there are certain repetitions and contradictions between them Both teams can work on the same data architecture with the all in one architecture avoiding unnecessary duplication Data stagnation Data stagnation is one of the most severe problems in the data lake which can quickly become a data swamp if it remains ungoverned We tend to throw data into the lake easily but lack effective governance and in the long run the timeliness of data becomes increasingly difficult to trace The lakehouse for massive data management can help improve the timeliness of analysis data more effectively Risk of potential incompatibilities Data analytics is still an emerging technology and new tools and techniques emerge every year Some technologies may only be compatible with data lakes while others may only be compatible with data warehouses The lakehouse means preparing for both Conclusion In general the lakehouse has the following key characteristics Transaction support Data is often read and written concurrently to business systems in an enterprise ACID support for transactions ensures consistency and correctness of concurrent data access especially in SQL access mode Data modeling and data governance The lakehouse can support the realization and transformation of various data models and support DW mode architecture such as the star and snowflake models The system should ensure data integrity and have robust governance and audit mechanisms BI support The integration of lakehouse supports the use of BI tools directly on the source data speeding up the analysis efficiency and reducing the data delay In addition it is more cost effective to operate two copies separately in lakehouse Memory separation The architecture of memory separation also enables the system to scale up to more significant concurrency and data capacity Some newer data warehouses have adopted this architecture Openness With open standardized storage formats such as Parquet etc and rich API support various tools and engines including machine learning and Python R libraries can provide efficient direct access to data Support for multiple data types structured and unstructured Lakehouse provides data warehousing transformation analysis and access for many applications Data types include images video audio semi structured and text Support for various workloads Support for various workloads including data science machine learning SQL queries and analysis These workloads may require multiple tools but they are all supported by the same database End to end flow Real time reporting has become a normal requirement in the enterprise Building a dedicated system for real time data services is no longer the same as before with the support of flow Four best open source data lake warehouse projectsHudiHudi is an opensoure procject providing tables transactions efficent upserts deletes advanced indexes streaming ingestion services data clustering compaction optimizations and concurrency all while keeping your data in open source file formats Apache Hudi brings core warehouse and database functionality directly to a data lake which is great for streming wokloads making users create efficient incremental batch pipelines Besides Hudi is very compatible for example it can be used on any cloud and it supports Apache Spark Flink Presto Trino Hive and many other query engines IcebergIceberg is an open table format for huge analytic dataset with Schema evolution Hidden partitioning Partition layout evolution Time travel Version rollback etc Iceberg was built for huge tables even those that can t be read with a distributed SQL engine used in production where a single table can contain tens of petabytes of data Iceberg is famous for its fast scan planning advanced filtering works with any cloud store serializable isolation multiple concurrent writers etc LakesoulLakeSoul is a unified streaming and batch table storage solution built on the Apache Spark engine It supports scalable metadata management ACID transactions efficient and flexible upsert operation schema evolution and streaming amp batch unification LakeSoul specializes in row and column level incremental upserts high concurrent write and bulk scan for data on cloud storage The cloud native computing and storage separation architecture makes deployment very simple while supporting huge amounts of data at a lower cost delta lakeDelta Lake is an open source storage framework that enables building a Lakehouse architecture with compute engines including Spark PrestoDB Flink Trino and Hive and APIs for Scala Java Rust Ruby and Python providing ACID transactions scalable metadata handling and unifies streaming and batch data processing on top of existing data lakes such as S ADLS GCS and HDFS Hudi focuses more on the fast landing of streaming data and the correction of delayed data Iceberg focuses on providing a unified operation API by shielding the differences of the underlying data storage formats forming a standard open and universal data organization lattice so that different engines can access through API Lakesoul now based on spark focuses more on building a standardized pipeline of data lakehouse Delta Lake an open source project from Databricks tends to address storage formats such as Parquet and ORC on the Spark level 2022-05-14 17:06:38
Apple AppleInsider - Frontpage News Compared: USB 3, USB 4, Thunderbolt 3, Thunderbolt 4, USB-C - what you need to know https://appleinsider.com/articles/20/08/24/usb-3-usb-4-thunderbolt-usb-c----everything-you-need-to-know?utm_medium=rss Compared USB USB Thunderbolt Thunderbolt USB C what you need to knowWith the varieties of USB and Thunderbolt terminology floating about as well as new versions being adopted by Apple and other device producers sorting out the mess can be a problem Here s what you need to know about USB USB Thunderbolt and Thunderbolt For most users there are two general families of multi purpose connections USB and Thunderbolt Both have their benefits and their foibles as well as sharing many characteristics but the two technology trees are largely quite different If you don t read any further here s your main takeaway The term USB C by itself doesn t specify anything for data charging or video beyond the physicality of the connector But as you might expect there are a lot of details behind USB USB Thunderbolt and Thunderbolt and how they pertain to the USB C connector Read more 2022-05-14 17:49:08
海外TECH Engadget Proposed Ohio legislation would criminalize AirTag stalking https://www.engadget.com/ohio-anti-airtag-stalking-bill-172740034.html?src=rss Proposed Ohio legislation would criminalize AirTag stalkingA group of bipartisan lawmakers in Ohio has introduced a bill to criminalize AirTag stalking If passed by the state legislature HB would “prohibit a person from knowingly installing a tracking device or application on another s property without the other person s consent Ohio lawmakers decided to tackle the growing problem of remote tracker stalking after News lobbied the government to take action In February the news station found a loophole in state law that allows those with no prior record of stalking or domestic violence to track someone without potential penalty According to an investigation by the outlet fewer than two dozen states have enacted laws against electronic tracking Ohio being among the group that has not drafted specific legislation against the behavior A recent report from Motherboard suggested AirTag stalking isn t an issue limited to a few high profile incidents After the outlet requested any records mentioning AirTags from a dozen US police departments it received reports Of those involved cases where women thought someone was secretly using the device to track them In February Apple said it would implement additional safety features to prevent AirTag stalking Later in the year the company plans to add a precision finding feature that will allow those with iPhone and series devices to find their way to an unknown AirTag The tool will display the direction of and distance to an unwanted AirTag Apple said it would also update its unwanted tracking alerts to notify people of potential stalkers earlier “AirTag was designed to help people locate their personal belongings not to track people or another person s property and we condemn in the strongest possible terms any malicious use of our products quot the company said at the time “We design our products to provide a great experience but also with safety and privacy in mind Across Apple s hardware software and services teams we re committed to listening to feedback 2022-05-14 17:27:40
海外TECH CodeProject Latest Articles Automated Web Application Code Testing On A Budget https://www.codeproject.com/Articles/5331903/Automated-Web-Application-Code-Testing-On-A-Budget budgetlearn 2022-05-14 17:34:00
ニュース BBC News - Home FA Cup final: Marcos Alonso free-kick hits bar for Chelsea against Liverpool https://www.bbc.co.uk/sport/av/football/61451803?at_medium=RSS&at_campaign=KARANGA FA Cup final Marcos Alonso free kick hits bar for Chelsea against LiverpoolChelsea s Marcos Alonso hits the Liverpool crossbar with a free kick early in the second half as the FA Cup final at Wembley goes to extra time after remaining goalless in minutes 2022-05-14 17:43:10
ニュース BBC News - Home Charlotte Edwards Cup round-Up: Lauren Winfield-Hill stars for Northern Diamonds https://www.bbc.co.uk/sport/cricket/61451359?at_medium=RSS&at_campaign=KARANGA Charlotte Edwards Cup round Up Lauren Winfield Hill stars for Northern DiamondsNorthern Diamonds South East Stars Southern Vipers and Central Sparks win their opening fixtures of the Charlotte Edwards Cup 2022-05-14 17:07:25

コメント

このブログの人気の投稿

投稿時間:2021-06-17 05:05:34 RSSフィード2021-06-17 05:00 分まとめ(1274件)

投稿時間:2021-06-20 02:06:12 RSSフィード2021-06-20 02:00 分まとめ(3871件)

投稿時間:2020-12-01 09:41:49 RSSフィード2020-12-01 09:00 分まとめ(69件)